Image source: GenBank Submission Handbook.

Quality Control - methods

  1. Data formatting guidelines for repositories ensure that the submitted data will be compatible with the database design.
  2. Automatic, computerized data retrieval from other databases ensures that the same procedure is used each time data are downloaded. Data quality concerns are minimized, but some possible concerns include:
    1. Factors that affect data transmission and import into the receiving database.
    2. Incorrect entry of a record or an identifier in the collaborating database may create an incorrect record in the receiving database. Databases have different levels of funding and technical support so this may be a concern with some smaller databases but not in others, like those nationally supported institutions.
  3. Manual curation procedures should be documented to minimize the variability in subjective decisions by curators. For example:
    1. Curators read the scientific literature and extract information to be input into a database. They prioritize and select papers to review, then they must distinguish between experimentally supported information vs inferential assertions. Automated, electronic curation may capture inferential assertions unintetionally. (Hirschman et al. 2010).
    2. One gene name may have multiple literature-based synonyms and a synonym may be associated with different genes (Howe et al., 2008).