Workpackage 4

Data representation

WP4 marks the beginning of the technical part of Phase 2 of the Project and will assess the data collected during the consultation phase concerning data representation, ontologies and semantics.


  • Assessment of technical aspects of database interoperability as a barrier to scientific and financial sustainability.
  • Assessment of the variability of practice in the semantics of biological data representation. eg. genotype, gene expression
  • Assessment of emerging standards and current practice for data representation, annotation and ontologies.

Description of work

The WP marks the beginning of the technical part of Phase 2 and will assess the following data from the Phase 1 list concerning current practice in
  • existing databases with the expert technical help of the Web services and Database Coordinator, and working intensively with other partners and the members of the IAG working on the relevant data standards:
    • Assess the requirements that data representation at each level of biological organisation will need to satisfy in order to support data exchange, database interoperability and long-term sustainability. Results from this analysis will feed into a technical feasibility study.
    • List and collate the emerging international standards and ontologies for representing each kind of data. Assess these in relation to the requirements analysis. Particular attention will be paid to the inter-relation of mouse and human/clinical data.
    • For each kind of data and aspect of representation, analyse and collate the consistencies and differences in across databases. This work will include assessment of the representation of the wide range of quantitative and qualitative mouse phenotype and pathology data.
    • Using a similar approach to that above, outline the modes of data representation used by mouse-centric databases outside Europe and by databases for other model organisms. Particular emphasis will be placed on relevant human-centric databases. This work will be confined to providing the information needed to identify critical problems in interoperability between European mouse-centric databases and other databases with which they need to interact.
    • In the light of these results and in relation to requirements and emerging standards, identify the critical differences and deficiencies in the current representations of the same kinds of data across European mouse-centric databases that may limit data exchange and interoperability.
    • Similarly, identify the critical differences and deficiencies in current representations of different kinds of data that may limit exchange and interoperability between European mouse-centric databases.
    • Examine the sustainability and intrinsic flexibility of data representations under changing conditions of computation, experimental approaches and biological understanding.
    • Assess the problems that arise from the need to import legacy data from older databases or experiments following the introduction of, and ongoing changes to, data models.
    • From the above work, report on the critical problems arising from data representation in European mouse-centric databases that will need to be addressed to ensure data exchange, interoperability and long-term sustainability. Working closely with groups involved in establishing international standards, lay out options and suggest possible strategic and practical solutions.

