The Protein Ontology |
|
Visitor Count:
|
Here we discuss various elements of Protein Ontology that are used to integrate protein data formats and provide a structured and unified vocabulary to represent protein synthesis concepts. These elements provide integration of heterogeneous protein and biological data sources. They convert enormous amounts of data collected by geneticists and molecular biologists into information that scientists, physicians and other health care professionals and researchers can use to easily understand the mapping of relationships inside protein molecules, interaction between two protein molecules and interactions between protein and other macromolecules at cellular level. Contents
PO consists of concepts (or classes), which are data descriptors for proteomics data and the relationships among these concepts. PO has (1) a hierarchical classification of concepts represented as classes, from general to specific; (2) a list of attributes related to each concept, for each class; (3) a set of relationships between classes to link concepts in ontology in more complicated ways then implied by the hierarchy, to promote reuse of concepts in the ontology; and (4) a set of algebraic operators for querying protein ontology instances. In this section we will briefly discuss various concepts and relationships that make up the Protein Ontology. More elaboration on these Protein Ontology Elements is presented later. Protein Ontology Algebra and Query Language that forms the fourth element of PO will not be discussed here, as we purely want this to be documentation of PO concepts and relationships. Fore more information on algebraic operators used by PO go to PO Algebra page. 2. Generic Concepts of Protein Ontology There are seven concepts of PO, called
Generic Concepts that are used to define complex PO Concepts: 3. Derived Concepts of Protein Ontology PO provides description for protein data that can be used to describe proteins in any organism using derived concepts formed of the generic concepts: 3.1 Derived Concepts for Protein Entry Details PO describes Protein Complex Entry and the Molecules contained in Protein Complex are described using Entry Concept and its sub concepts of Description, Molecule and Reference. Molecule reuses the generic concepts of Chain to represent the linkage of molecules in protein complex to chain of residue sequences. 3.2 Derived Concepts for Protein Sequence and Structure Details Protein Sequence and Structure data are described using Structure concept in PO with sub concepts ATOMSequence and UnitCell. ATOMSequence represents protein sequence and structure and is made of generic concepts of Chain, Residue and Atom. Protein Crystallography Data is described using the UnitCell Concept. 3.3 Derived Concepts for Structural Folds and Domains in Proteins Protein Structural Folds and Domains are defined in PO using the derived concept of StructuralDomains. Family and Super Family of the organism in which protein is present are represented in StructuralDomains by reference to the generic concept of Family. Structural Folds in protein are represented by sub concepts of Helices, Sheets and Other Folds. Each definition of structural folds and domains also reuses the generic concepts of Chain and Residue for describing the Secondary Structure of Proteins. Helix, which is a sub concept of Helices, identifies a helix. Helix has a sub concept HelixStructure that gives the detailed composition of the helix. In this way PO distinguishes concepts for identification and structure of secondary structures in a protein. Other secondary structures of proteins like sheets and turns (or loops) are represented in the similar way. Sheets have a sub concept Sheet that identifies a sheet. Sheet has a sub concept Strands that describes detailed structure of a sheet. Similarly turns in protein structure are repented in PO using OtherFolds Concept. Turn is a sub concept of OtherFolds that identifies a turn and TurnStructure describes its structure. Turns in protein structure are categorized as OtherFolds in Protein Ontology as there are less frequent than Helices and Sheets in Protein Structure. 3.4 Derived Concepts for Functional Domains in Proteins PO has the first Functional Domain Classification Model for proteins defined using the derived concept of FunctionalDomains. Like StructuralDomains Family and Super Family of the organism in which protein is present are represented in FunctionalDomains by reference to the generic concept of Family. FunctionalDomains describes Cellular and Organism Source of Protein using SourceCell sub concept, Biological Functionality of Protein using BiologicalFunction sub concept, and describes Active Binding Sites in Protein using ActiveBindingSites sub concept. Active Binding Sites are represented in PO as a collection of various Site Groups, defined using SiteGroup generic concept. 3.5 Derived Concepts for Chemical Bonds in Proteins Various chemical bonds used to bind various substructures in a complex protein structure are defined using ChemicalBonds concept in PO. Chemical Bonds are defined by their respective sub concepts are: DisulphideBond, CISPeptide, HydrogenBond, ResidueLink, and SaltBridge. They are defined using generic concepts of Bind and Atomic Bind. Chemical Bonds that have Binding Residues (DisulphideBond, CISPeptide) reuse the generic concept of Bind. Similarly the Chemical Bonds that have Binding Atoms (HydrogenBond, ResidueLink, and SaltBridge) reuse the generic concept of AtomicBind. 3.6 Derived Concepts for Constraints affecting the Protein Structural Conformation Various constraints that affect final protein structural conformation are defined using the Constraints concept of PO. The constraints described in PO at the moment are: Monogenetic and Polygenetic defects present in genes that are present in molecules making proteins described using GeneticDefects sub concept, Hydrophobic properties of proteins described using Hydrophobicity sub concept, and Modification in Residue Sequences due to changes in chemical environment and mutations are described using in ModifiedResidue sub concept. 4. Relationships in Protein Ontology Semantics in protein data is normally not
interpreted by annotating systems, since they are not aware of the specific
structural, chemical and cellular interactions of protein complexes. Protein
Ontology Framework provides specific set of rules to cover these application
specific semantics. The rules use only the relationships whose semantics are
predefined in PO to establish correspondence among terms. The set of
relationships with predefined semantics is: 5. Protein Ontology as a Structured Hierarchy Protein Ontology consists of a hierarchical classification of concepts discussed above represented as classes, from general to specific. In PO the notions classification, reasoning, and consistency are applied by defining new concepts from defined generic concepts. The concepts derived from generic concepts are placed precisely into the class hierarchy of Protein Ontology to completely represent information defining a protein complex, as depicted in Figure 1.
Figure 1 Class Hierarchy of Protein Ontology The Root Concept in PO is ProteinOntology. For each instance of protein that is entered into PO, the submission information is entered for ProteinOntology Concept. ProteinOntology Concept is described as:
7. Protein Ontology Generic Concepts There are seven concepts of PO, called
Generic Concepts that are used to define complex PO Concepts: 7.1 Residues Concept Residue refers to a portion of a larger molecule; for example in biochemistry and molecular biology, a residue refers to a specific monomer of a polysaccharide, protein or nucleic acid. In context of Protein Ontology residue represents one of the 20 amino acids that build protein sequence represented by a three-alphabet code like: CYS. Residues Concept in PO is described as:
7.2 Chains Concept Chains in Protein Ontology describe the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied. In Protein Data Sources they are simply represented by sequence of residues like SEQRES in PDB format. Chains Concept in PO is described as:
7.3 Atoms Concept Atom records in protein databases present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Atoms Concept in PO is described as:
7.4 Family Concept A protein family is a group of evolutionarily related proteins. Superfamily is a group of protein families. More details about the classification of proteins into families and superfamilies can be found in SCOP. Family Concept in PO is described as:
7.5 Bind Concept Data about each of the both the binding residues in Chemical Bonds like Disulphide Bonds and CIS Peptides is entered into PO as Bind Concept. Records in Protein databases identify and describe each of the disulfide bonds, prolines and other peptides found to be in the cis conformation in protein and polypeptide structures by identifying the two residues involved in the bonds. Bind Concept in PO is described as:
Data about binding atoms in Chemical Bonds like Hydrogen Bond, Residue Links, and Salt Bridges is entered into PO as AtomicBind Concept. Records in Protein databases identify and describe each of the hydrogen bonds, residue links and salt bridges found in protein and polypeptide structures by identifying the two atoms involved in the bonds. AtomicBind Concept in PO is described as:
Identification of groups comprising important sites in the macromolecule is done using Site parameter, represented as string in protein databases. They specify residues comprising catalytic, cofactor, anticodon, regulatory, or other important sites. SiteGroup Concept in PO is described as:
A protein complex is a group of two or more associated proteins. Protein complexes are a form of quaternary structure. Understanding the sequence, structure and functional interactions of proteins is an important research focus in biochemistry, often referred to as proteomics. The Main Concept for definition of Protein Complexes in the Protein Ontology is ProteinComplex. ProteinComplex Concept defines one or more Proteins in the Complex Molecule, and is described as:
Six sub concepts of ProteinComplex: Entry, Structure, StructuralDomains, FunctionalDomains, ChemicalBonds, and Constraints provide complete understanding of the sequence, structure and functional interactions of proteins. These sub concepts define sequence, structure, function, and chemical bindings of the Protein Complex. 9. Protein Complex Sub Concepts PO provides description for protein entry, sequence, structure, function, chemical bindings and constraints using Entry, Structure, StructuralDomains, FunctionalDomains, ChemicalBonds, and Constraints Concepts that are derived using generic concepts discussed earlier. 9.1 Entry Concept Entry describes the experiment and the biological macromolecules present in the protein. Entry Concept is described in PO as:
Protein Complex Entry and the Molecules contained in Protein Complex are described using sub concepts of Entry: Description, Molecule and Reference. 9.1.1 Description Concept Data and Information about protein entry and the experiments that identify protein is described as follows:
9.1.2 Molecule Concept Molecule Concept specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied. Molecule Concept is described as:
9.1.3 Reference Concept Reference Concept describes the Literature references for protein entry as follows:
|