The Protein Ontology

Home
About PO
PO Hierarchy
PO People
Downloads
PO Instances
Documentation
Contact PO

Visitor Count:

screen resolution stats

 


Protein Ontology Project Guide    (PDF Version)

Here we discuss various elements of Protein Ontology that are used to integrate protein data formats and provide a structured and unified vocabulary to represent protein synthesis concepts. These elements provide integration of heterogeneous protein and biological data sources. They convert enormous amounts of data collected by geneticists and molecular biologists into information that scientists, physicians and other health care professionals and researchers can use to easily understand the mapping of relationships inside protein molecules, interaction between two protein molecules and interactions between protein and other macromolecules at cellular level.

Contents

  1. Protein Ontology Elements

  2. Generic Concepts of Protein Ontology

  3. Derived Concepts of Protein Ontology
    3.1 Derived Concepts for Protein Entry Details
    3.2 Derived Concepts for Protein Sequence and Structure Details
    3.3 Derived Concepts for Structural Folds and Domains in Proteins
    3.4 Derived Concepts for Functional Domains in Proteins
    3.5 Derived Concepts for Chemical Bonds in Proteins
    3.6 Derived Concepts for Constraints affecting the Protein Structural Conformation

  4. Relationships in Protein Ontology

  5. Protein Ontology as a Structured Hierarchy

  6. Protein Ontology Concept

  7. Protein Ontology Generic Concepts
    7.1 Residues Concept
    7.2 Chains Concept
    7.3 Atoms Concept
    7.4 Family Concept
    7.5 Bind Concept
    7.6 AtomicBind Concept
    7.7 SiteGroup Concept

  8. Protein Complex Concept

  9. Protein Complex Sub Concepts
    9.1 Entry Concept
          
    9.1.1 Description Concept
         
    9.1.2 Molecule Concept
         
    9.1.3 Reference Concept

    9.2 Structure Concept
          
    9.2.1 ATOMSequence Concept
         9.2.2 UnitCell Concept

    9.3 StructuralDomains Concept
         
    9.3.1 Helices Concept
         9.3.2 Sheets Concept
         9.3.3 OtherFolds Concept
    9.4 FunctionalDomains Concept
         9.4.1 SourceCell Concept
        
    9.4.2 BiologicalFunction Concept
        
    9.4.3 ActiveBindingSites Concept

    9.5 ChemicalBonds Concept
         
    9.5.1 DisulphideBond Concept
         9.5.2 CISPeptide Concept
         9.5.3 HydrogenBond Concept

         9.5.4 ResidueLink Concept
        
    9.5.5 SaltBridge Concept

    9.6 Constraints Concept
         
    9.6.1 GeneticDefects Concept
         9.6.2 Hydrophobicity Concept
        
    9.6.3 ModifiedResidue Concept

  10. Protein Ontology Relationships
    10.1 SubClassOf Relationship
    10.2 PropertyOf Relationship
    10.3 PartOf Relationship
    10.4 InstanceOf Relationship
    10.5 ValueOf Relationship
    10.6 Sequence Relationships
           10.6.1 Sequences defining PO Structure Concepts
           10.6.2 Sequences defining PO StructuralDomains Concepts
           10.6.3 Sequences defining PO ChemicalBonds Concepts

1. Protein Ontology Elements

PO consists of concepts (or classes), which are data descriptors for proteomics data and the relationships among these concepts. PO has (1) a hierarchical classification of concepts represented as classes, from general to specific; (2) a list of attributes related to each concept, for each class; (3) a set of relationships between classes to link concepts in ontology in more complicated ways then implied by the hierarchy, to promote reuse of concepts in the ontology; and (4) a set of algebraic operators for querying protein ontology instances. In this section we will briefly discuss various concepts and relationships that make up the Protein Ontology. More elaboration on these Protein Ontology Elements is presented later. Protein Ontology Algebra and Query Language that forms the fourth element of PO will not be discussed here, as we purely want this to be documentation of PO concepts and relationships. Fore more information on algebraic operators used by PO go to PO Algebra page.

2. Generic Concepts of Protein Ontology

There are seven concepts of PO, called Generic Concepts that are used to define complex PO Concepts: {Residues, Chains, Atoms, Family, AtomicBind, Bind, and SiteGroup}. These generic concepts are reused in defining complex PO concepts. We now briefly describe these generic concepts.
    Details and Properties of Residues in a Protein Sequence are defined by instances of Residues Concept. Instances of Chains of Residues are defined in Chains Concept. All the Three Dimensional Structure Data of Protein Atoms is represented as instances of Atoms Concept. Defining Chains, Residues and Atoms as individual concepts has the benefit that any special properties or changes affecting a particular chain, residue and atom can be easily added. Family Concept represents Protein Super Family and Family Details of Proteins. Data about binding atoms in Chemical Bonds like Hydrogen Bond, Residue Links, and Salt Bridges is entered into ontology as an instance of AtomicBind Concept.  Similarly the data about binding residues in Chemical Bonds like Disulphide Bonds and CIS Peptides is entered into ontology as an instance of Bind Concept. When defining the generic concepts of AtomicBind and Bind in PO we again reuse the generic concepts of Chain, Residue, and Atom. All data related to site groups of the active binding sites of Proteins is defined as instances of SiteGroup Concept. In PO the notions classification, reasoning, and consistency are applied by defining new concepts from the defined generic concepts. The concepts derived from generic concepts are placed precisely into a class hierarchy of the Protein Ontology to completely represent information defining a protein complex.

3. Derived Concepts of Protein Ontology

PO provides description for protein data that can be used to describe proteins in any organism using derived concepts formed of the generic concepts:

3.1 Derived Concepts for Protein Entry Details

PO describes Protein Complex Entry and the Molecules contained in Protein Complex are described using Entry Concept and its sub concepts of Description, Molecule and Reference. Molecule reuses the generic concepts of Chain to represent the linkage of molecules in protein complex to chain of residue sequences.

3.2 Derived Concepts for Protein Sequence and Structure Details

Protein Sequence and Structure data are described using Structure concept in PO with sub concepts ATOMSequence and UnitCell. ATOMSequence represents protein sequence and structure and is made of generic concepts of Chain, Residue and Atom. Protein Crystallography Data is described using the UnitCell Concept.

3.3 Derived Concepts for Structural Folds and Domains in Proteins

Protein Structural Folds and Domains are defined in PO using the derived concept of StructuralDomains. Family and Super Family of the organism in which protein is present are represented in StructuralDomains by reference to the generic concept of Family. Structural Folds in protein are represented by sub concepts of Helices, Sheets and Other Folds. Each definition of structural folds and domains also reuses the generic concepts of Chain and Residue for describing the Secondary Structure of Proteins. Helix, which is a sub concept of Helices, identifies a helix. Helix has a sub concept HelixStructure that gives the detailed composition of the helix. In this way PO distinguishes concepts for identification and structure of secondary structures in a protein. Other secondary structures of proteins like sheets and turns (or loops) are represented in the similar way. Sheets have a sub concept Sheet that identifies a sheet. Sheet has a sub concept Strands that describes detailed structure of a sheet. Similarly turns in protein structure are repented in PO using OtherFolds Concept. Turn is a sub concept of OtherFolds that identifies a turn and TurnStructure describes its structure. Turns in protein structure are categorized as OtherFolds in Protein Ontology as there are less frequent than Helices and Sheets in Protein Structure.

3.4 Derived Concepts for Functional Domains in Proteins

PO has the first Functional Domain Classification Model for proteins defined using the derived concept of FunctionalDomains. Like StructuralDomains Family and Super Family of the organism in which protein is present are represented in FunctionalDomains by reference to the generic concept of Family. FunctionalDomains describes Cellular and Organism Source of Protein using SourceCell sub concept, Biological Functionality of Protein using BiologicalFunction sub concept, and describes Active Binding Sites in Protein using ActiveBindingSites sub concept. Active Binding Sites are represented in PO as a collection of various Site Groups, defined using SiteGroup generic concept.

3.5 Derived Concepts for Chemical Bonds in Proteins

Various chemical bonds used to bind various substructures in a complex protein structure are defined using ChemicalBonds concept in PO. Chemical Bonds are defined by their respective sub concepts are: DisulphideBond, CISPeptide, HydrogenBond, ResidueLink, and SaltBridge. They are defined using generic concepts of Bind and Atomic Bind. Chemical Bonds that have Binding Residues (DisulphideBond, CISPeptide) reuse the generic concept of Bind. Similarly the Chemical Bonds that have Binding Atoms (HydrogenBond, ResidueLink, and SaltBridge) reuse the generic concept of AtomicBind.

3.6 Derived Concepts for Constraints affecting the Protein Structural Conformation

Various constraints that affect final protein structural conformation are defined using the Constraints concept of PO. The constraints described in PO at the moment are: Monogenetic and Polygenetic defects present in genes that are present in molecules making proteins described using GeneticDefects sub concept, Hydrophobic properties of proteins described using Hydrophobicity sub concept, and Modification in Residue Sequences due to changes in chemical environment and mutations are described using in ModifiedResidue sub concept.

4. Relationships in Protein Ontology

Semantics in protein data is normally not interpreted by annotating systems, since they are not aware of the specific structural, chemical and cellular interactions of protein complexes. Protein Ontology Framework provides specific set of rules to cover these application specific semantics. The rules use only the relationships whose semantics are predefined in PO to establish correspondence among terms. The set of relationships with predefined semantics is: {SubClassOf, PartOf, AttributeOf, InstanceOf, and ValueOf}. The PO conceptual modelling encourages the use of strictly typed relations with precisely defined semantics. Some of these relationships (like SubClassOf, InstanceOf) are somewhat similar to those in RDF Schema but the set of relationships that have defined semantics in our conceptual PO model is small to maintain simplicity of the model.
    The following is a brief description of the set of pre-defined semantic relationships in our common PO conceptual model. SubClassOf relationship is used to indicate that one concept is a specialization of another concept. AttributeOf relationship indicates that a concept is an attribute of another concept. PartOf relationship indicates that a concept is a part of another concept. InstanceOf relationship indicates that an object is an instance of the concept. ValueOf relationship is used to indicate the value of an attribute of an object. By themselves the relationships described above, do not impose order among the children of the node. We defined a special relationship called Sequence(s) in PO to describe and impose order in complex concepts defining Structure, Structural Folds and Domains and Chemical Bonds of Proteins.

5. Protein Ontology as a Structured Hierarchy

Protein Ontology consists of a hierarchical classification of concepts discussed above represented as classes, from general to specific. In PO the notions classification, reasoning, and consistency are applied by defining new concepts from defined generic concepts. The concepts derived from generic concepts are placed precisely into the class hierarchy of Protein Ontology to completely represent information defining a protein complex, as depicted in Figure 1. 

Figure 1 Class Hierarchy of Protein Ontology

6. Protein Ontology Concept

The Root Concept in PO is ProteinOntology. For each instance of protein that is entered into PO, the submission information is entered for ProteinOntology Concept. ProteinOntology Concept is described as:

Property Type

Property Name

Property Description

Data Type

ProteinOntologyID

Protein Ontology Identifier (like PO000000000n, where n is a integer)

Data Type

ProteinOntologyDescription

Protein Ontology Details

Annotation Type

ProteinOntologyAnnotation

Provides Biological Description

7. Protein Ontology Generic Concepts

There are seven concepts of PO, called Generic Concepts that are used to define complex PO Concepts: {Residues, Chains, Atoms, Family, Bind, AtomicBind, and SiteGroup}. These generic concepts are reused in defining complex PO concepts:

7.1 Residues Concept

Residue refers to a portion of a larger molecule; for example in biochemistry and molecular biology, a residue refers to a specific monomer of a polysaccharide, protein or nucleic acid. In context of Protein Ontology residue represents one of the 20 amino acids that build protein sequence represented by a three-alphabet code like: CYS. Residues Concept in PO is described as:

Property Type

Property Name

Property Description

Data Type

Residue

Three-Alphabet Residue Identifier

Data Type

ResidueName

Residue Name

Data Type

ResidueProperty

Property of the Residue

Annotation Type

ResiduesAnnotation

Provides Biological Description

7.2 Chains Concept

Chains in Protein Ontology describe the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied. In Protein Data Sources they are simply represented by sequence of residues like SEQRES in PDB format. Chains Concept in PO is described as: 

Property Type

Property Name

Property Description

Data Type

Chain

One-Alphabet Chain Identifier

Data Type

ChainName

Chain Name

Data Type

ChainProperty

Property of the Chain

Annotation Type

ChainsAnnotation

Provides Biological Description

7.3 Atoms Concept

Atom records in protein databases present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Atoms Concept in PO is described as:

Property Type

Property Name

Property Description

Data Type

Atom

Atom Serial Number

Data Type

AtomName

Name of the Atom

Reference Type

AtomChainRef

Reference to concept of Chains

Reference Type

AtomResidueRef

Reference to concept of Residues

Data Type

AtomResSeqNum

Sequence number of residue to which atom is attached

Data Type

X

X Coordinates of the Atom

Data Type

Y

Y Coordinates of the Atom

Data Type

Z

Z Coordinates of the Atom

Data Type

Charge

Value of electric charge on the atom

Data Type

Element

Element symbol for the atom

Data Type

Occupancy

Orthogonal occupancy of the atom in crystal structure of the molecule.

Data Type

SegmentIdentifier

Specific segments of the molecule where atom is presented

Data Type

TemperatureFactor

Temperature factor for the atom

Annotation Type

AtomsAnnotation

Provides Biological Description

7.4 Family Concept

A protein family is a group of evolutionarily related proteins. Superfamily is a group of protein families. More details about the classification of proteins into families and superfamilies can be found in SCOP. Family Concept in PO is described as:

Property Type

Property Name

Property Description

Data Type

ProteinFamily

Protein Family

Data Type

ProteinSuperFamily

Protein Superfamily

Annotation Type

FamilyAnnotation

Provides Biological Description

7.5 Bind Concept

Data about each of the both the binding residues in Chemical Bonds like Disulphide Bonds and CIS Peptides is entered into PO as Bind Concept. Records in Protein databases identify and describe each of the disulfide bonds, prolines and other peptides found to be in the cis conformation in protein and polypeptide structures by identifying the two residues involved in the bonds. Bind Concept in PO is described as:

Property Type

Property Name

Property Description

Reference Type

BindChainRef

Reference to concept of Chains

Reference Type

BindResidueRef

Reference to concept of Residues

Data Type

BindResSeqNum

Sequence Number of Binding Residue

Data Type

BindSymmetry

Symmetry Operator

Annotation Type

BindAnnotation

Provides Biological Description

7.6 AtomicBind Concept

Data about binding atoms in Chemical Bonds like Hydrogen Bond, Residue Links, and Salt Bridges is entered into PO as AtomicBind Concept. Records in Protein databases identify and describe each of the hydrogen bonds, residue links and salt bridges found in protein and polypeptide structures by identifying the two atoms involved in the bonds. AtomicBind Concept in PO is described as:

Property Type

Property Name

Property Description

Reference Type

AtomicBindAtomRef

Reference to concept of Atoms

Reference Type

AtomicBindChainRef

Reference to concept of Chains

Reference Type

AtomicBindResidueRef

Reference to concept of Residues

Data Type

AtomicBindResSeqNum

Sequence Number of Binding Residue

Data Type

AtomicBindSymmetry

Symmetry Operator

Annotation Type

AtomicBindAnnotation

Provides Biological Description

7.7 SiteGroup Concept

Identification of groups comprising important sites in the macromolecule is done using Site parameter, represented as string in protein databases. They specify residues comprising catalytic, cofactor, anticodon, regulatory, or other important sites. SiteGroup Concept in PO is described as:

Property Type

Property Name

Property Description

Reference Type

SiteGroupChainRef

Reference to concept of Chains

Reference Type

SiteGroupResidueRef

Reference to concept of Residues

Data Type

SiteGroupResSeqNum

Sequence Number of the Residue

Annotation Type

SiteGroupAnnotation

Provides Biological Description

8. Protein Complex Concept

A protein complex is a group of two or more associated proteins. Protein complexes are a form of quaternary structure. Understanding the sequence, structure and functional interactions of proteins is an important research focus in biochemistry, often referred to as proteomics. The Main Concept for definition of Protein Complexes in the Protein Ontology is ProteinComplex ProteinComplex Concept defines one or more Proteins in the Complex Molecule, and is described as:

Property Type

Property Name

Property Description

Annotation Type

ProteinComplexAnnotation

Provides Biological Description

Six sub concepts of ProteinComplex: Entry, Structure, StructuralDomains, FunctionalDomains, ChemicalBonds, and Constraints provide complete understanding of the sequence, structure and functional interactions of proteins. These sub concepts define sequence, structure, function, and chemical bindings of the Protein Complex.

9. Protein Complex Sub Concepts

PO provides description for protein entry, sequence, structure, function, chemical bindings and constraints using Entry, Structure, StructuralDomains, FunctionalDomains, ChemicalBonds, and Constraints Concepts that are derived using generic concepts discussed earlier.

9.1 Entry Concept

Entry describes the experiment and the biological macromolecules present in the protein. Entry Concept is described in PO as:

Property Type

Property Name

Property Description

Data Type

SourceDatabaseID

Original Protein Data Source ID

Data Type

SourceDatabaseName

Original Protein Data Source

Data Type

SourceSubmissionDate

Original Protein Data Source Submission Date

Data Type

Classification

Protein Classification Type

Object Type

EntryFamily

Protein Family Details of the Entry

Annotation Type

EntryAnnotation

Provides Biological Description

Protein Complex Entry and the Molecules contained in Protein Complex are described using sub concepts of Entry: Description, Molecule and Reference.

9.1.1 Description Concept

Data and Information about protein entry and the experiments that identify protein is described as follows:

Property Type

Property Name

Property Description

Data Type

Title

Title for the experiment or analysis that is represented in the Entry.

Data Type

Authors

Names of the people responsible for the contents of the entry.

Data Type

Experiment

Describes information about the experiment and identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modelling technique.

Data Type

Keywords

Contains a set of terms relevant to the entry. Keywords provide a simple means of categorizing entries and may be used to generate index files. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.

9.1.2 Molecule Concept

Molecule Concept specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied. Molecule Concept is described as: 

Property Type

Property Name

Property Description

Data Type

MoleculeID

Numbers each molecule component

Data Type

MoleculeName

Name of the Molecule

Data Type

Synonyms

List of Synonyms for the Molecule

Reference Type

MoleculeChainRef

Reference to concept of Chains

Data Type

BiologicalUnit

Larger Biological Unit of which molecule is the part

Data Type

Fragment

Specifies a domain or region of the Molecule

Data Type

Mutations

Describes Mutations in Molecule

Data Type

Engineered

Recombinant Technology or Chemical Synthesis

Data Type

OtherDetails

Other Details of the Molecule

9.1.3 Reference Concept

Reference Concept describes the Literature references for protein entry as follows: 

Property Type

Property Name

Property Description

Data Type

CitationTitle

Title of the Citation

Data Type

CitationAuthors

Authors of the Citation

Data Type

CitationEditors

Editors of the Citation

Data Type