Cath Database
Cath Database
Cath Database
CATH DATABASE
CATH
CLASS: Secondary structure packing within the protein structure. Alpha-helices, BetaSheets and AlphaBeta. Includes both alpha/beta and alpha+beta.
Architecture
Distinguishes structures within the same class, but different architectures. Groupings can sometimes be rather broad as they describe general features of protein-fold shape. Ex: Tim Barrel, the number of layers in an sandwich(Orengo C.A et al., 1997)
Topology
Arrangement and connectivity of secondary structure elements are same in number. Within the topology level, structures are same but may differ in function. Ex. Globin or immunoglobin fold.
Homology
Structures are grouped by their high structural similarity and similar functions. They may have evolved from a common ancestor. Non-bundle globin-like foldsthe erythrocruorins, colicins, phycocyanins and domain 1 of diptheria toxin all have the same CAT number (1.10.340), but are differentiated by their H numbers 10, 20, 30 and 40, respectively
Sequence family
Have sequence identities >35%
Presumed to have extremely similar structures and functions they may be slightly different examples of the same protein from different species belonging to the same sequence superfamily. SOLID.
Any query structure unmatched is scanned against a library of representative structures from each close sequence family in CATH
Database of validated multiple structural alignment annotated with consensus functional information for evolutionary protein families.
A powerful resource to validate, examine and visualize key structural and functional features of each homologous superfamily.
Also provides a tool for examining sequence-structure relationships for proteins within each fold group
Comparisons provide a complete data set for analyzing analogues , homologous and checking for incorrect classifications
DHS-VALID program is used to check automatically all the pairwise sequence and structure comparison data generated for each fold group and homologues superfamily in CATH.
Conserved Residue Attributes Uses the pairwise structural comparison data from SSAP to determine the initial set of proteins to be aligned Identifies conserved characteristics and expresses as a 3D structural profile Profiles encapsulate the core
It is focused on providing structural annotation for protein sequences without structural representatives The protein sequences have also been clustered into whole chain families so as to aid functional prediction.
The structural annotation is generated using HMM models based on the CATH domain families
Applications:
CATH database was used as a guide to select proteins from a wide variety of protein families (Jonathan G. Lees et al.,2006)
The organization of proteins by global structural similarity helps improve prediction algorithms based on fold recognition
Allow the distribution of common motifs to be explored more easily
Gives insights into which combinations of motifs generate stable protein architectures
Allows newly determined structures to be easily examined for recognizable folds (CA Orengo et al.,1997)
G
1
E
N E 3 D
4
1. Boundary assignment by inheriting from other chain 2. Predicts Hypothetical proteins 3. Database of validated multiple structural alignments 4. Scores used for identifying matches
S S A P
REFERENCES
CA Orengo et al.,1997 CATH a hierarchic classification of protein domain structures J.E.Bray et al.,2000 The CATH Dictionary of Homologous Superfamilies(DHS): a consensus approach for identifying distant structural homologues CA Orengo et al.,1999 The CATH Database provides insights into protein structure/function relationships Lesley H. Greene et al.,2007 The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution Frances Pearl et al.,2005 The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis Daniel W.A. Buchan et al., 2002 Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database Corin Yeats et al.,2006 Gene3D: modelling protein structure, function and evolution