Protein function prediction

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.[1]

Generally, function can be thought of as, "anything that happens to or through a protein".[1] The Gene Ontology Consortium provides a useful classification of functions, based on a dictionary of well-defined terms divided into three main categories of molecular function, biological process and cellular component.[2] Researchers can query this database with a protein name or accession number to retrieve associated Gene Ontology (GO) terms or annotations based on computational or experimental evidence.

While techniques such as microarray analysis, RNA interference, and the yeast two-hybrid system can be used to experimentally demonstrate the function of a protein, advances in sequencing technologies have made the rate at which proteins can be experimentally characterized much slower than the rate at which new sequences become available.[3] Thus, the annotation of new sequences is mostly by prediction through computational methods, as these types of annotation can often be done quickly and for many genes or proteins at once. The first such methods inferred function based on homologous proteins with known functions (homology-based function prediction). The development of context-based and structure based methods have expanded what information can be predicted, and a combination of methods can now be used to get a picture of complete cellular pathways based on sequence data.[3] The importance and prevalence of computational prediction of gene function is underlined by an analysis of 'evidence codes' used by the GO database: as of 2010, 98% of annotations were listed under the code IEA (inferred from electronic annotation) while only 0.6% were based on experimental evidence.[4]

  1. ^ a b Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (December 2003). "Automatic prediction of protein function". Cellular and Molecular Life Sciences. 60 (12): 2637–50. doi:10.1007/s00018-003-3114-8. PMC 11138487. PMID 14685688. S2CID 8800506.
  2. ^ Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (May 2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium". Nature Genetics. 25 (1): 25–9. doi:10.1038/75556. PMC 3037419. PMID 10802651.
  3. ^ a b Gabaldón T, Huynen MA (April 2004). "Prediction of protein function and pathways in the genome era". Cellular and Molecular Life Sciences. 61 (7–8): 930–44. doi:10.1007/s00018-003-3387-y. PMC 11138568. PMID 15095013. S2CID 18032660.
  4. ^ du Plessis L, Skunca N, Dessimoz C (November 2011). "The what, where, how and why of gene ontology--a primer for bioinformaticians". Briefings in Bioinformatics. 12 (6): 723–35. doi:10.1093/bib/bbr002. PMC 3220872. PMID 21330331.