GeneMark

GeneMark
Original author(s)Bioinformatics group of Mark Borodovsky
Developer(s)Georgia Institute of Technology
Initial release1993
Operating systemLinux, Windows, and Mac OS
LicenseFree binary-only for academic, non-profit or U.S. Government use
Websiteopal.biology.gatech.edu/GeneMark

GeneMark is a generic name for a family of ab initio gene prediction algorithms and software programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction tool for annotation of the first completely sequenced bacterial genome of Haemophilus influenzae, and in 1996 for the first archaeal genome of Methanococcus jannaschii. The algorithm introduced inhomogeneous three-periodic Markov chain models of protein-coding DNA sequence that became standard in gene prediction as well as Bayesian approach to gene prediction in two DNA strands simultaneously. Species specific parameters of the models were estimated from training sets of sequences of known type (protein-coding and non-coding). The major step of the algorithm computes for a given DNA fragment posterior probabilities of either being "protein-coding" (carrying genetic code) in each of six possible reading frames (including three frames in the complementary DNA strand) or being "non-coding". The original GeneMark (developed before the advent of the HMM applications in Bioinformatics) was an HMM-like algorithm; it could be viewed as approximation to known in the HMM theory posterior decoding algorithm for appropriately defined HMM model of DNA sequence.