SOX genes (SRY-related HMG-box genes) encode a family of transcription factors that bind to the minor groove in DNA, and belong to a super-family of genes characterized by a homologous sequence called the HMG-box (for high mobility group). This HMG box is a DNA binding domain that is highly conserved throughout eukaryotic species. Homologues have been identified in insects, nematodes, amphibians, reptiles, birds and a range of mammals. However, HMG boxes can be very diverse in nature, with only a few amino acids being conserved between species.
Sox genes are defined as containing the HMG box of a gene involved in sex determination called SRY, which resides on the Y-chromosome. There are 20 SOX genes present in humans and mice, and 8 present in Drosophila. Almost all Sox genes show at least 50% amino acid similarity with the HMG box in Sry. The family is divided into subgroups according to homology within the HMG domain and other structural motifs, as well as according to functional assays.[1]
The developmentally important Sox family has no singular function, and many members possess the ability to regulate several different aspects of development. While many Sox genes are involved in sex determination, some are also important in processes such as neuronal development. For example, Sox2 and Sox3 are involved in the transition of epithelial granule cells in the cerebellum to their migratory state. Sox 2 is also a transcription factor in the maintenance of pluripotency in both Early Embryos and ES Cells.[2] Granule cells then differentiate to granule neurons, with Sox11 being involved in this process.[3] It is thought that some Sox genes may be useful in the early diagnosis of childhood brain tumours due to this sequential expression in the cerebellum, making them a target for significant research.
Sox proteins bind to the sequence WWCAAW and similar sequences (W=A or T). They have weak binding specificity and unusually low affinity for DNA. Sox genes are related to the Tcf/Lef1 group of genes which also contain a sequence-specific high mobility group and have a similar sequence specificity (roughly TWWCAAAG).[4]