Investigators
Bo Han , M.S.
Brown Celeste, Ph.D.
Dunker Keith, Ph.D.
Garner Ethan
Iakoucheva, M. Lilia, Ph.D.
Lawson J. David, Ph.D.
Li Xiaohong, Ph.D.
O'Connor Tim
Obradovic Zoran, Ph.D.
Peng Kang, M.S.
Radivojac Predrag, M.Sc.
Romero Pedro, Ph.D.
Vucetic Slobodan, Ph.D.
Xie Hongbo, M.S.
Wang Junping, Ph.D.
Problem
Protein function is generally thought to follow from the prior formation of a specific three-dimensional structure. In contrast to this view, many proteins that require a lack of three-dimensional structure for function have been reported through the literature over the last 50 years. These "intrinsically disordered" proteins exist as structural ensembles, either at the secondary or tertiary structure level. In other words, disordered proteins or regions have atomic coordinates and Ramachandran angles that vary significantly over time. Both extended (i.e., random coil-like) regions - with perhaps some secondary structure - and collapsed (i.e., partially folded or molten globule-like) domains - with poorly packed secondary structure units - are included in this definition. The existence of proteins with intrinsic protein disorder calls for a re-assessment of the view that prior folding into 3-D structure is always required for protein function, a view sometimes called "the protein structure-function paradigm."
Results
In summary, our bioinformatics work provides strong evidence regarding
the importance of disordered promoted protein. Recently, Peter Wright,
who is Editor in Chief of the Journal of Molecular Biology, and
H. Jane Dyson emphasized importance of our results to the molecular
biology community at the first section of a survey on intrinsically
unstructured proteins (J. Mol. Biol. v. 293:321-331, 1999). Our
results suggest that there is need to critically re-assess the protein
structure-function paradigm taken for granted by most molecular
biologists.
Protein function lies not only as the basis for interpreting the
data from the human genome project, but also as one of the cornerstones
of molecular biology. Our work therefore has the potential for wide-spread
impact, not only in academia, but also all across the biotechnology
and pharmaceutical industries.
Summary
Towards the objective of understanding commonness, flavors, complexity and function of protein disorder, we assembled a database of known disordered protein sequence segments and used it for developing predictors of protein disorder from primary sequence information. The preliminary results were obtained by analyzing sequences from the Protein Data Bank (PDB). Swiss Protein (SwissProt) database and 34 complete or nearly complete genomes. In summary, these prior studies provide strong evidence that: (1) disorder is a very common element of protein structure; (2) the strength of disorder prediction is correlated with sequence complexity; and (3) eukaryotes evidently have a much larger fraction of proteins with intrinsic disorder than eubacteria or archaebacteria.
Prediction of disorder from sequence
Since amino acid sequence determines protein 3 D structure, we reasoned that, if disorder were crucial to function, then amino acid sequence would determine lack of 3D structure, or disorder, as well. To test the hypothesis that disorder is encoded by the sequence, we have assembled a dataset of ordered and disordered protein sequence segments and used it to develop several predictors of disorder. Observed prediction accuracies were in the 70-83% range [Romero, P., Obradovic, Z., Kissinger, C.R., Villafranca, J.E., and Dunker, A.K., Proc. Pacific Symposium on Biocomputing, Hawaii, 1998, vol. 3, pp. 435-446][Romero, P., Obradovic, Z., and Dunker, A.K., Artificial Intelligence Review, 2000, Vol. 14, No. 6, S2, pp. 447-484][Romero, P., Obradovic, Z., and Dunker, A.K., Proc. IEEE Int. Conf. on Neural Networks, Houston, TX, 1997, vol. 1, pp. 90-95][Garner, E., Cannon, P., Romero, P., Obradovic, Z., and Dunker, A.K., Proc. Genome Informatics 1998,Tokyo, Japan, pp. 201-213][Li, X., Romero, P., Rani, M., Dunker, A.K., and Obradovic, Z., Proc. Genome Informatics 10, Tokyo, Japan, 1999, pp. 30-40]. That far exceeded the 50% expected by chance, demonstrating that disorder is indeed very likely to be encoded by the sequence. Our most accurate predictor [Vucetic, S., Radivojac, P., Obradovic, Z., Brown, C.J., and Dunker, A.K., Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks, Washington D.C., 2001, vol. 4, pp. 2718-2723] with 82.6% overall accuracy (88.8% accuracy on ordered proteins, and 76.5% accuracy on disordered proteins) is an ensemble of neural networks. However, the difference in accuracy as compared to logistic regression classifiers is smaller than 1% [Vucetic, S., Radivojac, P., Obradovic, Z., Brown, C.J., and Dunker, A.K., Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks, Washington D.C., 2001, vol. 4, pp. 2718-2723]. Such relatively high accuracies strongly support the hypothesis that disorder is an element of native protein structure that is encoded by the amino acid sequence.
Understanding the relationship between protein sequence and disordered protein.
We have constructed more than 6,000 composition-based and 265 property-based
sequence attributes with respect to their ability to discriminate
protein order and disorder[Li,
X., Obradovic, Z., Brown, C.J., Garner, E.C., and Dunker, A.K.,
proc. Genome Informatics 11, Tokyo, Japan, 2000, pp. 172-184][Williams,
R.M., Obradovic, Z., Mathura, V., Braun, W., Garner, E.C., Young,
J., Takayama, S., Brown, C.J., and Dunker, A.K., 2000, Proc.
6th Pacific Symposium on Biocomputing,
Estimation of the commonness of protein disorder.
Proteins with long disordered regions (>40 amino acids) were
occasionally found in protein structures characterized by X-ray
diffraction [Romero,
P., Obradovic, Z., Kissinger, C.R., Villafranca, J.E., and Dunker,
A.K., . Proc. IEEE Int. Conf. on Neural Networks,
Evolution of disordered protein.
Differences in the amino-acid composition of ordered and disordered protein may result in or from evolutionary differences between these two types of protein. We find that both the quantity and quality of amino-acid replacements in disordered protein differs from ordered. We recently completed an evolutionary study of 28 protein families with ordered and disordered regions, and found that 20 of the families have disordered regions that evolve significantly more rapidly than their ordered regions, and 3 families have disordered regions that evolve more slowly [Brown, C.J., Takayama, S., Campen, A.M., Vise, P., Marshall, T., Oldfield, C.J., Williams, C.J., and Dunker, A.K., 2002]. Differences in amino-acid composition may also affect the types of amino acid replacements that accumulate in disordered protein. Matrices that furnish the probability for replacing a given amino acid by another are generally based on ordered protein sequences. We are developing scoring matrices using disordered protein families. We find that scoring matrices based on disordered protein are more successful in aligning homologous disordered protein sequences than the commonly used scoring matrices [Radivojac, P., Obradovic, Z., Brown, C.J., and Dunker, A.K., Proc. 7th Pacific Symposium on Biocomputing, Hawaii, 2002 pp. 589-600].
Function of confirmed disordered proteins.
We recently completed a survey of functions associated with disordered
protein from over 100 proteins. [Dunker,
A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M., and Obradovic,
Z., Biochemistry, 2002,
May 28th, vol. 41, issue 21, pp. 6573 - 6582] Disordered protein
was identified either by missing electron density in x-ray crystal
structure entries in PDB, or b
Disorder in cell-signaling and cancer.
Many disordered regions are involved in binding to DNA, RNA, or other proteins [Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M., and Obradovic, Z., Biochemistry, 2002, May 28th, vol. 41, issue 21, pp. 6573 - 6582] this observation resulted in the hypothesis that disorder plays an important role in the processes of molecular recognition, signaling and regulation. To test this hypothesis, we applied our predictor of disorder to a database of signaling proteins involved in the broadest cascade of macromolecular interactions. Cancer-associated proteins were also tested, since they are closely interrelated to the cell signaling machinery; many are transcription factors overexpressed as a result of activation during tumorogenesis. We found that there is significantly more predicted disorder in signaling and cancer-associated proteins than in several other categories of protein function, such as, metabolism, biosynthesis and degradation [Iakoucheva, L.M., Brown, C.J., Lawson, J.D., Obradovic, Z., and Dunker, A.K., Journal of Molecular Biology, 2002, vol. 323, pp. 573-584].

