|
Title: Improving Protein Disorder Prediction by Incorporating
Evolutionary Information and Optimizing Knowledge Representation
Talk by Kang Peng
Abstract: A dominant view in molecular biology is that protein
functions depend on the protein 3-D structure determined by amino
acid sequence. However, it turns out that lots of disordered proteins,
or proteins without unique 3-D structure, still carry out important
functions. Our previous work on predictions of disorder from sequence
information at about 70% position by position out of example accuracy
compared to the 50% expected by chance for the balanced datasets,
supported the hypothesis that amino acid sequence determines three-dimensional
structure as well as lack of three-dimensional structure. Recently,
the prediction accuracy has been boosted to 82.6% by using larger
dataset and better knowledge representation. Using the same method,
we were able to achieve accuracy of 83.6% on an even larger new
dataset in this study. We also propose 2 new methods designed to
incorporate evolutionary information. The first one directly uses
homologous sequence segments of the true disordered regions in training
disorder predictors. The second one builds disordered predictors
from family profiles built by PSI-BLAST. Both methods achieve improved
prediction accuracy of 85%
|