Correlation Engine 2.0
Clear Search sequence regions


Sizes of these terms reflect their relevance to your search.

Experimental annotations of intrinsic disorder are available for 0.1% of 147 000 000 of currently sequenced proteins. Over 60 sequence-based disorder predictors were developed to help bridge this gap. Current benchmarks of these methods assess predictive performance on datasets of proteins; however, predictions are often interpreted for individual proteins. We demonstrate that the protein-level predictive performance varies substantially from the dataset-level benchmarks. Thus, we perform first-of-its-kind protein-level assessment for 13 popular disorder predictors using 6200 disorder-annotated proteins. We show that the protein-level distributions are substantially skewed toward high predictive quality while having long tails of poor predictions. Consequently, between 57% and 75% proteins secure higher predictive performance than the currently used dataset-level assessment suggests, but as many as 30% of proteins that are located in the long tails suffer low predictive performance. These proteins typically have relatively high amounts of disorder, in contrast to the mostly structured proteins that are predicted accurately by all 13 methods. Interestingly, each predictor provides the most accurate results for some number of proteins, while the best-performing at the dataset-level method is in fact the best for only about 30% of proteins. Moreover, the majority of proteins are predicted more accurately than the dataset-level performance of the most accurate tool by at least four disorder predictors. While these results suggests that disorder predictors outperform their current benchmark performance for the majority of proteins and that they complement each other, novel tools that accurately identify the hard-to-predict proteins and that make accurate predictions for these proteins are needed. © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Citation

Akila Katuwawala, Christopher J Oldfield, Lukasz Kurgan. Accuracy of protein-level disorder predictions. Briefings in bioinformatics. 2020 Sep 25;21(5):1509-1522

Expand section icon Mesh Tags

Expand section icon Substances


PMID: 31616935

View Full Text