Correlation Engine 2.0
Clear Search sequence regions


  • activity (5)
  • forest (4)
  • ic50 (4)
  • ligand (1)
  • protein family (1)
  • qsar (4)
  • random (6)
  • screen (2)
  • Sizes of these terms reflect their relevance to your search.

    While conventional random forest regression (RFR) virtual screening models appear to have excellent accuracy on random held-out test sets, they prove lacking in actual practice. Analysis of 18 historical virtual screens showed that random test sets are far more similar to their training sets than are the compounds project teams actually order. A new, cluster-based "realistic" training/test set split, which mirrors the chemical novelty of real-life virtual screens, recapitulates the poor predictive power of RFR models in real projects. The original Profile-QSAR (pQSAR) method greatly broadened the domain of applicability over conventional models by using as independent variables a profile of activity predictions from all historical assays in a large protein family. However, the accuracy still fell short of experiment on realistic test sets. The improved "pQSAR 2.0" method replaces probabilities of activity from naïve Bayes categorical models at several thresholds with predicted IC50s from RFR models. Unexpectedly, the high accuracy also requires removing the RFR model for the actual assay of interest from the independent variable profile. With these improvements, pQSAR 2.0 activity predictions are now statistically comparable to medium-throughput four-concentration IC50 measurements even on the realistic test set. Beyond the yes/no activity predictions from a typical high-throughput screen (HTS) or conventional virtual screen, these semiquantitative IC50 predictions allow for predicted potency, ligand efficiency, lipophilic efficiency, and selectivity against antitargets, greatly facilitating hitlist triaging and enabling virtual screening panels such as toxicity panels and overall promiscuity predictions.

    Citation

    Eric J Martin, Valery R Polyakov, Li Tian, Rolando C Perez. Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds. Journal of chemical information and modeling. 2017 Aug 28;57(8):2077-2088

    Expand section icon Mesh Tags

    Expand section icon Substances


    PMID: 28651433

    View Full Text