Correlation Engine 2.0
Clear Search sequence regions


  • genomes (6)
  • nucleotides (4)
  • open (13)
  • orfs (5)
  • ribosomes (4)
  • rna (2)
  • wheat (2)
  • Sizes of these terms reflect their relevance to your search.

    Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package 'OrfPP', which recovers ~83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes. © The Author(s) 2022. Published by Oxford University Press.

    Citation

    Mengyun Jiang, Weidong Ning, Shishi Wu, Xingwei Wang, Kun Zhu, Aomei Li, Yongyao Li, Shifeng Cheng, Bo Song. Three-nucleotide periodicity of nucleotide diversity in a population enables the identification of open reading frames. Briefings in bioinformatics. 2022 Jul 18;23(4)

    Expand section icon Mesh Tags

    Expand section icon Substances


    PMID: 35698834

    View Full Text