Correlation Engine 2.0
Clear Search sequence regions


Sizes of these terms reflect their relevance to your search.

Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving >10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https://scbb.ihbt.res.in/miWords/index.php. © The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Citation

Sagar Gupta, Ravi Shankar. miWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes. Briefings in bioinformatics. 2023 Mar 19;24(2)

Expand section icon Mesh Tags

Expand section icon Substances


PMID: 36907657

View Full Text