Correlation Engine 2.0
Clear Search sequence regions


Sizes of these terms reflect their relevance to your search.

Parallelization in Python integrates Message Passing Interface via the mpi4py module. Since mpi4py does not support parallelization of objects greater than 231 bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and High Performance Computer systems. We use BigMPI4py to speed-up the search for germ line specific de novo DNA methylated/unmethylated motifs from the 59 whole genome bisulfite sequencing DNA methylation samples from 27 human tissues of the ENCODE project. We developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores allowing us an efficient identification of a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues and finally, the discovery of germ layer specific DNA demethylation motifs. Our results point out that DNA methylation signal provide a higher degree of information for the demethylated state than for the methylated state. BigMPI4py is available at https://https://www.arauzolab.org/tools/bigmpi4py and https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis.

Citation

Alex M Ascension, Marcos J Arauzo-Bravo. BigMPI4py: Python Module for Parallelization of Big Data Objects Discloses Germ Layer Specific DNA Demethylation Motifs. IEEE/ACM transactions on computational biology and bioinformatics. 2022 May-Jun;19(3):1507-1522

Expand section icon Mesh Tags

Expand section icon Substances


PMID: 33301409

View Full Text