Wood-rotting fungi play an important role in the global carbon cycle because they are the only known organisms that digest wood, the largest carbon stock in nature. In the present study, we used linear discriminant analysis and random forest (RF) machine learning algorithms to predict white- or brown-rot decay modes from the numbers of genes encoding Carbohydrate-Active enZymes with over 98% accuracy. Unlike other algorithms, RF identified specific genes involved in cellulose and lignin degradation, including auxiliary activities (AAs) family 9 lytic polysaccharide monooxygenases, glycoside hydrolase family 7 cellobiohydrolases, and AA family 2 peroxidases, as critical factors. This study sheds light on the complex interplay between genetic information and decay modes and underscores the potential of RF for comparative genomics studies of wood-rotting fungi. Wood-rotting fungi are categorized as either white- or brown-rot modes based on the coloration of decomposed wood. The process of classification can be influenced by human biases. The random forest machine learning algorithm effectively distinguishes between white- and brown-rot fungi based on the presence of Carbohydrate-Active enZyme genes. These findings not only aid in the classification of wood-rotting fungi but also facilitate the identification of the enzymes responsible for degrading woody biomass.
Natsuki Hasegawa, Masashi Sugiyama, Kiyohiko Igarashi. Random forest machine-learning algorithm classifies white- and brown-rot fungi according to the number of the genes encoding Carbohydrate-Active enZyme families. Applied and environmental microbiology. 2024 Jul 24;90(7):e0048224
PMID: 38832775
View Full Text