Volume 17, Issue 4 (11-2025)                   J Crop Breed 2025, 17(4): 32-43 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Nazari L, Zinati Z. (2025). Comparison of Machine-Learning Algorithms for Identifying Key Genes Involved in Sorghum's Response to Temperature Stress. J Crop Breed. 17(4), 32-43. doi:10.61882/jcb.2025.1611
URL: http://jcb.sanru.ac.ir/article-1-1611-en.html
1- Department of Crop and Horticultural Science Research, Fars Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Shiraz, Iran
2- Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Shiraz, Iran.
Abstract:   (787 Views)

Extended Abstract
Background: Sorghum, a C4 plant, is relatively tolerant to various abiotic stresses. However, its performance is significantly affected by temperatures above 32 °C and below 15 °C. Identifying key genes through gene expression data and feature selection methods is a valuable approach to understanding stress tolerance. Feature selection filters out the most relevant genes using statistical and computational algorithms. Filter-based methods, which are independent of machine-learning algorithms, offer a fast and efficient way to identify relevant features. Combining multiple filter methods allows for a more precise and robust selection of key genes involved in sorghum’s response to temperature stress. Therefore, this study aimed to identify key genes involved in cold and heat stress response in sorghum using transcriptomic data and
filter-based methods, including Information Gain, Gain Ratio, and Relief.
Methods: In this study, gene expression count data were extracted from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) with accession number GSE225632. The analysis focused on sorghum shoot data under control conditions and temperature stress at four different times of day. The data were divided into two classes, control and stress (cold and heat), and differentially expressed genes were identified using the DESeq2 package in R. Subsequently, top genes were selected using three feature selection algorithms (Information Gain, Gain Ratio, and Relief), and the Venn diagram was used to examine the overlap of genes identified by the algorithms. Two machine-learning algorithms, Bayes Net and Random Forest, were employed for validation. These algorithms were run in WEKA 3.7, and their performance was compared in classifying samples based on the identified features. The classification algorithms were evaluated and compared using metrics, including True Positive Rate (TP Rate), False Positive Rate (FP Rate), Precision, Recall, F1 score, Matthews Correlation Coefficient (MCC), Area Under the ROC Curve (ROC AUC), and Area Under the PRC Curve (PRC AUC). A confusion matrix was used to display classification errors.
Results: In this study, gene expression changes were first analyzed by comparing control conditions with temperature stress (cold and heat). Among 34,129 genes examined, 2,136 genes with significant expression changes at a 0.05 level and a log2FoldChange greater than 1 were selected and used in subsequent feature selection and machine-learning analyses. Key genes responsive to temperature stress were identified using three feature-selection algorithms, with the top 50 genes extracted by each algorithm based on ranking. Across all methods, nine genes were consistently identified by all three feature selection approaches. The performance of two classification models was evaluated in classifying three classes (control, cold stress, and heat stress). The Bayes Net algorithm showed high discriminative accuracy; a TP Rate of 1, FP Rate of 0.21, and Precision of 0.980 were obtained for the control class; a Precision of 1 and a TP Rate of 0.958 were achieved for the cold-stress class; both accuracy and TP Rate were 1 for the
heat-stress class. The Random Forest algorithm also demonstrated strong discriminative power. A correct classification rate of 1 and a Precision of 0.96 were observed for the control class; a correct classification rate of 0.958 and a Precision of 1 were obtained for the cold-stress and
heat-stress classes, indicating robust performance in accurately identifying stressed samples.
Conclusion: This study demonstrates that identifying and analyzing key genes involved in sorghum’s temperature stress response can provide insights into the biological pathways and regulatory networks activated under such conditions. Nine out of the 2,136 differentially expressed genes were consistently identified by three different selection algorithms. These genes can serve as potential molecular markers, but further biological validation is necessary across different sorghum varieties. The high accuracy of Bayes Net and Random Forest confirms the strength of these models in distinguishing gene expression patterns between stressed and control conditions. Homology analysis of genes, such as Sobic.004G283300, Sobic.010G100600, and Sobic.006G093500, in Arabidopsis and maize supports their role in heat stress response. However, six genes (Sobic.010G128900, Sobic.001G093100, Sobic.007G168100, Sobic.002G269100, Sobic.006G183701, and Sobic.002G047800) remain uncharacterized, with no documented molecular function. Further research is required to explore the roles of these genes in physiological and stress-related processes. Understanding their functions could contribute to breeding sorghum varieties that are more resilient to environmental stresses, ultimately supporting sustainable agriculture. Field-based and experimental validation of these molecular markers is also recommended to confirm their applicability under real-world farming conditions.

 

Full-Text [PDF 1512 kb]   (14 Downloads)    
Type of Study: Research | Subject: Special
Received: 2025/03/12 | Accepted: 2025/07/22

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2026 CC BY-NC 4.0 | Journal of Crop Breeding

Designed & Developed by: Yektaweb