1- the Department of Genetic Engineering and Biology, Tabarestan Agricultural Genetics and Biotechnology Research Institute, Sari University of Agricultural Sciences and Natural Resources, Sari, Iran
2- Plant Production Department, Gonbad Faculty of Agricultural Sciences and Natural Resources, Gonbad Kavos University, Gorgan, Iran
Abstract: (7 Views)
Introduction Camelina (Camelina sativa) is one of the oldest plants from the Brassicaceae family, commonly known by the names “false flax” or “wild flax”. Camelina is a valuable and high-quality plant-based oil source with great potential for use in various industries to produce beneficial products. Like other plant species, Camelina encounters numerous environmental and non-environmental stresses during its vegetative growth stage. Investigating gene regulatory patterns and gene networks related to signaling and stress responses can help better understand the mechanisms of stress tolerance in this plant. One of the transcription factor families that contributes to appropriate responses to environmental stresses is the Heat Shock Factors (HSFs). In general, the accumulation of HSPs is essential for the survival of cells exposed to various environmental stresses. However, bioinformatics information regarding HSF genes in Camelina has not been reported. Therefore, the present study aimed to identify and analyze the gene structures, motifs, and domain conservation, as well as the three-dimensional structures of HSF proteins in Camelina.
Materials and methods: In this study, to identify and analyze the HSF gene family in Camelina sativa, genomic and protein sequence resources were obtained from the NCBI database. Initially, a tBLASTN analysis was performed using HSF protein sequences from the model plant Arabidopsis as input against the Camelina genome. After removing redundant sequences, the presence of specific domains was confirmed using the InterProScan tools. Domain searches were also conducted using HMMER v3.0 based on the HSF HMM profile obtained from Pfam database (PF00447) against the C. sativa proteome. Duplicate records were merged, and incomplete sequences were excluded. To accurately identify authentic members of the HSF gene family in Camelina sativa, a phylogenetic analysis was conducted using Arabidopsis thaliana HSF sequences as references. This analysis effectively distinguished true HSF proteins from those that merely contain the HSF-type winged-helix DNA-binding domain, ensuring the precise classification of HSF family members. Gene structure and physicochemical properties were analyzed using TBtools and ProtParam, while conserved motifs and subcellular localization were identified using MEME and WoLF PSORT, respectively. Phylogenetic analysis was conducted using MEGA 12.0, employing the maximum-likelihood method with bootstrap testing. 3D structures were predicted via Phyre2.0, and protein–protein interactions were analyzed using the STRING database.
Results: Based on the HMM model using the HMMER tool, a total of 137 isoforms of protein sequences containing HSF domains (PF00447) were identified in C. sativa. After removing redundant isoforms, 96 gene loci were determined. Ultimately, through phylogenetic analysis of these sequences alongside 21 HSF family proteins from A. thaliana, 64 protein sequences were confirmed as members of the HSF gene family in Camelina. Structural and physicochemical analysis revealed that the length of these proteins in Camelina ranged from 146 to 496 amino acids, with predicted molecular weights between 17.1 kDa and 55.2 kDa. The isoelectric points varied from 4.7 to 10.2, and the aliphatic index ranged from 58.28 to 76.54. An instability index above 40 was observed in 83% of the sequences, suggesting that these proteins may have lower stability under laboratory or intracellular conditions and are likely to degrade more rapidly. Furthermore, the negative GRAVY values for all CsHSF proteins indicate their hydrophilic nature. Subcellular localization analysis of CsHSF proteins revealed that their predominant presence is in the nucleus, followed by the cytoplasm. Based on the results of the phylogenetic tree, ScHSF proteins were classified into three distinct major groups. Gene structure analysis of these groups, based on the presence or absence of introns, showed that except for six intronless genes, the remaining HSF genes contain introns with more than one phase. The variation in intron number and phase in some Camelina HSF genes may be due to intron loss or gain during evolution, which likely contributes to the functional diversification of these proteins. Analysis of the relationship between the phylogenetic grouping of HSF proteins and exon number revealed that all groups exhibited a similar distribution in terms of exon count. While protein length and molecular weight varied significantly across groups, the majority of proteins in all groups contained only 2 or 3 exons. A total of 15 conserved motifs, ranging from 8 to 50 amino acids in length, were identified across all these proteins. Secondary structure analysis showed that group I proteins predominantly featured regular and continuous alpha-helices. Group II proteins exhibited more diverse secondary structures, including more prominent beta-sheets, extended coils, and a higher proportion of disordered regions. Proteins in group III displayed a combination of structural features from both groups I and II. Furthermore, interaction analysis indicated coordinated behavior among these proteins in response to various environmental conditions.
Conclusion: Bioinformatic identification and analysis of the HSF gene family in Camelina sativa led to the discovery of 64 members with diverse structural and functional characteristics. The relatively high number of family members compared to other species is likely associated with Camelina’s allohexaploid nature, consisting of three distinct genomes. Gene structure analysis revealed the presence of conserved DBD and HR-A/B motifs in all members, indicating their central role in responding to environmental stress. Given Camelina’s significance as a stress-tolerant oilseed crop, identifying these genes can pave the way for breeding programs aimed at developing cultivars with enhanced tolerance to environmental stresses.
Type of Study:
Research |
Subject:
ساير Received: 2025/07/14 | Accepted: 2025/10/25