Jiali Zhao · Huanhuan Xiong · Junhui Wang ·Hanguo Zhang · Lei Zhang
Abstract Larix olgensis A.Henry is a fast-growing tree used for afforestation in northeastern China and has great ecological and economic value. For studying developmental genes in the xylem of this species, we investigated the Myb transcription factor family, one of the largest families of transcription factors in plants, which plays an important role in the regulation of lignif ication in plant secondary walls. By sequencing a L. olgensis cDNA library using the Illumina HiSeq2500 high-throughput sequencing platform,we obtained 58,683 unigene sequences, of which 16,554 unigenes were longer than 1000 bp, accounting for 28.2%of the total database. The alignment of these genes with the GO, COG, KEGG, Swiss-Prot and NR databases resulted in annotated 29,350 unigenes. We obtained a total of 1460 differentially expressed genes, of which 453 were upregulated and 1007 were downregulated at the two developmental stages analyzed. The gene annotations showed a wide range of biological functions and metabolic pathways. The 10 Myb transcription factors that were obtained from the differentially expressed genes were analyzed by real-time quantitative PCR (qRT-PCR). The results showed that four Myb transcription factors may be associated with xylem development in L. olgensis. Due to the large genome size of conifers,genomics research on these species has lagged behind that for other plant groups. Our data provide the basis for further studies on xylem development in L. olgensis.
Keywords Larix olgensis · RNA-seq · Myb t ranscription factors · Differentially expressed genes
Larix olgensisA.Henry is a fast-growing tree species used for afforestation in northeastern China forest areas. Its economically valuable traits are thus an important focus for research and improvement. The use of conventional genetic engineering techniques withL. olgensiscould not overcome the problem of the long breeding cycle for conventional tree breeding, and many important economic traits could only be fully and effectively assessed during the rotation period.Coniferous tree genome sizes range from 6500 to 37,000 Mb due to a large number of rDNA repeats. The limitations of research techniques and insufficient funding have caused progress on coniferous genomics to be relatively slow compared to that of other species (Ahuja and Neale 2005).
Transcriptome sequencing, however, allows the sequencing of transcripts at any time or under any conditions, ref lecting the expression of protein-coding genes at specif ic time points or under specif ic conditions. It has been widely used in expression prof ile analyses, gene mining and function prediction, and metabolic pathway research in animals, plants, and microorganisms (Sekeroglu et al. 2016; Guo et al. 2017a, b, c; Chen et al. 2018;Gomez et al. 2018) and played an important role in elucidating gene expression patterns in both model and nonmodel plants (Correll et al. 2013; Guo et al. 2017a, b, c)and identifying genes related to resistance and biosynthesis, gene functions, and transcription factors and promoters(Wong et al. 2011; Qiu et al. 2016; Meng et al. 2018; Yu et al. 2018).
Since the release of the f irst EST sequence related to the formation of wood inPinus taedain 1998, high-throughput transcriptome data on conifers has grown signif icantly(Allona et al. 1998; Li et al. 2011, 2017; Zhang et al. 2012a,b). In the f ield of forestry, wood-related genes have been identif ied in Chinese f ir, Japanese cedar,Pinus radiata, andPinus canariensisusing transcriptome sequencing and analysis (Li et al. 2009; Huang et al. 2012; Mishima et al. 2014;Chano et al. 2017).
Such transcriptome analyses have also contributed to identifying transcription factors, which regulate functional genes (Feng et al. 2010; Guo et al. 2017a, b, c; Yang et al.2017), such as members of the Myb transcription factor family, one of the largest families of transcription factors in plants, known for their importance in regulating transcription. A network of transcription factors has also been shown to regulate secondary wall synthesis (Zhong et al. 2007a,b) and the phenylpropanoid metabolic pathways involved in lignin synthesis during secondary cell wall formation. InArabidopsis thaliana, MYB46 and MYB83 are part of the transcriptional network in the secondary wall developmental regulatory networks that are directly regulated by SECONDARY WALL-ASSOCIATED NAC DOMAIN PROTEIN1(SND1) in a primary network switch. WhenA. thalianaoverexpresses AtMYB46 and AtMYB83, downstream lignin-related genes are upregulated and specif ic expression of f iber and catheter-related genes is promoted, leading to cell wall thickening, thus suggesting their involvement in secondary wall synthesis (Zhong et al. 2007a, b; McCarthy et al. 2009). MYB58, MYB63, MYB85 are tertiary transcriptional regulators directly regulated by AtMYB46 and AtMYB83; and genes for these three regulators specif ically affect the synthesis of lignin without affecting the synthesis of components such as cellulose in the cell wall (Zhong et al.2008; Zhou et al. 2009). A number of Myb family transcription factors related to lignin synthesis, such as EgMYB2,PgMYB1, PtMYB1, PtMYB4, PtMYB8, PtrMYB 3, and PtrMYB20, have also been found in various tree species(Patzlaffet al. 2003a, b; Goicoechea et al. 2005; Bedon et al. 2007; McCarthy et al. 2010). In addition to promoting the synthesis of lignin, Myb transcription factors can also suppress it, as in the case of EgMYB1 and AtMYB75(Bhargava et al. 2010; Legay et al. 2010).
Lignin and cellulose are the main components of wood formation, and the key enzymes and genes in their synthetic pathways have been identif ied (Desprez et al. 2007; Vanholme et al. 2010). Because the regulation of biosynthetic genes can affect wood formation (Chiaiese et al. 2011), we are particularly interested in transcription factors related to the control of wood formation inL. olgensis. In this highthroughput sequencing analysis to lay a foundation for improving valuable traits inL. olgensis, we examined differences in gene expression during secondary wall thickening and at the end of wall thickening and identif ied Myb transcription factors related to wood formation.
FromL. olgensisin Maoer Mountain Forest Farm of Northeast Forestry University, Heilongjiang Province, China(44°29?—45°34?N, 127°17?—129°12?E), we collected lateral branches based on the phenological stages: during secondary wall thickening and the end of wall thickening. We peeled offthe bark and phloem, placed the cambium into liquid nitrogen and immediately stored samples at - 80 °C for later RNA extraction and qRT-PCR.
Total RNA was extracted from the cambium in two stages using a Universal Plant Total RNA Extraction Kit(BIOTEKE, Beijing, China) according to the manufacturer’s instructions. Total RNA was treated with DNase I (TaKaRa Bio, Shiga, Japan) to remove genomic DNA. High-quality RNA was extracted from three plants. Nanodrop, Qubit 2.0,and Agilent 2100 methods were used to detect the purity,concentration and integrity of RNA samples. The effective concentration of the library was quantif ied using qPCR.Illumina sequencing was performed using the HiSeq2500 platform, and the sequencing read length was PE125. The sample inspection, cDNA library construction, quality control, and sequencing were all completed by Biomarker Technology Company, Beijing, China.
Sequencing produced a large number of high-quality reads,with most having a base quality score that met or exceeded Q30. Any primer and adapter sequences were removed to ensure data quality. Clean reads were assembled using Trinity (Grabherr et al. 2011). First, shorter reads were extended to longer contigs. Unigenes and transcript sequences were obtained by clustering the contigs.
Unigenes were compared to GO, COG, KEGG, Swiss-Prot and NR databases to obtain functional annotations and classif ication information using BLAST (Altschul et al.1997). The BLAST E-value parameter was not greater than 10 -5 .
The reads were sequenced by Bowtie and compared with the unigene library (Langmead et al. 2009). FPKM (fragments per kilobase of transcript per million mapped reads), a commonly used gene expression level estimation method in transcriptome sequencing data analysis, was applied (Trapnell et al. 2010). FPKM can eliminate the inf luence of gene length and sequencing depth variation on gene expression.The calculated gene expression can be directly used to compare gene expression differences between different samples.DESeq was used for differential expression analysis between samples, and the Benjamini—Hochberg method was used to correct the signif icantpvalue obtained from the original hypothesis test, that is, adjust the false discovery rate (FDR). During the screening process, FDR < 0.01 and fold-change (FC) ≥ 2 were used as screening criteria, and the FC indicated the ratio of expression levels between two samples.
According to the results of transcriptome sequencing, Primer Premier 6.0 was used to design specif ic primers.α-tubulin was selected as a reference gene, and primers were synthesized by Sanon Biotech (Sangon Biotech, Shanghai, China).The total RNA extraction method was the same as the one used for the cDNA library. Reverse transcription of RNA into cDNA was performed according to the instructions of PrimeScriptTM RT reagent Kit with gDNA Eraser (Perfect Real Time) (TaKaRa Biotech, Dalian, China). The reaction system for qRT-PCR was 20 μL, containing 2 μL of cDNA solution, 0.4 μL of 50 × ROX reference dye, 1 μL of forward primer, 1 μL of reverse primer, and 10 μL of THUNDERBIRD SYBR qPCR Mix (TOYOBO, Shanghai, China) solution, with the remaining volume being ddH2O. Each sample
was run three times. The 2 -ΔΔCt (Bustin et al. 2009) method
was used to analyze the relative gene expression data.
Table 1 Summary statistics of sequencing data for L. olgensis based on the RNA-sequences
Table 2 Unigene annotation statistics of the transcriptome of L.olgensis using different databases
We obtained 101,592,425 clean reads after removing lowquality reads from the transcriptome sequencing results.The GC content of each sample ranged from 46.05 to 46.43%, and the percentage of Q30 bases in each sample was ≥ 87.88%. The clean sequence data were assembled with Trinity, and 58,683 unigenes were obtained. The N50 was 1490 bp, and the average length was 942 bp. Transcriptome sequencing resulted in a high degree of assembly integrity.There were 16,554 unigenes ≥ 1000 bp long, accounting for 28.2% of the total database (Table 1 ).
Unigenes were aligned by BLASTx for the annotation analysis using the Gene Ontology (GO) database, Clusters of Orthologous Groups of proteins (COG, NCBI) database,Kyoto Encyclopedia of Genes and Genomes (KEGG) database, Swiss-Prot database, and non-redundant (nr, NCBI)protein database (Ashburner et al. 2000; Tatusov et al.2000; Apweiler et al. 2004; Kanehisa et al. 2004; Deng et al. 2006). The project selected the BLAST parameters that were ≤ 10 -5 . In total, 29,350 unigenes were obtained in the f ive databases. The f inal gene annotation statistics are shown in Table 2.
To study the differences in gene expression during secondary wall thickening and the end of wall thickening, we first identified the differentially expressed genes (DEGs).The R package DESeq was used to analyze differential expression between samples, and the significance of DEGs bypvalue (FDR < 0.01, fold-change ≥ 2) was used to correct the differential expression analysis (Anders and Huber 2010). The comparison of the two stages showed that there were 1460 DEGs, of which 453 genes were upregulated, and 1007 genes were downregulated.
To describe the role of these genes in the development of the xylem, we carried out GO function annotation and method enrichment analysis for these DEGs. According to the results of GO enrichment, 948 (64.93%) DEGs were grouped into cellular component, molecular function,and biochemical process. These categories were divided into 53 functional genes, some of which were involved in multiple regulatory processes simultaneously. The results showed that a high percentage of genes belonged in the categories metabolic process, cellular process, catalytic activity, binding, cell part, and organelle, while only a few genes were related to extracellular matrix and receptor activity (Fig. 1).
We used the KEGG database to analyze the differentially expressed genes between the two stages. We obtained a total of 175 DEGs categorized in 81 pathways.Figure 2 shows the main metabolic pathways, and the top 10 metabolic pathways are listed in Table 3. Among them,the three pathways with the largest numbers of DEGs were starch and sucrose metabolism (22 DEGs, 12.6%),plant hormone signal transduction (17 DEGs, 9.71%), and protein processing in endoplasmic reticulum (17 DEGs,9.71%).
We chose α-tubulin as a housekeeping gene for our analysis.RNA was extracted fromL. olgensisstems at 60 days (stems not lignif ied) and 120 days (stems lignif ied). The reverse transcribed cDNA was used as a template for validation. The results showed that the expression of 10 genes encoding Myb transcription factors was downregulated. The gene expression at 60 days (stems not lignif ied) was higher than that at 120 days(stems lignif ied). Among these 10 genes, c68153.graph_c0 and c71764.graph_c0 were downregulated over twofold (Fig. 3 a).To further understand the relationship between Myb transcription factors and stem lignif ication inL. olgensis, we performed qRT-PCR analysis of expression levels at different tissue during the same period. Four Myb transcription factor genes were expressed more highly in the stem and root than in the leaves. The expression levels of c68153.graph_c0 and c71764.graph_c0 in the stem were both higher than in the leaves and roots in the two stages. These are likely to be the transcription factors associated with the xylem of the stem. The expression of c59059.graph_c0 and c81229.graph_c0 in the root was higher than that in the stem and leaf in both periods. Because lignif ication also occurs during root development, these two genes might also be involved in the synthesis of transcription factors associated with xylem. We predict that these four Myb transcription factors might play roles in the synthesis of the secondary wall (Fig. 3 b. c). The regulation of these genes is a topic of further study.
Fig. 1 GO functional classif ications of differentially expressed genes
Fig. 2 Main KEGG pathway classif ication map
Table 3 Top 10 KEGG pathways of DEGs
Fig. 3 a qRT-PCR analysis of 10 Myb transcription factors. b Expression levels of four Myb transcription factors in roots, stems and leaves of 60 days (stems not lignif ied). c Expression level of four Myb transcription factors in roots, stems and leaves at 120 days.The expression levels of Myb transcription factors in 60-d and 120-d leaves were set to 1, respectively
In the present study, using transcriptome sequencing technology, we generated 5.62 Gb of clean sequence data for each sample in two stages, and the percentage of Q30 bases was above 87.88%. A total of 58,683 unigenes were obtained after de novo assembly. Functional annotation of unigenes,including comparison with NR, Swiss-Prot, KEGG, COG,and GO databases, resulted in 29,350 annotation results. The number of unigenes inL. olgensiswas higher than in marine pine (55,322) and loblolly pine (52,112). However, the number of annotated unigenes was lower than obtained for marine pine (32,919) and loblolly pine (30,844) (Fernández-Pozo et al. 2011; Visser et al. 2015). Because genomics data for conifers is limited and the database is incomplete, not all genes could be annotated. Moreover, there likely are new species-specif ic genes when unigene populations are compared between different species. Through high-throughput sequencing ofL. olgensis, 16,554 unigenes with a length of 1 kb or more were obtained. Some of them had a complete coding region, which could greatly aid in the discovery of important genes expressed during the growth and development ofL. olgensis.
The demand for wood materials is growing, and biological research on controlling plant biomass and reducing lignin has focused on the wood synthetic pathway (Vanholme et al.2008). Conifer species provide good raw materials for industrial materials, pulp, and papermaking, and the identif ication of wood quality genes plays an important role in genetic improvement. Myb transcription factors are widely involved in biological processes such as plant growth, secondary metabolism and abiotic stress (Lee and Schiefelbein 2002;Machado et al. 2009; Seo et al. 2009; Zhang et al. 2012a, b;Li et al. 2015). Currently, the lignin biosynthesis pathway has been clarif ied in the model speciesArabidopsis thaliana,and multiple Myb transcription factors have been reported to be correlated with the synthesis of lignin (Humphreys and Chapple 2002; Chai et al. 2014). In this study, we found that 10 DEGs encoding Myb transcription factors were signif icantly downregulated at the end of lignif ication. To further understand their relationship with stem lignif ication, we performed qRT-PCR analysis of their expression in different tissues. The results showed that the expression levels of c68153.graph_c0 and c71764.graph_c0 in stems were higher than in leaves and roots, which is consistent with other research results (McCarthy et al. 2009; Zhou et al. 2009).The expression of c59059.graph_c0 and c81229.graph_c0 in the root was higher than that in the leaf and stem. Studies have shown that Myb transcription factors have a moderating effect on lignif ication in roots (Gibbs et al. 2014).
Members of the Myb family share a conserved domain(Dubos et al. 2010). The unigene PtrMYB152 is closely related to that ofArabidopsisMyb, which is involved in lignin synthesis (Li et al. 2014). This result suggests that homologous genes may have conserved functions in transcriptional regulatory networks in different species. These results showed that that the domains of homologous transcriptional regulatory network genes were highly conserved.c68153. graph_c0 was the most closely related to the lignin synthesis-related transcription factor PgMYB8 inPicea glauca. PgMYB8 was expressed in the secondary xylem of stem and roots of young plants and mature trees; c71764.graph_c0 and PgMYB7 were the most closely related, and PgMYB7 was expressed in the secondary xylem (Bedon et al. 2007). Therefore, c68153.graph_c0 and c71764.graph_c0, which are homologous to PgMYB8 and gMYB7,are regulatory genes with Myb structure domains and likely play a role in lignin synthesis. Among the DEGs of Myb transcription factors, c45178.graph_c0, c74635.graph_c0,c81229.graph_c0, c59059.graph_c0, c80275.graph_c0 were annotated as AtMYB44, AtMYB39, AtMYB98, AtMYB82,and AtMYB3, respectively, but their function remains to be studied.
Larix olgensisis important for building materials and wood pulp, and genetic improvement is expected be instrumental in alleviating wood shortages. Because its genome is very large, genomics research in this species lags behind that of model plants such as broad-leaved tree species. However, as new sequencing and molecular biology technologies enable the study of theL. olgensistranscriptome, identifying gene function is becoming increasingly feasible. Many of the genes associated with lignin synthesis in plants have been successfully cloned (Hu et al. 2014; Liu et al. 2015;Kasirajan et al. 2017). The regulatory networks between these genes and transcription factors will be improved. In this study, we obtained the full-length cDNA sequence of Myb transcription factors and lignin synthesis-related genes. Although the resources are limited, the identif ication of these genes will provide a basis for the study of the molecular mechanisms involved in xylem formation and accelerate the molecular breeding process. Clarifying these mechanisms is the focus of future studies.
Journal of Forestry Research2020年6期