Keywords
DNA methylation, Methylation profiling, Whole-Genome Bisulfite Sequencing (WGBS), Enzymatic methyl-seq (EM-seq), Bisulfite, Cytosine, Uracil
This article is included in the Genomics and Genetics gateway.
This article is included in the Bioinformatics gateway.
This article is included in the Bioinformatics Education and Training Collection collection.
DNA methylation, Methylation profiling, Whole-Genome Bisulfite Sequencing (WGBS), Enzymatic methyl-seq (EM-seq), Bisulfite, Cytosine, Uracil
DNA methylation is a key epigenetic regulator of many biological processes, including transposon activation, heterosis, biotic and abiotic stress response, development, and reproduction (Moore et al., 2013). Whole-genome DNA methylation profiling (WGDM) is among the most efficient techniques available for the detection of DNA methylation, providing >90% single-nucleotide-based genome-wide coverage for cytosines followed by guanine (CpG) residues (Chatterjee et al., 2017). WGDM is similar to the traditional whole-genome sequencing technique with additional steps of cytosine conversion to uracil (Plongthongkum et al., 2014; Li et al., 2018). Of deployed cytosine conversion techniques, sodium bisulfite (SB) conversion is a gold standard process that utilizes single-base-level chemical treatment to convert cytosine to uracil while leaving 5-methylcytosine (5-mC) unaffected prior to PCR enrichment (Figure 1). During PCR amplification, uracil is converted to thiamine, allowing researchers to perform high-throughput sequencing to determine the average methylation level in a sample (Feng et al., 2020). Although considered a gold standard, several limitations can arise through SB treatment, such as low conversion efficiency (failed conversion of non-methylated cytosines to uracil) and DNA degradation. These can impair the efficiency of downstream steps such as PCR amplification, library yield, and false detection of methylated cytosine (Hernández et al., 2013; Iurlaro et al., 2016; Tierling et al., 2018).
An approach known as enzymatic methyl-seq (EM) conversion has recently been developed to circumvent the limitations of SB conversion (Vaisvila et al., 2021). In the first step of this method, the 5-mC is converted to 5-hydroxymethylcytosine (5-hmC), then 5-formylcytosine (5-fC), and eventually to 5-carboxylcytosine (5-caC) using the ten-eleven translocation (TET) enzyme, which protects the methylated cytosines to be converted in downstream steps (Ito et al., 2011). In the second reaction, an enzyme known as apolipoprotein B mRNA editing enzyme catalytic polypeptide-like 3A (APOBEC3A) deaminates non-methylated cytosines into uracil (Figure 1) (Wijesinghe and Bhagwat, 2012; Schutsky et al., 2017). The final sequences containing methylated cytosines remain cytosines, while non-methylated cytosines appear as thymine (like in SB conversion-mediated methylation detection). Therefore, the same analytical methods can be used for both EM- and SB-converted DNA (Feng et al., 2020).
Recent data indicate EM conversion provides superior results compared to SB treatment. In a current research, Feng et al., (2020) investigated DNA methylation in Arabidopsis thaliana using the enzymatic methylation sequencing (EM-seq) method and compared their findings to whole-genome bisulfite sequencing (WGBS) data. They reported EM-seq provided better data than that of WGBS. Specifically, libraries prepared with EM-seq kits had higher mapping and lower duplication rates, lower background noise, higher average coverage, and higher total cytosine coverage between replicates (Feng et al., 2020). In another study using human gDNA, similar results were obtained where EM-seq libraries outscored bisulfite-converted libraries in most of the examined parameters, including coverage, duplication, sensitivity, and nucleotide composition (Vaisvila et al., 2021). Additionally, even with extremely low levels of DNA (100 pg), EM-seq proved to be a more reliable method for detecting methylation state than WBGS, which requires 10–200 ng DNA input (Vaisvila et al., 2021).
However, optimized sample preparation protocols are only available for a handful of species and are missing for many important crop plants. In addition, most existing benchmark methods have been performed for mammalian genomes, which are different from plant genomes (e.g., plant genomes are enriched with repetitive elements, as well as CHG and CHH methylation). Therefore, in this study, we have compared two commercial SB conversion kits: Zymo Conversion kit and QIAGEN Conversion kit with an EM conversion kit, NEB EM-seq kit, using soybean (Glycine max [L.] Merrill) DNA samples. Further, we have benchmarked the use of these three kits with the NEBNext® Ultra™ DNA Library Prep Kit for Illumina (NEB) for sample preparation for high-throughput sequencing.
Seeds of soybean genotype Williams 82 were sown in a Miracle-Gro Moisture Control potting medium and grown at 25°C, 16 h d−1 light at 230 to 365 μM m−2 s−1, and 90% relative humidity in a growth chamber. At the V2 developmental stage (Fehr and Caviness, 1977), a trifoliate leaf was collected for DNA isolation (Figure 2).
A detailed protocol can be found on protocols.io. Genomic DNA (gDNA) was isolated using the Zymo Research Quick-DNA Plant/Seed Miniprep kit (Cat #D6020; Irvine, CA, USA). DNA concentration was measured using a Qubit fluorometer (Thermo Fisher Scientific; Waltham, MA, USA) coupled with a Thermo Fisher Scientific dsDNA High Sensitivity Assay kit (Cat #Q32851). DNA purity was assessed from 260/230 and 260/280 nm absorbance ratios using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific). Furthermore, the quality and size of obtained DNA was determined using gel electrophoresis (Fisher Scientific gel casting and electrophoresis apparatus (Cat #FB-SB-1316); 1% agarose gel with 1X Tris acetate-EDTA buffer (ThermoFisher; Cat # B49). Nucleic Acid was stained with Biotium GelRed Nucleic Acid Gel Stain (Fisher Scientific; Cat # NC9594719) and gel imaging was performed using LiCOR ODYSSEY imager (Serial # OFC1321) under a 520 nm channel. As a non-methylated control, the soybean gDNA was spiked with Escherichia coli gDNA (Zymo Research, Cat #D5016) representing 1% of the total gDNA input. The spiked sample was then separated into three aliquots (each 65 μL), which were sheared using Qsonica sonicators (Newtown, CT, USA). Sonication settings entailed 15 sec on/90 sec off for 8 cycles at a 20 kHz frequency, resulting in DNA fragments of approximately 350 bp. Sheared aliquots were used for library preparation, as described below.
Sequencing libraries were prepared using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (New England Biolabs; Cat #E7645S; Ipswich, MA, USA). A total of 200 ng (for NEB EM-seq kit) or 1 μg (Zymo and QIAGEN Conversion kits) of starting DNA was used based on the manufacturer’s instructions (Figure 2). The sheared DNA fragments were repaired, and Illumina Methylated Adaptors were ligated using NEBNext® Multiplex Oligos for Illumina (NEB Cat #E7535S/L). Adapter-ligated fragments were then purified using Solid Phase Reversible Immobilization (SPRI) magnetic beads (Beckman Coulter Inc.; Cat #B23317; Brea, CA, USA). For the cytosine conversion, the adapter-ligated, purified DNA samples were processed with either a Zymo Conversion kit (EZ DNA Methylation™ Kit, Zymo Research; Cat #D5001), QIAGEN Conversion kit (EpiTect Bisulfite Kit, QIAGEN; Cat #59104), or NEB EM-seq kit (NEBNext® Enzymatic Methyl-seq Conversion Module, New England Biolab; Cat #E7125S/L) (Figure 2).
For SB conversion using the Zymo Conversion kit, DNA was incubated with CT Conversion Reagent (Zymo Research; Cat #D5001-1) (Table 3 and Figure 1) in a thermocycler for approximately 16 hrs. The converted DNA samples were then desulphonated using M-Desulphonation Buffer (Zymo Research; Cat #D5001-5). For SB conversion using the QIAGEN Conversion kit, the sample was incubated in a thermocycler for approximately 5 hrs with the Bisulfite mix (Table 2 and Figure 1) and the DNA protection buffer provided in the kit. Converted DNA samples were then treated with Buffer BD from the kit for desulphonation. For EM conversion, the sample was first treated with TET2 (NEB Cat #E7130AVIAL; part of the NEBNext® Enzymatic Methyl-seq Conversion Module) and APOBEC (NEB Cat #E7133AVIAL) (Table 1 and Figure 1) enzymes in a series of steps, which ultimately converted cytosines to uracil, allowing for the detection of 5 mC and 5 hMC.
Converted samples were purified using Beckman Coulter SPRIselect magnetic beads (Cat #B23317). The barcoding of converted DNA fragments was carried out with NEBNext® Multiplex Oligos for Illumina (Methylated Adaptor, Index Primers Set 1) (NEB Cat #E7535S/L) using PCR (16 cycles) with EpiMark® Hot Start Taq DNA Polymerase (NEB Cat #M0490S) following manufacturer instructions (more details can be found on protocols.io).
After amplification, prepared libraries were purified using SPRIselect magnetic beads (Cat #B23317). The concentration of prepared libraries was measured using the Thermo Fisher Scientific dsDNA High Sensitivity Assay kit (Cat #Q32851) with an Invitrogen Qubit Fluorometer as well as a NanoDrop ND-1000 spectrophotometer (Cat# E1123552; Thermo Fisher Scientific). Prepared libraries were then sent to Novogene Corporation Inc. (Sacramento, CA, USA) for downstream quality assessment and whole-genome sequencing. Fragment size and concentration were validated with an Agilent Bioanalyzer (Agilent Technologies; Santa Clara, CA, USA). Libraries were sequenced on an Illumina Hi-Seq sequencer to obtain 150 bp paired-end reads.
The quality of raw reads obtained from the three samples was assessed using FastQC (RRID:SCR_014583) (v.0.11.8) (Andrews, 2010) and MultiQC (RRID:SCR_014982) (v1.9) (Ewels et al., 2016). Adapter sequences and poor-quality bases (Phred <20) were removed using Trim Galore (RRID:SCR_011847) (v0.4.2) (Bolger et al., 2014). The pre-processed reads were then aligned to the soybean reference genome (Gmax_275_v2.0) obtained from Phytozome (Goodstein et al., 2012) using Bismark (RRID:SCR_005604) (v.0.22.3) aligner with default parameters (Krueger and Andrews, 2011). After removing duplicate reads, Bismark (v.0.22.3) (Krueger and Andrews, 2011) was used to detect cytosine methylation at a single-base resolution. Cytosine methylation levels at each region of the sequence were calculated as the number of methylated C vs. total C and T present. To estimate and evaluate false-positive methylation levels and efficiency of cytosine deamination, reads were aligned to the non-methylated controls, soybean chloroplast genome (NC_007942.1), and E. coli non-methylated genomic DNA sequences. Data were visualized with RStudio (RRID:SCR_000432) (v1.1.463) (R Core Team, 2020) implementing the ggplot2 package (RRID:SCR_014601) (v.3.3.5) (Wickham, 2016). The code is available from GitHub and is archived with Zenodo (Wijeratne, 2022).
Despite the utility and growing popularity of WGBS for epigenome analysis in crop plants, relatively little has been done to identify various factors affecting sample preparation and data quality. Cytosine conversion is a crucial step of this process, as partial conversion can lead to false methylation calls. In this survey experiment, we used soybean DNA to prepare samples using three different cytosine conversion protocols. We have evaluated their ease of use, DNA loss during the conversion step, and subsequent data quality. These evaluations shed light on the performance of these three kits and their use for preparing samples to analyze crop plant epigenomes.
Essential factors to consider when selecting a kit are the final yield of libraries and the ease of the protocol. Based on the time required for cytosine conversion steps, the QIAGEN Conversion kit was the fastest method to prepare the whole library (≈ 24 hrs), followed by the NEB EM-seq kit (≈ 26 hrs) and Zymo Conversion kit (≈ 35 hrs). When evaluating the cytosine conversion step, the Zymo Conversion kit needed the longest time of ≈ 16 hrs. However, the QIAGEN Conversion kit and the NEB EM-seq kit required a total of ≈ 5 hrs, which included the post-conversion cleanup step (Tables 1-3).
EZ DNA Methylation Kit (Zymo Research) | ||||
---|---|---|---|---|
Step | Time | Temperature | Cycles | Total time required for cytosine conversion |
Denaturation | 30 sec | 95°C | 55 | ≈ 16 hr |
Bisulfite conversion | 15 min | 50°C | ||
Hold | ∞ | 4°C |
Following cytosine conversion, the highest recovery of DNA was observed with the QIAGEN conversion kit (100%), followed by the Zymo Conversion (20%) and NEB EM-seq (7.40%) kits (Figure 3). Previously, researchers noted factors that resulted in the reduction of libraries while using SB-based cytosine conversion. In a study conducted by Tanaka and Okamoto (2007), a real-time PCR experiment indicated that DNA degradation during SB conversion occurs primarily due to the hydrolytic reaction or the prolonged exposure to the SB. Additionally, prolonged exposure to SB results in the formation of abasic sites and subsequent DNA strand damage (Tanaka and Okamoto, 2007). An abasic site is a region of DNA that lacks both purine and pyrimidine bases due to DNA damage formed by spontaneous hydrolysis of the N-glycosidic bond (Boiteux and Guillet, 2004). According to the manufacturer's manuals for SB-based conversion kits, DNA is denatured and fragmented before or during cytosine conversion in the thermocycler to chemically convert non-methylated cytosines into uracil. Thus, the prolonged exposure to SB during denaturation, as well as DNA fragmentation, likely contributed to the loss of small DNA fragments during the downstream purification steps, reducing library yield. Conversely, high recovery of DNA when using the QIAGEN conversion kit may be attributed to the presence of DNA protection buffer in the kit (Izzi et al., 2014; Leontiou et al., 2015; Tierling et al., 2018). The drastic loss of DNA with the NEB EM-seq kit may be due to comparatively more purification steps, increasing the chances of DNA loss due to pipetting.
Prepared samples were sequenced to see if the cytosine conversion method had any effect on the sequencing. The initial quality of sequencing data was analyzed based on the following criteria: the total number of sequencing reads obtained, total coverage (X), duplication rate (%), Phred quality score, Per-base sequence, and GC content (%) in the sample. These parameters indicated no major differences among the three samples as discussed below. The number of raw sequencing reads per sample ranged from 33–50 million, with coverage ranging from 20X to 30X (Figure 3). The Phred quality score of each data set was found to be excellent, ranging between 31 to 40 (Figure 4). Based on the per-base sequence plot, thymine was the most abundant base in each library with 50% of all four bases, whereas cytosine was the least abundant (Figure 5). GC content was similar between samples, with the Zymo Research library having the highest level, followed by the QIAGEN and NEB kits (Figure 3). As expected, the per-sequence GC content for each sample was biased due to cytosine conversion into thiamine during PCR amplification. Thus, our data suggest that the cytosine conversion method has little effect on the initial sequence quality.
Nevertheless, initial data quality may not reflect data utility. Therefore, we mapped reads to the soybean genome to see the mapping efficiency (defined as how many reads can be aligned to the genome proportional to the total reads). Reads generated from all three methods had similar mapping efficiencies. The NEB kit library had the highest mapping efficiency (74.4%), followed by the QIAGEN kit (72.5%) and the Zymo Research kit (70.8%) (Figure 3). The observed mapping efficiency (>70%) is comparable with previously reported efficiencies for cytosine-converted samples (≈ 60–75%) (Hari and Parthasarathy, 2019). These results implied that sequences obtained from all three methods could be aligned correctly to the soybean genome.
We also calculated the duplication rate for each sample, as a higher rate of duplication distorts the true sequence proportion in a given sample. Duplication rates between 13 and 15% were detected for each library (Figure 3), which is lower than previously reported duplication rates during soybean methylome profiling (Rambani et al., 2015). Duplication can occur for several reasons. For instance, biased PCR enrichment and overamplification of DNA fragments can result in an overrepresentation of library fragments. Duplication can also occur when an identical template binds to numerous clusters on a flow cell. Duplicates of PCR products distort the actual proportion of sequences in the sample (please see more details on the Babraham Institute website).
It is crucial to analyze the cytosine conversion efficiency during DNA methylation analysis. If a library has low conversion efficiency, it can result in false methylation detection (Singer, 2019). Adding methylated, non-methylated, or both types of genomic controls in a sample can help identify these limitations presented by bisulfite-converted DNA. Here, the cytosine conversion efficiency was calculated using reads aligned to non-methylated E. coli and chloroplast gDNA for each sample. According to these alignments, the bisulfite conversion efficiency of samples ranged between 90 and 99.8% (Figure 3). QIAGEN and Zymo Conversion kits had a similar conversion efficiency of roughly 99% (Figure 3). Similar results were reported for both bisulfite kits in a study conducted on human chromosomal DNA (Tierling et al., 2018). For the NEB EM-seq kit, the cytosine conversion efficiency was found to be 90.9%, around 9% lower than the data obtained from a published study (Figure 3) (Feng et al., 2020). The efficiency and accuracy of a conversion kit are dependent primarily on the PCR cycling procedure, conditions, the length of the desulphonation process, and the addition of reagents to prevent DNA degradation (Izzi et al., 2014; Tierling et al., 2018). In this case, the QIAGEN kit included a DNA protection buffer containing a pH indicator dye to ensure the correct pH for cytosine conversion. Alternatively, the Zymo Research kit protocol distinguishes the alkaline or denaturation step from the conversion step (Izzi et al., 2014). However, it is unclear what contributed to poor cytosine conversion efficiency in the NEB EM-seq kit. It is possible that further optimization of the input DNA quantity is necessary for this kit when it is used with the NEB library preparation kit.
Furthermore, the data were aligned to the soybean reference genome to detect the number of methylated cytosines in each library. The highest number of methylated cytosines in all three contexts (i.e., CpG, CHG, and CHH) was found in the library prepared with the NEB EM-seq kit, with around 1.295 × 109 total methylated cytosines, followed by 1.169 × 109 in the QIAGEN library and 703 million in the Zymo Research library. Moreover, the QIAGEN and Zymo Research libraries displayed a similar distribution of methylated cytosine across contexts. The NEB library showed a distinct distribution compared to the other kits, with an increase in methylated cytosines at CHH context (Figure 6). This can be attributed to the comparatively poor cytosine conversion efficiency, which may have resulted in a high number of erroneously methylated cytosines (Simpson et al., 2017). As expected, the majority of cytosines in each library were methylated in CpG context, followed by CHG and CHH (Figure 6). These data are congruent with a past soybean DNA methylation study, which revealed the greatest levels of methylation in CpG context, followed by CHG and CHH contexts, under normal developmental conditions (Song et al., 2013).
The current work is the first to compare WGBS and EM-seq to analyze DNA methylation in the soybean genome. Here, a survey study was conducted deploying three commercially available DNA methylation detection kits. The results obtained suggested that the QIAGEN kit provided the best DNA yield and number of sequencing reads. The Zymo Research kit demonstrated the best cytosine conversion efficiency and a low duplication rate. Nevertheless, we recommend the NEBNext® Enzymatic Methyl-seq Conversion Module, which was superior to the other kits based upon required DNA input, quality of generated libraries, and hands-on time. Follow-up studies with biological/technical replication are needed to validate the reproducibility and efficiency of the defined conversion kits.
Raw sequencing data are available as follows:
BioProject: Glycine max cultivar: Williams 82 (soybean). Accession number PRJNA902392; https://identifiers.org/bioproject:PRJNA902392 (Arkansas State University, 2022a).
Sequence Read Archive: Evaluation of cytosine conversion methods for whole-genome DNA methylation profiling (Run: SRR22331514). Accession number SRX18304416; https://identifiers.org/insdc.sra:SRX18304416 (Arkansas State University, 2022b).
Sequence Read Archive: Evaluation of cytosine conversion methods for whole-genome DNA methylation profiling (Run: SRR22331515). Accession number SRX18304415; https://identifiers.org/insdc.sra:SRX18304415 (Arkansas State University, 2022c).
Sequence Read Archive: Evaluation of cytosine conversion methods for whole-genome DNA methylation profiling (Run: SRR22331516). Accession number SRX18304414; https://identifiers.org/insdc.sra:SRX18304414 (Arkansas State University, 2022d).
Analysis code available from: https://github.com/ajwije/DNA_methylation_analysis.git
Archived analysis code at time of publication: https://doi.org/10.5281/zenodo.7328525 (Wijeratne, 2022).
License: MIT
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
No
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
No
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Epigenetics, Genomics, Rice, Photosynthesis, DNA methylation, Histone posttranslational modifications, RNA-seq
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
No
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
No
Are the conclusions drawn adequately supported by the results?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: DNA methylation, genomics, genetics, gene structure, microbiome, rnaseq, bioinformatics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 07 Dec 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)