Supplementary Figure 10: Bioinformatics protocol.
From: epiGBS: reference-free reduced representation bisulfite sequencing

A) Mapping
(1) Because part of the sequenced fragment originates from the methylated adapter sequence, this part needs to be excluded from the analysis. Therefore, the first 4 bases of merged and forward Watson reads as well as the last 4 of merged and first four of reverse Crick reads are removed.
(2) Reads are mapped against either de novo obtained or existing reference using BWA-METH1 as it is more sensitive / accurate compared to similar bisulfite sequence-aligners and allows for easy transfer or read group tags from the sequence name to the bam output file1.
B) Variant calling
(3) Variant calling is done using Freebayes2 separately for both Watson and Crick reads for all samples simultaneously. The settings force every position to be called for all samples. Freebayes is used because it allows for indel realignment and sensitive variant calling.
(4) By simultaneously iterating over both Watson and Crick variant call files (VCF) SNPs and methylation polymorphisms can be distinguished. C/T polymorphisms in Watson combined with C on the Crick strand indicate a methylation polymorphism on the Watson strand whereas a G/A polymorphism on Crick combined with a G on the Watson strand indicates a methylation polymorphism on the Crick strand. Where combined SNP and methylation lead to a C/T or G/A polymorphism on both Watson and Crick strand only the SNP is called as the methylation ratio cannot be determined.
C) Visualization
(5) SNPs and methylation polymorphisms are exported in VCF format. Furthermore, methylation ratios are calculated and exported in a tab separated IGV specific format (.IGV) including the context (CG, CHG or CHH). Datasets for all species studied are available on genomespace (see https://gsui.genomespace.org/jsui/gsui.html?pathOrUrl=/Home/thomasvangurp/epiGBS%20Nature%20Methods/).
D) Annotation
(6) Usearch blastx3 is performed against reference protein sequences related to the species sequenced.
(7) Resulting blastx reads are imported into blast2go for mapping to gene ontology terms and enzyme codes. A list of all annotated genes is exported. Optionally, a list of contigs mapping to specific genes or with specific GO terms can be exported allowing for a focused analysis in RnBeads4.
E) Analysis
(8) A pipeline for processing the sample specific methylation bed files and experimental details like treatment and/or sample groups allows for comparing differential methylation between combinations of treatments, including all possible 2-way interactions between treatment groups using RnBeads4 or other methylation analysis tools.
1 Brent S Pedersen et al., "Fast and Accurate Alignment of Long Bisulfite-Seq Reads," arXiv:1401.1129v2, January 6, 2014.
2 Erik Garrison and Gabor Marth, "Haplotype-Based Variant Detection From Short-Read Sequencing," arXiv:1207.3907v2, July 17, 2012.
3 Robert C Edgar, "Search and Clustering Orders of Magnitude Faster Than BLAST.," Bioinformatics 26, no. 19 (October 1, 2010): 2460–61, doi:10.1093/bioinformatics/btq461.
4 Yassen Assenov et al., "Comprehensive Analysis of DNA Methylation Data with RnBeads.," Nature Methods 11, no. 11 (November 2014): 1138–40, doi:10.1038/nmeth.3115.