Extended Data Fig. 5: NGS data analysis for consortium experiments.
From: Sentinel cells programmed to respond to environmental DNA including human sequences

Overview of method used to obtain the data in Fig. 5c–f. (a) Bowtie2 was used to align reads to a reference file of sequences corresponding to all recorder constructs in sentinel cells with (recorder sequences SEN1, 2, 3, 4, 5) and without (recorder sequences SEN1ΔDT, 2ΔDT, 3ΔDT, 4ΔDT, 5ΔDT) the terminator present. (b) Unique reads were selected based on bowtie2-assigned MAPQ scores ≥20. Reads 1, 2, and 3 have unique alignments – read 1 aligns to the junction between homology arms after terminator excision (found in recorder sequence SEN1ΔDT only) and reads 2/3 align to the junction between homology arms and the terminator (found in recorder sequence SEN1 only) – so high MAPQ scores (≥20) would be assigned to these reads. Reads 4 and 5 have multiple possible alignments – read 4 aligns internal to a homology arm (found in recorder sequences SEN1ΔDT and SEN1) and read 5 aligns to gfp (found in all recorder sequences) – so low MAPQ scores would be assigned. MAPQ score distributions for all samples are shown in Supplementary Fig. 13. (c) Alignment profiles are generated from all reads with MAPQ scores ≥20. For SEN#ΔDT sequences (no terminator), the number of aligned reads is reported by summing the number of reads aligning to the SNP position (Fig. 5e and Supplementary Fig. 16). For SEN# sequences (with terminator), the number of aligned reads is reported by averaging the number of reads aligned at either end of the terminator (Supplementary Fig. 15,16). Alignment profiles for all samples are shown in Supplementary Fig. 14(d) Reads aligned at the SNP position in SEN#ΔDT sequences can be used to generate a consensus sequence logo, from which the identity of the SNP recorded can be extracted. All SNP sequences extracted are shown in Extended Data Fig. 6.