Extended Data Fig. 1: Overview of experimental and computational workflow. | Nature

Extended Data Fig. 1: Overview of experimental and computational workflow.

From: Defining genome architecture at base-pair resolution

Extended Data Fig. 1

a, Cells are initially fixed with formaldehyde and then permeabilized with digitonin. They are subsequently treated with MNase at different concentrations. End repair and ligation are then performed. This results in the ligation of sequences that are in close proximity in the nucleus. DNA is then extracted to generate an MNase 3C library. This library is sonicated to a fragment size of around 200 bp. Illumina sequencing adaptors are added to the library. This library manufacture process is scaled up to maximize the amount of DNA available and the complexity of the libraries. Multiple samples with different sequencing indices are then mixed. The DNA is then denatured and mixed with a pool of biotinylated oligonucleotides. These 120-mer oligonucleotides were designed to capture the central portion of the hypersensitive site at the promoter or the central sequence of CTCF sites guided by a combination of motif analysis and DNase I footprinting. Following a hybridization reaction, a streptavidin bead pull down is performed and the uncaptured material is washed away. The material is PCR amplified and the oligonucleotide capture is repeated to improve the purity. The reads are then sequenced with 300-bp paired-end reads, which allows the entire sequence of each read to be determined as the DNA is fragmented to 200 bp by sonication. b, The overview of the data analysis. The raw FASTQ file is processed to reconstruct a single read from paired-end sequencing data. The single reads are then mapped to the 800-bp sequence surrounding the capture oligonucleotide using the non-stringent aligner BLAT. This enables the reads to be cut into ‘slices’ depending on whether they align to the sequence around the capture site. This strategy allows ligation junctions in the read to be determined with base-pair accuracy. The resulting FASTQ file is aligned to the genome using Bowtie 2 (ref. 28). This file is processed to remove PCR duplicates and the junctions of the ‘slices’ within the reads are identified. c, The different methods used for data visualization. Simple read pile-ups are generally used. However, the resolution can be further increased by reporting the precise base-pair position of the ligation junctions. As protein binding protects against DNA digestion by MNase, the regions of protein binding can be inferred from footprints in the junction plots that are similar to DNase I footprinting. More detailed localization of the protein-binding site that results in the interaction can be achieved by separating the junction profiles based on whether the read and therefore protein-binding site is upstream or downstream of the junction. Finally, single-base-pair resolution maps of junctions between the capture site and the peaks at regulatory elements can be generated. In the example above the central binding site of the two interacting CTCF sites is protected and the ligation junctions surround this. The direction of the reads at the capture and reporter sites can be used to identify the site of the proteins giving rise to the ligation junctions. This can be plotted with arrow plots. Here this shows that the central CTCF motif at both the capture and reporter site are the origin of the contacts between the two sites. This is more easily visualized using 3D surface plots. These were constructed by converting each data point into a rectangle 20 bp long and 4 bp wide in the direction of the reads giving rise to the interactions. This shows the central binding site of the two CTCF sites giving rise to the interactions and contacts between these central CTCF motif and the neighbouring nucleosomes.

Back to article page