Extended Data Fig. 1: Characterization of GRID-seq libraries.

a, Major steps of the GRID-seq technology. Seedlings are crosslinked with formaldehyde and nuclei are isolated. Genomic DNA is fragmented by AluI. The 3’ end of RNA is ligated to 5’-adenylated linker and the linker is extended by reverse transcription to generate first-strand cDNA complementary to the RNA. The other end of linker and AluI-cleaved DNA are then ligated, forming a RNA-linker-DNA chimera. After reverse crosslinking, DNA purification, biotin selection, second-strand cDNA synthesis and MmeI digestion, the desired RNA-DNA chimeras are purified and subjected to library construction and sequencing. b, Pipeline of GRID-seq bioinformatic analysis. PCR duplications and adapter sequence are removed. Filtered reads, which contain the linker sequence, are divided into RNA parts and DNA parts according to the orientation of linker sequence. The RNA/DNA parts are mapped to the Arabidopsis nuclear genome (TAIR10) and mate pairs are merged to identify unique RNA-DNA interactions. Finally, background interactions are deducted to identify specific RNA-DNA interactions. c, Scatter plot showing background-corrected RNA reads in two biological replicates of GRID-seq data. The Pearson correlation coefficient is shown. d, Nucleotide frequency of mapped RNA (upper panel) and DNA (lower panel) reads. e, Strand orientation of mapped RNA and DNA reads. f, Pie chart showing the percentages of different types of RNAs in GRID-seq data (upper). Pie chart showing the percentages of uniquely mapped DNA reads distributed in different genomic regions in GRID-seq data (lower), distal intergenic: > 1 Kb to the nearest gene, upstream: within 1 Kb upstream of the nearest TSS, downstream: within1 Kb downstream of the nearest TTS. g, Scatter plot showing background-corrected RNA reads in two biological replicates of GRO-seq data. The Pearson correlation coefficient is shown. h, Meta- analyses showing two biological replicates of GRO-seq signals across the gene region. TSS (transcription start site), TTS (transcription termination site). i, Scatter plot of background-corrected RNA reads in GRID-seq data versus RNA reads in GRO-seq data. Blue dots represent RNAs that have comparable reads in GRID-seq and GRO-seq. Pink dots represent RNAs that have higher reads in GRID-seq. Orange dots represent RNAs that have higher reads in GRO-seq. RPKM: reads per kilobase per million mapped reads. FPKM: fragments per kilobase per million mapped reads. j, Boxplot showing the transcription levels of target genes that are involved in cis, short-range intra-chromosomal, long-range intra-chromosomal and inter-chromosomal interactions as determined by two biological replicates of GRO-seq. On each box, the horizontal mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers (99.3 percentage coverage). Significance of difference between groups is determined by Kruskal-Wallis test.