Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

Abstract

Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb–2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes ~12 h with preprocessing when all use cases are run sequentially (~4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (~48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: FitHiC2 flowchart.
Fig. 2: Reproducibility and validation of FitHiC2 calls.
Fig. 3

Similar content being viewed by others

Data availability

FitHiC2 calls for different Hi-C datasets as well as processed files from published data that are used as references are provided in the Zenodo repository: https://doi.org/10.5281/zenodo.338058935.

Code availability

The source code and the documentation of FitHiC2 are publicly available through GitHub: https://github.com/ay-lab/fithic. An executable version is also provided on Code Ocean at https://codeocean.com/capsule/4528858/36. The source code is distributed under the MIT license at https://opensource.org/licenses/MIT.

References

  1. Bickmore, W. A. The spatial organization of the human genome. Annu. Rev. Genomics Hum. Genet. 14, 67–84 (2013).

    Article  CAS  Google Scholar 

  2. Dekker, J., Marti-Renom, M. A. & Mirny, L. A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).

    Article  CAS  Google Scholar 

  3. Quinodoz, S. A. et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174, 744–757.e24 (2018).

    Article  CAS  Google Scholar 

  4. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  Google Scholar 

  5. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).

    Article  CAS  Google Scholar 

  6. Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).

    Article  Google Scholar 

  7. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  Google Scholar 

  8. Stadhouders, R. et al. Transcription regulation by distal enhancers: who’s in the loop? Transcription 3, 181–186 (2012).

    Article  Google Scholar 

  9. Ay, F. & Noble, W. S. Analysis methods for studying the 3D architecture of the genome. Genome Biol. 16, 183 (2015).

    Article  Google Scholar 

  10. Lajoie, B. R., Dekker, J. & Kaplan, N. The hitchhiker’s guide to Hi-C analysis: practical guidelines. Methods 72, 65–75 (2015).

    Article  CAS  Google Scholar 

  11. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  Google Scholar 

  12. Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

    Article  CAS  Google Scholar 

  13. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).

    Article  CAS  Google Scholar 

  14. Ay, F., Bailey, T. L. & Noble, W. S. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).

    Article  CAS  Google Scholar 

  15. Bhattacharyya, S., Chandra, V., Vijayanand, P. & Ay, F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat. Commun. 10, 4221 (2019).

    Article  Google Scholar 

  16. Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33, 1029–1047 (2013).

    Article  Google Scholar 

  17. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  18. Ay, F. et al. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Res. 24, 974–988 (2014).

    Article  CAS  Google Scholar 

  19. Wang, C. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).

    Article  Google Scholar 

  20. Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

    Article  CAS  Google Scholar 

  21. Ay, F. et al. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genomics 16, 121 (2015).

    Article  Google Scholar 

  22. Bunnik, E. M. et al. Comparative 3D genome organization in apicomplexan parasites. Proc. Natl Acad. Sci. USA 116, 3183–3192 (2019).

    Article  CAS  Google Scholar 

  23. Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).

    Article  CAS  Google Scholar 

  24. Hwang, Y. C. et al. HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics 31, 1290–1292 (2015).

    Article  Google Scholar 

  25. Lun, A. T. & Smyth, G. K. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics 16, 258 (2015).

    Article  Google Scholar 

  26. Mifsud, B. et al. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS One 12, e0174744 (2017).

    Article  Google Scholar 

  27. Carty, M. et al. An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data. Nat. Commun. 8, 15454 (2017).

    Article  CAS  Google Scholar 

  28. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

    Article  CAS  Google Scholar 

  29. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  CAS  Google Scholar 

  30. Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics, https://doi.org/10.1093/bioinformatics/btx664 (2017).

  31. Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).

    Article  CAS  Google Scholar 

  32. Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).

    Article  CAS  Google Scholar 

  33. Yardimci, G. G. et al. Measuring the reproducibility and quality of Hi-C data. Genome Biol. 20, 57 (2019).

    Article  Google Scholar 

  34. Huang, J., Marco, E., Pinello, L. & Yuan, G. C. Predicting chromatin organization using histone marks. Genome Biol. 16, 162 (2015).

    Article  Google Scholar 

  35. Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Zenodo, https://doi.org/10.5281/zenodo.3380589 (2019).

  36. Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Code Ocean, https://doi.org/10.24433/CO.5589539.v2 (2019).

  37. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  Google Scholar 

  38. Yardimci, G. G. & Noble, W. S. Software tools for visualizing Hi-C data. Genome Biol. 18, 26 (2017).

Download references

Acknowledgements

We would like to thank William S. Noble and Timothy L. Bailey for their contributions to earlier versions of Fit-Hi-C. We are also thankful to Abhijit Chakraborty for his feedback on the Fit-Hi-C package. Finally, we would like to thank all users of Fit-Hi-C/FitHiC2 who have reached out to us with their questions and valuable suggestions leading to significant improvements in the implementation and documentation. This work was funded by NIH grant R35-GM128938 to F.A.

Author information

Authors and Affiliations

Authors

Contributions

A.K. implemented the current version of FitHiC2. S.B. developed the merging filter module. A.K. and S.B. performed data analysis and wrote the manuscript under the supervision of F.A., who developed the original Fit-Hi-C code. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ferhat Ay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Ay, F. et al. Genome Res. 24, 999–1011 (2014): https://doi.org/10.1101/gr.160374.113

Sima, J. et al. Cell. 176, 816–830.e18 (2019): https://doi.org/10.1016/j.cell.2018.11.036

Bunnik, E. et al. Proc. Natl Acad. Sci. USA. 116, 3183–3192 (2019): https://doi.org/10.1073/pnas.1810815116

Zheng, Y., et al. Elife. 8, e38070 (2019): https://doi.org/10.7554/eLife.38070.001

Key data used in this protocol

Rao, S. et al. Cell. 159, 1665–1680 (2014): https://doi.org/10.1016/j.cell.2014.11.021

Quinodoz, S. et al. Cell. 174, 744–757.e24 (2018): https://doi.org/10.1016/j.cell.2018.05.024

Dixon, J. et al. Nat. Genet. 50, 1388–1398 (2018): https://doi.org/10.1038/s41588-018-0195-8

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat Protoc 15, 991–1012 (2020). https://doi.org/10.1038/s41596-019-0273-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41596-019-0273-0

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics