Evaluation of methods for modeling transcription factor sequence specificity

Weirauch, Matthew T; Cote, Atina; Norel, Raquel; Annala, Matti; Zhao, Yue; Riley, Todd R; Saez-Rodriguez, Julio; Cokelaer, Thomas; Vedenko, Anastasia; Talukder, Shaheynoor; Bussemaker, Harmen J; Morris, Quaid D; Bulyk, Martha L; Stolovitzky, Gustavo; Hughes, Timothy R

doi:10.1038/nbt.2486

Analysis
Published: 27 January 2013

Evaluation of methods for modeling transcription factor sequence specificity

Matthew T Weirauch^1,2,
Atina Cote¹,
Raquel Norel³,
Matti Annala⁴,
Yue Zhao⁵,
Todd R Riley⁶,
Julio Saez-Rodriguez⁷,
Thomas Cokelaer⁷,
Anastasia Vedenko⁸,
Shaheynoor Talukder¹,
DREAM5 Consortium,
Harmen J Bussemaker⁶,
Quaid D Morris^1,9,
Martha L Bulyk^8,10,11,
Gustavo Stolovitzky³ &
…
Timothy R Hughes^1,9

Nature Biotechnology volume 31, pages 126–134 (2013)Cite this article

26k Accesses
377 Citations
35 Altmetric
Metrics details

Subjects

Abstract

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Evaluation criteria used in this study.**

**Figure 2: Comparison of algorithm performance by TF.**

**Figure 3: Comparison of algorithm performance on *in vivo* data.**

**Figure 4: Characteristics of Klf9 motifs produced by the eight PWM-based algorithms evaluated in this study.**

Uncovering uncharacterized binding of transcription factors from ATAC-seq footprinting data

Article Open access 23 April 2024

DNA-guided transcription factor interactions extend human gene regulatory code

Article Open access 09 April 2025

Position-dependent function of human sequence-specific transcription factors

Article Open access 17 July 2024

Accession codes

Accessions

Gene Expression Omnibus

References

Stormo, G.D., Schneider, T.D., Gold, L. & Ehrenfeucht, A. Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10, 2997–3011 (1982).
Article CAS PubMed PubMed Central Google Scholar
Berg, O.G. & von Hippel, P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–743 (1987).
Article CAS PubMed Google Scholar
Stormo, G.D. Consensus patterns in DNA. Methods Enzymol. 183, 211–221 (1990).
Article CAS PubMed Google Scholar
Siddharthan, R. Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS ONE 5, e9722 (2010).
Article PubMed PubMed Central Google Scholar
Zhao, X., Huang, H. & Speed, T.P. Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005).
Article CAS PubMed Google Scholar
Sharon, E., Lubliner, S. & Segal, E. A feature-based approach to modeling protein-DNA interactions. PLOS Comput. Biol. 4, e1000154 (2008).
Article PubMed PubMed Central Google Scholar
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
Article CAS PubMed PubMed Central Google Scholar
Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011).
Article CAS PubMed PubMed Central Google Scholar
Maerkl, S.J. & Quake, S.R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).
Article CAS PubMed Google Scholar
Agius, P., Arvey, A., Chang, W., Noble, W.S. & Leslie, C. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010).
Article PubMed PubMed Central Google Scholar
Annala, M., Laurila, K., Lähdesmäki, H. & Nykter, M. A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhao, Y., Granas, D. & Stormo, G.D. Inferring binding energies from selected binding sites. PLOS Comput. Biol. 5, e1000590 (2009).
Article PubMed PubMed Central Google Scholar
Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).
Article CAS PubMed PubMed Central Google Scholar
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zykovich, A., Korf, I. & Segal, D.J. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 37, e151 (2009).
Article PubMed PubMed Central Google Scholar
Fordyce, P.M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nat. Biotechnol. 28, 970–975 (2010).
Article CAS PubMed PubMed Central Google Scholar
Warren, C.L. et al. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl. Acad. Sci. USA 103, 867–872 (2006).
Article CAS PubMed PubMed Central Google Scholar
Meng, X., Brodsky, M.H. & Wolfe, S.A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23, 988–994 (2005).
Article CAS PubMed PubMed Central Google Scholar
Berger, M.F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).
Article CAS PubMed PubMed Central Google Scholar
Stormo, G.D. & Zhao, Y. Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–760 (2010).
Article CAS PubMed Google Scholar
Prill, R.J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).
Article PubMed PubMed Central Google Scholar
Stolovitzky, G., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Ann. NY Acad. Sci. 1115, 1–22 (2007).
Article PubMed Google Scholar
Stolovitzky, G., Prill, R.J. & Califano, A. Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009).
Article CAS PubMed Google Scholar
Zhao, Y. & Stormo, G.D. Quantitative analysis demonstrates most transcription factors require only simple models of specificity. Nat. Biotechnol. 29, 480–483 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhao, Y., Ruan, S., Pandey, M. & Stormo, G.D. Improved models for transcription factor binding site identification using non-independent interactions. Genetics 191, 781–790 (2012).
Article CAS PubMed PubMed Central Google Scholar
Foat, B.C., Morozov, A.V. & Bussemaker, H.J. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22, e141–e149 (2006).
Article CAS PubMed Google Scholar
Chen, X., Hughes, T.R. & Morris, Q. RankMotif.: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors. Bioinformatics 23, i72–i79 (2007).
Article CAS PubMed Google Scholar
Berger, M.F. et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008).
Article CAS PubMed PubMed Central Google Scholar
Rhee, H.S. & Pugh, B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wei, G.H. et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010).
Article CAS PubMed PubMed Central Google Scholar
de Boer, C.G. & Hughes, T.R. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 40, D169–D179 (2012).
Article CAS PubMed Google Scholar
Kulakovskiy, I.V., Boeva, V.A., Favorov, A.V. & Makeev, V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 26, 2622–2623 (2010).
Article CAS PubMed Google Scholar
Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).
Article CAS PubMed PubMed Central Google Scholar
John, S., Marais, R., Child, R., Light, Y. & Leonard, W.J. Importance of low affinity Elf-1 sites in the regulation of lymphoid-specific inducible gene expression. J. Exp. Med. 183, 743–750 (1996).
Article CAS PubMed Google Scholar
Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).
Article CAS PubMed PubMed Central Google Scholar
Jaeger, S.A. et al. Conservation and regulatory associations of a wide affinity range of mouse transcription factor binding sites. Genomics 95, 185–195 (2010).
Article CAS PubMed Google Scholar
Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008).
Article CAS PubMed Google Scholar
Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Article CAS PubMed PubMed Central Google Scholar
Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS PubMed PubMed Central Google Scholar
Keilwagen, J. et al. De-novo discovery of differentially abundant transcription factor binding sites including their positional preference. PLOS Comput. Biol. 7, e1001070 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Schutz, F. & Delorenzi, M. MAMOT: hidden Markov modeling tool. Bioinformatics 24, 1399–1400 (2008).
Article CAS PubMed Google Scholar
Kinney, J.B., Tkacik, G. & Callan, C.G. Jr. Precise physical models of protein-DNA interaction from high-throughput data. Proc. Natl. Acad. Sci. USA 104, 501–506 (2007).
Article CAS PubMed Google Scholar
Kinney, J.B., Murugan, A., Callan, C.G. Jr. & Cox, E.C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl. Acad. Sci. USA 107, 9158–9163 (2010).
Article CAS PubMed PubMed Central Google Scholar
Linhart, C., Halperin, Y. & Shamir, R. Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. Genome Res. 18, 1180–1189 (2008).
Article CAS PubMed PubMed Central Google Scholar
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., B 58, 267–288 (1996).
Google Scholar
Chen, C.Y. et al. Discovering gapped binding sites of yeast transcription factors. Proc. Natl. Acad. Sci. USA 105, 2527–2532 (2008).
Article CAS PubMed PubMed Central Google Scholar
Philippakis, A.A., Qureshi, A.M., Berger, M.F. & Bulyk, M.L. Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008).
Article CAS PubMed PubMed Central Google Scholar
Lam, K.N., van Bakel, H., Cote, A.G., van der Ven, A. & Hughes, T.R. Sequence specificity is obtained from the majority of modular C2H2 zinc-finger arrays. Nucleic Acids Res. 39, 4680–4690 (2011).
Article CAS PubMed PubMed Central Google Scholar
Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).
Article CAS PubMed Google Scholar
Eddy, S.R. A new generation of homology search tools based on probabilistic inference. Genome Inform. 23, 205–211 (2009).
PubMed Google Scholar
Chen, L., Wu, G. & Ji, H. hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data. Bioinformatics 27, 1447–1448 (2011).
Article CAS PubMed PubMed Central Google Scholar
Parkinson, H. et al. ArrayExpress update–an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 39, D1002–D1004 (2011).
Article CAS PubMed Google Scholar
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011).
Article CAS PubMed Google Scholar
Dreszer, T.R. et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 40, D918–D923 (2012).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank H. van Bakel and M. Albu for database assistance, and members of the Hughes laboratory for helpful discussion. M.T.W. was supported by fellowships from the Canadian Institutes of Health Research (CIHR) and the Canadian Institute for Advanced Research (CIFAR) Junior Fellows Genetic Networks Program. This work was supported in part by the Ontario Research Fund and Genome Canada through the Ontario Genomics Institute, and the March of Dimes (T.R.H.). Funding was also provided by Operating Grant MOP-77721 from CIHR to T.R.H. and M.L.B., and grant no. R01 HG003985 from the US National Institutes of Health/National Human Genome Research Institute to M.L.B., as well as US National Institutes of Health grants R01HG003008 and U54CA121852 and a John Simon Guggenheim Foundation Fellowship to H.J.B. M.A., K.L., H.L. and M.L. were supported by the Academy of Finland (project 260403) and EU ERASysBio ERA-NET. Y.O., C.L. and R.S. were funded by the European Community's Seventh Framework Programme under grant agreement no. HEALTH-F4-2009-223575 for the TRIREME project, and by the Israel Science Foundation (grant no. 802/08). Y.O. was supported in part by a fellowship from the Edmond J. Safra Bioinformatics Program at Tel Aviv University. J.G., I.G., S.P. and J.K. were supported by grant XP3624HP/0606T by the Ministry of Culture of Saxony-Anhalt. A.M. was supported by US National Science Foundation (NSF) grant PHY-1022140. C.C. was supported by NSF grant PHY-0957573. J.B.K. was supported by the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory.

Author information

Authors and Affiliations

Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
Matthew T Weirauch, Atina Cote, Shaheynoor Talukder, Quaid D Morris & Timothy R Hughes
Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Rheumatology and Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
Matthew T Weirauch
IBM Computational Biology Center, Yorktown Heights, New York, New York, USA
Raquel Norel & Gustavo Stolovitzky
Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Matti Annala
Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Yue Zhao
Department of Biological Sciences, Columbia University, and Center for Computational Biology and Bioinformatics, Columbia University Medical Center, New York, New York, USA
Todd R Riley & Harmen J Bussemaker
EMBL-EBI European Bioinformatics Institute, Cambridge, UK
Julio Saez-Rodriguez & Thomas Cokelaer
Department of Medicine, Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Anastasia Vedenko & Martha L Bulyk
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Quaid D Morris & Timothy R Hughes
Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Martha L Bulyk
Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA
Martha L Bulyk
Computational Biology Program, Sloan-Kettering Institute, Memorial Sloan-Kettering Cancer Center, New York, New York, USA
Phaedra Agius, Aaron Arvey & Christina Leslie
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Philipp Bucher, Vidhya Jagannathan & Christoph D Schmid
EPFL (École Polytechnique Fédérale de Lausanne) SV ISREC (The Swiss Institute for Experimental Cancer Research) GR-BUCHER, Lausanne, Switzerland
Philipp Bucher
Department of Physics, Princeton University, Princeton, New Jersey, USA
Curtis G Callan Jr & Anand Murugan
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
Curtis G Callan Jr
Genome Institute of Singapore, Singapore
Cheng Wei Chang & Wing-Kin Sung
Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan
Chien-Yu Chen, Yong-Syuan Chen & Yu-Wei Chu
Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan
Yu-Wei Chu
Institute of Computer Science, Martin Luther University, Halle-Wittenberg, Germany
Jan Grau, Ivo Grosse & Stefan Posch
Institute for Genetics, University of Bern, Bern, Switzerland
Vidhya Jagannathan
Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany
Jens Keilwagen
Max Planck Institute for Molecular Genetics, Berlin, Germany
Szymon M Kiełbasa, Alena Myšičková & Martin Vingron
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
Justin B Kinney
MicroDiscovery GmbH, Berlin, Germany
Holger Klein
Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
Miron B Kursa & Witold R Rudnicki
Department of Information and Computer Science, Aalto University School of Science and Technology, Aalto, Finland
Harri Lähdesmäki
Turku Centre for Biotechnology, Turku University, Turku, Finland
Harri Lähdesmäki
Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Kirsti Laurila & Matti Nykter
Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas, USA
Chengwei Lei & Jianhua Ruan
Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Chaim Linhart, Yaron Orenstein & Ron Shamir
Department of Genome Sciences, University of Washington, Seattle, Washington, USA
William Stafford Noble
Swiss Tropical and Public Health Institute (Swiss TPH), Basel, Switzerland
Christoph D Schmid
University of Basel, Basel, Switzerland
Christoph D Schmid
School of Computing, National University of Singapore, Singapore
Wing-Kin Sung & Zhizhuo Zhang

Authors

Matthew T Weirauch
View author publications
Search author on:PubMed Google Scholar
Atina Cote
View author publications
Search author on:PubMed Google Scholar
Raquel Norel
View author publications
Search author on:PubMed Google Scholar
Matti Annala
View author publications
Search author on:PubMed Google Scholar
Yue Zhao
View author publications
Search author on:PubMed Google Scholar
Todd R Riley
View author publications
Search author on:PubMed Google Scholar
Julio Saez-Rodriguez
View author publications
Search author on:PubMed Google Scholar
Thomas Cokelaer
View author publications
Search author on:PubMed Google Scholar
Anastasia Vedenko
View author publications
Search author on:PubMed Google Scholar
Shaheynoor Talukder
View author publications
Search author on:PubMed Google Scholar
Harmen J Bussemaker
View author publications
Search author on:PubMed Google Scholar
Quaid D Morris
View author publications
Search author on:PubMed Google Scholar
Martha L Bulyk
View author publications
Search author on:PubMed Google Scholar
Gustavo Stolovitzky
View author publications
Search author on:PubMed Google Scholar
Timothy R Hughes
View author publications
Search author on:PubMed Google Scholar

Consortia

DREAM5 Consortium

Phaedra Agius
, Aaron Arvey
, Philipp Bucher
, Curtis G Callan Jr
, Cheng Wei Chang
, Chien-Yu Chen
, Yong-Syuan Chen
, Yu-Wei Chu
, Jan Grau
, Ivo Grosse
, Vidhya Jagannathan
, Jens Keilwagen
, Szymon M Kiełbasa
, Justin B Kinney
, Holger Klein
, Miron B Kursa
, Harri Lähdesmäki
, Kirsti Laurila
, Chengwei Lei
, Christina Leslie
, Chaim Linhart
, Anand Murugan
, Alena Myšičková
, William Stafford Noble
, Matti Nykter
, Yaron Orenstein
, Stefan Posch
, Jianhua Ruan
, Witold R Rudnicki
, Christoph D Schmid
, Ron Shamir
, Wing-Kin Sung
, Martin Vingron
& Zhizhuo Zhang

Contributions

M.T.W. and T.R.H. wrote the manuscript. T.R.H., M.T.W., M.L.B. and A.V. conceived of the study. M.T.W. did the majority of the computational analyses. M.A., Y.Z. and T.R.R. did additional computational analyses. A.C. and S.T. performed the PBM experiments. T.R.H., M.T.W., G.S. and R.N. designed and carried out the DREAM5 TF challenge. The DREAM5 Consortium and M.A. participated in the DREAM5 TF challenge. R.N., J.S.-R., T.C. and M.T.W. designed and created the prediction server. M.L.B., G.S., Q.D.M. and H.J.B. provided critical feedback on the manuscript.

Corresponding author

Correspondence to Timothy R Hughes.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weirauch, M., Cote, A., Norel, R. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31, 126–134 (2013). https://doi.org/10.1038/nbt.2486

Download citation

Received: 23 July 2012
Accepted: 18 December 2012
Published: 27 January 2013
Issue date: February 2013
DOI: https://doi.org/10.1038/nbt.2486

This article is cited by

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis
- Omer Ali
- Amna Farooq
- Junbai Wang
BMC Bioinformatics (2022)
Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning
- H. Tomas Rube
- Chaitanya Rastogi
- Harmen J. Bussemaker
Nature Biotechnology (2022)
Navigating the pitfalls of applying machine learning in genomics
- Sean Whalen
- Jacob Schreiber
- Katherine S. Pollard
Nature Reviews Genetics (2022)
DNA sequence classification based on MLP with PILAE algorithm
- Mohammed A. B. Mahmoud
- Ping Guo
Soft Computing (2021)
Deep learning for HGT insertion sites recognition
- Chen Li
- Jiaxing Chen
- Shuai Cheng Li
BMC Genomics (2020)