Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Can literature analysis identify innovation drivers in drug discovery?

Key Points

  • Drug discovery has traditionally responded to the 'pull' of unmet medical need and commercial potential. Here, we evaluate the scientific areas that are providing 'push' in terms of scientific innovation, as measured by publications and patents.

  • The amount of funding from the US National Institutes of Health, the number of doctoral degrees awarded, the number of publications and the number of approvals for new molecular entities issued by the US Food and Drug Administration show a general upward trend from the 1950s to the present. Although explanations can be suggested for some of the short-term ups and downs, these trends are also affected by numerous hidden variables, and well as time lags, feedback loops and other complications.

  • The numbers of publications in various disease areas correlate well with global disease burden.The correlation is greater in the developed world, where the impact of infectious and parasitic diseases, respiratory infections and infant mortality, is considerably lower than in the developing world.

  • A few therapeutic areas stand out when analysing trends in publications, citations, publications in high-impact journals and patents. Oncology is the clearest outlier on almost every metric. Viruses, nutrition and metabolism, and the immune system also show increases in the majority of metrics.

  • In terms of individual diseases, insulin resistance, orthomyxoviridae infections, depression, autism, macular degeneration, inflammation, obesity, cognitive disorders and ventricular dysfunction have strong publication growth in both 2- and 5-year periods. Publications related to the genes encoding forkhead box P3, leucine-rich repeat kinase 2, janus kinase 2, transcription factor 7-like 2, interleukin-17A, toll-like receptor 2 (TLR2), TLR4, FK506 binding protein 12-rapamycin associated protein 1and ADIPOQ (adiponectin, C1Q and collagen domain-containing) show the strongest publication growth in the same periods.

  • Assessing scientific innovation is a complex endeavour; however, the 'bibliome' seems to offer many approaches that could enhance decision-making in drug discovery.

Abstract

Drug discovery must be guided not only by medical need and commercial potential, but also by the areas in which new science is creating therapeutic opportunities, such as target identification and the understanding of disease mechanisms. To systematically identify such areas of high scientific activity, we use bibliometrics and related data-mining methods to analyse over half a terabyte of data, including PubMed abstracts, literature citation data and patent filings. These analyses reveal trends in scientific activity related to disease studied at varying levels, down to individual genes and pathways, and provide methods to monitor areas in which scientific advances are likely to create new therapeutic opportunities.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: NIH funding, US biology and chemistry doctorates awarded, PubMed publications and FDA NME approvals by year.
Figure 2: Rate of scientific publication versus relative disease burden for key therapeutic areas.
Figure 3: Scientific publication over three decades classified by MeSH disease headings.
Figure 4: Rates of change in scientific publication by year for MeSH disease categories.
Figure 5: Representation of MeSH disease categories in high-impact journals.
Figure 6: Highly-cited articles and associations with patent filings by disease area over 5 years.
Figure 7: Recent growth in publications by disease and by individual gene.

Similar content being viewed by others

References

  1. Zhong, X. & Moseley, G. B. Mission possible: managing innovation in drug discovery. Nature Biotech. 25, 945–946 (2007).

    Article  CAS  Google Scholar 

  2. Ullman, F. & Boutellier, R. A case study of lean drug discovery: from project driven research to innovation studios and process factories. Drug Discov. Today 13, 543–550 (2008).

    Article  CAS  Google Scholar 

  3. Sams-Dodd, F. Optimizing the discovery organization for innovation. Drug Discov. Today 10, 1049–1056 (2005).

    Article  Google Scholar 

  4. Cohen, F. J. Macro trends in pharmaceutical innovation. Nature Rev. Drug Discov. 4, 78–84 (2005).

    Article  CAS  Google Scholar 

  5. Chin-Dusting, J., Mizrahi, J., Jennings, G. & Fitzgerald, D. Finding improved medicines: the role of academic–industrial collaboration. Nature Rev. Drug Discov. 4, 891–897 (2005).

    Article  CAS  Google Scholar 

  6. Vallance, P. & Levick, M. Drug discovery and development in the age of molecular medicine. Clin. Pharmacol. Ther. 82, 363–366 (2007).

    Article  CAS  Google Scholar 

  7. Kneller, R. The origins of new drugs. Nature Biotech. 23, 529–530 (2005).

    Article  CAS  Google Scholar 

  8. Davenport, T. H. & Harris, J. G. Competing on Analytics: The New Science of Winning. (Harvard Business School Press, Boston, Massachusetts, 2007).

    Google Scholar 

  9. US Department of Health and Human Services. Innovation or stagnation? Challenge and opportunity on the critical path to new medical products. The National Institute for Pharmaceutical Technology and Education website [online] (2004).

  10. Card, D. & Lemieux, T. Going to college to avoid the draft: the unintended legacy of the Vietnam war. Am. Econ. Rev. 91, 97–102 (2001).

    Article  Google Scholar 

  11. Shumway, R. H. & Stoffer, D. S. Time Series Analysis and Its Applications. (Springer, New York, 2005).

    Google Scholar 

  12. Ruffalo, R. R. Why has R&D productivity declined in the pharmaceutical industry? Expert Opin. Drug Discov. 1, 99–102 (2006).

    Article  Google Scholar 

  13. Bren, L. Frances Oldham Kelsey: FDA medical reviewer leaves her mark on history. FDA Consum. 35, 24–29 (2001).

    PubMed  Google Scholar 

  14. Mathers, C. D. et al. The global burden of disease in 2002: data sources, methods and results. Global Programme on Evidence for Health Policy. Discussion Paper No. 54. World Health Organization (2003; revised 2004).

    Google Scholar 

  15. Teitelbaum, M. S. Research funding: structural disequilibria in biomedical research. Science 321, 644–645 (2008).

    Article  CAS  Google Scholar 

  16. Cohen, J. Bang for the buck. Science 321, 518–519 (2008).

    Article  Google Scholar 

  17. Taroncher-Oldenburg, G. & Marshall, A. Trends in biotech literature 2006. Nature Biotechnol. 25, 961 (2007).

    Article  CAS  Google Scholar 

  18. Payne, D. J., Gwynn, M. N., Holmes, D. J. & Pompliano, D. L. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nature Rev. Drug Discov. 6, 29–40 (2007).

    Article  CAS  Google Scholar 

  19. Vicente, M. et al. The fallacies of hope: will we discover new antibiotics to combat pathogenic bacteria in time? FEMS Microbiol. Rev. 30, 841–852 (2006).

    Article  CAS  Google Scholar 

  20. Coates, A. R. & Hu, Y. Novel approaches to developing new antibiotics for bacterial infections. Br. J. Pharmacol. 152, 1147–1154 (2007).

    Article  CAS  PubMed Central  Google Scholar 

  21. Ashiya, M. & Smith, R. E. T. Non-insulin therapies for type 2 diabetes. Nature Rev. Drug Discov. 6, 777–778 (2007).

    Article  CAS  Google Scholar 

  22. Das, S. K. & Chakrabarti, R. Non-insulin dependent diabetes mellitus: present therapies and new drug targets. Mini Rev. Med. Chem. 5, 1019–1034 (2005).

    Article  CAS  Google Scholar 

  23. Morral, N. Novel targets and therapeutic strategies for type 2 diabetes. Trends Endocrinol. Metab. 14, 169–175 (2003).

    Article  CAS  Google Scholar 

  24. Webby, R. J. & Webster, R. G. Are we ready for pandemic influenza? Science 302, 1519–1522 (2003).

    Article  CAS  Google Scholar 

  25. Caviedes, J. E. & Cimino, J. J. Towards the development of a conceptual distance metric for the UMLS. J. Biomed. Inform. 37, 77–85 (2004).

    Article  Google Scholar 

  26. Wang, X. et al. Automating terminological networks to link heterogeneous biomedical databases. Medinfo 11, 555–559 (2004).

    CAS  Google Scholar 

  27. Patel, C. O. & Cimino, J. J. Mining cross-terminology links in the UMLS. AMIA Annu. Symp. Proc. 2006, 624–628 (2006).

    PubMed Central  Google Scholar 

  28. Pedersen, T., Pakhomov, S. V., Patwardhan, S. & Chute, C. G. Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40, 288–299 (2007).

    Article  Google Scholar 

  29. Agarwal, P. & Searls, D. B. Literature mining in support of drug discovery. Brief Bioinform. 9, 479–492 (2008). In this article, the authors of the Analysis provide details of methods used herein and review wider applications of literature mining that are specifically aimed at drug discovery.

    Article  CAS  Google Scholar 

  30. Kalberer, J. T. Jr & Newell, G. R. Jr. Funding impact of the National Cancer Act and beyond. Cancer Res. 39, 4274–4284 (1979).

    PubMed  Google Scholar 

  31. Karpas, A. Human retroviruses in leukaemia and AIDS: reflections on their discovery, biology and epidemiology. Biol. Rev. Camb. Philos. Soc. 79, 911–933 (2004).

    Article  Google Scholar 

  32. Cohen, J. HIV/AIDS. Where have all the dollars gone? Science 321, 520 (2008).

    Article  CAS  Google Scholar 

  33. Dorsey, E. R. et al. Financial anatomy of neuroscience research. Ann. Neurol. 60, 652–659 (2006).

    Article  Google Scholar 

  34. Bollen, J., Rodriquez, M. A. & Van de Sompel, H. Journal status. Scientometrics 69, 669–687 (2006).

    Article  CAS  Google Scholar 

  35. Evans, J. A. Electronic publication and the narrowing of science and scholarship. Science 321, 395–399 (2008). A much discussed study showing that online publishing, and the ease of following hyperlinks, tends to channel researchers towards a narrower and more recent set of publications, with a possible loss of diversity and historical perspective.

    Article  CAS  Google Scholar 

  36. Ducor, P. Intellectual property: coauthorship and coinventorship. Science 289, 873–875 (2000).

    Article  CAS  Google Scholar 

  37. Murray, F. Innovation as co-evolution of scientific and technological networks: exploring tissue engineering. Res. Policy 31, 1389–1403 (2002).

    Article  Google Scholar 

  38. Fontenot, J. D. & Rudensky, A. Y. A well adapted regulatory contrivance: regulatory T cell development and the forkhead family transcription factor Foxp3. Nature Immunol. 6, 331–337 (2005).

    Article  CAS  Google Scholar 

  39. Mesa, R. A. New insights into the pathogenesis and treatment of chronic myeloproliferative disorders. Curr. Opin. Hematol. 15, 121–126 (2008).

    Article  CAS  Google Scholar 

  40. Gable, D. R., Hurel, S. J. & Humphries, S. E. Adiponectin and its gene variants as risk factors for insulin resistance, the metabolic syndrome and cardiovascular disease. Atherosclerosis 188, 231–244 (2006).

    Article  CAS  Google Scholar 

  41. Ramanana-Rahary, S., Zitt, M. & Rousseau, R. Aggregation properties of relative impact and other classical indicators: convexity issues and the Yule–Simpson paradox. Scientometrics 79, 311–327 (2009). Although somewhat technical, this paper describes important statistical artefacts that can arise when classifications of the scientific literature are aggregated or subdivided, including the reversal of certain trends.

    Article  Google Scholar 

  42. Zitt, M., Ramanana-Rahary, S. & Bassecoulard, E. Relativity of citation performance and excellence measures: from cross-field to cross-scale effects of field-normalisation. Scientometrics 63, 373–401 (2005).

    Article  Google Scholar 

  43. Lehmann, J. M. et al. An antidiabetic thiazolidinedione is a high affinity ligand for peroxisome proliferator-activated receptorγ (PPARγ). J. Biol. Chem. 270, 12953–12956 (1995).

    Article  CAS  Google Scholar 

  44. Calabrese, L. & Fleischer, A. B. Thalidomide: current and potential clinical applications. Am. J. Med. 108, 487–495 (2000).

    Article  CAS  Google Scholar 

  45. Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).

    Article  CAS  Google Scholar 

  46. Searls, D. B. Mining the bibliome. Pharmacogenomics J. 1, 88–89 (2001).

    Article  CAS  Google Scholar 

  47. De Solla Price, D. J. Little Science, Big Science. (Yale University, New Haven, 1963).

    Google Scholar 

  48. Price, D. J. Networks of scientific papers. Science 149, 510–515 (1965). A classical paper by the founder of scientometrics, which showed that networks of citations among scientific papers obey a power law distribution. It was published many decades before the study of such scale-free networks achieved prominence.

    Article  CAS  Google Scholar 

  49. Lawrence, P. A. The mismeasurement of science. Curr. Biol. 17, R583–R585 (2007).

    Article  CAS  Google Scholar 

  50. Lawrence, P. A. The politics of publication. Nature 422, 259–261 (2003).

    Article  CAS  Google Scholar 

  51. Garfield, E. & Melino, G. The growth of the cell death field: an analysis from the ISI-Science citation index. Cell Death Differ. 4, 352–361 (1997). In this paper, the originator of the impact factor, Eugene Garfield, uses bibliometrics to trace and analyse the historical development of the field of apoptosis.

    Article  CAS  Google Scholar 

  52. Takahashi, K., Aw, T. C. & Koh, D. An alternative to journal-based impact factors. Occup. Med. (Lond.) 49, 57–59 (1999).

    Article  CAS  Google Scholar 

  53. Sayers, E. & Wheeler, D. Building Customized Data Pipelines Using the Entrez Programming Utilities (eUtils). The NCBI website [online] (2004).

    Google Scholar 

  54. American Association for the Advancement of Science. Historical data on federal R&D, FY 1978–2009. The AAAS website [online] (2008).

  55. National Science Foundation. Doctoral degress awarded, by detailed field: 1920–99. The National Science Foundation website [online] (accessed 2009).

  56. Falkenheim, J. C. & Fiegener, M. K. 2007 records fifth consecutive annual increase in US doctoral awards. The National Science Foundation website [online] (2008).

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge P. Vallance for posing the problem, V. Kumar for implementing the methods to identify disease areas and genes with strong publication growth, and R. Liu for suggesting the use of common authors to map patents to publication, and thereby associate patents with MeSH key words. D. Rajagopalan, K. Kabnick, L. Liu, T. White, W. Reisdorf and Y. Li provided technical assistance and valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj Agarwal.

Supplementary information

Supplementary information S1 (table)

The PubMed queries that were used to generate publication counts that corresponded to the WHO disease areas in Figure 2. (PDF 454 kb)

Supplementary information S2 (table)

(a) Overlap in publications between disease areas: the number in each cell is the percentage of publications in the disease area in the specific row that is also contained in the disease area in a specific column. (PDF 194 kb)

Supplementary information S3 (table)

PubMed queries used for each disease area in figures 3, 4, 5 and 6. (PDF 108 kb)

Supplementary information S4 (table)

List of diseases with MeSH History notes dated between 1998 and 2007. (PDF 116 kb)

Supplementary information S5 (table)

List of pathways (of size between 10 and 100 associated genes) from Ingenuity (http://www.ingenuity.com), GeneGO (http://www.genego.com) and Biocarta (http://www.biocarta.com) that have a significant fraction of 'hot' genes (that is, genes that are associated with a significant increase in publication counts). (PDF 99 kb)

Related links

Related links

FURTHER INFORMATION

David B. Searls's homepage

National Science Foundation Science and Engineering Statistics

US FDA Center for Drug Evaluation and Research

World Health Report

Glossary

Impact factor

The average number of citations made in a given year to articles that were published in the journal in question during the two preceding years. Impact factors (and many variants) are used to rank journals for their impact on a field, and are sometimes misused in evaluating individual researchers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agarwal, P., Searls, D. Can literature analysis identify innovation drivers in drug discovery?. Nat Rev Drug Discov 8, 865–878 (2009). https://doi.org/10.1038/nrd2973

Download citation

  • Issue date:

  • DOI: https://doi.org/10.1038/nrd2973

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing