Key Points
-
Drug discovery has traditionally responded to the 'pull' of unmet medical need and commercial potential. Here, we evaluate the scientific areas that are providing 'push' in terms of scientific innovation, as measured by publications and patents.
-
The amount of funding from the US National Institutes of Health, the number of doctoral degrees awarded, the number of publications and the number of approvals for new molecular entities issued by the US Food and Drug Administration show a general upward trend from the 1950s to the present. Although explanations can be suggested for some of the short-term ups and downs, these trends are also affected by numerous hidden variables, and well as time lags, feedback loops and other complications.
-
The numbers of publications in various disease areas correlate well with global disease burden.The correlation is greater in the developed world, where the impact of infectious and parasitic diseases, respiratory infections and infant mortality, is considerably lower than in the developing world.
-
A few therapeutic areas stand out when analysing trends in publications, citations, publications in high-impact journals and patents. Oncology is the clearest outlier on almost every metric. Viruses, nutrition and metabolism, and the immune system also show increases in the majority of metrics.
-
In terms of individual diseases, insulin resistance, orthomyxoviridae infections, depression, autism, macular degeneration, inflammation, obesity, cognitive disorders and ventricular dysfunction have strong publication growth in both 2- and 5-year periods. Publications related to the genes encoding forkhead box P3, leucine-rich repeat kinase 2, janus kinase 2, transcription factor 7-like 2, interleukin-17A, toll-like receptor 2 (TLR2), TLR4, FK506 binding protein 12-rapamycin associated protein 1and ADIPOQ (adiponectin, C1Q and collagen domain-containing) show the strongest publication growth in the same periods.
-
Assessing scientific innovation is a complex endeavour; however, the 'bibliome' seems to offer many approaches that could enhance decision-making in drug discovery.
Abstract
Drug discovery must be guided not only by medical need and commercial potential, but also by the areas in which new science is creating therapeutic opportunities, such as target identification and the understanding of disease mechanisms. To systematically identify such areas of high scientific activity, we use bibliometrics and related data-mining methods to analyse over half a terabyte of data, including PubMed abstracts, literature citation data and patent filings. These analyses reveal trends in scientific activity related to disease studied at varying levels, down to individual genes and pathways, and provide methods to monitor areas in which scientific advances are likely to create new therapeutic opportunities.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
References
Zhong, X. & Moseley, G. B. Mission possible: managing innovation in drug discovery. Nature Biotech. 25, 945–946 (2007).
Ullman, F. & Boutellier, R. A case study of lean drug discovery: from project driven research to innovation studios and process factories. Drug Discov. Today 13, 543–550 (2008).
Sams-Dodd, F. Optimizing the discovery organization for innovation. Drug Discov. Today 10, 1049–1056 (2005).
Cohen, F. J. Macro trends in pharmaceutical innovation. Nature Rev. Drug Discov. 4, 78–84 (2005).
Chin-Dusting, J., Mizrahi, J., Jennings, G. & Fitzgerald, D. Finding improved medicines: the role of academic–industrial collaboration. Nature Rev. Drug Discov. 4, 891–897 (2005).
Vallance, P. & Levick, M. Drug discovery and development in the age of molecular medicine. Clin. Pharmacol. Ther. 82, 363–366 (2007).
Kneller, R. The origins of new drugs. Nature Biotech. 23, 529–530 (2005).
Davenport, T. H. & Harris, J. G. Competing on Analytics: The New Science of Winning. (Harvard Business School Press, Boston, Massachusetts, 2007).
US Department of Health and Human Services. Innovation or stagnation? Challenge and opportunity on the critical path to new medical products. The National Institute for Pharmaceutical Technology and Education website [online] (2004).
Card, D. & Lemieux, T. Going to college to avoid the draft: the unintended legacy of the Vietnam war. Am. Econ. Rev. 91, 97–102 (2001).
Shumway, R. H. & Stoffer, D. S. Time Series Analysis and Its Applications. (Springer, New York, 2005).
Ruffalo, R. R. Why has R&D productivity declined in the pharmaceutical industry? Expert Opin. Drug Discov. 1, 99–102 (2006).
Bren, L. Frances Oldham Kelsey: FDA medical reviewer leaves her mark on history. FDA Consum. 35, 24–29 (2001).
Mathers, C. D. et al. The global burden of disease in 2002: data sources, methods and results. Global Programme on Evidence for Health Policy. Discussion Paper No. 54. World Health Organization (2003; revised 2004).
Teitelbaum, M. S. Research funding: structural disequilibria in biomedical research. Science 321, 644–645 (2008).
Cohen, J. Bang for the buck. Science 321, 518–519 (2008).
Taroncher-Oldenburg, G. & Marshall, A. Trends in biotech literature 2006. Nature Biotechnol. 25, 961 (2007).
Payne, D. J., Gwynn, M. N., Holmes, D. J. & Pompliano, D. L. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nature Rev. Drug Discov. 6, 29–40 (2007).
Vicente, M. et al. The fallacies of hope: will we discover new antibiotics to combat pathogenic bacteria in time? FEMS Microbiol. Rev. 30, 841–852 (2006).
Coates, A. R. & Hu, Y. Novel approaches to developing new antibiotics for bacterial infections. Br. J. Pharmacol. 152, 1147–1154 (2007).
Ashiya, M. & Smith, R. E. T. Non-insulin therapies for type 2 diabetes. Nature Rev. Drug Discov. 6, 777–778 (2007).
Das, S. K. & Chakrabarti, R. Non-insulin dependent diabetes mellitus: present therapies and new drug targets. Mini Rev. Med. Chem. 5, 1019–1034 (2005).
Morral, N. Novel targets and therapeutic strategies for type 2 diabetes. Trends Endocrinol. Metab. 14, 169–175 (2003).
Webby, R. J. & Webster, R. G. Are we ready for pandemic influenza? Science 302, 1519–1522 (2003).
Caviedes, J. E. & Cimino, J. J. Towards the development of a conceptual distance metric for the UMLS. J. Biomed. Inform. 37, 77–85 (2004).
Wang, X. et al. Automating terminological networks to link heterogeneous biomedical databases. Medinfo 11, 555–559 (2004).
Patel, C. O. & Cimino, J. J. Mining cross-terminology links in the UMLS. AMIA Annu. Symp. Proc. 2006, 624–628 (2006).
Pedersen, T., Pakhomov, S. V., Patwardhan, S. & Chute, C. G. Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40, 288–299 (2007).
Agarwal, P. & Searls, D. B. Literature mining in support of drug discovery. Brief Bioinform. 9, 479–492 (2008). In this article, the authors of the Analysis provide details of methods used herein and review wider applications of literature mining that are specifically aimed at drug discovery.
Kalberer, J. T. Jr & Newell, G. R. Jr. Funding impact of the National Cancer Act and beyond. Cancer Res. 39, 4274–4284 (1979).
Karpas, A. Human retroviruses in leukaemia and AIDS: reflections on their discovery, biology and epidemiology. Biol. Rev. Camb. Philos. Soc. 79, 911–933 (2004).
Cohen, J. HIV/AIDS. Where have all the dollars gone? Science 321, 520 (2008).
Dorsey, E. R. et al. Financial anatomy of neuroscience research. Ann. Neurol. 60, 652–659 (2006).
Bollen, J., Rodriquez, M. A. & Van de Sompel, H. Journal status. Scientometrics 69, 669–687 (2006).
Evans, J. A. Electronic publication and the narrowing of science and scholarship. Science 321, 395–399 (2008). A much discussed study showing that online publishing, and the ease of following hyperlinks, tends to channel researchers towards a narrower and more recent set of publications, with a possible loss of diversity and historical perspective.
Ducor, P. Intellectual property: coauthorship and coinventorship. Science 289, 873–875 (2000).
Murray, F. Innovation as co-evolution of scientific and technological networks: exploring tissue engineering. Res. Policy 31, 1389–1403 (2002).
Fontenot, J. D. & Rudensky, A. Y. A well adapted regulatory contrivance: regulatory T cell development and the forkhead family transcription factor Foxp3. Nature Immunol. 6, 331–337 (2005).
Mesa, R. A. New insights into the pathogenesis and treatment of chronic myeloproliferative disorders. Curr. Opin. Hematol. 15, 121–126 (2008).
Gable, D. R., Hurel, S. J. & Humphries, S. E. Adiponectin and its gene variants as risk factors for insulin resistance, the metabolic syndrome and cardiovascular disease. Atherosclerosis 188, 231–244 (2006).
Ramanana-Rahary, S., Zitt, M. & Rousseau, R. Aggregation properties of relative impact and other classical indicators: convexity issues and the Yule–Simpson paradox. Scientometrics 79, 311–327 (2009). Although somewhat technical, this paper describes important statistical artefacts that can arise when classifications of the scientific literature are aggregated or subdivided, including the reversal of certain trends.
Zitt, M., Ramanana-Rahary, S. & Bassecoulard, E. Relativity of citation performance and excellence measures: from cross-field to cross-scale effects of field-normalisation. Scientometrics 63, 373–401 (2005).
Lehmann, J. M. et al. An antidiabetic thiazolidinedione is a high affinity ligand for peroxisome proliferator-activated receptorγ (PPARγ). J. Biol. Chem. 270, 12953–12956 (1995).
Calabrese, L. & Fleischer, A. B. Thalidomide: current and potential clinical applications. Am. J. Med. 108, 487–495 (2000).
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).
Searls, D. B. Mining the bibliome. Pharmacogenomics J. 1, 88–89 (2001).
De Solla Price, D. J. Little Science, Big Science. (Yale University, New Haven, 1963).
Price, D. J. Networks of scientific papers. Science 149, 510–515 (1965). A classical paper by the founder of scientometrics, which showed that networks of citations among scientific papers obey a power law distribution. It was published many decades before the study of such scale-free networks achieved prominence.
Lawrence, P. A. The mismeasurement of science. Curr. Biol. 17, R583–R585 (2007).
Lawrence, P. A. The politics of publication. Nature 422, 259–261 (2003).
Garfield, E. & Melino, G. The growth of the cell death field: an analysis from the ISI-Science citation index. Cell Death Differ. 4, 352–361 (1997). In this paper, the originator of the impact factor, Eugene Garfield, uses bibliometrics to trace and analyse the historical development of the field of apoptosis.
Takahashi, K., Aw, T. C. & Koh, D. An alternative to journal-based impact factors. Occup. Med. (Lond.) 49, 57–59 (1999).
Sayers, E. & Wheeler, D. Building Customized Data Pipelines Using the Entrez Programming Utilities (eUtils). The NCBI website [online] (2004).
American Association for the Advancement of Science. Historical data on federal R&D, FY 1978–2009. The AAAS website [online] (2008).
National Science Foundation. Doctoral degress awarded, by detailed field: 1920–99. The National Science Foundation website [online] (accessed 2009).
Falkenheim, J. C. & Fiegener, M. K. 2007 records fifth consecutive annual increase in US doctoral awards. The National Science Foundation website [online] (2008).
Acknowledgements
We gratefully acknowledge P. Vallance for posing the problem, V. Kumar for implementing the methods to identify disease areas and genes with strong publication growth, and R. Liu for suggesting the use of common authors to map patents to publication, and thereby associate patents with MeSH key words. D. Rajagopalan, K. Kabnick, L. Liu, T. White, W. Reisdorf and Y. Li provided technical assistance and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary information S1 (table)
The PubMed queries that were used to generate publication counts that corresponded to the WHO disease areas in Figure 2. (PDF 454 kb)
Supplementary information S2 (table)
(a) Overlap in publications between disease areas: the number in each cell is the percentage of publications in the disease area in the specific row that is also contained in the disease area in a specific column. (PDF 194 kb)
Supplementary information S4 (table)
List of diseases with MeSH History notes dated between 1998 and 2007. (PDF 116 kb)
Supplementary information S5 (table)
List of pathways (of size between 10 and 100 associated genes) from Ingenuity (http://www.ingenuity.com), GeneGO (http://www.genego.com) and Biocarta (http://www.biocarta.com) that have a significant fraction of 'hot' genes (that is, genes that are associated with a significant increase in publication counts). (PDF 99 kb)
Related links
Glossary
- Impact factor
-
The average number of citations made in a given year to articles that were published in the journal in question during the two preceding years. Impact factors (and many variants) are used to rank journals for their impact on a field, and are sometimes misused in evaluating individual researchers.
Rights and permissions
About this article
Cite this article
Agarwal, P., Searls, D. Can literature analysis identify innovation drivers in drug discovery?. Nat Rev Drug Discov 8, 865–878 (2009). https://doi.org/10.1038/nrd2973
Issue date:
DOI: https://doi.org/10.1038/nrd2973
This article is cited by
-
US FDA Drug Approvals are Persistent and Polycyclic: Insights into Economic Cycles, Innovation Dynamics, and National Policy
Therapeutic Innovation & Regulatory Science (2021)
-
Exploring why global health needs are unmet by research efforts: the potential influences of geography, industry and publication incentives
Health Research Policy and Systems (2020)
-
Adoption of the Citation Typing Ontology by the Journal of Cheminformatics
Journal of Cheminformatics (2020)
-
Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics
Data Science and Engineering (2020)
-
Do national funding organizations properly address the diseases with the highest burden?: Observations from China and the UK
Scientometrics (2020)