Fig. 2: Computationally inferred keywords of documents show the involvement of NHGRI in important developments in the nascent field of genomics. | Nature Communications

Fig. 2: Computationally inferred keywords of documents show the involvement of NHGRI in important developments in the nascent field of genomics.

From: A digital archive reveals how a funding agency cooperated with academics to support the nascent field of genomics

Fig. 2: Computationally inferred keywords of documents show the involvement of NHGRI in important developments in the nascent field of genomics.The alternative text for this image may have been generated using AI.

A Documents in the Core Collection have been annotated with keywords by the History of Genomics Program, but the scale of the collection left gaps in annotation, even after one decade of manual annotation work. B Entity recognition and pattern matching fill this gap by computationally generating relevant keywords on biological phenomena, techniques, organizations, and individuals. We use two-sided Fisher’s Exact Test with Bonferroni correction to detect keywords that appear more in projects that follow the HGP (we call this “enriched” henceforth). C Hierarchical clustering of 1246 keywords from panel B by the share of documents in each project containing the keyword, normalized across projects by a standard Z score. See Data S2 for the list of keywords and Fig. S17 for the entire row-wide dendrograms. D Swarm plot shows that genomic techniques (dots) enriched among the four genomics projects that followed the HGP already occurred in documents before the start of these projects. One such technique is genome-wide association studies (red dot). E Bibliometric analysis of investigative techniques (n = 2601; defined using Medical Subject Headings) in the biomedical literature. (top) shows the share of publications that are among the 5% most cited publications. (bottom) shows the share of new genes introduced to the biomedical literature according to investigative techniques used in these initial publications. The red vertical line indicates GWAS. F Timeline of the development of GWAS. The histogram shows the occurrence of GWAS in the Core Collection. The dashed lines indicate key publications: Risch and Merikangas demonstrated the mathematical feasibility of GWAS33, the start of the International HapMap Project34, and – what have been independently considered35,36,37 to be – the first GWAS article38, and the first large-scale GWAS39.

Back to article page