Fig. 6 | Scientific Data

Fig. 6

From: Evaluating FAIR maturity through a scalable, automated, community-governed framework

Fig. 6

The Metadata Harvester workflow. Yellow boxes are the starting GUIDs, text nodes are some form of URI. Arrows show the flow of information. White boxes are resolution activities, and the Associated Accept headers used. The pink box is a suite of third-party metadata extraction tools. Tika operates on a wide range of non-textual data (e.g. PDFs) to extract embedded metadata. Extruct and Distiller extract embedded metadata within HTML in a variety of formats. Purple barrels are metadata collection steps, where all Linked Data, and hash-style metadata are cached, together with the raw output of the resolution. InChI Keys (left pathway) have a defined two-step resolution mechanism, that supports content negotiation, and are therefore treated as a special case for efficiency. DOIs and other Handles are converted into URIs, and thereafter treated in the same manner. The first step of URI harvesting is to follow all Link rel = “meta” headers after resolution, to extract any metadata from these locations following the same workflow as for other URIs. These headers are followed only one layer deep, after which the system returns to the content of the original URI resolution, using content-negotiation.

Back to article page