Fig. 1: Overview of the proposed LLM based automatic synoptic reporting framework.
From: Synoptic reporting by summarizing cancer pathology reports using large language models

a An excerpt of a de-identified sample pathology report is shown. The boxes show example data elements along with the corresponding data element responses as reported by Mayo Clinic pathologists. Here “Procedure”, “Tumor Size” are NLP type data elements, and “Distant Metastasis”, “Lymphovascular Invasion” are classification type data elements. The text highlights show the most relevant region of the report where the information for the response is located. b To generate a synoptic report from an unstructured pathology report, we take an element-by-element, prompt-based approach. The unstructured sections of the pathology report are combined, cleaned, and processed to create training prompts. Training prompts consist of the instruction, unstructured report, and reference response concatenated. The model is fine-tuned on the training prompts and learns to associate the specific reference response with the unstructured report and given instruction. To obtain an inference from the model, an inference prompt is created that is identical to the training prompt except it does not contain the reference response. The model is given only the instruction and the unstructured report during inference and is expected generate response text that follows the inference prompt.