WO2025111606A1

WO2025111606A1 - Methods and systems for vector embedding search for predicting abnormalities in medical images

Info

Publication number: WO2025111606A1
Application number: PCT/US2024/057320
Authority: WO
Inventors: Andrew D. MISSERT; Timothy L. KLINE; Panagiotis KORFIATIS
Original assignee: Mayo Foundation for Medical Education and Research; Mayo Clinic in Florida
Current assignee: Mayo Foundation for Medical Education and Research; Mayo Clinic in Florida
Priority date: 2023-11-24
Filing date: 2024-11-25
Publication date: 2025-05-30
Anticipated expiration: 2026-05-24

Abstract

Methods and systems for predicting abnormalities in medical images are disclosed herein. An example computer-implemented method comprises: obtaining a digital medical image of an anatomical region of interest; generating a feature vector using a trained image embedding model; obtaining one or more similar feature vectors from a feature vector datastore, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; generating at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and presenting, for review by a clinician, one or more of: the at least one summary, or the digital medical image.

Description

METHODS AND SYSTEMS FOR VECTOR EMBEDDING SEARCH FOR PREDICTING ABNORMALITIES IN MEDICAL IMAGES

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application No. 63/602536, entitled “METHODS AND SYSTEMS FOR VECTOR EMBEDDING SEARCH FOR PREDICTING ABNORMALITIES IN MEDICAL IMAGES” (filed November 24, 2023), the entirety of which is incorporated by reference herein.

FIELD

[0002] The present disclosure generally relates to medical image analysis, and more particularly relates to using machine learning techniques to, among other things: (i) generate embeddings for medical images, (ii) leverage the large archive of unlabeled clinical data, (iii) predict abnormalities in medical images; and/or (iv) perform inference without the need for taskspecific training or manual labeling.

BACKGROUND

[0003] Clinicians and other medical professionals often analyze medical images such as X- ray images, computed tomography (CT) images, magnetic resonance imaging (MRI) images, etc., when providing a clinical recommendation to a patient. Generally, clinicians rely on their own intuition and training when searching for abnormalities in medical images.

[0004] Many clinical archives contain a large amount of data including a wide variety of historical medical images associated with various conditions or ailments. Moreover, leveraging such clinical archives would significantly improve the efficiency and accuracy of medical image analysis by a clinician. However, an individual medical professional can review only a small percentage of the medical images available in a clinical archive.

[0005] Machine learning (ML) and artificial intelligence (Al) models are used by professionals in a wide variety of fields to analyze large amounts of data and make determinations based on said data. However, annotated and curated datasets do not exist in current clinical archives and manually annotating such a large amount of data is extremely expensive and time consuming.

[0006] Therefore, there is an opportunity to leverage large archives of unlabeled clinical data to perform inference without the need for manual labeling, training, or human intuition by using ML tools that can identify typical and frequently overlooked abnormalities in medical images. BRIEF SUMMARY

[0007] In one aspect, a computer-implemented method for predicting abnormalities in medical images may be provided. In one example, the method may include: (1 ) obtaining, by one or more processors, a digital medical image of an anatomical region of interest corresponding to a patient; (2) generating, by the one or more processors, a feature vector by inputting the digital medical image to a trained image embedding model, the feature vector corresponding to a plurality of image features of the digital medical image; (3) obtaining, by the one or more processors, one or more similar feature vectors from a feature vector datastore based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; (4) generating, by the one or more processors, at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and (5) presenting, via a graphical user interface and for review by a clinician, one or more of: the at least one summary, or the digital medical image.

[0008] In a variation of this embodiment, the computer-implemented method further comprises: generating the trained image embedding model by training a self-supervised autoencoder on unlabeled historical medical images using contrastive learning techniques.

[0009] In a variation of this embodiment, generating the feature vector comprises: transforming, via the trained image embedding model, three dimensional regions of the digital medical image to two-dimensional image features or one-dimensional image features.

[0010] In a variation of this embodiment, the computer-implemented method further comprises: generating, by the one or more processors, a vector store query specifying the feature vector corresponding to the digital medical image and at least one vector store filter; and issuing, by the one or more processors, the vector store query against the feature vector datastore to obtain the one or more similar feature vectors.

[0011] In a variation of this embodiment, the at least one vector store filter restricts results of the vector store query to feature vectors corresponding to at least one of: the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group. [0012] In a variation of this embodiment, the digital medical image corresponding to the patient and the plurality of historical digital medical images are either: computed tomography (CT) images, X-ray images, magnetic resonance imaging (MRI) images, ultrasound images, or positron emission tomography (PET) images.

[0013] In a variation of this embodiment, the clinician is a radiologist.

[0014] In a variation of this embodiment, generating the at least one summary comprises: determining, by the one or more processors, a list of identifying information corresponding to the one or more historical patients associated with the one or more similar feature vectors; and obtaining, by the one or more processors, the EHR associated with the one or more historical patients by querying a longitudinal patient record (LPR) with the list of identifying information.

[0015] In a variation of this embodiment, the EHR includes at least one of: (i) radiology reports, (ii) clinical notes, (iii) lab results, (iv) procedures or (v) clinical outcomes.

[0016] In a variation of this embodiment, the trained language model is generated by training a language model on a set of preprocessed EHR data, and the computer-implemented method further comprises: preprocessing, by the one or more processors, a set of EHR data corresponding to a plurality of patients by: (i) tokenizing the set of EHR data, (ii) formatting the tokenized set of EHR data for input to the language model, (iii) splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and (iv) replicating radiology reports included in the set of EHR data; and fine-tuning, by the one or more processors, the trained language model using the replicated radiology reports.

[0017] In a variation of this embodiment, generating the at least one summary comprises: inputting, by the one or more processors and with the EHR associated with the one or more historical patients, a prompt defining a summarization task to the trained language model.

[0018] In a variation of this embodiment, the set of preprocessed EHR data is associated with the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

[0019] In another aspect, a computing system for predicting abnormalities in medical images may be provided. In one example, the computing system may include: one or more medical imaging devices; one or more processors; and/or one or more non-transitory memories coupled to the one or more processors. The one or more non-transitory memories including computer executable instructions stored therein that, when executed by the one or more processors, may cause the one or more processors to: (1) obtain, from a medical imaging device, a digital medical image of an anatomical region of interest corresponding to a patient; (2) generate a feature vector by inputting the digital medical image to a trained image embedding model, the feature vector corresponding to a plurality of image features of the digital medical image; (3) obtain one or more similar feature vectors from a feature vector datastore based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; (4) generate at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and (5) present, via a graphical user interface and for review by a clinician, one or more of: the at least one summary, or the digital medical image.

[0020] In a variation of this embodiment, the computer-executable instructions, when executed by the one or more processors, further cause the computing system to: generate the trained image embedding model by training a self-supervised autoencoder on unlabeled historical medical images using contrastive learning techniques.

[0021] In a variation of this embodiment, the computer-executable instructions, when executed by the one or more processors, generate the feature vector by causing the computing system to: transform, via the trained image embedding model, three dimensional regions of the digital medical image to two-dimensional image features or one-dimensional image features.

[0022] In a variation of this embodiment, the computer-executable instructions, when executed by the one or more processors, further cause the computing system to: generate a vector store query specifying the feature vector corresponding to the digital medical image and at least one vector store filter; and issue the vector store query against the feature vector datastore to obtain the one or more similar feature vectors.

[0023] In a variation of this embodiment, the at least one vector store filter restricts results of the vector store query to feature vectors corresponding to at least one of: the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

[0024] In a variation of this embodiment, the digital medical image corresponding to the patient and the plurality of historical digital medical images are either: computed tomography (CT) images, X-ray images, magnetic resonance imaging (MRI) images, ultrasound images, or positron emission tomography (PET) images. [0025] In a variation of this embodiment, the clinician is a radiologist.

[0026] In a variation of this embodiment, the computer-executable instructions, when executed by the one or more processors, generate the at least one summary by causing the computing system to: determine a list of identifying information corresponding to the one or more historical patients associated with the one or more similar feature vectors; and obtain the EHR associated with the one or more historical patients by querying a longitudinal patient record (LPR) with the list of identifying information.

[0027] In a variation of this embodiment, the EHR includes at least one of: (i) radiology reports, (ii) clinical notes, (iii) lab results, (iv) procedures or (v) clinical outcomes.

[0028] In a variation of this embodiment, the trained language model is generated by training a language model on a set of preprocessed EHR data, and the computer-executable instructions, when executed by the one or more processors, further cause the computing system to: preprocess a set of EHR data corresponding to a plurality of patients by: (i) tokenizing the set of EHR data, (ii) formatting the tokenized set of EHR data for input to the language model, (iii) splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and (iv) replicating radiology reports included in the set of EHR data; and finetune the trained language model using the replicated radiology reports.

[0029] In a variation of this embodiment, the computer-executable instructions, when executed by the one or more processors, generate the at least one summary by causing the computing system to: input, with the EHR associated with the one or more historical patients, a prompt defining a summarization task to the trained language model.

[0030] In a variation of this embodiment, the set of preprocessed EHR data is associated with the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

[0031] In yet another aspect, a tangible, non-transitory computer readable medium storing computer-readable instructions for classifying documents may be provided. In one example, the tangible, non-transitory computer readable medium, when executed by one or more processors of a computing system, may cause the computing system to: (1 ) obtain, from a medical imaging device, a digital medical image of an anatomical region of interest corresponding to a patient;

(2) generate a feature vector by inputting the digital medical image to a trained image embedding model, the feature vector corresponding to a plurality of image features of the digital medical image; (3) obtain one or more similar feature vectors from a feature vector datastore based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; (4) generate at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and (5) present, via a graphical user interface and for review by a clinician, one or more of: the at least one summary, or the digital medical image.

BRIEF DESCRIPTION OF THE FIGURES

[0032] Advantages will become more apparent to those skilled in the art from the following description of the preferred embodiments which have been shown and described by way of illustration. As will be realized, the present embodiments may be capable of other and different embodiments, and their details are capable of modification in various respects. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.

[0033] The figures described below depict various aspects of the applications, methods, and systems disclosed herein. It should be understood that each figure depicts an embodiment of a particular aspect of the disclosed applications, systems and methods, and that each of the figures is intended to accord with a possible embodiment thereof. Furthermore, wherever possible, the following description refers to the reference numerals included in the following figures, in which features depicted in multiple figures are designated with consistent reference numerals.

[0034] FIG. 1 illustrates an example computing environment for predicting abnormalities in medical images in which the techniques disclosed herein may be implemented.

[0035] FIG. 2 illustrates a combined block and logic diagram for an example computer- implemented method for predicting abnormalities in medical images.

[0036] FIG. 3 illustrates a combined block and logic diagram for an example embedding model architecture, in accordance with one or more aspects.

[0037] FIG. 4 illustrates an example computer-implemented method for predicting abnormalities in medical images.

[0038] FIG. 5 illustrates an example computer-implemented method for generating embeddings for medical images. DETAILED DESCRIPTION

Overview

[0039] The present techniques provide methods and systems for, inter alia, medical image analysis, and more particularly, techniques for training and operating one or more machine learning models to, among other things, generate embeddings for medical images, leverage the large archive of unlabeled clinical data without the need for task-specific training or manual labeling, and facilitate the identification of abnormalities in medical images.

[0040] Specifically, the present techniques may include generating an image embedding for a medical image using a trained image embedding model and identifying a plurality of historical patients associated with similar historical medical images by querying a vector database containing embeddings for a plurality of historical medical images with the image embedding for the medical image. Furthermore, the present techniques may include providing electronic health records (EHR), from a clinical archive, for the plurality of historical patients with similar medical images to a trained language model. An example user computing device may receive an output summary from the trained language model that conveys the common findings and abnormalities of the EHR for the plurality of historical patients with similar medical images.

[0041] Additionally, the present techniques include leveraging a large clinical archive (e.g., a longitudinal patient record, data from a longitudinal patient record, another clinical archive, etc.) through trained embedding models and trained language models. Further, an example system may generate and access a corpus of embeddings (e.g., a vector database) for a plurality of medical images from a clinical archive, thereby allowing patients to be easily grouped based upon associated medical images. Moreover, by summarizing the electronic health records for a group of patients associated with medical images sharing certain characteristics using the trained language model robust insight may be provided into the common characteristics of the patients, abnormalities/conditions associated with the similar medical images, and other relevant information contained in the EHR. Advantageously, the summaries generated by the trained language models discussed herein, may be used as a guide for a clinician or another medical professional when assessing a medical image for a patient. Furthermore, the present techniques need not employ manual labeling, training, or human intuition to leverage such clinical archives to generate the summaries of EHR data. Thus, the methods and systems discussed herein provide significant advantages over conventional medical image analysis technologies. Example Computing Environment

[0042] Referring now to the drawings, FIG. 1 is a block diagram representative of an example computing environment 100 capable of implementing the example methods and/or operations described herein, including, for example, one or more steps of the method 200 of FIG. 2 and/or one or more steps of the method 400 of FIG. 4 discussed in greater detail below. The computing environment 100 of FIG. 1 includes a server computing device 102, a user computing device 104, one or more networks 106, one or more medical imaging devices 108, a vector database 1 10, and an electronic health care records (EHR) database 112. In some embodiments, the computing environment 100 further includes one or more cloud application programming interfaces (API) 114.

[0043] The exemplary network 106 of FIG. 1 may be a single communication link directly connecting the server computing device 102 and the user computing device 104 (e.g., a direct wireless link), or one or more networks 106 may include multiple links (e.g., connecting the server computing device 102, the user computing device 104, the medical imaging device 108, and an additional computing device) and/or communication networks of one or more types (e.g., one or more wired and/or wireless local area networks (LANs), and/or one or more wired and/or wireless wide area networks (WANs) such as the Internet, public networks, private networks, etc.). Additionally, the network 106 may include one or more networking devices such as routers, switches, etc., and the network 106 may perform routing and/or switching operations using the networking devices. In some embodiments, the network 106 may include one or more firewalls, file servers and/or storage devices. Further, each device within the network 106 may be assigned a unique identifier, such as an IP address, to facilitate communication. The network 106 may include any suitable topology, such as a star topology (devices connected to a central hub), a bus topology (devices connected along a single cable), a ring topology (devices connected in a circular fashion), and/or a mesh topology (devices connected to multiple other devices). For ease of reading herein (and not for limitation purposes), the one or more networks 106 may be referred to using the singular tense.

[0044] The server computing device 102 includes one or more communication interfaces 120, one or more processors 122, one or more memories 124, and one or more application programming interfaces 130. The memories 124 may have stored thereon one or more modules 140 (e.g., one or more sets of instructions). In some embodiments, the server computing device 102 includes one or more input devices and one or more output devices (e.g., similar to the input devices 168a and the output devices 168b, discussed in greater detail below) that enable receipt of user input and communication of output data to a user (e.g., a clinician, a user of the user computing device 104). The server computing device 102 may be an individual server, a group (e.g., cluster) of multiple servers, a computing device (e.g., a personal computer, a laptop, a smart phone, a tablet, a wearable device, etc.), or another suitable type of computing device or system (e.g., a collection of computing resources).

[0045] The one or more communication interfaces 120 may enable communication with other machines (e.g., user computing device 104) via, for example, the one or more networks 106. The example communication interface 120 may include any suitable type of communication interface(s) (e.g., wired and/or wireless interfaces) configured to operate in accordance with any suitable protocol(s). For example, the communication interfaces 120 may be configured to transmit and receive data using a suitable wired communication protocol such as an Ethernet protocol, a USB protocol, a UART protocol, an I2C protocol, a SPI protocol, or wireless communication protocols such as a Bluetooth protocol, a Wi-Fi® (IEEE 802.11 standard) protocol, a near-field communication (NFC) protocol, a cellular (e.g., GSM, CDMA, LTE, WiMAX, etc.) protocol, a peer-to-peer wireless protocol, a short-range wireless protocol, and/or other suitable wired or wireless communication protocols. In some embodiments, for data throughput and efficiency reasons, a combination of such protocols may be used by the communication interface 120. In some embodiments, the communication interface 120 may be a network interface controller (NIC) and may include any suitable NIC, such as wired/wireless controllers (e.g., Ethernet controllers), and facilitate bidirectional/multiplexed networking over the network 106 between the server computing device 102 and the user computing device 104 and/or other components of the environment 100 (e.g., the medical imaging device 108, the vector database 1 10, the EHR database 112, another computing device, etc.). In some embodiments, the communication interface 120 may include advanced features such as hardware acceleration, specialized networking protocols, etc.

[0046] The processors 122 may include one or more microprocessors, controllers, and/or any suitable type of processor, and the memories 124 (e.g., volatile memory, non-volatile memory) may be accessible by the processor 122 (e.g., via a memory controller). For example, the one or more processors 122 may include one or more central processing units, one or more graphics processing units, one or more field-programmable gate arrays, one or more application-specific integrated circuits, one or more tensor processing units, one or more digital signal processors, one or more neural processing units, one or more RISC-V processors, one or more coprocessors, one or more specialized processors/accelerators for artificial intelligence or machine learning-specific applications, one or more microcontrollers, etc. The processor 122 may interact with the memory 124 to obtain, for example, machine-readable instructions and/or computer-executable instructions stored in the memory 124 corresponding to, for example, the operations represented by the flowcharts of this disclosure (e.g., the method 200 of FIG. 2, the method 400 of FIG. 4). For example, the memories 124 may include one or more random access memories, one or more read-only memories, one or more cache memories, one or more hard disk drives, one or more solid-state drives, one or more non-volatile memory express, one or more optical drives, one or more universal serial bus flash drives, one or more external hard drives, one or more network-attached storage devices, one or more cloud storage instances, one or more tape drives, etc.

[0047] As mentioned above, the memories 124 may have stored thereon one or more modules 140, for example, as one or more sets of computer-executable instructions. The modules 140 may include an image embedding module 142, a data processing module 144, a machine learning (ML) training module 150, and a ML operation module 152. In some aspects, more or fewer modules 140 may be included. The computer-executable instructions may be stored on non-transitory computer-readable media, in some aspects. In some aspects, the modules 140 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.). The operating systems may be configured to run the modules 140 during operation of the server computing device 102. For example, the modules 140 may include additional modules and/or services for receiving and processing data from one or more other components of the environment 100 such as the EHR database 1 12, the vector database 1 10, the one or more cloud application programming interfaces (APIs) 114, or the user computing device 104. The modules 140 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.). The modules 140 may be configured to communicate with one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.). In some embodiments, the modules 140 may respond to network requests (e.g., via the API 130) or other requests received via the network 106 (e.g., via the user computing device 104 or other components of the environment 100).

[0048] The memories 124 of the server computing device 102 of FIG. 1 may store instructions for executing an image embedding module 142. Generally speaking, the image embedding module 142 may generate vector embeddings, or feature vectors, for medical images. For example, the image embedding module 142 may generate one or more feature vectors for a medical image of a patient (e.g., an image from medical imaging device 108, EHR database 112, or another clinical source). Further, the image embedding module 142 may store one or more image embedding models 143 trained to generate feature vectors for a digital image. For example, the one or more image embedding models 143 may be image encoders (e.g., a convolutional neural network), such as a residual network (ResNet), visual geometry group (VGG), or another suitable foundational image encoder. In some embodiments, the image embedding module 142 may generate image embedding models 143 by training a selfsupervised autoencoder (e.g., using contrastive learning techniques) on unlabeled historical medical images from a clinical archive (e.g., EHR database 1 12). In some embodiments, the image embedding models 143 may be trained on digital medical images of a particular anatomical region of interest. Moreover, the image embedding module 142 may store one or more image embedding models 143 each trained on a specific anatomical region of interest. In some embodiments, the embeddings generated by the image embedding module 142 may be stored in the vector database 110 and/or a remote embedding database service (e.g., a remote database accessible via the cloud APIs 1 14). For example, the embeddings may be stored on a remote embedding database service based on an open source or proprietary model (e.g., Milvus, Pinecone, Redis, Postgres, MongoDB, Facebook Al Similarity Search (FAISS), etc.). The image embedding module 142 may include further instructions for accessing vector database 110 or a vector store, and for creating, retrieving, updating deleting, and/or configuring the data therein. For ease of reading herein (and not for limitation purposes), the one or more image embedding models 143 may be referred to using the singular tense.

[0049] In some embodiments, the ML training module 150 includes instruction for training the one or more image embedding models 143 to generate image embeddings for medical images. Further, the ML training module 150 may obtain image embeddings from a vector database (e.g., vector database 110) and/or digital images from a datastore (e.g., medical images from a clinical archive such as the EHR database 112) for training the image embedding models 143. In some embodiments, the ML training module 150 may perform preprocessing operations on obtained image data, such as resizing, normalization, data augmentation, binarization, etc. For example, the ML training module 150 may obtain a set of medical images from the EHR database 112 and the ML training module 150 may preprocess the medical images to generate training images for the image embedding model(s) 143. In some embodiments, the ML training module 150 may train the image embedding models 143 on medical images labeled with corresponding embeddings. In some embodiments, the ML training module 150 may train a foundational image embedding model (e.g., a base/pretrained model, an open-source embedding model, etc.) for a certain modality of medical image (e.g., x-ray images, a computed tomography image, etc.). For example, a foundational embedding module may be trained to have specific levels of knowledge common to medical images, or a modality of medical images. Further, the ML training module 150 may train the foundational model to generate the one or more image embedding models 143, each having a specific level of knowledge common to a certain type or modality of medical image (e.g., image of the lungs, CT image, etc.). Additionally or alternatively, the ML training module 150 may include instructions for fine-tuning and/or training the one or more image embedding models 143 using various fine-tuning techniques, such as layer freezing, learning rate adjustments, weight adjustments, etc., thereby improving the compressing capabilities of the image embedding models 143. For example, the ML training module 150 may train and/or fine-tune a first instance of a foundational model on X-ray images and a second instance of the foundational model on three dimensional medical imaging data. As another example, the ML training module 150 may train and/or fine-tune the image embedding models 143 on sub-sections of a medical image (e.g., a subsection of a medical image of a lung including a particular portion of the lung), allowing small scale embeddings to be generated by the embedding models 143 and thereby providing additionally granularity to the embedding techniques described herein. In some embodiments, the ML training module 150 may train image embedding model 143 for a specific organ (e.g., lungs, kidney, brain, etc.).

[0050] The memories 124 of the server computing device 102 of FIG. 1 may also store instructions for executing a data processing module 144. Generally speaking, the data processing module 144 may process and/or analyze data from the EHR database 112 and/or the vector database 110. In some embodiments, the data processing module 144 may process data from the EHR database 112 and/or the vector database 1 10 to sort and/or chunk the data into multiple subsets (e.g., based on data type, according to time, based on file size, etc.). For example, the data processing module 144 may obtain one or more sets of unstructured text (e.g., radiology reports, clinical notes, clinical outcomes) from the EHR database 112 and the data processing module 144 may organized the unstructured texts into a structured format. Continuing with this example, the data processing module 144 may include instructions for parsing the unstructured text to generate structured text. For instance, when the text relates to historical electronic health records (EHR) associated with a plurality of individuals, the data processing module 144 may generate a time series data structure in which each electronic health record is represented by one or more timestamps, and at each timestamp, text of one or more record (e.g., radiology reports, clinical notes, clinical outcomes) is labeled. In some embodiments, the data processing module 144 may also label the data from the EHR database 1 12 according to the condition associated with the one or more health records and/or one or more biographic feature (e.g., age, sex, body weight). As another example, the data processing module 144 may obtain embeddings from the vector database 1 10 and process the embeddings to generate training embeddings for the one or more image embedding models 143. Continuing with this example, the embeddings obtained from the vector database 110 may correspond to respective anatomical regions of interest and the data processing module 144 may chunk the embeddings based on anatomical region (e.g., using a clustering algorithm). In some embodiments, the data processing module 144 may generate training data for the machine learning models described herein by performing preprocessing operation on data (e.g., data from the vector database 110 and/or the EHR database 1 12, data obtained by the ML training module 150, other data of the environment 100), similar to the preprocessing operations described below with respect to the ML training module 150.

[0051] The memories 124 of the server computing device 102 of FIG. 1 may also store instructions for executing a machine learning (ML) training module 150. Generally speaking, the ML training module 150 may train one or more machine learning models to make predictions and/or inferences based on input data (e.g., text data and/or image data). For example, the ML training module 150 may store instructions for pretraining and/or training image embeddings models, language models, or other machine learning models suitable for image and/or text processing such as a convolutional neural network (CNN), a recurrent neural network (RNN), a multi-modal language model, etc. In some embodiments, a single model (e.g., a foundational language model, a foundation image embedding model) may be trained and copied. Copies of this model may serve as respective base models for a plurality of fine-tuned models (e.g., trained image embedding models, trained language models). Further, the ML training module 150 may also perform checkpointing to save parameters (e.g., in a local electronic database) of the models (e.g., language models, embedding models) at various stages, allowing for efficient training and retraining. Additionally, the ML training module 150 may adjust hyperparameters like batch size, model size, and learning rate to optimize the training process (e.g., by tunning hyperparameters using successive evaluation). In some embodiments, the ML training module 150 includes instructions for validating and testing the models to ensure their accuracy and reliability. For example, the ML training module 150 may validate and test the embedding models 143 and the language models 153 using various statistical metrics/techniques, human or clinician feedback, and or other clinical validation techniques. For example, the ML training module 150 and/or the ML operation module 152 may flag abnormal data generated by and/or input to language models 153 and/or image embedding models 143 for clinician review. [0052] Additionally, the ML training module 150 may include instructions for collecting information used to pretrain or train the one or more language models 153 (e.g., training data). In some embodiments, the ML training module 150 may access data sources by active means (e.g., scraping, API calls, or other retrieval) and/or may access existing corpuses (e.g., the vector database 1 10, the EHR database 112, another existing corpus). Further, the ML training module 150 may include instructions for preprocessing the collected data to generate training data by, for example, performing cleaning operations on the data, resizing the data, normalizing the data, extracting text from the data, and/or performing other preprocessing operations on the data. In some embodiments, the ML training module 150 may perform data collection and/or data preprocessing in parallel to obtain training data for the one or more language models 153. Additionally or alternatively, the ML training module 150 may perform data deduplication, filtering, annotation, compliance, version control, validation, quality control, etc. Further, the ML training module 150 may perform other preprocessing operations, such as lexical parsing, tokenizing (e.g., the process of dividing training text into units such as words, subwords, or characters), case conversions and other string splitting/munging. In some embodiments, the ML training module 150 may store large data volumes used for training the language models 153 (e.g., training data, and/or collected data) in a structured and/or unstructured format. For example, the ML training module 150 and/or the data processing module 144 may store terabytes, petabytes, exabytes or more of training data. In some embodiments, the ML training module 150 may include instructions for adding copies, and/or modified copies, of the training data (e.g., token embeddings) to one or more specialized electronic databases, such as a database accessible via the cloud APIs 1 14 or a local electronic database (not depicted), and instructions for accessing it to train models. For example, the ML training module 150 may use a specialized database (e.g., a vector database or store, a graph database, another local electronic database, etc.) to store and query the token embeddings (e.g., training data). In some embodiments, the one or more language models 153 may be pretrained language models.

[0053] In some embodiments, the ML training module 150 includes instructions for performing pretraining (e.g., preprocessing of training data, initialization of an untrained language model, non-task-specific training) of a language model. For example, the ML training module 150 may pretrain the one or more language models 153 using unsupervised learning. During an example unsupervised learning process, the language models 153 may process one or more tokens (e.g., preprocessed data from the data processing module 144 and/or the ML training module 150) to learn to predict one or more elements (e.g., one or more most-likely next tokens, based on the likelihood of tokens in the training data). Further, pretraining the one or more language models 153 may include additional steps such as additional training, data batching, hyperparameter tuning, model checkpointing, etc. Generally, a pretrained language model is taught linguistic features (e.g., grammar, syntax, vocabulary, etc.) enabling the model to effectively understand complex natural language. In some embodiments, one or more human reviewers may be looped into the process of pre-processing data collected by the ML training module 150. For example, a distributed work queue may be used to transmit batch jobs and receive human-computed responses from one or more human workers.

[0054] At a high level, the ML training module 150 may include instructions for language modeling. Moreover, the ML training module 150 may include instructions for training one or more deep learning models by processing token sequences (e.g., tokens embedded in a vector) using a large language model architecture (e.g., a transformer architecture). For example, a transformer model (e.g., language models 153) may include a plurality of layers including selfattention and feedforward neural networks. This architecture may enable the model to learn contextual relationships between the tokens, and to predict the next token in a sequence, based upon the preceding tokens. During training, and pretraining, the model is provided with the sequence of tokens, and it learns to predict a probability distribution over the next token in the sequence. This training process may include updating one or more model parameters (e.g., weights or biases) using an objective function that minimizes the difference between the predicted distribution and a true next token in the training data. In some embodiments, the language models 153 may be multi-modal language models, or multi-modal transformer models, configured to process various modalities of data (e.g., text data, image data, audio data, etc.). As mentioned above, the ML training module 150 may collect data for training, or pretraining, language models via web scraping, via API calls/access, via database extract-transform-load (ETL) processes, etc. For example, the ML training module 150 may access and obtain data for training from websites, academic publications, social media websites, books, web forums/ interest sites (e.g., Wikipedia, Reddit, bulletin boards, etc.), or other suitable data sources. It should be appreciated that trained language models may be unpredictable, and thus, processing outputs for ethical and bias concerns (especially in a medical context) may be important. Notwithstanding, medical professionals may use the present techniques for purposes of thought experimentation, i.e., specifically to facilitate a clinician recommendation. In doing so, the present techniques may advantageously assist practitioners with analyzing all possible conditions, treatment options, and relevant factors. Ultimately, the present techniques may be used to augment and solidify human decision making, rather than as a substitute for such deliberate thinking. In some embodiments, the training module 150 may include instructions for processing both the input data and output data of the language models 153 to ensure that objectionable content is managed effectively (e.g., not passed to the language models 153 and/or not provided as an output of the language models 153) in order to address ethics and bias concerns.

[0055] In some embodiments, the ML training module 150 includes instructions for training or fine-tuning the one or more language model(s) 153 to perform specific tasks. For example, the language model 153 may be a pretrained large language model (LLM) or another transformer model, such as a generative pretrained transformer (GPT), a bidirectional encoder representations from transformer (BERT), etc. Generally, such pretrained language models may be known as a “base model” in some aspects. Further, the ML training module 150 may include instruction for training or fine-tuning a base model to perform a specific task. For example, the ML training module 150 may include instructions for training and/or fine-tuning language model 153 on EHR data from the EHR database 112 associated with a particular anatomical region of interest. Moreover, the ML training module 150 may provide tokenized EHR data from a group of patients associated with a particular condition or a particular condition localized within an anatomical reason labeled with example summaries to language model 153. As another example, the ML training module 150 may fine tune language model 153 to learn knowledge corresponding to cancer, and/or may further train the model to learn knowledge of a non-small cell lung cancer, small cell lung cancer, cardiac sarcoma, etc. Further, the ML training module 150 may include instructions for fine-tuning a cancer model (e.g., based on a base language model), a brain trauma model (e.g., based on the same base model used to fine-tune the cancer model), etc. In some embodiments, the ML training module 150 may include instructions for training other machine learning models (e.g., non-transformer models) to perform language tasks. For example, alternatives to a transformer architecture may include recurrent neural networks, long short-term memory networks, gated recurrent networks, convolutional neural networks, recursive neural networks, and other modeling architectures. It should be understood that the ML training module 150 may train and/or fine-tune many (e.g., hundreds or more) additional models (e.g., supervised or unsupervised machine learning models). For example, the ML training module 150 generate a fine-tuned language model specific to an organ or anatomical region (e.g., brain, lungs, heart, abdomen, spine, etc.), and/or a specific feature (e.g., cardiac chambers, coronary arteries, lung parenchyma, lymph nodes, etc.) associated with a specific organ or anatomic region. In some embodiments, to manage complexity of fine- tuning and other machine learning operations of the server computing device 102, one or more open-source frameworks may be used. For example, an example framework may be TensorFlow, Keras, MXNet, Caffe, SciKit learn, PyTorch, or another suitable machine learning frameworks. Specifically for training and operating language models, frameworks such as OpenLLM and LangChain may be used, in some aspects. Additionally, the ML training module 150 may use an algorithm such as stochastic gradient descent or another optimization technique to adjust weights of the pretrained model. In some embodiments, the ML training module 150 and/or the ML operation module 152 may enrich inputs, or prompts, to the language models 153 with additional information, providing more context or directions for the model, which can be particularly useful in fine-tuning the model with limited examples. For example, the ML training module 150 may fine-tune a base model using one-shot or few-shot learning. As another example, the ML training module 150 and/or the ML operation module 152 may use such enriched prompting to focus the output of the one or more language models 153.

[0056] In an example training scenario, the ML training module 150 may train language model 153 on a set of preprocessed EHR data. Further, the data processing module 144 and/or the ML training module 150 may preprocessing a set of EHR data from the EHR database 112, such as EHR data related to a particular type of cancer. For example, the preprocessing may include tokenizing the set of EHR data, formatting the tokenized set of EHR data for input to the language model, splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and replicating radiology reports included in the set of EHR data. The ML training module 150 may use the EHR training data set and the EHR testing data set to train language model 153 using, for example, self-supervised learning techniques. In some embodiments, the ML training module 150 may fine-tune trained language model 153 using the replicated radiology reports.

[0057] The memories 124 of the server computing device 102 of FIG. 1 may also store instructions for executing a machine learning (ML) operation module 152. Generally speaking, the ML operation module 152 may operate one or more machine learning models (e.g., image embedding models 143, language models 153, CNNs, RNNs, multimodal language models, etc.) capable of implementing the example methods and/or operations described herein. In some embodiments, the ML operation module 152 may select one or more trained models (e.g., one or more trained language models 153 and/or one or more trained image embedding models 143) at runtime based on data about a specific subject, based on data contained in a prompt (e.g., a prompt received via the user computing device 104), and/or based on other conditions that may be preprogrammed into ML operation module 152. In some embodiments, the ML operation module 152 may store the one or more trained language models 153. Further, the ML operation module 152 may initialize the one or more trained language model(s) 153, load parameters into the model(s), and/or provide the model(s) with inference data (e.g., prompt inputs). In some embodiments, the ML operation module 152 may deploy the one or more language model(s) 153 (e.g., a pretrained model, a fine-tuned model, and/or a trained model) onto a cloud computing device (e.g., via the cloud APIs 114 and/or the API 130). At a high level, the ML operation module 152 may receive one or more inputs and may provide those inputs (e.g., one or more prompts) to language model 153. Further, the ML operation module 152 may generate a summary of EHR data by inputting EHR data and/or a prompt to language models 153. For example, the language models 153 may output a summary of common finding and abnormalities associated with the input EHR data. In some embodiments, the API 130 may include elements for receiving requests to language model 153, and for generating outputs for a user computing device based on model outputs. For example, the API 130 may include a RESTful API that receives a GET or POST request including a prompt parameter. The ML operation module 152 may receive the request from the API 130, and pass the prompt parameter into language model 153 and receive a corresponding output.

[0058] The ML operation module 152 may operate machine learning models (e.g., image embedding models 143, language models 153) in different modes. For example, the ML operation module 152 may generate a query embedding for a query medical image using image embedding model 143 and provide EHR data for patients with similar medical images to language model 153, as described below with respect to FIG. 2. Further, a user may provide the query medical image to the ML operation module 152 using the user computing device 104 (e.g., via the input devices 168a). Additionally or alternatively, the ML operation module 152 may cause output data (e.g., a summary of the EHR data) generated by language model(s) 153 and/or image embedding model(s) 143 to be presented on the user computing device (e.g., via the output devices 168b). As another example, the ML operation module 152 may receive user input (e.g., from the user computing device 104) and provide it to one or more machine learning models (e.g., one or more language models 153, one or more image embedding models 143). In some embodiments, the ML operation module 152 may provide outputs of the machine learning models to other models. Moreover, the ML operation module 152 may provide the output of one or more first language models 153 to one or more second language models so that information and/or data can be aggregated. For example, many subjects develop renal cell carcinoma (ROC) cancer to the liver. Conventional medical imaging techniques may struggle to correctly identify the RCC as it is often difficult to distinguish between RCC and other kidney conditions. Continuing with this example, if a patient is suspected of having RCC, a RCC language model and a kidney language model (e.g., fine-tuned language models 153) may be prompted and/or provided with EHR data (e.g., in response to user input, in response to an inference generated by language models 153, etc.). In some embodiments, a user may be provided with plurality of summaries, each generated by a respective finetuned language model, and/or an aggregated summary, generated by a language model based on the plurality of summaries, so that the user can be informed by the different perspectives from models trained on respective tasks. It should be noted that these summaries may be generated (e.g., by server computing device 102 via language models 153) very quickly (e.g., in minutes, seconds, or less). This represents an advantageous improvement over conventional techniques, which may take many hours or even days to generate a summary of common findings and abnormalities of medical subjects corresponding to a more limited set of medical images.

[0059] In some embodiments, the ML operation module 152 may operate and/or store one or more multi-modal machine learning models. Additionally or alternatively, the ML operation module 152 may generate and/or store one or more prompts, or template prompts, for the language models 153. In some embodiments, the ML operation module 152 may be configured to facilitate communication between the language models 153, the image embedding models, and/or the vector database 1 10. For example, the language model 153 may convert an input image (e.g., in response to an input prompt including the input image) to an image embedding, or feature vector, using the image embedding model 143. Continuing with this example, the language model 153 may query the vector database 110 (e.g., in response to the input prompt) with the image embedding to identify a plurality of patients associated with images similar to the input image. Based on the identified patients, the language model 153 may retrieve electronic health records for the plurality of patients (e.g., in response to the input prompt) and generate a summary of the EHR data. As a more specific example, a prompt parameter of an input prompt may be a computed tomography (CT) image of the lung of a subject. Further, the input prompt may cause the language model 153 to output a summary of EHRs associated with a plurality of CT images similar to the prompt parameter CT image. Expressed another way, the ML operation module 152 may facilitate supplementing the language models 153 using a version of retrieval augmented generation (RAG) driven by the image embedding models 143 and the vector database 1 10.

[0060] Language models (e.g., large language models) are generally stateless, meaning that after they process a prompt, they have no internal record or memory of the information that was input, or the information that was generated as part of the language model’s processing. Thus, many systems add statefulness to models using context information. The ML operation module 152 may include one or more sets of computer-executable instructions for maintaining state of the type found in this example, maintaining other types of state information, and/or for implementing other context management techniques. For example, ML operation module 152 may implement sliding window context, wherein a predetermined number of tokens (e.g., 4096 maximum tokens in the case of GPT 3.5, equivalent to about 3000 words) may be “remembered” by the language model and can be used to enrich multiple sequential prompts input into the language model (for example, when the language model is used in a chat mode). In some embodiments, the ML operation module 152 may perform other types of state maintaining strategies. For example, the ML operation module 152 may implement a strategy in which information from the immediately preceding prompt is part of the window, regardless of the size of that prior prompt. In some aspects, the ML operation module 152 may implement a strategy in which one or more prior prompts are included in each current prompt. This prompt stuffing technique, or prompt concatenation, may be limited by prompt size constraints — once the total size of the prompt exceeds the prompt limit, the model immediately loses state information related to parts of the prompt truncated from the prompt.

[0061] Returning to the example user computing device 104 of FIG. 1 , the user computing device 104 includes one or more communication interfaces 160, one or more processors 162, one or more memories 164, one or more input devices 168a, one or more output devices 168b, and one or more application programming interfaces 170. The memories 164 may have stored thereon one or more modules 180 (e.g., one or more sets of instructions). The user computing device 104 may be a computing device (e.g., a personal computer, a laptop, a smart phone, a tablet, a wearable device, etc.), an individual server, a group (e.g., cluster) of multiple servers, or another suitable type of computing device or system (e.g., a collection of computing resources). In some embodiments, the user computing device 104 may be included in and/or associated with a medical environment. In some embodiments, one or more user computing devices may be included in the computing environment 100 - for example, a first user may access a user computing device 104 that is a laptop, a second user may access a user computing device 104 that is a desktop computer, etc. For example, each of these respective users may receive output data from the server computing device 102 (e.g., outputs from language model 153) via their use of their respective user computing device 104. For ease of reading (and not limitation) purposes, the one or more user computing devices 104 may be referred to herein using the singular tense. [0062] The one or more communication interfaces 160 may, similar to the communication interface 120, enable communication with other machines (e.g., server computing device 102) via, for example, the one or more networks 106. The example communication interface 160 may include any suitable type of communication interface(s) (e.g., wired and/or wireless interfaces) configured to operate in accordance with one or more suitable protocol(s). In some embodiments, the communication interface 160 may be a network interface controller (NIC) and may include any suitable NIC, such as wired/wireless controllers (e.g., Ethernet controllers), and facilitate bidirectional/multiplexed networking over the network 106 between the user computing device 104 and the server computing device 102 and/or other components of the environment 100 (e.g., the medical imaging device 108, the vector database 110, the EHR database 112, another computing device, etc.). In some embodiments, the communication interface 160 may include advanced features such as hardware acceleration, specialized networking protocols, etc.

[0063] The processors 162 may include one or more microprocessors, controllers, and/or any suitable type of processor (e.g., one or more processors similar to the processors 122), and the memories 164 (e.g., one or more volatile memories or non-volatile memories similar to the memories 124) may be accessible by the processor 162 (e.g., via a memory controller). The processor 162 may interact with the memory 164 to obtain, for example, machine-readable instructions and/or computer-executable instructions stored in the memory 164 corresponding to, for example, the operations represented by the flowcharts of this disclosure (e.g., the method 200 of FIG. 2, the method 400 of FIG. 4).

[0064] As mentioned above, the memories 164 may have stored thereon one or more modules 180, for example, as one or more sets of computer-executable instructions. The modules 180 may include a model configuration module 181 , an application programming interface (API) module 182, and an authentication module 184. In some aspects, more or fewer modules 180 may be included. The computer-executable instructions may be stored on non- transitory computer-readable media, in some aspects. In some aspects, the modules 180 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.). The operating systems may be configured to run the modules 180 during operation of the user computing device 104. For example, the modules 180 may include additional modules and/or services for receiving and processing data from one or more other components of the environment 100 such as the server computing device 102, the EHR database 112, the vector database 110, or the one or more cloud APIs 114. The modules 180 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.). The modules 180 may be configured to communicate with one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.). In some embodiments, the modules 180 may respond to network requests (e.g., via the API 170) or other requests received via the network 106 (e.g., via the server computing device 102 or other components of the environment 100).

[0065] The memories 164 of the user computing device 104 of FIG. 1 may store instructions for executing a model configuration module 181. In some embodiments, the model configuration module 181 may exchange instruction, information, and/or data with the server computing device 102 (e.g., via the communication interface 160 and over the networks 106). Generally speaking, the model configuration module 181 may configure or select one or more machine learning models (e.g., image embedding models 143, language models 153) based on a modality of a medical image, a patient of interest, one or more conditions/aspects of the patient, user input data, and/or other preprogrammed information. As mentioned above, the server computing device 102 (e.g., the ML operation module 152) may store one or more language models 153 each finetuned and/or trained (e.g., via the ML training module 150) to perform a specific task (e.g., learn knowledge corresponding to cancer, a specific kind of cancer, a lung condition, etc.). Additionally, the server computing device 102 (e.g., the image embedding module 142) may store one or more image embedding models 143 each finetuned and/or trained on a specific modality of image data. In some embodiments, a user of the user computing device 104 may provide an indication of one or more preferred models (e.g., via the input devices 168a) for image data analysis (e.g., image embedding model 143) and/or text analysis (e.g., language model 153). Additionally or alternatively, the model configuration module 181 may include instructions for generating one or more graphical user interfaces that allow a user (e.g., a clinician) to review and/or select one or more preferred models. In some embodiments, the model configuration module 181 may enable one or more language models 153 and/or one or more image embedding models 143 to be stored, for example in the memory 124 or in a database accessible via the cloud APIs 114 or via a local electronic database (not depicted). In some embodiments, the model configuration module 181 may access electronic health records for a patient via a personnel records database (not depicted) and/or the EHR database 112 to enable information associated with the patient to be provided to one or more models (e.g., image embedding models 143, language models 153). [0066] In some aspects, the model configuration module 181 may be omitted from the modules 180, or its access may be restricted to administrative users only (e.g., clinicians). For example, in some aspects, one or more of the modules 180 may be packaged into a downloadable application (e.g., a smart phone app available from an app store) that enables registered but non-privileged (i.e., non-administrative) users to access the environment 100 using their consumer user computing device 104. In other aspects, one or more of the user computing devices 104 may be locked down, such that the user computing device 104 is controlled hardware, accessible only to those who have physical access to certain areas.

[0067] The memories 164 of the user computing device 104 of FIG. 1 may also store instructions for executing an application programming interface (API) module 182. Generally speaking, the API module 182 may access the API(s) 170, the cloud API(s) 114, and/or one or more remote APIs (not depicted), and further may enable one or more components of the environment 100 to access functionality of the user computing device 104 and/or the server computing device 102. For example, the API module 182 may enable a remote user to query language model 153 via the cloud APIs 1 14. In some embodiments, the API module 182 may enable other user applications (i.e., applications not facilitated by the modules 180) to connect to the user computing device 104, for example, to send queries or prompts, and to receive responses from the user computing device 104. The API module 182 may include instructions for authentication, rate limiting and error handling.

[0068] The memories 164 of the user computing device 104 of FIG. 1 may also store instructions for executing an authentication module 184. Generally speaking, the authentication module 184 may implement access control mechanisms for one or more trained models (e.g., image embedding models 143, language models 153), ensuring that a model can only be accessed by those who are authorized to do so, and that the access of those users is private and secure. It should be appreciated that the authentication module 184 may permission users based upon their respective position (e.g., physician, nurse, physician assistant, etc.).

[0069] As mentioned above, the present techniques may store training data, training parameters and/or trained models in an electronic database (not depicted), accessible via the cloud APIs 1 14, for example. Specifically, one or more trained machine learning models (e.g., language models 153, image embedding models 143) may be serialized and stored in a database (e.g., as a binary, a JSON object, etc.). Such a model can later be retrieved, deserialized and loaded into memory and then used for predictive purposes. The one or more trained models and their respective training parameters (e.g., weights) may also be stored as blob objects. In some embodiments, cloud computing APIs (e.g., AWS SageMaker, Google Al Platform, Azure Machine Learning) may also be used to store trained models, via the cloud APIs 114.

[0070] The input device(s) 168a and the output device(s) 168b of the user computing device 104 may enable receipt of user input and communication of output data to the user. The input devices 168a may include one or more suitable types of user input devices, such as keyboards, touch screen displays, microphones, mice, touchpads, and/or any suitable types of remote and/or local user input devices. The output devices 168b may include one or suitable types of output devices, such as touch screen displays, speakers, and the like. In some embodiments, the output devices 168b may present or display information to a user. For example a display 168b may use any suitable display technology (e.g., LED, OLED, LCD, etc.). In some embodiments, the input device 168a may be integrated with the output device 168b as a touchscreen display. Further, the input devices 168a and the output devices 168b may combine to form an integral user interface to enable a user of the user computing device 104 to interact with graphical user interfaces (GUIs) provided by the user computing device 104. For example, the input devices 168a and/or the output devices 168b may be configured to present outputs generated by the language models 153 for review by a user. As another example, the input devices 168a and/or the output devices 168b may enable a user to interact with a graphical user interface for selecting and/or configuring one or more machine learning models (e.g., via the model configuration module 181 ). As yet another example, the input devices 168a and/or the output devices 168b may enable a user to review medical images captured by the medical imaging device 108. In some embodiments, the input devices 168a and the output devices 168b are not integral to the user computing device 104 and receive instructions from the user computing device 104 via wired and/or wireless transmissions over communication interface 160, for example. In some embodiments, the input devices 168a and the output devices 168b may include one or more local interfaces, and/or may include one or more remote interfaces that are communicatively connected to the user computing device 104 and/or the server computing device 102 via the network 106 (e.g., that are provided by an application, web browser, or other software executing on a device of a user). For ease of reading (and not limitation) purposes, the input devices 168a and/or the output devices 168b may be referred to herein using the singular tense.

[0071] The user computing device 104 and the server computing device 102 may communicate with one another via the network 106. In some aspects, the user computing device 104 and/or the server computing device 102 may offload some or all of their respective functionality to the one or more cloud APIs 1 14. In some aspects, the one or more cloud APIs 1 14 may include one or more public clouds, one or more private clouds and/or one or more hybrid clouds. The one or more cloud APIs 1 14 may include one or resources provided under one or more service models, such as Infrastructure as a Service (laaS), Platform as a Service (PaaS), Software as a Service (SaaS), and Function as a Service (FaaS). For example, the one or more cloud APIs 114 may include one or more cloud computing resources, such as computing instances, electronic databases, operating systems, email resources, etc. The one or more cloud APIs 114 may include distributed computing resources that enable, for example, the ML training module 150 and/or other of the modules 140 to distribute parallel model training jobs across many processors. In some aspects, the one or more cloud APIs 114 may include one or more language operation APIs, such as OpenAI Med-PaLM, Bing, Claude.ai, etc. In other aspects, the one or more cloud APIs 114 may include an API configured to operate one or more open-source models, such as Llama 2.

[0072] The medical imaging device 108 may be a computed tomography (CT) scanner, a positron emission tomography (PET) scanner, an X-ray machine, a magnetic resonance imaging (MRI) machine, an ultrasound machine, another three dimensional medical imaging device, or any other suitable medical imaging device. Additionally, the medical imaging device 108 may communicate with the components of the computing environment 100 such as the server computing device 102 and the user computing device 104 via, for example, the network 106. For example, the medical imaging device 108 may send CT image data, PET image data, X-ray image data, MRI image data, and/or other medical image data to the components of the computing environment (e.g., the server computing device 102, the user computing device 104, the EHR database 112, the vector database 110, etc.).

[0073] The vector database 110 may be a specialized electronic database (e.g., a vector store, a graph database, etc.) that may be used to store and query embeddings. The embeddings may be vector representations of medical images collected from a clinical archive. Further, the embeddings stored on the vector database 110 may be generated and/or retrieved by the image embedding module 142 (e.g., via image embedding models 143). In some embodiments, due to the size of some clinical archives, the embeddings sets stored in the vector database 1 10 may be on the order of millions, as opposed to conventional medical image embedding sets. Embedding databases and/or vector databases may include specialized features, such as efficient retrieval, similarity search and scalability. Further, the embeddings stored in the vector database 110 may include identifying information for the associated patients, thereby further enhancing retrieval capabilities of the vector database 1 10. For example, a query embedding (e.g., for a medical image of a patient) may be issued against the vector database 1 10, and the vector database 110 may provide a plurality of similar embeddings and/or associated identifying information for the corresponding medical images to the server computing device 102 (e.g., image embedding module 142). In some aspects, a remote embedding database service may be used (e.g., via the cloud APIs 114). Such a remote embedding database service may be based on an open source or proprietary model (e.g., Milvus, Pinecone, Redis, Postgres, MongoDB, Facebook Al Similarity Search (FAISS), etc.). In some embodiments, one or more vector databases 110 may include organized sets of vectors grouped based on a modality of the medical image they represent, or a medical topic they relate to.

[0074] The electronic health record (EHR) database 112 may be an electronic database that stores all suitable EHR data available in a longitudinal patient record (LPR) or other medical institution database. Typically, a longitudinal patient record stores EHR data chronologically for each patient of a medical institution. In some embodiments, the EHR database 1 12 may store sets of electronic health records organized based on associated demographic information (e.g., age, sex, body weight, medical condition). Additionally or alternatively, the sets of EHR data stored by the EHR database 112 may consolidate the electronic health record of an individual. In some embodiments, the EHR database 112 may store EHR data for a portion of the plurality of patients associated with a longitudinal patient record. Obtaining EHR data as such is advantageous for training, because searching through all available EHR data may be incredibly inefficient. For example, the EHR database 1 12 may only store EHR data that relates to patients with medical conditions identifiable through medical imaging techniques. In some embodiments, the environment 100 may include one or more electronic databases (e.g., EHR database 112), such as a relational database that uses structured query language (SQL) and/or a NoSQL database or other schema-less database suited for the storage of unstructured or semistructured data. It should be noted that the data from a clinical archive included in the EHR database 112 may include a wide variety of metadata, such as dates, times, doctor’s notes, etc., that may, using the techniques described herein, be analyzed by the language models described herein (e.g., language models 153), thereby further enhancing the ability to leverage information included in a clinical archive.

Example Methods [0075] FIG. 2 depicts an exemplary block flow diagram for a method 200, according to one or more aspects. The method 200 may include offline processes 202 and online processes 204. Components of the offline processes 202 and online processes 204 may utilize vector database 1 10 and EHR database 112 of FIG. 1 , for example. The method 200 may be implemented in part or full by either the server computing device 102, the user computing device 104, and/or the cloud APIs 1 14. Though the method below is described with regard to the computing environment 100, it will be recognized that any similarly suitable environment(s) and/or system(s) may be used to implement method 200.

[0076] The offline processes 202 may include receiving/obtaining (e.g. by the server computing device 102) one or more medical images from a medical image archive 210 at an anatomy specific encoder 212 (e.g., image embedding model 143). For example, the anatomy specific encoder 212 may be a self-supervised image embedding model trained to encode medical images of a specific anatomical region (e.g., the chest, the respiratory system, the lungs, etc.) into one or more feature vector. Further, the anatomy specific encoder 212 may generate feature vectors for the one or more medical images from the medical image archive 210. The method 200 may also include storing the feature vectors generated by the anatomy specific encoder 212 in the vector database 110. As mentioned above, the vector database 110 may store identifying information for each respective vector embedding. For example, the identifying information may include patient identifiers, patient EHR record identifiers, etc. Furthermore, the vector database 1 10 may store embeddings for all available medical images in a clinical archive, such as the EHR database 1 12. In this way, large clinical archives of unlabeled historical data may be leveraged through embedding and vectorization techniques. Moreover, a query embedding for the query medical image 220 may be issued to the vector database 110 and every historical patient associated with a medical image similar to the query medical image 220, that has an electronic health record in the clinical archive, may be identified efficiently (e.g., in less than a second, in a manner of seconds, etc.). Advantageously, the electronic health records for these historical patients may be summarized using the exemplary language models described herein, thereby providing a robust and efficient analysis of relevant, or potentially relevant, medical histories for a clinician. Additionally, the embeddings generated by the anatomy specific encoder 212 may be stored in a structured format. For example, one or more vector databases 110 may respectively correspond to one or more anatomy specific encoders 212 each trained to encode medical images of a specific anatomical region. Further, the vector database 110 may store feature vectors / embeddings in a structured format based on a corresponding anatomical region. For ease of reading herein (and not for limitation purposes), the vector database 110 and/or the anatomy specific encoder 212 may be referred to using the singular tense.

[0077] The online processes 204 may include receiving/obtaining (e.g. by the server computing device 102) a query medical image 220 at an anatomy specific encoder 222 (e.g., image embedding model 143). For example, the anatomy specific encoder 222 may be a selfsupervised image embedding model trained to encode medical images of a specific anatomical region into one or more feature vector, similar to the anatomy specific encoder 222. Further, the anatomy specific encoder 222 may generate one or more feature vectors for the query medical image 220. Depending on the implementation, the query medical image 220 may be obtained from the medical imaging device 108, the EHR database 1 12, or another clinical archive/source. In some embodiments, the anatomy specific encoder 212 may instead receive/obtain the query medical image 220 and generate the corresponding feature vector for the nearest neighbor search 224.

[0078] At block 224, a nearest neighbor search is performed for the feature vectors of the query medical image 220. As mentioned above, the vector database 1 10 may store feature vectors / embeddings in a structured format based on a corresponding anatomical region. Further, the method 200 may include querying the vector database 110 using the feature vectors generated by the anatomy specific encoder 222 to identify similar vector embeddings (e.g., embeddings corresponding to medical images specific to the anatomical region of interest and from the medical image archive 210). In some embodiments, the method 200 may include specifying additional parameters when querying the vector database 110, such as a filter specifying the query to a subset of embeddings stored on the vector database 110 corresponding to the anatomical region of interest, a filter limiting the output to a specific number of nearest embeddings, etc. Moreover, the additional parameters may be used to filter out noise vectors to improve output quality. For example, the query medical image 220 may be suspected to depict a very rare or very common condition. If the condition is suspected to be rare, the query may specify that the three nearest embeddings to the query vector are to be output. In the case of a more common condition, the query may specify that the twenty nearest embeddings to the query vector are to be output.

[0079] At block 226, the method 200 may include identifying N nearest patients associated with the embeddings from the vector database 110 responsive to the nearest neighbor search 224. Further, the method 200 may include obtaining identifying information for each of the N nearest patients. Additionally, the method 200 may include querying the EHR database 112 using the identifying information for the N nearest patients to obtain EHR data (block 230), such as radiology reports and other text data associated with the patient and/or the medical images for the patient corresponding to the responsive embeddings from the vector database 110.

[0080] The online process 204 of the method 200 also may also include generating, via language model 232 (e.g., large language model 153), a summary of the EHR data (block 234). In some embodiments, the method 200 may include processing the EHR data 230 (e.g., radiology reports, clinical notes, treatments, clinical outcomes, and/or other text data) using the language model 232 to generate a plain language summary. For example, the summary of EHR data 234 may include indications of common findings, recurring abnormalities, and/or general commonalities for the N nearest patients 226. Further, the summary of EHR data 234 may detail recurring demographic information (e.g., age, weight, sex, etc.), medication history, vital signs (e.g., blood pressure, heart rate, etc.), medical procedures, familial medical history, subject habits (e.g., smoking, drinking, etc.), or abnormalities (e.g., medical conditions, chronic illnesses, etc.) associated with the N nearest patients 226. In operation, the summary of EHR data 234 may be provided to a clinician reviewing the medical image 220 to supplement clinical review of the image. For example, the user computing device 104 may present the summary of EHR data 234 via the output device 168b. In an example use case, an image embedding may be generated in response to the medical imaging device 108 capturing a medical image, and a summary of common findings and abnormalities of the related patients may be provided to a clinician automatically. Expanding on this use case, the embeddings generated automatically for a new medical image may be stored and compared to historical embeddings for the patient to, for example, track an organs health over time and easily identify abnormalities across the embeddings for a patient.

Example Embedding Models

[0081] FIG. 3 depicts an exemplary block flow diagram for an architecture of an image embedding model 300, according to one or more aspects. In some embodiments, the image embedding model 300 (e.g., embedding models 143) may be a transformer autoencoder, such as a self-supervised autoencoder trained using contrastive learning techniques on unlabeled historical medical images. Further, the image embedding model 300 may include a linear projection layer 310 and a transformer encoder layer 320. The embedding model 300 may be implemented in part or full by either the server computing device 102, the user computing device 104, and/or the cloud APIs 1 14. Though the embedding model 300 below is described with regard to the computing environment 100, it will be recognized that any similarly suitable environment(s) and/or system(s) may be used to implement embedding model 300.

[0082] In an example scenario, the server computing device 102 obtains a medical image 302 from the medical imaging device 108. The ML operation module 152 may then identify a region of interest (e.g., a portion of the image depicting a specific organ or region) in the medical image 302 using one or more machine learning models (e.g., a CNN, a multi-modal language model 153), as described below with respect to FIG. 5. Based on the identified region of interest, the ML operation module 152 may obtain an image patch 304 corresponding to the identified region of interest in the medical image 302.

[0083] The ML operation module 152 may provide the image patch 304 to the embedding model 300, and specifically, the image patch 304 may be flattened (e.g., by the ML operation module 152 and/or by a flattening layer of the embedding model 300) into a one dimensional (1 D) array (e.g., a patch of 10 by 10 pixels with 2 color channels would be flattened into a 1 D array with 200 elements) and input to linear projection layer 310. The linear projection layer 310 may transform the 1 D array into a lower dimensional raw embedding 312 (e.g., with index values 314A-314N and color channel values 316A-316N). The raw embedding 312 may be passed to the transformer encoder layer 320 including a self-attention mechanism or layer(s) (e.g., a multi-head self-attention mechanism), one or more feed-forward network layers, one or more layer normalization layers, and/or one or more residual connections around the selfattention layers and the feed-forward layers. The transformer encoder layer 320 may output the image embedding 330 (e.g., with elements 332A-332N) for the image patch 304. The ML operation module 152 may then query the vector database 110 using the image embedding 330 and/or add the image embedding 330 to the vector database 1 10, as described with respect to FIG. 1-2 and FIG. 4.

Example Methods

[0084] Referring next to FIG. 4, a flowchart illustrates a computer-implemented method 400 for predicting abnormalities in medical images. The method 400 of FIG. 4 may be implemented by the processors 122, the processors 162, and/or other suitable processors, etc., executing instructions stored on the memories 124, the memories 164, and/or another suitable non- transitory computer readable medium, etc., described above with respect to FIG. 1 -3. Though the method below is described with regard to the computing environment 100, it will be recognized that any similarly suitable environment(s) and/or system(s) may be used to implement method 400. [0085] At block 402, the image embedding module 142 may obtain a digital medical image of an anatomical region of interest corresponding to a patient. In some embodiments, the digital medical image may be captured by medical imaging device 108, or another suitable medical imaging device, and obtained by the server computing device 102. In some embodiments, the digital medical image corresponding to the patient and the plurality of historical digital medical images are either: computed tomography (CT) images, X-ray images, magnetic resonance imaging (MRI) images, ultrasound images, or positron emission tomography (PET) images.

[0086] At block 404, the image embedding module 142 may generate a feature vector by inputting the digital medical image to a trained image embedding model (e.g., image embedding model 143), the feature vector corresponding to a plurality of image features of the digital medical image. In some embodiments, the method 400 further comprises: generating the trained image embedding model (e.g., via the image embedding module 142 and/or the ML training module 150) by training a self-supervised autoencoder on unlabeled historical medical images using contrastive learning techniques. In some embodiments, the image embedding module 142 may generate the feature vectors by transforming, via the trained image embedding model, three dimensional regions of the digital medical image to two-dimensional image features or one-dimensional image features.

[0087] At block 406, the image embedding module 142 may obtain one or more similar feature vectors from a feature vector datastore (e.g., vector database 110 of the computing environment 100) based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive. In some embodiments, the image embedding module 142 may generate a vector store query specifying the feature vector corresponding to the digital medical image and at least one vector store filter and issue the vector store query against the feature vector datastore to obtain the one or more similar feature vectors. In some embodiments, the at least one vector store filter restricts results of the vector store query to feature vectors corresponding to at least one of: the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

[0088] At block 408, the ML operation module 152 may generate at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model (e.g., language model 153). In some embodiments, the EHR includes at least one of: (i) radiology reports, (ii) clinical notes, (iii) lab results, (iv) procedures or (v) clinical outcomes. In some embodiments, the ML operation module 152 may generate the at least one summary by determining a list of identifying information corresponding to the one or more historical patients associated with the one or more similar feature vectors and obtaining the EHR associated with the one or more historical patients by querying a longitudinal patient record (LPR) with the list of identifying information. In some embodiments, the trained language model (e.g., language model 153) is generated by training a language model on a set of preprocessed EHR data. Further, the method 400 may further comprise: preprocessing a set of EHR data corresponding to a plurality of patients by: (i) tokenizing the set of EHR data, (ii) formatting the tokenized set of EHR data for input to the language model, (iii) splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and (iv) replicating radiology reports included in the set of EHR data; and fine- tuning the trained language model using the replicated radiology reports. In some embodiments, the ML operation module 152 may generate the at least one summary by inputting, with the EHR associated with the one or more historical patients, a prompt defining a summarization task to the trained language model. In some embodiments, the set of preprocessed EHR data is associated with the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

[0089] At block 410, the user computing device 104 may present, via a graphical user interface (e.g., a graphical user interface presented via the output device 168b and/or the input device 168a) and for review by a clinician, one or more of: the at least one summary, or the digital medical image. In some embodiments, the clinician is a radiologist.

Example Methods

[0090] Referring next to FIG. 5, a flowchart illustrates a computer-implemented method 500 for generating embeddings for medical images. The method 500 of FIG. 5 may be implemented by the processors 122, the processors 162, and/or other suitable processors, etc., executing instructions stored on the memories 124, the memories 164, and/or another suitable non- transitory computer readable medium, etc., described above with respect to FIG. 1 -3. Though the method below is described with regard to the computing environment 100, it will be recognized that any similarly suitable environment(s) and/or system(s) may be used to implement method 500. [0091] At block 502, the ML operation module 152 may identify an anatomical region depicted in a medical image (e.g., medical image 302 of FIG. 3, medical image 220 of FIG. 2) using one or more machine learning models, such as a convolutional neural network (CNN), a multi-modal language model (e.g., language models 153), and/or another machine learning model capable of processing image data and providing a classification of the image data. For example, the ML training module 150 may fine-tune and/or train the one or more machine learning models on general medical knowledge (e.g., information from clinical data sources), historical medical images, various medical conditions (e.g., clinical documentation related to medical conditions), etc.

[0092] At block 504, the ML operation module 152 may identify a region of interest (e.g., a bounding box containing the image patch 304 of FIG. 3) containing an organ in the medical image using the one or more machine learning models.

[0093] At block 506, the ML operation module 152 may select a trained image embedding model (e.g., a trained autoencoder) from among a plurality of image embedding models (e.g., image embedding models 143) based on one or more of: the anatomical region depicted in the medical image, or the organ contained in the region of interest. For example, the trained image embedding model may be fine-tuned and/or trained (e.g., by the image embedding model 143 and/or the ML training module 150) on medical images of a particular modality (e.g., X-ray images, CT images, etc.), medical images of a particular anatomical region, and/or image patches of a particular organ or sub-region (e.g., an image patch of a vertebrate from an image of a spine).

[0094] At block 508, the ML operation module 152 may apply the trained image embedding model to the medical image to generate an embedding for the medical image. The ML operation module 152 may then query a vector store (e.g., vector database 110) with the image embedding to retrieve similar embeddings, as described above with respect to FIG. 1-4, and/or the ML operation module 152 may add the image embedding to the vector store.

OTHER MATTERS

[0095] Although the text herein sets forth a detailed description of numerous different embodiments, it should be understood that the legal scope of the invention is defined by the words of the claims set forth at the end of this patent. The detailed description is to be construed as exemplary only and does not describe every possible embodiment, as describing every possible embodiment would be impractical, if not impossible. One could implement numerous alternate embodiments, using either current technology or technology developed after the filing date of this patent, which would still fall within the scope of the claims.

[0096] It should also be understood that, unless a term is expressly defined in this patent using the sentence “As used herein, the term ‘ ’ is hereby defined to mean...” or a similar sentence, there is no intent to limit the meaning of that term, either expressly or by implication, beyond its plain or ordinary meaning, and such term should not be interpreted to be limited in scope based upon any statement made in any section of this patent (other than the language of the claims). To the extent that any term recited in the claims at the end of this disclosure is referred to in this disclosure in a manner consistent with a single meaning, that is done for sake of clarity only so as to not confuse the reader, and it is not intended that such claim term be limited, by implication or otherwise, to that single meaning.

[0097] Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

[0098] Additionally, certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (code embodied on a non-transitory, tangible machine-readable medium) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.

[0099] In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC) to perform certain operations). A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.

[00100] Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.

[00101] Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).

[00102] The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.

[00103] Similarly, the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of geographic locations.

[00104] Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing," “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.

[00105] As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.

[00106] Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. For example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.

[00107] As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

[00108] In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the description. This description, and the claims that follow, should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.

[00109] Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for the approaches described herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.

[00110] The particular features, structures, or characteristics of any specific embodiment may be combined in any suitable manner and in any suitable combination with one or more other embodiments, including the use of selected features without corresponding use of other features. In addition, many modifications may be made to adapt a particular application, situation or material to the essential scope and spirit of the present invention. It is to be understood that other variations and modifications of the embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered part of the spirit and scope of the present invention.

[00111] While the preferred embodiments of the invention have been described, it should be understood that the invention is not so limited and modifications may be made without departing from the invention. The scope of the invention is defined by the appended claims, and all devices that come within the meaning of the claims, either literally or by equivalence, are intended to be embraced therein. [00112] It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.

[00113] Furthermore, the patent claims at the end of this patent application are not intended to be construed under 35 U.S.C. § 112(f) unless traditional means-plus-function language is expressly recited, such as “means for” or “step for” language being explicitly recited in the claim(s). The systems and methods described herein are directed to an improvement to computer functionality, and improve the functioning of conventional computers.

Claims

WHAT IS CLAIMED:

1 . A computer-implemented method for predicting abnormalities in medical images, the method comprising: obtaining, by one or more processors, a digital medical image of an anatomical region of interest corresponding to a patient; generating, by the one or more processors, a feature vector by inputting the digital medical image to a trained image embedding model, the feature vector corresponding to a plurality of image features of the digital medical image; obtaining, by the one or more processors, one or more similar feature vectors from a feature vector datastore based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; generating, by the one or more processors, at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and presenting, via a graphical user interface and for review by a clinician, one or more of: the at least one summary, or the digital medical image.

2. The computer-implemented method of claim 1 , further comprising: generating the trained image embedding model by training a self-supervised autoencoder on unlabeled historical medical images using contrastive learning techniques.

3. The computer-implemented method of claim 2, wherein generating the feature vector comprises: transforming, via the trained image embedding model, three dimensional regions of the digital medical image to two-dimensional image features or one-dimensional image features.

4. The computer-implemented method of claim 1 , further comprising: generating, by the one or more processors, a vector store query specifying the feature vector corresponding to the digital medical image and at least one vector store filter; and issuing, by the one or more processors, the vector store query against the feature vector datastore to obtain the one or more similar feature vectors.

5. The computer-implemented method of claim 4, wherein the at least one vector store filter restricts results of the vector store query to feature vectors corresponding to at least one of: the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

6. The computer-implemented method of claim 1 , wherein the digital medical image corresponding to the patient and the plurality of historical digital medical images are either: computed tomography (CT) images, X-ray images, magnetic resonance imaging (MRI) images, ultrasound images, or positron emission tomography (PET) images.

7. The computer-implemented method of claim 6, wherein the clinician is a radiologist.

8. The computer-implemented method of claim 1 , wherein generating the at least one summary comprises: determining, by the one or more processors, a list of identifying information corresponding to the one or more historical patients associated with the one or more similar feature vectors; and obtaining, by the one or more processors, the EHR associated with the one or more historical patients by querying a longitudinal patient record (LPR) with the list of identifying information.

9. The computer-implemented method of claim 8, wherein the EHR includes at least one of: (i) radiology reports, (ii) clinical notes, (iii) lab results, (iv) procedures or (v) clinical outcomes.

10. The computer-implemented method of claim 1 , wherein the trained language model is generated by training a language model on a set of preprocessed EHR data, and the method further comprises: preprocessing, by the one or more processors, a set of EHR data corresponding to a plurality of patients by: (i) tokenizing the set of EHR data, (ii) formatting the tokenized set of EHR data for input to the language model, (iii) splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and (iv) replicating radiology reports included in the set of EHR data; and fine-tuning, by the one or more processors, the trained language model using the replicated radiology reports.

11 . The computer-implemented method of claim 10, wherein generating the at least one summary comprises: inputting, by the one or more processors and with the EHR associated with the one or more historical patients, a prompt defining a summarization task to the trained language model.

12. The computer-implemented method of claim 10, wherein the set of preprocessed EHR data is associated with the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

13. A computing system for predicting abnormalities in medical images, the computing system comprising: one or more medical imaging devices; one or more processors; and one or more non-transitory memories, the one or more non-transitory memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to: obtain, from a medical imaging device, a digital medical image of an anatomical region of interest corresponding to a patient; generate a feature vector by inputting the digital medical image to a trained image embedding model, the feature vector corresponding to a plurality of image features of the digital medical image; obtain one or more similar feature vectors from a feature vector datastore based on the feature vector corresponding to the digital medical image, the feature vector datastore including a plurality of feature vectors corresponding to a plurality of historical digital medical images included in a clinical archive; generate at least one summary of common findings and abnormalities of one or more historical patients associated with the one or more similar feature vectors by inputting electronic health records (EHR) associated with the one or more historical patients to a trained language model; and present, via a graphical user interface and for review by a clinician, one or more of: the at least one summary, or the digital medical image.

14. The computing system of claim 13, the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, cause the one or more processors to: generate the trained image embedding model by training a self-supervised autoencoder on unlabeled historical medical images using contrastive learning techniques.

15. The computing system of claim 14, the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, generate the feature vector by causing the one or more processors to: transform, via the trained image embedding model, three dimensional regions of the digital medical image to two-dimensional image features or one-dimensional image features.

16. The computing system of claim 13, the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, cause the one or more processors to: generate a vector store query specifying the feature vector corresponding to the digital medical image and at least one vector store filter; and issue the vector store query against the feature vector datastore to obtain the one or more similar feature vectors.

17. The computing system of claim 16, wherein the at least one vector store filter restricts results of the vector store query to feature vectors corresponding to at least one of: the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.

18. The computing system of claim 13, wherein the digital medical image corresponding to the patient and the plurality of historical digital medical images are either: computed tomography (CT) images, X-ray images, magnetic resonance imaging (MRI) images, ultrasound images, or positron emission tomography (PET) images.

19. The computing system of claim 18, wherein the clinician is a radiologist.

20. The computing system of claim 13, the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, generate the at least one summary by causing the one or more processors to: determine a list of identifying information corresponding to the one or more historical patients associated with the one or more similar feature vectors; and obtain the EHR associated with the one or more historical patients by querying a longitudinal patient record (LPR) with the list of identifying information.

21 . The computing system of claim 20, wherein the EHR includes at least one of: (i) radiology reports, (ii) clinical notes, (iii) lab results, (iv) procedures or (v) clinical outcomes.

22. The computing system of claim 13, wherein the trained language model is generated by training a language model on a set of preprocessed EHR data, and wherein the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, cause the one or more processors to: preprocess a set of EHR data corresponding to a plurality of patients by: (i) tokenizing the set of EHR data, (ii) formatting the tokenized set of EHR data for input to the language model, (iii) splitting the formatted EHR data into an EHR training data set and an EHR testing data set, and (iv) replicating radiology reports included in the set of EHR data; and fine-tune the trained language model using the replicated radiology reports.

23. The computing system of claim 22, the one or more non-transitory memories having stored thereon computer executable instructions that, when executed by the one or more processors, generate the at least one summary by causing the one or more processors to: input, with the EHR associated with the one or more historical patients, a prompt defining a summarization task to the trained language model.

24. The computing system of claim 22, wherein the set of preprocessed EHR data is associated with the anatomical region of the digital medical image, a modality of the digital medical image, a sex of the patient, a medical condition of the patient, an organ of interest, or an age group.