WO2022066725A1

WO2022066725A1 - Training end-to-end weakly supervised networks in a multi-task fashion at the specimen (supra-image) level

Info

Publication number: WO2022066725A1
Application number: PCT/US2021/051495
Authority: WO
Inventors: Julianna IANNI; Saul KOHN; Sivaramakrishnan SANKARAPANDIAN; Rajath Elias SOANS
Original assignee: Proscia Inc.
Priority date: 2020-09-23
Filing date: 2021-09-22
Publication date: 2022-03-31

Abstract

Examples may provide an electronic neural network that has been trained using at least one supra-image associated with a supra-image label indicating a pathological class by: forward propagating at least one batch of components through the electronic neural network, where the electronic neural network includes a plurality of task-specific branches, one task-specific branch corresponding to each of a plurality of binary pathological classification tasks; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, where the overall loss function includes a task-specific loss function for each task; and updating weights of the electronic neural network. The electronic neural network is configured to provide an output pathological class of the plurality of pathological classes for an input supra-image.

Description

TRAINING END-TO-END WEAKLY SUPERVISED NETWORKS IN A MULTI-TASK FASHION AT THE SPECIMEN (SUPRA-IMAGE) LEVEL

Related Application

[0001] This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/082,276, filed September 23, 2020, and entitled, “Training End-To-End Weakly Supervised Networks in a Multi-Task Fashion at the Specimen (Supra-lmage) Level”, which is hereby incorporated by reference herein in its entirety.

Field

[0002] This disclosure relates generally to machine learning, e.g., in the context of pathology, such as dermatopathology.

Background

[0003] Much recent research has advanced the application of deep learning techniques for classification problems in digital pathology, satellite imaging, and other fields that use gigapixel images with weak labels (e.g., labels at the level of the supraimage). While until recently, deep learning techniques for these applications required time-consuming pixel-wise annotations of positive regions of interest within these images for training, now multiple-instance learning techniques allow division of one or several images into patches or tiles, which are then treated as instances or components when training neural networks in this paradigm; but one label at an image, or specimen (or supra-image, comprising several images) level is required.

[0004] A supra-image level weak label provides information for its supra-image rather than the individual constituent images. Labelling constituent images with the weak label from their supra-image and using them separately can introduce noise into the training process, because one constituent image may contain no features relevant to its label. That is, ignoring the connection of the constituent images that form a supra-image and simply using them separately to train a machine learning classifier produces inaccurate and unsatisfactory results.

[0005] In the past, weakly-supervised networks have been trained to perform a single task on a single image or image sub-region. The task is often either classification, where input data is sorted into one or more output classes, or regression, where a single number (e.g., a probability of a particular classification) is predicted based on the input data. If two classifications on the same input data are desired, then two networks, one for each classification, would be required, according to past techniques.

Summary

[0006] According to various embodiments, a computer-implemented method of classifying a novel supra-image as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks is presented. The method includes: receiving the novel supra-image; providing the novel supra-image to the electronic neural network that has been trained using a training dataset including at least one supra-image, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image including a plurality of images, each image corresponding to a plurality of components, where the training dataset provides at least one batch of components, where the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, where the electronic neural network includes a plurality of task-specific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch including a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, where each task-specific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, where the overall loss function includes a task-specific loss function for each task of the plurality of binary pathological classification tasks, where task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, where the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

[0007] Various optional features of the above embodiments include the following. The plurality of binary pathological classification tasks can include: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk. The plurality of binary pathological classification tasks can include: atypical vs. benign, atypical vs. malignant, and benign vs. malignant. The plurality of binary pathological classification tasks can include: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score. The plurality of binary pathological classification tasks can include: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification. The plurality of binary pathological classification tasks can include: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis. The plurality of binary pathological classification tasks can include: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response. The plurality of pathological classes can consist of a number c of pathological classes, and the multiple pathological tasks can consist of a number c(c- 1 )/2 of binary classification tasks. Each component can include a feature vector. The plurality of pathological classes can include a plurality of dermatopathological classes. [0008] According to various embodiments, a system for classifying a novel supra-image as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks is presented. The system includes a processor; and a memory communicatively coupled to the processor, the memory storing instructions which, when executed on the processor, perform operations including: receiving the novel supra-image; providing the novel supra- image to the electronic neural network that has been trained using a training dataset including at least one supra-image, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image including a plurality of images, each image corresponding to a plurality of components, where the training dataset provides at least one batch of components, where the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, where the electronic neural network includes a plurality of task-specific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch including a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, where each task-specific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, where the overall loss function includes a task-specific loss function for each task of the plurality of binary pathological classification tasks, where task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, where the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

[0009] Various optional features of the above embodiments include the following. The plurality of binary pathological classification tasks can include: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk. The plurality of binary pathological classification tasks can include: atypical vs. benign, atypical vs. malignant, and benign vs. malignant. The plurality of binary pathological classification tasks can include: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score. The plurality of binary pathological classification tasks can include: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification. The plurality of binary pathological classification tasks can include: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis. The plurality of binary pathological classification tasks can include: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response. The plurality of pathological classes can consist of a number c of pathological classes, and the multiple pathological tasks can consist of a number c(c- 1 )/2 of binary classification tasks. Each component can include a feature vector. The plurality of pathological classes can include a plurality of dermatopathological classes.

Drawings

[0010] The above and/or other aspects and advantages will become more apparent and more readily appreciated from the following detailed description of examples, taken in conjunction with the accompanying drawings, in which:

[0011] Fig. 1 is a schematic diagram depicting an example supra-image, its constituent images, a tiling of one of its constituent images, and vector representations of the tiles of the constituent image according to various embodiments;

[0012] Fig. 2 is a schematic diagram of an architecture of a system that uses a weakly-supervised neural network to perform multiple tasks according to various embodiments;

[0013] Fig. 3 is a flow diagram for a method of iteratively training, at the supra- image level, a neural network to classify supra-images according to various embodiments;

[0014] Fig. 4 is a flow diagram for a method of automatically classifying a supra- image according to various embodiments;

[0015] Fig. 5 is a schematic diagram of a hardware computer system suitable for implementing various embodiments;

[0016] Fig. 6 is a schematic diagram of the system architecture of an example reduction to practice;

[0017] Fig. 7 is a schematic diagram representing a hierarchical classification technique implemented by the reduction to practice of Fig. 6; [0018] Fig. 8 depicts receiver operating characteristic curves for the neural networks implemented by the reduction to practice of Fig. 6;

[0019] Fig. 9 depicts a chart comparing reference lab performance on the same test set when trained on consensus and non-consensus data; and

[0020] Fig. 10 depicts a chart depicting mean and standard deviation sensitivity to melanoma versus percentage reviewed for 1 ,000 simulated sequentially accessioned datasets, drawn from reference lab confidence scores.

Description of the Embodiments

[0021] Reference will now be made in detail to example implementations. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the invention. The following description is, therefore, merely exemplary.

[0022] I. Introduction

[0023] Embodiments can use weakly-labeled supra-images to train a machine learning algorithm, such as an electronic neural network, in a manner that provides superior classification results in comparison to existing techniques. Moreover, some embodiments provide a single network that can be trained to perform multiple tasks on supra-images in a weakly supervised manner using multi-instance learning. While previous work in multiple-instance learning has trained a single network to perform a single task, such as classification or regression for single images or image subregions, some embodiments extend the multiple-instance framework for a single network to an arbitrary number of tasks, which can be trained at the same time as one another, and be used to predict different attributes of the supra-image input data simultaneously. Further, some embodiments extend multiple-instance learning for supra-images to an arbitrary variety of tasks, allowing the network to learn to both classify and regress, based on the same input data and weak label. Embodiments may be applied to provide pathology classifications in the medical field, to classify features in satellite images, or in any other field that involves detecting and/or classifying features in sets of images. [0024] Fig. 1 is a schematic diagram 100 depicting an example supra-image 102, its constituent images 104, a tiling 108 of one of its constituent images 106, and vector representations 112 of the tiles of the constituent image 106 according to various embodiments. As used herein, the term “supra-image” includes one or more constituent images of a specimen. The specimen may be a medical specimen, a landscape specimen, or any other specimen amenable to image capture. For example, a supra-image may represent images from a single resection or biopsy (the supra-image) constituting several slides (the constituent images). As another example, the supra-image may be a three-dimensional volume representing the results of a radiological scan such as a Computer Tomography (CT) or Magnetic Resonance Imaging (MRI) scan, and the constituent images may include two- dimensional slices of the three-dimensional volume. Within the domain of digital pathology, the images forming a supra-image may be of tissue stained with Hematoxylin and Eosin (H&E), and a label may be associated with the supra-image, for example, the diagnosis rendered by the pathologist. Frequently, more tissue is cut than can be scanned in a single slide - this is especially frequent for suspected malignant cases - and several images may share the same weak label. A supra- image may be of any type of specimen in any field, not limited to pathology, e.g., a set of satellite images.

[0025] As shown in, Fig. 1 , supra-image 102 may represent a three-dimensional volume, by way of non-limiting examples. Supra-image 102 may be, for example, a representation of a CT or MRI scan. Images 104 represent the constituent images of supra-image 102. By way of non-limiting examples, images 104 may be slices derived from, or used to derive, a CT or MRI scan, or may be whole-slide images, e.g., representing multiple images from a biopsy of a single specimen.

[0026] In general, when processed by a computer, due to hardware volatile memory storage limitations, each constituent image of a supra-image may be broken down into a number of tiles, which may be, e.g., 128 pixels by 128 pixels. As shown in Fig. 1 , image 106 of constituent images 104 may be partitioned into tiles, such as tile 110, to form partitioned image 108.

[0027] In general, an individual tile may be represented by one or more corresponding feature vectors. Such feature vectors may be obtained from tiles using a separate neural network, trained to produce feature vectors from tiles. Each such feature vector may encode the presence or absence of one or more features in the tile that it represents. Each feature vector may be in the form of a tuple of numbers. As shown in Fig. 1 , feature vectors 112 represent the tiles of partitioned image 108. For example feature vector 114 may correspond to and represent a presence or absence of a particular feature in tile 110.

[0028] Both tiles and their representative feature vectors are examples of “components” as that term is used herein. According to some embodiments, each component is implemented as a tile of a constituent image of a supra-image. According to some embodiments, each component is implemented as a vector, such as a feature vector, that represents and corresponds to a respective tile in a constituent image of a supra-image.

[0029] While previous work in multiple-instance learning has been limited to training at the level of small image patches, or subsets of an image identified by a preprocessing step or network, this disclosure extends tile-based multiple-instance learning training to the supra-image level, which does not require selecting out small regions of interest. Moreover, this disclosure extends multiple-instance learning to an arbitrary number and type of classification tasks. While embodiments may be applied within the domain of digital pathology, the supra-image methods disclosed herein generalize to other fields with problems that involve several images with shared labels, such as time series of satellite images. These and other features and advantages are disclosed in detail herein.

[0030] II. Description of the Problem

[0031] Datasets that contain large numbers of high-resolution images, such as neural network training corpora, can be extremely costly to annotate in detail. A timesaving and cost-saving alternative to annotations is to supply weak labels to the images or supra-images, simply stating whether or not certain features are present.

[0032] In past work, weakly-supervised networks were trained to operate either only in the specific case of a weak label per-image, or using a downstream classifier or alternative numerical method to combine the output of a weakly-supervised classifier from the image level to the supra-image level. The former case restricts the usability of a trained network, while the latter relies on two models’ or methods’ performance to generate and combine image-level classifications to produce a representative supra-image level classification.

[0033] None of these prior methods of artificial intelligence training allow for training based on how diagnoses are made in clinical practice, where the pathologist renders a diagnosis for each specimen only, not for each individual slide pertaining to that specimen. This diagnosis may be stored in an electronic clinical records system, such as a Laboratory Information System (“LIS”), a Laboratory Information Management System (“LI MS”), an Electronic Medical Record (“EMR”) system. By abstracting training to the specimen level, some embodiments provide a training method that may operate on diagnoses made straight from an electronic clinical records system, without the requirement of human intervention to label relevant slides. That is, some embodiments may use as a training corpus of supra-images with weak labels taken from diagnoses stored in an electronic clinical records system.

[0034] Current frameworks for performing machine learning analyses of collections of images often involve training a neural network to predict numeric values or class labels for a given input. If two different predictions are required for a single input, it is possible to train a single model to perform both predictions, using backpropagation on the sum of their errors to try to achieve accuracy in each prediction. However, if the two prediction tasks are very different in nature, or one is significantly more challenging than the other, training a network in this way could be detrimental to the performance of both tasks. Multi-task learning attempts to account for this risk of detriment, where the model performs, and is judged on, one or more tasks at a time during training, but can perform all tasks during inference.

[0035] Multi-task learning often improves performance when compared to multiple models performing the tasks individually. This may be because the tasks can act as implicit regularizers, stopping one task from overfitting, since other tasks require the same input data representation. Also, some tasks may be easier than others to learn, and training them together can lead to a faster convergence on useful data representations for all of the tasks. Further, training models in a multi-task framework means that there are fewer models for a team/organization to verify and maintain. [0036] III. Description of Example Embodiments

[0037] Some embodiments provide weakly-supervised multiple-instance multitask learning at the supra-image level. Some embodiments train a neural network in a weakly-supervised fashion, using collections of components from images constituting supra-images as the input data, with a single label per collection. Some embodiments provide a trained neural network that is able to predict multiple different attributes of an input supra-image simultaneously. Some embodiments provide a trained network that is capable of performing an arbitrary number and variety of tasks, e.g., both classification and regression. Some embodiments utilize multi-task learning as a method specifically of handling noisy data labels (e.g., training corpora in which some small proportion of labels, such as less than 1 %, are incorrect, or labels associated with phenomena that have no objective ground truth, such as cancer risk categories, where different human classifiers may arrive at different classifications for the same data).

[0038] Thus, some embodiments provide, for the first time, the ability to: implement weak supervision at the specimen-level, use multi-task learning to minimize the impact of noisy labels, and perform multiple tasks in a neural network to solve problems, e.g., in the domain of dermatopathology. These and other features and advantages are described in detail herein.

[0039] Fig. 2 is a schematic diagram of an architecture of a system 200 that uses a weakly-supervised neural network to perform multiple tasks according to various embodiments. As shown, system 200 includes a neural network with shared representation layers 204, followed by individual task-specific representation layers 206, each of which feeds into a respective instance (component) aggregation layer 208, each of which is coupled to a respective output layer 210. During training, the weights and biases of the various layers of system 200 are determined, as described in detail herein in reference to Fig. 3. During operation to process a novel supra-image (e.g., performing classification and/or regression), one or more attributes of a novel input supra-image are determined by the trained system 200, as described in detail herein in reference to Fig. 4. System 200 may be implemented using the hardware of system 500, as shown and described herein in reference to Fig. 5, for example. [0040] To minimize the effects of label noise in a multi-class classification problem (e.g., for diagnostic stratification of melanocytic specimens in dermatopathology), some embodiments forego the typical multi-class formulation, in which the network is trained to accurately separate c classes simultaneously. Instead, for c > 3 classes, some embodiments break the problem into c(c-1)/2 binary (i.e., two- class) classification tasks in a multi-task framework. In this way, such embodiments can divide the problem into tasks such that each subnetwork of the model can be trained to focus its attention on only the features necessary to distinguish between a single pair of classes at a time. For example, rather than training a network to distinguish into three categories of, for example, high-risk, intermediate-risk and low- risk specimens, some embodiments are trained to individually identify the boundaries between each pair of classes: high and intermediate, low and intermediate, and low and high. Identifying boundaries between binary classification tasks can improve performance when there is class-dependent noise in the set of labels used to train the model (e.g., often an input which is labeled “high risk” is actually “intermediate risk”, often an “intermediate risk” specimen will be mislabeled as “low risk”, but rarely is a “high risk” specimen labeled as “low risk”). Thus, for example, for any three classes A, B, and C that a supra-image can be alternatively classified as, system 200 may include a first output layer 212 that distinguishes between class A and class B, a second output layer 214 that distinguishes between class B and class C, and a third output layer 216 that distinguishes between class A and class C. More generally, system may include c(c-1 )/2 output layers 210 for classifying a supra-image into c > 3 classes.

[0041] Thus, by way of non-limiting example, and embodiment can classify a supra-image into one of three melanocytic risk classes, where melanocytic risk may be characterized as high for cancerous invasive melanoma, medium for melanoma that has not spread past the dermis in situ, and low for benign or dysplastic conditions. Other examples of three-class classifications suitable for implementations include: a first, second, and third Gleason score (e.g., Gleason 2-4, 5-7, or 8-10); a first, second, and third survival quantification (e.g., low, medium, or high risk, correlating to varying expected survival time in months, such as less than 3 months, 3 months to 12 months, or greater than 12 months); a first, second, and third prognosis (e.g., recovery, hospitalization, or death); and a first, second, and third drug response (e.g., nonresponder, moderate responder, strong responder). Note, however, that embodiments are not limited to three classes; any number of classes of may be considered.

[0042] Note also that embodiments can utilize (and predict) more than one weak label per supra-image. For example, three branches of the network could be used to distinguish between melanocytic high/medium/low risk, while a fourth branch could be used to predict some other number, e.g., a survival quantification.

[0043] Due to electronic hardware memory limitations, some embodiments provide a framework in which each image in a supra-image is divided into a mosaic of tiles, e.g., squares of 128 pixels-per-side. A sampled collection of such tiles, or feature vector representations thereof, small enough to be stored in the available volatile memory of the training computer, and labeled with the label of the supra-image from which the tiles are obtained, may serve as a single element of the training corpus for weakly-supervised iterative training according to various embodiments. Multiple such labeled collections of components may comprise a full training corpus. No region-of- interest need be identified. An example iterative training technique that accommodates current hardware volatile memory limitations is shown and described presently in reference to Fig. 3.

[0044] Fig. 3 is a flow diagram for a method 300 of iteratively training, at the supra-image level, a neural network to classify supra-images, according to various embodiments. Method 300 may be implemented using system architecture as shown and described herein in reference to Fig. 2, as instantiated by system 500, as shown and described herein in reference to Fig. 5.

[0045] During training, an embodiment iteratively accepts as input collections of components of a supra-image from a training corpus of supra-images. Current hardware (e.g., Graphical Processing Units or GPUs) commonly used to train neural networks cannot always hold all the image tiles from a supra-image or constituent image at once due to Random Access Memory (RAM) limitations. For example, each image of a supra-image is typically too large to feed into the hardware used to hold and train the deep learning neural network. Therefore, some embodiments train a weakly supervised neural network at the supra-image level, within these hardware limitations, by sampling (e.g., randomly sampling) components from constituent images of supra-images into collections of components that are close to the maximum size the hardware is able to hold in RAM.

[0046] The random sampling may not take into account which image from a supra-image the components are drawn from; components may be randomly drawn without replacement from a common pool for the supra-image. The sampling can be performed several times for a given supra-image, creating more than one collection to train with for a given supra-image. Multiple such collections may form a partition of a given supra-image; that is, the set-theoretic union of the collections from a single supra-image may cover the entire supra-image, and the set-theoretic intersection of such collections may be empty.

[0047] Turning to Fig. 3, at block 302, method 300 accesses a training corpus of supra-images. The supra-images may be in any field of interest. The supra-images include or may be otherwise associated with weak labels. The supra-images and weak labels may be obtained from an electronic clinical records system, such as an LIS. The supra-images maybe accessed over a network communication link, or from electronic persistent memory, by way of non-limiting examples. The training corpus may include hundreds, thousands, or even tens of thousands or more supra-images.

[0048] At 304, method 300 selects a batch of supra-images for processing. In general, the training corpus of supra-images with supra-image level labels to be used for training is divided into one or more batches of one or more supra-images. In general, during training, the loss incurred by the network is computed over all batches through the actions of 304, 306, 308, 310, 312, and 314. The losses over all of the batches are accumulated e.g., according to the Overall Loss, described in detail below, and then the weights and biases of the network are updated, at which point the accumulated loss is reset, and the process repeats until the iteration is complete.

[0049] At 306, method 300 samples, e.g., randomly samples, a collection of components from the batch of supra-images selected at 304. In general, each batch of supra-images is identified with a respective batch of collections of components, where each collection of components includes one or more components sampled, e.g., randomly sampled, from one or more images from a single supra-image in the batch of supra-images. Thus, the term “batch” may refer to both a batch of one or more supra-images and a corresponding batch of collections of components from the batch of one or more supra-images. Embodiments may not take into account which constituent image a given component in a collection comes from; components in the collection may be randomly drawn without replacement from a common pool for a given supra-image. Each collection of components is labeled according to the label(s) of the supra-image making up the images from which the components from the collection are drawn. The components may be tiles of images within the selected supra-image batch, or may be feature vectors representative thereof. The collections of components, when implemented as tiles, may form a partition of a given supra- image, and when implemented as vectors, the corresponding tiles may form a partition.

[0050] Embodiments may iterate through a single batch, i.e., a batch of collections of components, through the actions of 306, 308, and 310, until all components from the images of the supra-images for the batch are included in some collection of components that is forward propagated through the network. Embodiments may iterate through all of the batches through the actions of 304, 306, 308, 310, 312, and 314 to access the entire training dataset to completely train a network.

[0051] At 308, the collection of components sampled at 306 is forward propagated through the neural network to compute loss. Briefly, when the collection of components that is forward propagated through the multiple-instance learning neural network, the network's prediction is compared to the weak label for the collection. The more incorrect it is, the larger the loss value. Such a loss value is accumulated each time a collection of components is propagated through the network, until all collections of components in the batch are used and the overall loss for that batch is determined. The actions outlined in this paragraph are elaborated upon and described in detail presently.

[0052] The network will have at least one layer for a shared data representation of a component, which is subsequently passed to the task-specific branches. Each task-specific branch could, in of itself, represent a weakly-supervised neural network. It includes a number of neural network layers, followed by an aggregation of the component representations, and layers corresponding to the final output. [0053] The prediction of a given task t and batch b is denoted y_{b t}. The sampled collection of components is presented to the network, with a weak label y. K batch of collections of size N_b will have a list of weak labels y_b. The label y corresponds to the correct prediction for at least one of the tasks. Note that some labels may be irrelevant to some tasks.

[0054] Once all of the task predictions are obtained, the overall loss is determined. The overall loss may be characterized as follows, by way of non-limiting example: 112

In the above Overall Loss formula, the parameters may be characterized as follows. [0055] In the Overall Loss formula, the term N_b represents the number of collections in a batch, the term N_t represents the number of batches in the training corpus, and the term N_w represents the number of weights in the network.

[0056] The term a_t represents the weight assigned to every prediction in task t. This can be thought of as the overall importance of a given task. This importance governs the extent to which each task contributes to the overall loss, and therefore the relative extent to which performance at each task is optimized during training. For example, in various embodiments, a_t may represent the ranked importance of each task in terms of clinical importance (e.g., which tasks are most critical). Other embodiments may use a larger a_t value for a task t that typically has lower performance in comparison to other tasks. (For example, the inventors have seen in practice that melanocytic high vs. medium risk is a more difficult task for the model to perform. If this performance is valued above other tasks, an embodiment could increase its a_t value or correspondingly decrease the alpha values associated with the other tasks). Without prior knowledge of these requirements or otherwise, a_t may be set to one for all tasks t.

[0057] The term p_{b t} represents the weight assigned on a batch-by-batch basis for task t. This corresponds to the importance of a given task-batch combination. For a binary classification task that does not correspond to the ground-truth class of data in a batch, this value can be set to zero to ignore the prediction of that task arm in the overall loss computation. For example, for the task of classifying between high-risk and medium-risk, batch data classified as low-risk may be masked by being weighted zero.

[0058] The term c_{b t}H_t(y_{b t}, y_{b t>}) represents the weighted cost function for a particular task t. The cost function, on a basic level, calculates how wrong a prediction is - producing a larger number for a worse answer - in order for the loss function to update the model weights proportionally. In binary classification, there are several commonly used cost functions, such as binary cross-entropy. The parameters for this weighted function are y_{b t} , the predicted value, y_{b t}, the ground-truth, and c_{b t}, the class weight associated with that task: the relative proportion of the number of times that task will be calculated, given the ground-truth values in the dataset. For example, for a dataset that is 80% label A, 10% label B, and 10% label C, where the tasks are binary classification from among these labels, the B vs. C task may be weighted more than the A vs. B and A vs. C tasks.

[0059] The term A represents the weight assigned to l_2-regularization, which corresponds to stopping the model from overfitting by taxing the cumulative size of the weights in the network by this amount. The term

| |w |₂ represents the L2 norm of the weights in the network, where / indexes weights and N_w is the total number of weights in the network.

[0060] At 310, method 300 determines whether there are additional collections of components from the batch selected at 304 that have not yet been processed. If so, control reverts to 306, where another collection of components is selected for processing as described above. If not, then control passes to 312.

[0061] At 312, method 300 back propagates the accumulated loss to update the weights and biases of the neural network. That is, after iterating through the collections of components from a single batch, the neural network weights and biases are updated according to the magnitude of the aggregated loss. Method 300 may implement gradient descent to perform actions of 312. The actions of 312 may repeat over all batches in the dataset.

[0062] Thus, at 314, method 300 determines whether there are additional batches of supra-images from the training corpus accessed at 302 that have not yet been processed during the current iteration. Embodiments may iterate over the batches to access the entire training dataset. If additional batches exist, then control reverts to 304, where another batch of one or more supra-images is selected. Otherwise, control passes to 316. The repetitions may continue, e.g., until convergence, in order to train the network for optimal performance across all tasks.

[0063] At 316, once all collections of components from all batches of supra- images are processed according to 304, 306, 308, 310, 312, and 314, a determination is made as to whether an additional epoch is to be performed. In general, each iteration over all batches of supra-images in the training corpus may be referred to as an “epoch”. Embodiments may train the neural networks for hundreds, or even thousands or more, of epochs.

[0064] At 318, method 300 provides the neural network that has been trained using the training corpus accessed at 302. Method 300 may provide the trained neural network in a variety of ways. According to some embodiments, the trained neural network is stored in electronic persistent memory. According to some embodiments, the neural network is made available on a network, such as the internet. According to some such embodiments, an interface to the trained neural network is provided, such as a Graphical User Interface (GUI) or Application Program Interface (API).

[0065] Fig. 4 is a flow diagram for a method 400 of automatically classifying a supra-image according to various embodiments. Method 400 may use a neural network trained according to method 300 as shown and described herein in reference to Fig. 3. Method 400 may be implemented by system 500, as shown and described herein in reference to Fig. 5.

[0066] At 402, a supra-image is obtained. The supra-image may be in any field. The supra-image may be obtained over a network link or by retrieval from persistent storage, by way of non-limiting example.

[0067] At 404, the neural network is applied to the supra-image obtained at 402. To do so, the supra-image may be broken down into parts (e.g., components or sets of components) and the parts may be individually passed through the network up to a particular layer, where the features from the various parts are aggregated, and then the parts are passed through to a further particular layer, where the features are again aggregated, until all parts are iteratively passed and all features aggregated such that one or more outputs are produced. [0068] Three (or more) output layers may be present (e.g., as shown and described herein in reference to Fig. 2). In operation, each such layer provides an output. These outputs may be independently useful. That is, depending on usage, it is possible in some cases that all outputs are of interest. Alternately, the multiple outputs may be synthesized to produce a final, single output. According to some embodiments, e.g., that assess cancer or other risk, post-processing of the output layers’ output is instituted to obtain a single output from the network, where the single output reflects (broadly) a score, e.g., a severity score, between zero and one, inclusive. According to such embodiments, the higher the score, the more likely the specimen is (according to the model) to contain an invasive melanoma (rather than a benign nevus, at the other end of the spectrum). Such embodiments may obtain the score by synthesizing an output from two or more of the multiple tasks. For example, some embodiments synthesize the outputs from tasks low-risk vs. high-risk and low- risk vs. intermediate-risk. Such embodiments may take the maximum score of these two tasks to assign a severity score, and use logic based on these scores to assign an output classification, e.g., Melanocytic (lower risk), Suspect, or High Risk.

[0069] At 406, method 400 provides the output. The output may be provided by displaying a corresponding datum to a user of method 400, e.g., on a computer monitor. Such a datum may indicate the presence or absence of a feature of interest in the supra-image, by way of non-limiting example.

[0070] Fig. 5 is a schematic diagram of a hardware computer system 500 suitable for implementing various embodiments. For example, Fig. 5 illustrates various hardware, software, and other resources that can be used in implementations of method 200 as shown and described herein in reference to Fig. 2, method 300 as shown and described herein in reference to Fig. 3, and/or method 400 as shown and described herein in reference to Fig. 4. System 500 includes training corpus source 520 and computer 501. Training corpus source 520 and computer 501 may be communicatively coupled by way of one or more networks 504, e.g., the internet.

[0071] Training corpus source 502 may include an electronic clinical records system, such as an LIS, a database, a compendium of clinical data, or any other source of supra-images suitable for use as a training corpus as disclosed herein. [0072] Computer 501 may be implemented as any of a desktop computer, a laptop computer, can be incorporated in one or more servers, clusters, or other computers or hardware resources, or can be implemented using cloud-based resources. Computer 501 includes volatile memory 514 and persistent memory 512, the latter of which can store computer-readable instructions, that, when executed by electronic processor 510, configure computer 501 to perform any of methods 200, 300, and/or 400, as shown and described herein. Computer 501 further includes network interface 508, which communicatively couples computer 501 to training corpus source 502 via network 504. Other configurations of system 500, associated network connections, and other hardware, software, and service resources are possible.

[0073] IV. Example Reduction to Practice

[0074] This Section presents an example reduction to practice. The example reduction to practice is configured to perform hierarchical classification of digitized whole-slide image specimens into six classes defined by their morphological characteristics, including classification of “Melanocytic Suspect” specimens likely representing melanoma or severe dysplastic nevi. The reduction to practice was trained on 7,685 images from a single lab (the reference lab), including the largest set of triple-concordant melanocytic specimens compiled to date, and tested the system on 5,099 images from two distinct validation labs. The reduction to practice achieved Area Underneath the Receiver Operating Characteristics Curve (AUC) values of 0.93 classifying Melanocytic Suspect specimens on the reference lab, 0.95 on the first validation lab, and 0.82 on the second validation lab. The reduction to practice is capable of automatically sorting and triaging skin specimens with high sensitivity to Melanocytic Suspect cases and that a pathologist would only need between 30% and 60% of the caseload to address all melanoma specimens.

[0075] A. Introduction to the Reduction to Practice

[0076] More than five million diagnoses of skin cancer are made each year in the United States, about 106,000 of which are melanoma of the skin. Diagnosis requires microscopic examination H&E stained, paraffin wax embedded biopsies of skin lesion specimens on glass slides. These slides can be manually observed under a microscope, or digitally on a whole-slide image scanned on specialty hardware. [0077] The five-year survival rate of patients with metastatic malignant melanoma is less than 20%. Melanoma occurs more rarely than several other types of skin cancer, and its diagnosis is challenging, as evidenced by a high discordance rate among pathologists when distinguishing between melanoma and benign melanocytic lesions (-40% discordance rate). The Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis (MPATH-Dx; “MPATH” hereafter) reporting schema was introduced by Piepkorn, et al., The mpath-dx reporting schema for melanocytic proliferations and melanoma, Journal of the American Academy of Dermatology, 70(1): 131 -141 , 2014 to provide a precise and consistent framework for dermatopathologists to grade the severity of melanocytic proliferation in a specimen. MPATH scores are enumerated from I to V, with I denoting a benign melanocytic lesion and V denoting invasive melanoma. It has been shown that discordance rates are related to the MPATH score, with better inter-observer agreement on both ends of the scale than in the middle.

[0078] A tool that allows labs to sort and prioritize melanoma cases in advance of pathologist review could improve turnaround time, allowing pathologists to review cases requiring faster turnaround time early in the day. This is particularly important as shorter turnaround time is correlated with improved overall survival for melanoma patients. It could also alleviate common lab bottlenecks such as referring cases to specialized dermatopathologists, or ordering additional tissue staining beyond the standard H&E. These contributions are especially important as the number of skin biopsies performed per year has skyrocketed, while the number of practicing pathologists has declined.

[0079] The advent of digital pathology has brought the revolution in machine learning and artificial intelligence to bear on a variety of tasks common to pathology labs. Several deep learning algorithms have been introduced to distinguish between different skin cancers and healthy tissue with very high accuracy. See, e.g., De Logu, et al., Recognition of cutaneous melanoma on digitized histopathological slides via artificial intelligence algorithm, Frontiers in Oncology, 10, 2020; Thomas, et al., Interpretable deep learning systems for multi-class segmentation and classification of nonmelanoma skin cancer, Medical Image Analysis, 68:101915, 2021 ; Zormpas- Petridis, et al., Superhistopath: A deep learning pipeline for mapping tumor heterogeneity on low-resolution whole-slide digital histopathology images, Frontiers in Oncology, 10:3052, 2021 ; and Geijs, et al., End-to-end classification on basal-cell carcinoma histopathology whole-slides images, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, February 2021. However, almost all of these studies fail to demonstrate the robustness required for use in a clinical workflow setting because they were tested a on small number (<-1000) of whole-slide images. Moreover, these algorithms are often not capable of triaging whole-slide images, as they use curated training and test datasets that do not represent the diversity of cases encountered in a dermatopathology lab. Many of them rely on pixel-level annotations to train their models, which is slow and expensive to scale to a large dataset with greater variability.

[0080] Considerable advancements have been made towards systems capable of use in clinical practice for prostate cancer. In Campanella, et al., Clinical-grade computational pathology using weakly supervised deep learning on whole-slide images, Nature Medicine, 25(8): 1301 -1309, 2019, the authors trained a model in a weakly-supervised framework that did not require pixel-level annotations to classify prostate cancer and validated on -10,000 whole-slide images sourced from multiple countries. However, some degree of human-in-the-loop curation was performed on their dataset, including manual quality control such as post-hoc removal of slides with pen ink from the study. Pantanowitz, et al, An artificial intelligence algorithm for prostate cancer diagnosis in whole-slide images of core needle biopsies: a blinded clinical validation and deployment study, The Lancet Digital Health, 2(8):e407-e416, 2020 describes using pixel-wise annotations to develop a model trained on -550 whole-slide images that distinguish high-grade from low-grade prostate cancer. In dermatopathology, the model developed in lanni, et al., Tailored for real-world: A whole-slide image classification system validated on uncurated multi-site data emulating the prospective pathology workload, Nature Scientific Reports, 10(1): 1-12, 2020, hereinafter, “lanni 2020”, classified skin lesion specimens between four morphology-based groups, was tested on -13,500 whole-slide images, and also demonstrated that use of confidence thresholding could provide a high accuracy; however, it grouped malignant melanoma with all other benign melanocytic lesions, limiting its potential uses. Additionally, all previous attempts at pathology classification using deep learning have, at their greatest level of abstraction, performed classification at the level of a whole-slide image or a sub-region of a whole-slide image. Because a pathologist is required to review all whole-slide images from a tissue specimen, previous deep learning pathology efforts therefore do not leverage the same visual information that a pathologist would have at hand to perform a diagnosis, require some curation of datasets to ensure that pathology is present in all training slides, and implement ad-hoc rules for combining the predictions of each whole-slide corresponding to a specimen. Most have also neglected the effect of diagnostic discordance on their ground truth, resulting in potentially mislabeled training and testing data.

[0081] Thus, this Section presents a reduction to practice that can classify skin cases for triage and prioritization prior to pathologist review. Unlike previous systems, the reduction to practice performs hierarchical melanocytic specimen classification into low (MPATH l-ll), Intermediate (MPATH III), or High (MPATH IV-V) diagnostic categories, allowing for prioritization of melanoma cases. The reduction to practice was the first to classify skin biopsies at the specimen level through a collection of whole-slide images that represent the entirety of the tissue from a single specimen, e.g., a supra-image. This training procedure is analogous to the process of a dermatopathologist, who reviews the full collection of scanned whole-slide images corresponding to a specimen to make a diagnosis. Finally, the reduction to practice was trained and validated on the largest dataset of consensus-reviewed melanocytic specimens published to date. The reduction to practice was built to be scalable and ready for the real-world, built without any pixel-level annotations, and incorporating the automatic removal of scanning artifacts.

[0082] B. Reference and Validation Lab Data Collection

[0083] The reduction to practice was trained using slides from 3511 specimens (consisting of 7685 whole-slide images) collected from a leading dermatopathology lab in a top academic medical center (Department of Dermatology at University of Florida College of Medicine), which is referred to as the “Reference Lab”. The Reference Lab dataset consisted of both an uninterrupted series of sequentially- accessioned cases (69% of total specimens) and a targeted set, curated to enrich for rarer melanocytic pathologies (31 % of total specimens). Melanocytic specimens were only included in this set if three dermatopathologists’ consensus on diagnosis could be established. The whole-slide images consisted exclusively of H&E-stained, formalin-fixed, paraffin-embedded dermatopathology tissue and were scanned using a 3DHistech P250 High Capacity Slide Scanner at an objective power of 20X, corresponding to 0.24pm/pixel. The final classification given by the reduction to practice was one of six classes, defined by their morphologic characteristics:

[0084] 1. Basaloid: containing abnormal proliferations of basaloid-oval cells, primarily basal cell carcinoma of various types;

[0085] 2. Squamous: containing malignant squamoid epithelial proliferations, consisting primarily of squamous cell carcinoma (invasive and in situ);

[0086] 3. Melanocytic Low Risk: benign to moderately atypical melanocytic nevi/pro I iteration of cells of melanocytic origin, classified as the MPATH I or MPATH II diagnostic category;

[0087] 4. Melanocytic Intermediate Risk: severely atypical melanocytic nevi or melanoma in situ, classified as the MPATH III diagnostic category;

[0088] 5. Melanocytic High Risk: invasive melanoma, classified as the MPATH

IV or V diagnostic category; or

[0089] 6. Other: all skin specimens that do not fit into the above classes, including but not limited to inflammatory conditions and benign proliferations of squamoid epithelial cells.

[0090] The overall reference set was composed of 544 Basaloid, 530

Squamous, 1079 Melanocytic and 1358 Other specimens. Of the Melanocytic specimens, 764 were Low Risk, 213 were Intermediate Risk and 102 were High Risk. The heterogeneity of this reference set is illustrated in Table 1 , below.

Table 1 : Counts of each of the general pathologies in the reference set from the Reference Lab, broken-out into specific diagnostic entities

[0091] The specimen counts presented herein for the melanocytic classes reflect counts following three-way consensus review (see Section IV(C)). For training, validating, and testing the reduction to practice, this dataset was divided into three partitions by sampling at random without replacement with 70% of specimens used for training, and 15% used for each of validation and testing.

[0092] To validate performance and generalizability across labs, scanners, and associated histopathology protocols, several large datasets of similar composition to the Reference Lab were collected from leading dermatopathology labs of two additional top academic medical centers (Jefferson Dermatopathology Center, Department of Dermatology Cutaneous Biology, Thomas Jefferson University, denoted as “Validation Lab 1”, and Department of Pathology and Laboratory Medicine at Cedars-Sinai Medical Center, which is denoted as “Validation Lab 2”). These datasets are both comprised of: (1) an uninterrupted set of sequentially-accessioned cases - 65% for Validation Lab 1 , 24% for Validation Lab, and (2) a set targeted to heavily sample melanoma, pathologic entities that mimic melanoma, and other rare melanocytic specimens. Specimens from Validation Lab 1 consisted of slides from 2795 specimens (3033 whole-slide images), scanned using a 3DHistech P250 High Capacity Slide Scanner at an objective power of 20X (0.24 pm/pixel). Specimens from Validation Lab 2 consisted of slides from 2066 specimens (2066 whole-slide images; each specimen represented by a single whole-slide image), with whole-slide images scanned using a Ventana DP 200 scanner at an objective power of 20X (0.47 pm/pixel). Note: specimen and whole-slide image counts above reflect specimens included in the study after screening melanocytic specimens for inter-pathologist consensus. Table 2 shows the class distribution for the Validation labs.

Table 2: Class counts for the Validation Lab datasets

[0093] C. Consensus Review

[0094] There are high discordance rates in diagnosing melanocytic specimens. Elmore et al. [4] studied 240 dermatopathology cases and found that the consensus rate for MPATH Class II lesions was 25%, for MPATH Class III lesions 40%, and for MPATH Class IV 45%. Therefore, three board-certified pathologists reviewed each melanocytic specimen to establish a reliable ground truth for melanocytic cases in the implementation of the reduction to practice described herein. The first review was the original specimen diagnosis made via glass slide examination under a microscope. Two additional dermatopathologists independently reviewed and rendered a diagnosis digitally for each melanocytic specimen. The patient’s year of birth and gender were provided with each specimen upon review. Melanocytic specimens were considered to have a consensus diagnosis and included in the study if:

[0095] 1 . All three dermatopathologists were in consensus on a diagnostic class for the specimen, or

[0096] 2. Two of three dermatopathologists were in consensus on a diagnostic class for the specimen, and a fourth and fifth pathologist reviewed the specimen digitally and both agreed with the majority classification.

[0097] A diagnosis was rendered in the above fashion for every melanocytic specimen obtained from the Reference Lab and Validation Lab 1. All dysplastic and malignant melanocytic specimens from Validation Lab 2 were reviewed by three dermatopathologists, and only the specimens for which consensus could be established were included in the study. No non-melanocytic specimens were reviewed for concordance due to inherently lower known rates of discordance.

[0098] For the specimens obtained from the Reference Lab, consensus was established for 75% of specimens originally diagnosed as M PATH l/l I, 66% of those diagnosed as MPATH III, 87% of those diagnosed as MPATH IV/V, and for 74% of the reviewed specimens in total. For specimens obtained from Validation Lab 1 , pathologists consensus was established for 84% of specimens originally diagnosed as MPATH l/ll specimens, 51 % of those diagnosed as MPATH III, 54% of those diagnosed as MPATH IV/V, and for 61 % of the reviewed specimens in total.

[0099] D. Reduction to Practice System Architecture

[00100] Fig. 6 is a schematic diagram of the system architecture 600 of an example reduction to practice. The reduction to practice includes three main components: quality control 610, feature extraction 620, and hierarchical classification 630. A brief description of how the reduction to practice was used to classify a novel supra-image follows. Each specimen 602, a supra-image, was first segmented into tissue-containing regions, subdivided into 128x128 pixel tiles by tiling 604, and extracted at an objective power of 10X. Each tile was passed through the quality control 610, which includes ink filtering 612, blur filtering 616, and image adaptation 614. The image-adapted tiles were then passed through the feature extraction 620 stage, including a pretrained ResNet50 network 622, to obtain embedded vectors 624 as components corresponding to the tiles. Next, the embedded vectors 624 were propagated through the hierarchical classification 630 stage, including an upstream neural network 632 performing a binary classification between “Melanocytic Suspect” and “Rest”. Specimens that were classified as “Melanocytic Suspect” were fed into a first downstream neural network 634, which classified between “Melanocytic High Risk, Melanocytic Intermediate Risk” and “Rest”. The remaining specimens were fed into a second downstream “Rest” neural network 636, which classified between “Basaloid, Squamous, Melanocytic Low Risk” and “Other”. This classification process of the reduction to practice is described in detail presently.

[00101] Quality control 610 included ink filtering 612, blur filtering 616, and image adaptation 614. Pen ink is common in labs migrating their workload from glass slides to whole-slide images where the location of possible malignancy was marked. This pen ink represented a biased distractor signal in training the reduction to practice that is highly correlated with malignant or High Risk pathologies. Tiles containing pen ink were identified by a weakly supervised neural network trained to detect inked slides. These tiles were removed from the training and validation data and before inference on the test set. Areas of the image that were out of focus due to scanning errors were also removed to the extent possible by blur filtering 616 by setting a threshold on the variance of the Laplacian over each tile. In order to avoid domain shift between the colors of the training data and validation data, the reduction to practice adopted as its image adaptation 614 the image adaptation procedure in lanni 2020.

[00102] The next component of the reduction to practice, feature extraction 620, extracted informative features from the quality controlled, color-standardized tiles. To capture higher-level features in these tiles, they were propagated through a neural network (ResNet50; He, et al., Deep residual learning for image recognition, arXiv preprint arXiv: 1512.03385, 2015) trained on the ImageNet (Deng, et al., Imagenet: A large-scale hierarchical image database, In IEEE Conference on Computer Vision and Pattern Recognition, pages 248-255, 2009) dataset to embed each input tile into 1024 channel vectors which were then used in subsequent neural networks.

[00103] The hierarchical neural network architecture was developed in order to classify both Melanocytic High and Intermediate Risk specimens with high sensitivity. First, the upstream neural network 632 performed a binary classification between “Melanocytic Suspect” (defined as “High or Intermediate Risk”) and “Basaloid, Squamous, Low Risk”, or “Other” (which are collectively defined as the “Rest” class). Specimens that were classified as “Melanocytic Suspect” were fed into the downstream neural network 634, which further classified the specimen between “Melanocytic High Risk, Melanocytic Intermediate Risk” and “Rest”. The remaining specimens, classified as “Rest”, were fed into a separate downstream neural network 636, which further classified the specimen between “Basaloid, Squamous, Melanocytic Low Risk” and “Other”. Each neural network 632, 634, 636 included four fully- connected layers (two layers of 1024 channels each, followed by two of 512 channels each). Each neuron in the three layers after the input layer was ReLU activated.

[00104] The three neural networks 632, 634, 636 in the hierarchy were trained under a weakly-supervised multiple-instance learning (MIL) paradigm. Each embedded tile was treated as an instance of a bag containing all quality-assured tiles of a specimen. Embedded tiles were aggregated using sigmoid-activated attention heads. To help prevent over-fitting, the training dataset included augmented versions of the tiles. Augmentations were generated with the following augmentation strategies: random variations in brightness, hue, contrast, saturation, (up to a maximum of 15%), Gaussian noise with 0.001 variance, and random 90° image rotations. The upstream binary “Melanocytic Suspect vs. Rest” classification neural network 632 and the downstream “Rest” subclassifier neural network 636 were each trained end-to-end with cross-entropy loss. The “Melanocytic Suspect” subclassifier neural network 634 was also trained with cross-entropy loss, but with a multi-task learning strategy. This subclassifier neural network 634 was presented with three tasks: differentiating “Melanocytic High Risk” from “Melanocytic Intermediate Risk” specimens, “Melanocytic High Risk” from “Rest” specimens, and “Melanocytic Intermediate Risk” from “Rest” specimens. The training loss for this subclassifier neural network 634 was computed for each task, but was masked if it did not relate to the ground truth label of the specimen. Two out of three tasks were trained for any given specimen in a training batch. By training in this manner, the shared network layers were used as a generic representation of melanocytic pathologies, while the task branches learned to attend to specific differences to accomplish their tasks.

[00105] Fig. 7 is a schematic diagram representing a hierarchical classification technique 700 implemented by the reduction to practice of Fig. 6. For example, the hierarchal classification technique 700 may be implemented by hierarchal classification 630 as shown and described above in reference to Fig. 6. Thus, Fig. 7 depicts Melanocytic Suspect Subclassifier 734, corresponding to the first downstream neural network 634 of Fig. 6, and depicts Rest subclassifier 736, corresponding to the second downstream neural network 636 of Fig. 6. During inference, the predicted classes of an input specimen 702 (e.g., a supra-image) were computed as follows: [00106] 1. The larger of the two confidence values 704 (see below for the confidence thresholding procedure) output from the upstream classifier determined which downstream classifier a specimen was passed to.

[00107] 2. If the specimen was handed to the “Rest” subclassifier 736, the highest confidence class probability was used as the predicted label.

[00108] 3. If the specimen was handed to the Melanocytic Suspect subclassifier 734, the highest confidence class probability between the “Melanocytic High Risk vs Rest” and “’Melanocytic Intermediate Risk vs Rest” tasks was used as the predicted label.

[00109] As an additional step in the classification pipeline, the hierarchical classification technique 700 performed classification with uncertainty quantification to establish a confidence score for each prediction using a Monte Carlo dropout method following a similar procedure as used by Gal et al., Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, In International Conference on Machine Learning, pages 1050-1059, 2016. Using the confidence distribution of the specimens in the validation set of the Reference Lab, the hierarchal classification technique 700 computed confidence threshold values for each predicted class following the procedure outlined in lanni 2020 by requiring classifications to meet a predefined a level of accuracy in the validation set. Specimens that were predicted as “Melanocytic High Risk” had to pass two confidence thresholds: an accuracy threshold 712 and a PPV threshold 714 - both established a priori on the validation set to be predicted as “Melanocytic High Risk - in order to be predicted as “Melanocytic High Risk”. Specimens that were predicted to be “Melanocytic High Risk” but failed to meet these thresholds were predicted as “Melanocytic Suspect”. Thresholds that maximized the sensitivity of the reduction to practice to the “Melanocytic Suspect” class were set, while simultaneously maximizing the PPV to the “Melanocytic High Risk” class.

[00110] To evaluate how the reduction to practice generalizes to data from other labs, the neural network trained on data from the Reference Lab to both Validation Lab 1 and Validation Lab 2 was fine tuned. A quantity of 255 specimens were set aside from each validation lab (using an equal class distribution of specimens) as the calibration set, of which 210 specimens were used as the training set, and 45 specimens were used as the validation set for fine tuning the neural networks. (The remaining specimens in the validation lab used as the test set.) The final validation lab metrics presented below are reported on the test set with these calibrated neural networks.

[00111] E. Performance Evaluation

[00112] Fig. 8 depicts Receiver Operating Characteristic (“ROC”) curves 800 for the neural networks implemented by the reduction to practice of Fig. 6. In particular, the ROC curves derived from the Reference Lab test dataset for the hierarchal neural networks 632, 634, 636 of the reduction to practice as shown and described in reference to Fig. 6 are depicted in Fig. 8. Fig 8 depicts such results for the upstream classifier (left column), the High & Melanocytic Intermediate classifier (middle column), and the Basaloid, Squamous, Low Risk Melanocytic & Rest classifier (right column), for the Reference Lab (first row), for Validation Lab 1 , (second row), and for Validation Lab 2 (third row).

[00113] The Area Underneath the ROC Curve (“AUC”) values, calculated with the one-vs-rest scoring scheme, were 0.97, 0.95, 0.87, 0.84, 0.81 , 0.93, and 0.96 for the Basaloid, Squamous, Other, Melanocytic High Risk, Melanocytic Intermediate Risk, Melanocytic Suspect, and Melanocytic Low Risk classes, respectively. Table 3 shows the performance of the reduction to practice with respect to diagnostic entities of clinical interest on the Reference Lab test dataset. In particular, Table 3 shows metrics for selected diagnoses of clinical interest, based on the reference Lab test set, representing the classification performance of the individual diagnoses into their higher-level classes: e.g., a correct classification of “Melanoma” is the prediction “Melanocytic High Risk”. Results are class-weighted according to the relative prevalence in the test set.

Table 3: Metrics for selected diagnoses of clinical interest

[00114] The sensitivity of the reduction to practice to the Melanocytic Suspect class was found to be 0.83, 0.85 for the Melanocytic High and Intermediate risk classes, respectively. The PPV to Melanocytic High Risk was found to be 0.57. The dropout Monte Carlo procedure set the threshold for Melanocytic High Risk classification very high; specimens below this threshold were classified as Melanocytic Suspect, maximizing the sensitivity to this class.

[00115] After fine-tuning all three neural networks in the hierarchy through the calibration procedure in each validation lab, the reduction to practice was able to generalize to unseen data from both validation labs as depicted in Fig. 8. Note that fine-tuning was not performed for any of the neural networks in the pre-processing pipeline (Colorization, Ink Detection or ResNet). The ROC curves derived from the Validation Lab 1 and Validation Lab 2 test datasets are shown in Fig. 8. The AUC values for Validation Lab 1 were 0.95, 0.88, 0.81 ,0.87, 0.87, 0.95, and 0.92 for the Basaloid, Squamous, Other, Melanocytic High Risk, Intermediate Risk, Suspect, and Low Risk classes, respectively and the AUC values for the same classes for Validation Lab 2 were 0.93, 0.92, 0.69, 0.76, 0.75, 0.82, and 0.92.

[00116] F. Consensus Ablation Study

[00117] Fig. 9 depicts a chart 900 comparing reference lab performance on the same test set when trained on consensus and non-consensus data. The melanocytic class referenced in chart 900 is defined as the Low, Intermediate and High Risk classes. The sensitivity of the Melanocytic Intermediate and High Risk classes are defined with respect to the reduction to practice classifying these classes as suspect. The PPV to melanocytic high risk in the non-consensus trained model was 0.33, while the consensus model was 0.57.

[00118] In general, diagnosing melanocytic cases is challenging. Although some specimens (such as ones diagnosed as compound nevi) clearly exhibit very low risk, and others (such as invasive melanoma) exhibit very high risk of progressing into life threatening conditions, reproducible stratification in the middle of the morphological spectrum has historically proved difficult. The results disclosed in this Section were derived with the reduction to practice trained and evaluated on consensus data: data for which the ground truth melanocytic specimen diagnostic categories were agreed upon by multiple experts. To understand the effect of consensus on training deep learning neural networks, an ablation study was performed by training two hierarchical neural networks. Both neural networks used all non-melanocytic specimens available in the training set. The first neural network was trained only including melanocytic specimens for which consensus was obtained under the diagnostic categories of MPATH l/ll, MPATH III, or M PATH IV/V. The other neural network was trained by also including non-consensus data: melanocytic specimens whose diagnostic category was not agreed upon by the experts. To facilitate a fair comparison, validation sets for both neural network versions and a common consensus test set derived from the Reference Lab were reserved. The sensitivities of the reduction to practice to different classes on both consensus and non-consensus data are shown in Fig. 9, where a clear improvement is shown in the sensitivity to the Melanocytic class of over 40% for melanocytic specimens that are annotated with consensus labels over ones that are not; this primarily manifested from a reduction in false positive Melanocytic Suspect classifications.

[00119] G. Discussion

[00120] This document discloses a reduction to practice capable of automatically sorting and triaging skin specimens with high sensitivity to Melanocytic Suspect cases prior to review by a pathologist. By contrast, prior art techniques may provide diagnostically-relevant information on a potential melanoma specimen only after a pathologist has reviewed the specimen and classified it as a Melanocytic Suspect lesion.

[00121] The ability of the reduction to practice to classify suspected melanoma prior to pathologist review could substantially reduce diagnostic turnaround time for melanoma by not only allowing timely review and expediting the ordering of additional tests or stains, but also ensuring that suspected melanoma cases are routed directly to subspecialists. The potential clinical impact of an embodiment with these capabilities is underscored by the fact that early melanoma detection is correlated with improved patient outcomes.

[00122] Fig. 10 depicts a chart 1000 depicting mean and standard deviation sensitivity to melanoma versus percentage reviewed for 1 ,000 simulated sequentially accessioned datasets, drawn from reference lab confidence scores. In particular, chart 1000 depicts mean 1002 and standard deviation sensitivity 1002 to melanoma versus percentage reviewed for 1 ,000 simulated sequentially-accessioned datasets, drawn from Reference Lab confidence scores. In the clinic, 95% of melanoma suspect cases are detected within the first 30% of cases, when ordered by melanoma suspect model confidence.

[00123] As the reduction to practice was optimized to maximize melanoma sensitivity, the performance was investigated as a simple Melanocytic Suspect binary classifier. The reduction to practice may be used to sort a pathologist’s work list of specimens by the reduction to practice’s confidence (in descending order) in the upstream classifier’s suspect melanocytic classification. Fig. 10 demonstrates the resulting sensitivity to the Melanocytic Suspect class against the percentage of total specimens that a pathologist would have to review in this sorting scheme in order to achieve that sensitivity. A pathologist would only need between 30% and 60% of the caseload to address all melanoma specimens according to this dataset.

[00124] Diagnostic classification of melanocytic lesions remains challenging. There is known lack of consensus among pathologists, and a disturbing lack of intrapathologist concordance over time was recently reported. Training with consensus data resulted in improved performance seen in classifications excluding Melanocytic Suspect, which has the highest pathologist discordance rates, as show in in Chart 1000. Because pathologists tend to cautiously diagnose a benign lesion as malignant, the reduction to practice learned the same bias in absence of consensus. By training on consensus of multiple dermatopathologists, the reduction to practice may have the unique ability to learn a more consistent feature representation of melanoma and aid in flagging misdiagnosis. While the reduction to practice is highly sensitive to melanoma (84% correctly detected as Intermediate or High Risk in the Reference Lab Test set) there are a large number of false positives (2.7% of sequentially-accessioned specimens in the reference lab were predicted to be suspect) classified as suspect. It may therefore be possible to flag initial diagnoses discordant with the reduction to practice’s classification of highly confident predictions for review in order to lower the false positive rate.

[00125] The reduction to practice also enables other automated pathology workflows in addition to triage and prioritization of suspected melanoma cases. Sorting and triaging specimens into other classifications such as Basaloid could allow the majority of less complicated cases (such as basal cell carcinoma) to be directly assigned to general pathologists, or to dermatologists who routinely sign out such cases. Relevant to any system designed for clinical use is how well its performance generalizes to sites on which the system was not trained. Performance of the reduction to practice on the Validation Labs after calibration (as shown in Fig. 10) was in many cases close to that of the Reference Lab.

[00126] Some further aspects are defined in the following clauses:

[00127] Clause 1 : A computer-implemented method of classifying a novel supraimage as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks, the method comprising: receiving the novel supra-image; providing the novel supra-image to the electronic neural network that has been trained using a training dataset comprising at least one supraimage, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image comprising a plurality of images, each image corresponding to a plurality of components, wherein the training dataset provides at least one batch of components, wherein the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, wherein the electronic neural network comprises a plurality of taskspecific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch comprising a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, wherein each task-specific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, wherein the overall loss function comprises a task-specific loss function for each task of the plurality of binary pathological classification tasks, wherein task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, wherein the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

[00128] Clause 2: The method of Clause 1 , wherein the plurality of binary pathological classification tasks comprises: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk.

[00129] Clause 3: The method of Clause 1 or Clause 2, wherein the plurality of binary pathological classification tasks comprises: atypical vs. benign, atypical vs. malignant, and benign vs. malignant. [00130] Clause 4: The method of any of Clauses 1-3, wherein the plurality of binary pathological classification tasks comprises: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score.

[00131] Clause 5: The method of any of Clauses 1-4, wherein the plurality of binary pathological classification tasks comprises: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification.

[00132] Clause 6: The method of any of Clauses 1-5, wherein the plurality of binary pathological classification tasks comprises: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis.

[00133] Clause 7: The method of any of Clauses 1-6, wherein the plurality of binary pathological classification tasks comprises: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response.

[00134] Clause 8: The method of any of Clauses 1-7, wherein the plurality of pathological classes consist of a number c of pathological classes, and wherein the multiple pathological tasks consist of a number c(c-1 )/2 of binary classification tasks.

[00135] Clause 9: The method of any of Clauses 1-8, wherein each component comprises a feature vector.

[00136] Clause 10: The method of any of Clauses 1-9, wherein the plurality of pathological classes comprises a plurality of dermatopathological classes.

[00137] Clause 11 : The method of any of Clauses 1-10, wherein the training dataset provides a plurality of batches of components, and wherein the method further comprises repeating the forward propagating and the back propagating for another batch of components of the plurality of batches of components.

[00138] Clause 12: The method of any of Clauses 1-11 , wherein the electronic neural network comprises at least one layer for a shared data representation of components. [00139] Clause 13: The method of any of Clauses 1-12, wherein each supraimage represents a biopsy.

[00140] Clause 14: The method of any of Clauses 1-13, wherein each image comprises a whole-slide image.

[00141] Clause 15: The method of any of Clauses 1-8 or 10-14, wherein each component comprises a 128-pixel-by-128-pixel square.

[00142] Clause 16: A system for classifying a novel supra-image as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks, the system comprising: a processor; and a memory communicatively coupled to the processor, the memory storing instructions which, when executed on the processor, perform operations comprising: receiving the novel supra-image; providing the novel supra-image to the electronic neural network that has been trained using a training dataset comprising at least one supra-image, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image comprising a plurality of images, each image corresponding to a plurality of components, wherein the training dataset provides at least one batch of components, wherein the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, wherein the electronic neural network comprises a plurality of taskspecific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch comprising a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, wherein each task-specific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, wherein the overall loss function comprises a task-specific loss function for each task of the plurality of binary pathological classification tasks, wherein task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, wherein the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

[00143] Clause 17: The system of Clause 16, wherein the plurality of binary pathological classification tasks comprise: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk.

[00144] Clause 18: The system of Clause 16 or Clause 17, wherein the plurality of binary pathological classification tasks comprises: atypical vs. benign, atypical vs. malignant, and benign vs. malignant.

[00145] Clause 19: The system of any of Clauses 16-18, wherein the plurality of binary pathological classification tasks comprises: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score.

[00146] Clause 20: The system of any of Clauses 16-19, wherein the plurality of binary pathological classification tasks comprises: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification.

[00147] Clause 21 : The system of any of Clauses 16-20, wherein the plurality of binary pathological classification tasks comprises: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis.

[00148] Clause 22: The system of any of Clauses 16-21 , wherein the plurality of binary pathological classification tasks comprises: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response.

[00149] Clause 23: The system of any of Clauses 16-22, wherein the plurality of pathological classes consist of a number c of pathological classes, and wherein the multiple pathological tasks consist of a number c(c-1 )/2 of binary classification tasks. [00150] Clause 24: The system of any of Clauses 16-23, wherein each component comprises a feature vector.

[00151] Clause 25: The system of any of Clauses 16-24, wherein the plurality of pathological classes comprises a plurality of dermatopathological classes.

[00152] Clause 26: The system of any of Clauses 16-25, wherein the training dataset provides a plurality of batches of components, and wherein the training further comprises repeating the forward propagating and the back propagating for another batch of components of the plurality of batches of components.

[00153] Clause 27: The system of any of Clauses 16-26, wherein the electronic neural network comprises at least one layer for a shared data representation of components.

[00154] Clause 28: The system of any of Clauses 16-27, wherein each supraimage represents a biopsy.

[00155] Clause 29: The system of any of Clauses 16-28, wherein each image comprises a whole-slide image.

[00156] Clause 30: The system of any of Clauses 16-23 or 25-30, wherein each component comprises a 128-pixel-by-128-pixel square.

[00157] Clause 31 : A method of training an electronic neural network to perform a plurality of binary pathological classification tasks for classifying a novel supra-image as one of a plurality of pathological classes, the method comprising: obtaining a training dataset comprising at least one supra-image, wherein each supra-image is associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image comprising at least one image, each image corresponding to a plurality of components, wherein the training dataset provides at least one batch of components; forward propagating the at least one batch of components, and their respective supra-image labels, through the electronic neural network, wherein the electronic neural network comprises a plurality of task-specific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch comprising a plurality of respective taskspecific layers, at least one respective aggregation of instances layer, and at least one respective output layer, wherein each task-specific branch is configured to produce, for a given input batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, wherein the overall loss function comprises a task-specific loss function for each task of the plurality of binary pathological classification tasks, wherein taskspecific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, wherein the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image.

[00158] Clause 32: The method of Clause 31 , wherein the plurality of binary pathological classification tasks comprise: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk.

[00159] Clause 33: The method of Clause 31 or Clause 32, wherein the plurality of binary pathological classification tasks comprise: atypical vs. benign, atypical vs. malignant, and benign vs. malignant.

[00160] Clause 34: The method of any of Clauses 31-33, wherein the plurality of binary pathological classification tasks comprise: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score.

[00161] Clause 35: The method of any of Clauses 31-34, wherein the plurality of binary pathological classification tasks comprise: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification.

[00162] Clause 36: The method of any of Clauses 31-35, wherein the plurality of binary pathological classification tasks comprise: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis.

[00163] Clause 37: The method of any of Clauses 31-36, wherein the plurality of binary pathological classification tasks comprise: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response.

[00164] Clause 38: The method of any of Clauses 31-37, wherein the training dataset provides a plurality of batches of components, and wherein the method further comprises repeating the forward propagating and the back propagating for another batch of components of the plurality of batches of components.

[00165] Clause 39: The method of any of Clauses 31-38, wherein the plurality of pathological classes consist of a number c of pathological classes, and wherein the multiple pathological tasks consist of a number c(c-1 )/2 of binary classification tasks.

[00166] Clause 40: The method of any of Clauses 31-39, wherein the electronic neural network comprises at least one layer for a shared data representation of components.

[00167] Clause 41 : The method of any of Clauses 31-40, wherein each supraimage represents a biopsy.

[00168] Clause 42: The method of any of Clauses 31-41 , wherein each image comprises a whole-slide image.

[00169] Clause 43: The method of any of Clauses 31-42, wherein each component comprises a 128-pixel-by-128-pixel square.

[00170] Clause 44: The method of any of Clauses 31-42, wherein each component comprises a feature vector.

[00171] Clause 45: The method of any of Clauses 31-44, wherein the plurality of pathological classes comprise a plurality of dermatopathological classes.

[00172] Clause 46: Computer readable storage comprising a representation of an electronic neural network produced by operations of any of Clauses 31-45.

[00173] Clause 47: An electronic computer comprising at least one electronic processor communicatively coupled to electronic persistent memory comprising instructions that, when executed by the at least one processor, configure the at least one processor to perform operations of any of Clauses 1-15 or 31-45.

[00174] Clause 48: At least one non-transitory computer readable storage medium comprising instructions that, when executed by at least one electronic processor, configure the at least one processor to perform operations of any of Clauses 1-15 or 31-45. [00175] Certain embodiments can be performed using a computer program or set of programs. The computer programs can exist in a variety of forms both active and inactive. For example, the computer programs can exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats; firmware program(s), or hardware description language (HDL) files. Any of the above can be embodied on a transitory or non-transitory computer readable medium, which include storage devices and signals, in compressed or uncompressed form. Exemplary computer readable storage devices include conventional computer system RAM (random access memory), ROM (read-only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM), and magnetic or optical disks or tapes.

[00176] While the invention has been described with reference to the exemplary embodiments thereof, those skilled in the art will be able to make various modifications to the described embodiments without departing from the true spirit and scope. The terms and descriptions used herein are set forth by way of illustration only and are not meant as limitations. In particular, although the method has been described by examples, the steps of the method can be performed in a different order than illustrated or simultaneously. Those skilled in the art will recognize that these and other variations are possible within the spirit and scope as defined in the following claims and their equivalents.

Claims

What is claimed is:

1 . A computer-implemented method of classifying a novel supra-image as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks, the method comprising: receiving the novel supra-image; providing the novel supra-image to the electronic neural network that has been trained using a training dataset comprising at least one supra-image, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image comprising a plurality of images, each image corresponding to a plurality of components, wherein the training dataset provides at least one batch of components, wherein the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, wherein the electronic neural network comprises a plurality of task-specific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch comprising a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, wherein each taskspecific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, wherein the overall loss function comprises a task-specific loss function for each task of the plurality of binary pathological classification tasks, wherein task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, wherein the electronic neural network is configured to provide an

43 output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

2. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk.

3. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: atypical vs. benign, atypical vs. malignant, and benign vs. malignant.

4. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score.

5. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification.

6. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis.

7. The method of claim 1 , wherein the plurality of binary pathological classification tasks comprises: a first drug response versus a second drug response,

44 the second drug response versus a third drug response, and the first drug response versus the third drug response.

8. The method of claim 1 , wherein the plurality of pathological classes consist of a number c of pathological classes, and wherein the multiple pathological tasks consist of a number c(c-1)/2 of binary classification tasks.

9. The method of claim 1 , wherein each component comprises a feature vector.

10. The method of claim 1 , wherein the plurality of pathological classes comprises a plurality of dermatopathological classes.

11. A system for classifying a novel supra-image as one of a plurality of pathological classes using an electronic neural network to perform a plurality of binary classification tasks, the system comprising: a processor; and a memory communicatively coupled to the processor, the memory storing instructions which, when executed on the processor, perform operations comprising: receiving the novel supra-image; providing the novel supra-image to the electronic neural network that has been trained using a training dataset comprising at least one supra- image, each supra-image associated with a respective supra-image label indicating a pathological class of the plurality of pathological classes, each supra-image comprising a plurality of images, each image corresponding to a plurality of components, wherein the training dataset provides at least one batch of components, wherein the electronic neural network has been trained by: forward propagating the at least one batch of components, and their respective labels, through the electronic neural network, wherein the electronic neural network comprises a plurality of task-specific branches, one task-specific branch corresponding to each of the binary pathological classification tasks, each task-specific branch comprising

45 a plurality of respective task-specific layers, at least one respective aggregation of instances layer, and at least one respective output layer, wherein each task-specific branch is configured to produce, for a given batch of components, an estimated pathological class of the plurality of pathological classes; back propagating the at least one batch of components with respect to an overall loss function to obtain revised weights for the electronic neural network, wherein the overall loss function comprises a task-specific loss function for each task of the plurality of binary pathological classification tasks, wherein task-specific loss functions for respective tasks are masked for batches of components having labels that do not involve the respective tasks; and updating weights of the electronic neural network based on the revised weights, wherein the electronic neural network is configured to provide an output pathological class of the plurality of pathological classes in response to inputting the novel supra-image; receiving from the electronic neural network an output indicative of one of the plurality of pathological classes for novel supra-image; and providing the output.

12. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: melanocytic high risk versus melanocytic medium risk, melanocytic medium risk versus melanocytic low risk, and melanocytic low risk versus melanocytic high risk.

13. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: atypical vs. benign, atypical vs. malignant, and benign vs. malignant.

14. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: a first Gleason score versus a second Gleason score, the second Gleason score versus a third Gleason score, and the third Gleason score versus the first Gleason score.

15. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: a first survival quantification versus a second survival quantification, the second survival quantification versus a third survival quantification, and the first survival quantification versus the third survival quantification.

16. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: a first prognosis versus a second prognosis, the second prognosis versus a third prognosis, and the first prognosis versus the third prognosis.

17. The system of claim 11 , wherein the plurality of binary pathological classification tasks comprises: a first drug response versus a second drug response, the second drug response versus a third drug response, and the first drug response versus the third drug response.

18. The system of claim 11 , wherein the plurality of pathological classes consist of a number c of pathological classes, and wherein the multiple pathological tasks consist of a number c(c-1)/2 of binary classification tasks.

19. The system of claim 11 , wherein each component comprises a feature vector.

20. The system of claim 11 , wherein the plurality of pathological classes comprises a plurality of dermatopathological classes.