WO2026000088A1

WO2026000088A1 - Scalable vector cages: vector-to-pixel metadata transfer for defect alignment

Info

Publication number: WO2026000088A1
Application number: PCT/CA2025/050907
Authority: WO
Inventors: Vinitha Rajagopal MUTHU; Roisul Islam RUMI; Abhishek CHANDAR; Shamisa KASPOUR; Stephen SOUTHIN; Robert Mackenzie BEGGS; Thorsteinn Hjortur JONSSON
Original assignee: Discovery Loft Inc
Current assignee: Discovery Loft Inc
Priority date: 2024-06-28
Filing date: 2025-06-27
Publication date: 2026-01-02
Anticipated expiration: 2026-12-28

Abstract

According to an aspect, there are provided systems and methods for classifying segments of an object. The method includes receiving captured images, aligning the captured images onto a cage for image alignment to identify segments of the object in the captured images, detecting one or more defects in the captured images, combining the aligned captured images and the detected one or more defects to generate data representative of the one or more defects, appending the data representative of the one or more defects to the cage. Storing data representative of the one or more defects to the cage may provide more computationally efficient storage and processing.

Description

SCALABLE VECTOR CAGES: VECTOR-TO-PIXEL METADATA TRANSFER FOR DEFECT ALIGNMENT

CROSS-REFERENCE

[0001] The present application claims priority to US provisional patent application no. 63/665,992, titled “SCALABLE VECTOR CAGES: VECTOR-TO-PIXEL METADATA TRANSFER FOR DEFECT ALIGNMENT”, filed on 28 June 2024, the contents of which are incorporated herein by reference.

FIELD

[0002] The present disclosure generally relates to the field of computing platforms, artificial intelligence, computer vision, and image processing. In particular, this disclosure relates to systems and methods of processing images to segment an object into its constituent parts.

INTRODUCTION

[0003] Object part classification is traditionally carried out using image-based methods that rely on pixel-level segmentation techniques. These methods are often limited by the complexity of the object, as well as the quality of the image captured. Object complexity, image quality, object variability, and human bias are all factors that can affect the accuracy of part segmentation. Factors such as the number of parts being identified, and their size and shape substantially increase the complexity of performing object part classification. Furthermore, challenges arise in classifying parts where the parts have defects.

[0004] There is a need for improvement in the processes used to segment objects from captured images, or at least alternatives, particularly to reduce bias in identifying small object parts.

SUMMARY

[0005] Embodiments of the invention described herein relate to object part classification utilizing Scalable Vector Cages (SVC) (also referred to as vector graphics or scalable vector graphics (SVG)). In particular, embodiments described herein relate to using known defects of an object to aid in the object part classification. An example approach utilizes vector-based outlines to identify an object's components accurately, providing a one-shot learning method for mapping components based on their locations in the SVC to the pixels of a captured image. By labelling (manually or automatically) each component within the SVCs, the outlined information can be transferred to a machine model that can manipulate and overlay this information onto a captured image of the same object type. Additionally, known defects of the object can be used as additional information in the SVC to aid in image alignment. This can allow for the precise classification of all the components and known defects in the object which may further be usable for other applications. Furthermore, SVCs are designed to be in a format that is efficient to render as part of human interactive web interfaces.

[0006] Object part classification can be carried out using image-based methods that may rely solely on pixel-level segmentation or 3D mesh based techniques. However, these methods are computationally expensive and can be limited by the complexity of the object, as well as the quality of the image captured. In contrast, embodiments described herein provide an SVC approach that utilizes vector-based outlines that provide more accurate and precise identification of an object's components, which can provide efficient and accurate classification processes within human interactive interfaces. Embodiments described herein enable the SVC approach to perform efficiently with more complex objects and retain a scalability that is better than systems which utilize 3D models, such as 3D point clouds, 3D meshes or 3D Voxel Grids.

[0007] An advantage of the SVC approach is the combination of metadata in a vector format with a pixel format, which enables more accurate and efficient object recognition and classification. Information related to any known defects can also be appended to the SVC to aid in subsequent object recognition and classification. This approach can provide highly accurate traced outlines of an object and its defined parts, which can be easily transferred to a machinelearning model. Any such model can then be trained to recognize these outlines and classify objects more accurately, which can improve the overall efficiency and reliability of an inspection process.

[0008] Systems and methods described herein can be advantageous in repeated object inspection (e.g., inspecting a vehicle before and after a renter has used it), where other physical inspection methods can be time-consuming, costly, and potentially hazardous. Photo-based assessments can offer several advantages over conventional methods, such as time and cost savings, convenience, safety, improved accuracy, and data storage and sharing. However, determining part segmentation or identifying the boundaries between different parts, components, or defects of an object in a photo can be a significant challenge. [0009] Object complexity, image quality, object variability, and human bias are all factors that can affect the accuracy of part segmentation. Machine learning approaches can be used to address these challenges, but they can be limited by training data availability, object variability, limited scope, ambiguity, and interpretability. Factors such as the number of different object parts, their sizes and shapes, can increase the complexity of the machine learning process so that it requires annotated data which can be expensive to collect.

[0010] Some embodiments of the systems and methods described herein aim to overcome these limitations by employing advanced machine learning techniques that can achieve accurate and reliable part segmentation to provide improvements to photo-based object inspection particularly for objects with known defects which are repeatedly being inspected. These techniques apply to imagery data in a 2D pixel format or a 3D mesh format and their results can be enhanced from being run in conjunction with the SVC approach, which enables one-shot learning for object part classification.

[0011] Applications of the present systems and methods can include damage monitoring and inspection for online/digital vehicle marketplaces, damage inspection for vehicle rental, manufacturing, robotics, medical imaging, autonomous vehicles, augmented reality, etc. Applications of the present systems and methods may include any object type made up of parts, including, but not limited to merchandise, industrial equipment, aircraft parts and components, construction equipment and machinery, medical equipment and devices, electronic devices and components, furniture and fixtures, agricultural machinery and equipment, marine vessels and components, manufacturing machinery and equipment, power generation and distribution equipment, scientific instruments, equipment, etc.

[0012] According to an aspect, there is provided a system for classifying segments of an object. The system comprising a server having non-transitory computer readable storage medium with executable instructions for causing one or more processors to process captured images using a cage for image alignment, the cage defining segments of the object, align the captured images onto the cage to identify segments of the object in the captured images, detect one or more defects in the segments of the object in the captured images, and append data representative of the one or more defects to the cage.

[0013] According to an aspect, there is provided a system for classifying segments of an object. The system comprising a server having non-transitory computer readable storage medium with executable instructions for causing one or more processors to process captured images using a cage for image alignment, the cage defining segments of the object and one or more defects, align the captured images onto the cage to identify segments of the object and the one or more defects in the captured images, and detect one or more physical conditions of the segments of the object and the one or more defects in the captured images.

[0014] According to an aspect, there is provided a method for classifying segments of an object. The method comprising processing captured images using a cage for image alignment, the cage defining segments of the object, aligning the captured images onto the cage to identify segments of the object in the captured images, detecting one or more defects in the segments of the object in the captured images, and appending data representative of the one or more defects to the cage.

[0015] According to an aspect, there is provided a method for classifying segments of an object. The method comprising processing captured images using a cage for image alignment, the cage defining segments of the object and one or more defects, aligning the captured images onto the cage to identify segments of the object and the one or more defects in the captured images, and detecting one or more physical conditions of the segments of the object and the one or more defects in the captured images.

[0016] According to an aspect, there is provided a system for classifying segments of an object. The system includes a server having non-transitory computer readable storage medium with executable instructions for causing one or more processors to: receive captured images; align the captured images onto a cage for image alignment to identify segments of the object in the captured images, the cage defining segments of the object; detect one or more defects in the captured images; combine the aligned captured images and the detected one or more defects to generate data representative of the one or more defects; and append the data representative of the one or more defects to the cage.

[0017] In some embodiments, the one or more defects are characterized with a multi-modal data structure comprising linguistic data and visual data.

[0018] In some embodiments, detecting the one or more defects comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

[0019] In some embodiments, the object comprises a vehicle. [0020] In some embodiments, the cage comprises a 3D model.

[0021] In some embodiments, the cage includes a graphic script in a domain specific language.

[0022] In some embodiments, the server is further configured to compute cost data for repair of the one or more defects.

[0023] In some embodiments, the processor is configured to align the captured images onto the cage by: generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask, wherein the homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask, wherein the cage and the semantic mask are compared as two binary arrays of a same shape.

[0024] In some embodiments, the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection.

[0025] In some embodiments, the data representative of the one or more defects is appended to the cage on first observation of the one or more defects.

[0026] In some embodiments, the one or more defects are detected by: dividing the captured images into one or more object grids; processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

[0027] In some embodiments, the processor is further configured to: receive a second set of captured images; align the second set of captured images onto the cage to identify segments of the object in the captured images; detect one or more physical conditions of the one or more defects in the captured images; and combine the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

[0028] In some embodiments, aligning the captured images comprises aligning the one or more defects. [0029] In some embodiments, the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

[0030] In some embodiments, the one or more defects is removed from the cage when the system fails to identify the one or more defects.

[0031] In some embodiments, the one or more one or more physical conditions of the one or more defects are detected by: dividing the captured images into one or more object grids and one or more defect grids; processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

[0032] In some embodiments, receiving the captured images: receiving a captured image and the cage; processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image; determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0033] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

[0034] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0035] According to an aspect, there is provided a method for classifying segments of an object. The method including: receiving captured images; aligning the captured images onto a cage for image alignment to identify segments of the object in the captured images, the cage defining segments of the object; detecting one or more defects in the captured images; combining the aligned captured images and the detected one or more defects to generate data representative of the one or more defects; and appending the data representative of the one or more defects to the cage.

[0036] In some embodiments, the one or more defects are characterized with a multi-modal data structure comprising linguistic data and visual data.

[0037] In some embodiments, detecting the one or more defects comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

[0038] In some embodiments, the object comprises a vehicle.

[0039] In some embodiments, the cage comprises a 3D model.

[0040] In some embodiments, the cage comprises a graphic script in a domain specific language.

[0041] In some embodiments, the method further comprises computing cost data for repair of the one or more defects.

[0042] In some embodiments, aligning the captured images onto the cage comprises: generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask, wherein the homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask, wherein the cage and the semantic mask are compared as two binary arrays of a same shape.

[0043] In some embodiments, the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection. [0044] In some embodiments, the data representative of the one or more defects is appended to the cage on first observation of the one or more defects.

[0045] In some embodiments, the one or more defects are detected by: dividing the captured images into one or more object grids; processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

[0046] In some embodiments, the method further includes: receiving a second set of captured images; aligning the second set of captured images onto the cage to identify segments of the object in the captured images; detecting one or more physical conditions of the one or more defects in the captured images; and combining the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

[0047] In some embodiments, aligning the captured images comprises aligning the one or more defects.

[0048] In some embodiments, the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

[0049] In some embodiments, the one or more defects is removed from the cage when the system fails to identify the one or more defects.

[0050] In some embodiments, the one or more one or more physical conditions of the one or more defects are detected by: dividing the captured images into one or more object grids and one or more defect grids; processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

[0051] In some embodiments, receiving the captured images includes: receiving a captured image and the cage; processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image; determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0052] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

[0053] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0054] According to an aspect, there is provided a method of accepting or rejecting an image. The method comprising: receiving a captured image and a cage, processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image, determining presence or absence consensus between outputs of each of the plurality of neural networks, and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0055] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks. Each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition. The quaternary classification classifies an image based on 1) whether the associated rejection condition is met, 2) whether the image should be accepted, 3) whether another expert neural network’s associated rejection condition is met, and 4) the perspective of the image.

[0056] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0057] According to an aspect there is provided a non-transitory computer recordable storage medium having stored therein computer executable program code, which when executed by a processor, causes the processor to carry out methods described herein.

DESCRIPTION OF THE FIGURES

[0058] In the figures, embodiments are illustrated by way of example. It is to be expressly understood that the description and figures are only for the purpose of illustration and as an aid to understanding.

[0059] Embodiments will now be described, by way of example only, with reference to the attached figures, wherein in the figures:

[0060] FIG. 1 illustrates a block schematic diagram of an example system for object segmentation and defect identification, according to some embodiments.

[0061] FIG. 2A illustrates a process diagram of a method 200 of detecting defects in segments of an object and appending them to a cage, according to some embodiments.

[0062] FIG. 2B illustrates another process diagram of a method 210 of detecting the physical condition of segments of an object and previously identified defects, according to some embodiments.

[0063] FIG. 3 illustrates different approaches to object segmentation, according to some embodiments. [0064] FIG. 4 illustrates a process for aligning a captured image with a cage, according to some embodiments.

[0065] FIG. 5 illustrates an example interface of a front view of an object cage with example parts and damage shown, according to some embodiments.

[0066] FIG. 6 illustrates a process diagram of a method 600 of comparing a defect identified in previous inspection to its state in a subsequent inspection, according to some embodiments.

[0067] FIG. 7 illustrates a schematic diagram of computing device, according to some embodiments.

DETAILED DESCRIPTION

[0068] The Scalable Vector Cages (SVC) approach can be a versatile technology with numerous potential applications. It can be used to recognize object parts accurately, making it particularly useful for situations where objects can be represented as a collection of distinct features. Once the parts of the object have been identified, then other processes (e.g., fault/damage detection/condition monitoring) may be carried out for use in a variety of applications.

[0069] The systems and methods described herein utilize, for example, SVC. In some embodiments, this approach using SVCs may require only a single labelled example of the object type (e.g., the vehicle model) to segment the object. This approach may provide more efficient segmentation of an object (e.g., a vehicle) into its component parts. This approach may better be used to identify the constituent parts of an object. The approach may be useful to identify defects in the constituent parts. Once identified, information describing the defect can be appended to the SVC (e.g., as an additional component of the SVC, as an exception for existing components of the SVC, etc.). This transforms the defect into an aspect of the object that can be used to help more accurately identify and segment the object in the future. This may be particularly useful in circumstances where an object may be repeatedly inspected (e.g., vehicle rental). This can also help ensure that the defect is incorporated into the segmentation process rather than carrying out a potentially computationally taxing defect identification process on a known defect every time the object is inspected. This may also render functionality associated with the defect more consistent by looking for changes to the defect in repeated inspections rather than requiring de novo defect identification each time the object is inspected. [0070] In some embodiments, the systems and methods described herein are directed to an approach that may require only a single labeled example of the object type (e.g., the vehicle model) to segment the object. This labelled example can take the form of a cage for each object type, a SVC specification which is an established standard in the industry of web-accessible graphics. The labelled example can include defect information specific to a particular object to generate a bespoke SVC for that object. The approach described herein can produce high-quality graphs of individual types and models of objects (such as vehicles). The cages can automatically code with a labeling tool in the form of SVC as opposed to pixel arrays.

[0071] These cages can be stored as fixed models of segments (parts). The defects or damage may be stored as an additional segment, a modification to an existing segment, or in some other manner that modifies the typical cage for the object type. In some embodiments, estimates of repair of damage or impact on fair market value can be generated by determining which parts are damaged and how much it may cost to repair or replace. In some embodiments, the system may estimate this based on the replacement cost of the part. In some embodiments, the system may be configured to identify and factor in damage which may be repaired without replacing the part (e.g., minor dents in a vehicle). Upon identification of a new defect or damage in an object, the system may be configured to update the cage associated with that object to include the new defect or damage and use the defect or damage as part of the segmentation and alignment process for subsequent inspections.

[0072] Embodiments described herein can help, for example, the rental item marketplace to better standardize inspections before and after a renter has used the item. The systems and methods described herein can aide in image capture standardization, object (e.g., vehicle) segmentation, and object assessment.

[0073] It can be difficult for users to assess the quality of the vehicle based on images captured by different users in different lighting conditions, at different angles, and with varying cameras of quality.

[0074] A critical step in inspecting an object is first classifying parts, especially those that may be impacted by damage or are afflicted with other deviations. For example, when inspecting the exterior of a vehicle, it is important to identify which part of the vehicle is damaged, such as the front bumper, rear bumper, or side panel. This information can be essential because the repair cost and impact on the fair market value of the vehicle can vary depending on the impacted part. Furthermore, in repeated inspections, it can be critical to ensure that known damage is consistently identified in subsequent inspections. This may better track accountability for damage repair and may provide more consistent estimates for damage repair values.

[0075] Embodiments described herein can provide systems and methods for classifying segments of an object by processing images (e.g., images of the object or portions thereof) using one or more cages for image alignment. A cage can be a file that defines segments for an object. The cage may also include information associated with known damage or defects. A cage can be a graphics script that is written in a domain specific language. This domain specific language can be a specialized computer language designed for the particular application domain and tailored to the specific context (e.g., vehicle condition). The domain specific language can be a key to databases. The script can define individual object components and label the components according to their condition. The script can define individual parts of an object (e.g., a vehicle) and label the parts by their current or estimated condition.

[0076] The cage or script can be used to render in two dimensions (2D) or three dimensions (3D), for example. The cage can define a 2D or 3D model of the object and the segments of the object. Segments of an object can be parts of an object. An object can be partitioned into multiple segments. A cage can categorize different areas within an image (e.g., an image of the object) and map the areas to segments of the object. There can be different cage views that correspond to different perspectives or fields of view of the object. A cage can be used to located segments of the object in an image. The cage may further include information associated with known defects/damage to the vehicle to use the defects/damage to segment the vehicle.

[0077] An example cage is a SVC. An SVC can be a file in vector image format. An SVC can be a graphic script in a domain specific language. An SVC can be a file that uses formulas to define shapes, borders, colors, and so on. In some embodiments, defects or damage may be stored as their own formulas that define shapes, borders, colors, and so on. In some embodiments, defects or damage may be stored as modification to formulas for existing components. In some embodiments, defect or damage may be stored in another way which flags an exception to a typical cage.

[0078] In some embodiments, the defect or damage is detected to be likely in a segment of the object (e.g., vehicle). In such cases, the defect or damage detection model may use underlying states to determine the likelihood that there is damage on the object (e.g., vehicle). In such embodiments, when the presence of damage is sufficiently likely, then the system may save one or more underlying states along with the cage. The underlying states may be discrete or continuous. Some example states can include, but is not limited to, the type of damage (e.g., scratch, dent, etc.), the severity of the defect or damage, the penetration of the defect or damage (e.g., whether is penetrates the paint), size, and location. The location may in part be determined by aligning the cage with the images and determining the impacted segments of the object based on where within the cage the defect or damage presents.

[0079] Saving, for example, a full video or image of the object (e.g., vehicle) may use a substantial amount of memory. By using the SVC to represent objects, the system can reuse precise SVCs for a large number of objects of the same type (e.g., only one SVC is needed for each new make, model, and other variation of a vehicle despite thousands or tens of thousands of said vehicle being included in the fleet). By using underlying defect or damage states (e.g., type, severity, penetration, location, and size), then the defects or damage in objects can be tracked in a manner that takes up less memory than a image of the defect or damage itself.

[0080] In some embodiments, the system may also save an image of the defect or damage for future use. Such images may be stored in a server and used for, for example, future audits and reports. However, describing defects or damage using underlying defect or damage states may still reduce the amount of data that needs to be transmitted to, for example, an agent device in the field, or a user device. This may minimize data usage or assist in providing fulsome reports in conditions of unreliable internet or other network access.

[0081] Systems and methods described herein can use cages (e.g., SVCs) to align images taken of an object onto a bespoke representation of that object which includes any known defects or damage (i.e., a cage). This can then be used to segment the image based on the presegmented components of the SVCs and the defects/damage. The SVCs can also contain language specific information relating to individual object parts and their current condition (e.g., a hierarchical naming scheme for that segment, type of the object, name of the object, hash that relates the object to the image and the capture session).

[0082] In some embodiments, SVCs can be, for example, 2D graphic scripts. The systems and methods described herein can use these graphic scripts to generate a 2D representation of the object. The graphics scrips can include code that provides a specific way of mapping the cage components onto the image. For example, the outlines of segments may each be associated with a parametric curve where the parameters may be coefficients of polynomials or other smooth functions (e.g., basic curves, Bezier curve, etc.) or a sequence of same. These curves can describe the shape of the segment. The SVC may include some or all relevant information about each of the components (e.g., name, type, history, etc.). These Bezier curves can be efficiently run aligned to map onto a captured image. As an example, the curve may define a closed shape with an interior and the mapping process may look for all of one pixel colour (e.g., white) on the interior and all of another pixel colour (e.g., black) to be outside the closed shape. As each segment may have a Bezier curve associated with it which may further include additional information associated with that segment (e.g., a hierarchical naming scheme for that segment, colour of the segment, hierarchical colour of the segment, type of the object, name of the object, hash that relates the object to the image and the capture session). In some embodiments, damage or defects may be stored as their own smooth functions (e.g., Bezier curves). In some embodiments, damage or defects may be stored as modifications to the smooth functions (e.g., Bezier curves) of existing object components. In some embodiments, damage or defects may be stored in a manner using a different identification schema.

[0083] In operation, the SVCs can be moved to the Graphics Processing Unit (GPU) memory wherein it can be manipulated there (e.g., alignment) to map it onto the captured images. Once its on the GPU memory there are special instructions to do multiple cases and multiple instructions simultaneously. The GPU may be configured to handle such manipulations more efficiently than the Computer Processing Unit (CPU). The GPU may be able to carry out these manipulation on a plurality of SVCs and for a plurality of captured images (e.g., different views) in parallel. Results of such manipulations can be carried over to CPU memory and used to efficiently adjust graphics transformations which render the cage onto an image or a web interface. As such, the Scalable Vector Cages represent a powerful tool both for computational efficiency and for segment-by- segment information tracking.

[0084] This process can be, for example, carried out using a GPU and can be manipulated quickly in response to calculated results using a CPU. SVCs can contain information about the 3D configuration of an object as well as the individual segments that make the object up (e.g., in the form of the Bezier curves). When running the graphics script, the system can generate a 2D representation of the 3D information based on the alignment of the object. This 2D representation can then be compared to the object (e.g., a semantic mask). By generating a 2D representation of the object before comparison, the system does not need to track the 3D information about the object (e.g., locations and orientations of components) that are not visible in the image and can run comparisons on a more computationally efficient basis for different views of the object.

[0085] Embodiments described herein can involve semantic masks and semantic segmentation models. A semantic mask can be a result of an equivariant transformation of the image that defines semantic classifications (e.g., object component identifiers). The output is also an image that preserves the bitmap structure (e.g., aligning with the initial image). Semantic segmentation models can be parametric machine learning models that learn to produce semantic masks. In some embodiments, the system further comprises one or more machine learning models that use the images and cages to produce output indicating conditions of components of the object.

[0086] Furthermore, this process can also provide the technical benefit of efficiently segmenting the object into its constituent parts by comparing a pre-segmented cage and defining the segments in the captured image based on the overlay and alignment with the cage. Segmenting an object in this manner may provide a more efficient way of identifying and reporting defects or damage to the object, and storing information in the context of data transfer protocols. Storing the defects or damage with the cage may also provide a more computationally efficient means of tracking the defects or damage in subsequent inspections.

[0087] In addition, some methods for damage identification rely on a classification of individual pixels in coloured images based on the outputs of a neural network and corresponding pixel-wise classification of the imagery data. This kind of algorithm can be referred to as a segmentation algorithm. Based on the information being captured from the imagery data, this segmentation algorithm can treat the classification of pixel-arrays in different ways. Different segmentation algorithms include "segmentation", "semantic segmentation", "panoptic segmentation", "instance segmentation", and “binary segmentation”. These approaches may have challenges in providing a detailed classification of small components and in identifying rare components. These approaches may also have challenges in providing detailed classification of unique damage or defects in an object component. In order to alleviate these challenges, a very large, labelled dataset may be required, involving detailed pixel-level annotations that can be very cumbersome to generate.

[0088] SVCs can offer a solution for photo-based inspections, particularly in the context of online marketplaces, Human-in-the-Loop (HITL) systems, and digital transactions. Its use of advanced machine learning techniques and its standardized representation of the object type may allow for more accurate and efficient inspection and assessment of an object's quality and value. In some embodiments, the system further comprises one or more machine learning models that use the images and cages to identify and assess conditions of components of the object.

[0089] Systems and methods described herein can facilitate an online platform for the transfer of objects (particularly large objects made of many components such as vehicles). In particular, the systems and methods described herein can improve image capture, alignment, segmentation, and assessment. Such improvements can facilitate online platforms or apply to other applications.

[0090] For example, the SVCs approach can be used in the inspection of cars from photos. By identifying and classifying the various components of a vehicle in an image, such as the tires, wheels, doors, windows, and body panels, inspectors or machine learning models can use the SVCs approach to quickly identify and assess any damage or issues with specific components of the vehicle. The damage or issues may then be used to update the system to include information associated with the damage or issues with the SVC. Pre-existing damage or issues can be used to identify and assess the vehicle to determine if the pre-existing damage or other issues have worsened. An SVC can define segments of the object, such as parts of a vehicle. For example, an SVC can be a script to define individual components of a vehicle and label the components according to their condition by processing images of the vehicle.

[0091] The SVCs approach could also be helpful in other inspection scenarios, such as prepurchase inspections, insurance claims assessments, or accident damage assessments. By providing a more precise and detailed understanding of the various components of the object being inspected, the SVCs approach may help improve the accuracy and efficiency of inspections.

[0092] The SVCs approach can use 2D images of objects. SVCs can also use other representations (e.g., to create 3D models from 2D images) in other embodiments. For example, one possible approach to creating 3D models using Vector Cages could involve using the 2D outlines to create 3D model of the object. In some embodiments, the 3D model may be further refined and improved based on additional images and data.

[0093] Described herein are systems and methods for object part segmentation. These systems and methods can implement a database of numerous unique models (via SVC or cages) with, for example, numerous distinct exterior parts. These systems and methods can be capable of producing granular part descriptions. [0094] Individual cages can be crafted to outline and define each part's exact shape, position, and classification using a cage creation tool. These object cages serve as a reference point that machine learning processes may use in addition to any available pixel arrays. By considering the pixel arrays that have been captured by a user and the outputs of any machine learning algorithms, the systems and methods described herein may identify the settings of the camera view that are needed to accurately align the SVCs with outputs of machine learning algorithms. This can result in a robust machine learning algorithm for segmenting parts and effectively localizing any damage or defects. The damage or defect can then be added to the cage associated with that object for subsequent inspections. The systems and methods described herein can, for example, detect and classify more than one unique part in any given image, regardless of the capture device or conditions. This approach may be significantly more data efficient thereby eliminating challenges that come with existing approaches.

[0095] One example use is for vehicles. In such an example, the object type may initially be based on the vehicle’s make and model. SVCs may be highly adaptive to minor body style differences between specific vehicle models in different regions, making them versatile for a wide range of applications. This capability is particularly crucial when using SVCs for, for example, particular vehicle models that may have different body styles in other markets. For example, a 2012 Toyota Corolla manufactured for the North American market may have slightly different body styling if made for Germany, with more "European" style features. The systems and methods described herein can be configured to adjust the SVCs to align with these differences, ensuring accurate and effective use of the technology across diverse markets and regions. When damage or defects are detected in the vehicle, the cage may be updated with this information. The bespoke cage may be associated with the vehicle through a unique identifier such as its VI N or license plate. This may ensure that the defects or damage are taken into account upon subsequent inspection.

[0096] Continuing with the vehicle example application, the systems and methods described herein may provide enhanced accuracy and reliability in identifying and positioning, for example, every one of the 255 unique exterior vehicle parts, including small items like sensors on bumpers and mirrors. SVCs may provide a new standard in vehicle inspection technology.

[0097] Furthermore, systems and methods making use of SVCs may be able to perform one- shot learning. For example, the system may be configured to carry out identifying and/or mapping an SVC onto an image. Such a system may be capable of receiving the information for a new SVC (e.g., for a new vehicle type) and efficiently mapping and/or identifying that new object in a captured image without ever having been trained with that object type. This makes the use of SVCs powerful for objects wherein there may be new object types added wherein the system will not be able to be trained with them or can not provide results of sufficient quality with the limited labeled data available (e.g., cars of different makes and models and with minor variations).

[0098] In some embodiments, the systems and methods described herein can be used for rental marketplaces. Users may be able to put many different object types up for rent (e.g., merchandise, mass produced products, supplies, etc.). The user may be prompted or otherwise permitted to use the object segmentation process described herein to analyze the object. In these embodiments, it may first be necessary to determine whether the object has a corresponding prebuilt Scalable Vector Cage stored in the platform. In some embodiments, the user may, for example, look the object type up in a database to determine whether a corresponding SVC exists. In some embodiments, the system may, for example, determine whether a corresponding SVC exists by conducting a preliminary analysis of a captured image of the object. Other methods of retrieving a corresponding SVC are conceived. In some embodiments, where no SVC exists, the system may be configured to capture images and/or other information about the object in order to generate a SVC and label the parts therein. Once a corresponding SVC is retrieved, the platform may align the SVC with any captured images of the object to segment the object into its parts. The platform may also be configured to assess the object based on damage, deviations, other irregularities. The platform may be able to assign the damage, deviations, or irregularities to parts of the object. The platform may be configured to append information about the damage, deviations, or irregularities with the cage such that subsequent inspections of the object will take the damage, deviations, or irregularities into account.

[0099] In some embodiments, the systems and methods described herein can be used as part of the inspection process for, for example, critical infrastructure (e.g., roadways, bridges, energy infrastructure, dams, buildings, etc.). For example, infrastructure which is difficult to visually inspect (e.g., because it is dangerous or difficult to physically visit, because it is remote, etc.) may benefit from a system configured to assess the infrastructure using remote or autonomous inspection devices (e.g., unmanned aerial vehicles equipped with image or video capture devices). In these systems, the remote or autonomous vehicles may be configured to retrieve images or video of the relevant infrastructure. In some embodiments, the images retrieved will be images from predefined locations at predefined angles. In some embodiments, the images can be retrieved at any location or angle. In some embodiments, video is captured and a full view of the infrastructure can be produced from the video (e.g., by stitching frames of the video together to generate a full image or by generating a 3D model of the infrastructure). In some embodiments, the images captured may be of an exterior, interior, or of specific internal components of the infrastructure. In these implementations, the images (or video) captured by the remote or autonomous devices can be compared to a SVC corresponding to the infrastructure to segment the infrastructure into its component parts. In some embodiments, the SVC corresponding to the infrastructure may be a general SVC (e.g., where the infrastructure spec is highly standardized). In some embodiments, the SVCs corresponding to the infrastructure may be tailor-made to the specific infrastructure asset (e.g., where the infrastructure is unique). The tailor-made SVCs may be generated manually or automatically based on the infrastructure asset’s plans or during its life (e.g., after it is built). In some embodiments, the SVCs may be generated in part based on general features and in part be tailor-made (e.g., default SVCs of component parts such as valves or piping may be combined with tailor-made SVCs of other parts such as specific floorplans into a full SVCs corresponding to the full infrastructure asset). The images can further be assessed for damage, defects, irregularities, wear, or other concerning aspects. The damage, defects, irregularities, wear, or other concerning aspects can be mapped to parts of the infrastructure using the full SVC. The parts to which, for example, damage is observed may further inform applicable next steps. For example, weathering observed on an exterior wall of a dam may be within safety standards, but the same weathering observed on a water intake pipe may prompt a further inspection or reconditioning or replacement of the piping. The output of such an implementation may be used to, for example, generate a status report for the infrastructure, provide recommendations to preserve the integrity of the infrastructure, automatically prompt further action (e.g., place work orders for replacement parts or repair work), or other actions. The damage, defects, or irregularities may be appended to the cage such that subsequent inspections of the infrastructure asset take the damage, deviations, or irregularities into account and can provide reports on the progression of the noted damage, defects, or irregularities.

[0100] The SVCs approach may also be applied in manufacturing (e.g., to classify the components of products on assembly lines, ensuring that each part may be correctly identified, positioned and/or free of defects), robotics (e.g., to help robots accurately identify and manipulate objects and parts thereof, enabling them to possibly perform tasks with greater precision and accuracy), medical imaging (e.g., to identify and track specific parts of the human body, which may aid in the diagnosis and treatment of medical conditions), autonomous vehicles (e.g., to identify and respond to objects on the road, which may improve safety and reduce accidents), and/or augmented reality applications (e.g., to place virtual objects within real-world environments, which may create more realistic and immersive experiences).

[0101] In some embodiments, the SVCs approach may be used to image and segment parts of the human body. Such applications may assist in the accurate and expedient diagnosis and thus treatment of diseases or other conditions. In such medical embodiments, the patient may be scanned with a variety of imaging modalities (e.g., visible light, X-ray, ultrasound, nuclear imaging, etc.). In some embodiments, the SVC may be generated for each patient based on their biographical details (e.g., height, weight, known medical issues, etc.). In some embodiments, the SVC may be based on past images taken of the same patient (which may be particularly useful to track new issues as they emerge). A pre-generated SVC may be configured to modify with the patient as they change (e.g., a child’s SVC may updated based on expected growth of the child since their previous visit). In some embodiments, the SVC is based on specific parts of the patient (e.g., a cage of a specific organ or body part to further assess and elucidate any disease or damage to the sub-parts). The medical system may be configured to compare a patient’s SVC to the images captured of the patient to better segment any observed anomalies (that may be indicative of a disease or condition) onto specific parts of the patient. This may be useful to accurately assess damage or defects in very small parts of certain organs or other patient structures which may be difficult to analyze using other pixel-wise segmentation methods. This may further be useful to track the condition of a patient to ascertain, for example, if they are ameliorating or not. The output of such a system can include, for example, a report of the patient’s condition, intelligent recommendations for treatment (or management), or automatic next steps (e.g., automatically ordering further tests). The damage or defects may appended to the cage such that subsequent analysis of the patient take the damage or defects into account and can provide reports on the progression of the noted damage or defects. This may also better focus subsequent analysis of images of the patient by focusing on locations where the damage or defects were previously noted.

[0102] Applications of the present systems and methods may include inspection of any object type made up of parts, including, but not limited to merchandise, industrial equipment, aircraft parts and components, construction equipment and machinery, medical equipment and devices, electronic devices and components, furniture and fixtures, agricultural machinery and equipment, marine vessels and components, manufacturing machinery and equipment, power generation and distribution equipment, scientific instruments, equipment, etc. [0103] Example System Implementation

[0104] FIG. 1 illustrates a block schematic diagram of an example system 100 for object segmentation and defect identification, according to some embodiments.

[0105] The system 100 can be used to align images captured by a user 10 with a cage. This cage can be used to segment the captured image into its constituent components. These segmented components may further be used to ascertain and/or localize any damage or defects existing in the constituent components and append the damage or defects into the cage.

[0106] The system 100 can include an I/O Unit 102, a processor 104, communication interface 106, and data storage 108. The processor 104 can execute instructions in memory 110 to implement aspects of processes described herein. The processor 104 can execute instructions in memory 110 to configure an alignment engine 120, a recognition engine 122, application programming interface (API) 124, and other functions described herein. The system 100 may be software (e.g., code segments compiled into machine code), hardware, embedded firmware, or a combination of software and hardware, according to various embodiments. The system 100 may further comprise other components within its data storage 108 such as databases 112, to provide stored data to memory 110, and persistent storage 114.

[0107] The system 100 can implement a task manager process to divide the data computation process into different tasks to be dispatched to different processors. For each of the captured images, the system 100 may have defined specific tasks to be performed for each view of the images, which varies depending on the content of the specific images collected. The tasks can be, for example, dispatched to different agents (if agents are part of the system processes) simultaneously for an efficient data process.

[0108] The I/O unit 102 can enable the system 100 to interconnect with one or more input devices, such as a keyboard, mouse, camera, touch screen and a microphone, and/or with one or more output devices such as a display screen and a speaker.

[0109] The processor 104 can be, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), a reconfigurable processor, or any combination thereof.

[0110] The communication interface 106 can enable the system 100 to communicate with other components, to exchange data with other components, to access and connect to network resources, to serve applications, and perform other computing applications by connecting to a network 140 (or multiple networks) capable of carrying data including the Internet, Ethernet, plain old telephone service (POTS) line, public switch telephone network (PSTN), integrated services digital network (ISDN), digital subscriber line (DSL), coaxial cable, fiber optics, satellite, mobile, wireless (e.g. Wi-Fi, WiMAX), SS7 signaling network, fixed line, local area network, wide area network, and others, including any combination of these.

[0111] The data storage 108 may be configured to store information associated with or created by the system 100, such as for example image data, cage data, metadata, object metrics, cost data, valuation reports and so on. Data storage device 108 can include memory 110, databases 112 (e.g., cage database), and persistent storage 114. The data storage 108 can implement databases 112, for example. Storage 108 and/or persistent storage 114 may be provided using various types of storage technologies, such as solid state drives, hard disk drives, flash memory, and may be stored in various formats, such as relational databases, non-relational databases, flat files, spreadsheets, extended markup files, and so on.

[0112] Memory 110 may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically- erasable programmable read-only memory (EEPROM), Ferroelectric RAM (FRAM) or the like.

[0113] The system 100 can have an alignment engine 120 to align the captured images with a cage. The alignment engine 120 may determine which cage type to use based on the captured images, metadata, inputs from the user, a serial or identification number (e.g., a vehicle identification number or VIN or a license plate), system configuration, or another source. The cage defines segments of an object of that type. The alignment engine 120 can use numerous methods to align the captured image such as by using Image Registration. Image Registration is the process of transforming different data sets into one coordinate system to be compared and analyzed. In aligning two images, Image Registration can find the geometric transformation that maps one embodiment onto the other. This transformation may include translation, rotation, scaling, and/or distortion correction. One method of Image Registration includes generating a semantic mask from the captured images using semantic segmentation models and optimizing for a best homography matrix (defining a composition or rotations, translations, scaling and/or distortion correction to align the cage with the semantic mask) between the cage and the semantic mask. The homography matrix may also be referred to as a perspective transformation in the context of 3D modeling, or as the camera matrix transformation, or as an affine transformation in the context of 2D models (images). Upon alignment, the captured images can have their segments (e.g., parts) defined based on segments defined in the cage. The alignment engine may make use of a GPU to carry out some or all of the alignment procedures. Variables from the I/O Unit 102 may initially be put into a CPU memory where they may be formatted in such a way that it can be efficiently concatenated with other requests for GPU processing and subsequently loaded into the GPU for processing. The GPU can take on a plurality of requests and run the computations in parallel. The output may modify information the in CPU memory based on the calculation that was done in GPU memory. Exploitation of the differing processing units can be referred to as heterogenous computing.

[0114] In an example implementation the cage and the semantic mask can be compared as two binary arrays of a same shape. Each element in the array represents a pixel in the image. The element's value indicates whether that pixel is part of the object represented by the cage or the semantic mask. The binary arrays are compared to align the cage with the semantic mask, and a homography matrix is optimized to minimize the difference between them. This process involves finding the best rotation, translation, scaling, and/or distortion correction combination to align the cage with the semantic mask. The result of this process is a transformed cage that is aligned with the semantic mask. The alignment can be assessed by comparing the transformed cage with the original semantic mask and evaluating the difference between them. Other comparative methods could be used to align two images, such as Cross-Correlation or Feature Matching. Cross-Correlation involves calculating the similarity between two images by sliding one over the other and comparing the pixel values at each position. Feature Matching involves identifying distinctive features in each image and then matching those features to find corresponding points in both pictures.

[0115] In some embodiments, a minimal shape of the object (e.g., a rectangle for vehicles) may be compared to the semantic mask in place of the cage. This can be useful where the cage is not known initially (e.g., where the system also needs to determine which cage to apply). Using the minimal shape can help establish the homography matrix that can then be applied to cages when testing cage match to determine the best cage fit. A second alignment step can further be performed when determining which cage best matches the semantic mask. This second alignment step can seek to compute the optimal alignment using second order gradient descent methods that consider the slope of the gradient descent optimization process. This process may provide good convergence properties which that can be run with multiple cages simultaneously.

[0116] In some embodiments, the alignment may make use of hierarchical alignment. For example, the captured image may initially have the shape of a full object detected (e.g., identifying the vehicle as opposed to the background) and one or more cages may be aligned against that shape. This may give a rudimentary initial alignment and a means of excluding cages that cannot match at this rudimentary level. The captured image may then have segments of a certain type identified (e.g., broad categories of segments which, for a vehicle, may include the body, the parts (e.g., mirror, headlights, etc.), windows, tires, and wheels) and the cage or cages may be aligned at this level. Finally the individual segments of the cages may be mapped to the captured image to determine which is the best cage or to fine-tune the alignment (e.g., determining how the graphics should be modified to overlay the cage). This process may provide a computationally efficient method of reaching alignment based on a captured image. The alignment technique the minimal cost associated with using the cage instead of the mask (defined in terms of, for example, the Kullback-Leibler divergence, the negative likelihood score or the mean squared error) may be sought. Where identifying the best matching cage, then the individual cost scores for each cage can be determined and the cage with the lowest score selected.

[0117] The system 100 can have a recognition engine 122 to, for example, process the captured images and/or metadata to compute object metrics. The recognition engine 122 processes the images that are captured by the object capture modules 132. Images are preliminarily analyzed by determining whether the image has, for example, appropriate brightness, clarity, and/or other factors (e.g., cleanliness of an interior) to confirm that the images can be fully processed for defect detection. If the images cannot be processed this is communicated to the user through interfaces of the object capture module 132 along with further instructions. If the user is unable to rectify the issue due to surroundings, time-of-day or weather the system 100 is able to schedule a notification for another date and time to complete the appraisal. In some embodiments, images by the recognition engine 122 are then processed to identify defects or damage in the captured images. In some embodiments, the system 100 uses the recognition engine 122 for analyzing the captured images to determine, for example, vehicle metrics such as the exterior and interior color, odometer reading, any detected paint chips, scratches, defects and dents listing the affected body parts and the severity of damage. The recognition engine 122 can be configured to recognize other aspects or features of the parts of an object (e.g., other deviations from a standard or ideal model). The recognition engine may also make use of heterogenous computing.

[0118] The recognition engine 122 can use the aligned cages to identify different locations or components of the object shown in the images. The damage shown in the image is then linked to the respective location or component. These sections are then inspected by the system 100 to detect any defects or issues. Defects or issues that are detected for the first time can be added to the cage associated with that object such that they are used for alignment in subsequent inspections.

[0119] In some embodiments related to vehicles, the system 100 can also use recognition engine 122 to process image data to detect what the remaining tread life is on the tires, any chips and cracks on any glass parts, condition and type of wheels/rims and any illuminated OBD-2 instrument lights any noises that may indicated engine, exhaust, loose belts or any other audibly detectable mechanical issues. The object capture modules 132 can be used to define metadata that can be processed along with the captured image data to compute object metrics.

[0120] For vehicular examples, images containing the vehicle’s VI N are processed and the detected barcode and data is sent to an integrated VI N decoder to gather the vehicle metrics like year, make, model and trim. Images of the tire tread can be analyzed to detect the depth of the tread, any signs of weathering and tire type. The images of the instrument cluster are processed to recognize characters to capture the odometer reading and if the mileage is measured in kilometers or miles. The recognition engine 122 can implement or integrate with different tasks that can be dispatched to different processors (or agents should they be used in a system implementation), for example. The tasks can be linked to different images or to different segments.

[0121] Continuing with the vehicle example, the alignment engine 120 and/or the recognition engine 122 can access a repository of cages for vehicles to identify different components of the vehicle. There can be a different cage for each type of vehicle (e.g., make and model). The type of vehicle can be linked to a VIN, for example. The recognition engine 122 can compute metrics to assess damage on the vehicle shown in an image relative to a location or component of the vehicle. The cage can be used to define the location or component. The cage can provide a reference for the location or component so that it can be aggregated with other reference or location data in a common way (via the cage definition). For example, an image may indicate damage on the back left corner of the vehicle body. The cage as aligned with the captured images of a vehicle can combined with the output of the recognition engine 122 to determine that an image shows damage to a specific part of the vehicle by cross-referencing the location of recognized damage to the location in the aligned cage. The cage data can thus provide an anatomy of the vehicle. The image (showing the damage) can be routed to an improved interface application 130. The system can append information associated with damage to the cage and store the bespoke cage for future inspections. In some embodiments, cages are mapped on to the captured images to identify the best matching cage if no bespoke cage is presently found.

[0122] In some embodiments of system 100, the alignment engine 120 and the recognition engine 122 can work in parallel. For example, the system 100 may work simultaneously to align the cage with the captured image using the alignment engine 120 and detect damage or defects using the recognition engine 122. These results may, when completed, be combined and the damaged segments identified by cross-referencing the identified damaged in the captured image with the segment location in the aligned cage. Detected damage may be appended to the cage associated with the object for subsequent inspection.

[0123] The system 100 can be operable to register and authenticate users (using a login, unique identifier, and password for example) prior to providing access to applications, a local network, network resources, other networks and network security devices. The system 100 can connect to different machines, and/or data sources.

[0124] The system 100 configures an interface application 130 with an object capture module 132 to capture images of an object and/or metadata for the captured images. The system 100 may configure an interface application 130 with visual elements to guide a user through an image capture process using object capture module 132, based on, for example, specified angles and views. In some embodiments, the system 100 may have other methods to capture images (e.g., taken at fixed times to identify possible objects in a field of view or taken as objects emerge from an assembly line). In some embodiments, the capture module may be configured to capture images by pulling frames from a feed of video. In some embodiments, the images may be captured in some other way from the video feed. In some embodiments, alternative imaging methods (e.g., infrared, LIV, ultrasonic, etc.) may be used to capture images of the object based on other imaging modalities. [0125] The system 100 can have an application programming interface (API) 124 to integrate with other systems for data exchange.

[0126] The system 100 may optionally have a cost estimation tool to process object metrics to detect new defects of the object and compute cost data for repair of the new defects. The system 100 may optionally estimate cost data for repairs of defects based on the progression of the defect since last inspection (e.g., determine the cost of damage associated with how much the damage has worsened since the last inspection). This may be based on the type, severity, location, size, and penetration of the defect or damage. This may further be based on the segment of the object that has the defect or damage (e.g., based on its material, paint, replacement cost, etc.) depending on the remedial action. The system 100 may have a valuation tool to compute a market value estimate for the object using the object metrics and the cost data. The system 100 can implement different machine learning methods for determining the condition, estimated reconditioning costs and estimated market value, for example, by participating as an autonomous third-party validation source between the seller and the buyer.

[0127] In some embodiments, system 100 has an SVC Repository in data storage 108. The system 100 uses cages to identify each component of the vehicle and link each type of damage to respective components. The cage may also include information associated with known damage or defects of that object. In some embodiments, the cage data can be used to generate an interactive guide to aide the user 10 in capturing images of the object. In some embodiments, the system 100 has a cage repository manager. The system 100 can generate an interface for the cage repository manager. The interface can list all cages in data storage 108 along with attributes or properties of the cages. Example attributes for vehicle cages include type, make, model, year, body, trim, code, status, unique identifiers.

[0128] In some embodiments, the interface application 130 dynamically configures for a user 10 as part of an image capture process or an image review process. In some embodiments, the interface application 130 dynamically configures the object capture module 132 based on an object type. The object type can be linked to one or more cages maintained by the system 100. The interface application 130 can compute the object type, for example, by capturing or scanning an identification or serial number from a barcode (for example, scanning the VIN). In such embodiments, the system 100 may query, for example, the database 112 to see if that specific vehicle has a bespoke cage associated with that object (e.g., one which includes damage or defect information) and use the bespoke cage if found or generate a new cage based on the vehicle type if a bespoke cage is not found. In some embodiments, the object capture module 132 generates visual elements corresponding to an object shape to facilitate the image capture process. Once the object type is identified the system 100 can retrieve the relevant cage based on which the view is captured. The system 100 can use the cage to generate a silhouette as an SVG file (as an illustrative example) that is used by the user interface process to guide the user on the required alignment needed to frame the object. The overlay can be based on cage data. The cage data can be used to render the shapes on the interface to help the image capture process.

[0129] The visual elements can be overlays that can be aligned with the object to capture an image of the object. In some embodiments, the object capture module 132 generates visual elements corresponding to different parts of the object to generate the metadata. The metadata can indicate a location on the image and a corresponding descriptor label. The system 100 links this metadata to different locations of the object using the cage. This provides the metadata required to know the shape and location ordinance of the entire anatomy of the object that is visible within the image. For example, for an object, the metadata can be used to label parts of the image of the object or to indicate defects or marks.

[0130] Image Capture Details

[0131] In the foregoing example, the images of the object were captured via an object capture module 132 as part of interface application 130. It is to be understood that the systems and methods described herein are technologically agnostic to image source and many different image capture methods and modalities may be used. The image capture module 132 need not be part of an interface application 130. In some embodiments, images may be captured automatically by remote or autonomous systems. In some embodiments, images may comprise video feed. In some embodiments, images may comprise other imaging modalities such as LIV detection, ultrasound, infrared, nuclear imaging, etc. In some embodiments, a user may upload images originating from a camera external to the system into the system and the system may be configured to segment and analyze those images. In such embodiments, the system may be configured to conduct a preliminary review of the images to ensure they meet any quality and/or angle requirements (that may exist). In some embodiments the images may be captured using multiple cameras (e.g., cameras configured to capture an object at different angles, cameras positioned along an object’s path to capture images as an object travels, etc.). In such embodiments, further systems and algorithms may be configured to correlate images captured from multiple cameras to the same object instance and/or to correlate the images to each other. Other image sources, imaging modalities, and imaging capture methods are conceived.

[0132] Example Method

[0133] FIG. 2A illustrates a process diagram of a method 200 of detecting defects in segments of an object and appending them to a cage, according to some embodiments.

[0134] The method comprises the steps of Capturing Images of the Object (block 202), Aligning the Images with a Cage (block 204), Detecting Defects in the Segments (block 206), and Appending Defects to the Cage (block 208).

[0135] In operation, the method 200 may begin, for example, with a user 10 capturing images of the object (e.g., a vehicle) (step 202) using, for example, an object capture module 132. The interface application 130 may provide guidance to the user as to how to capture the images (e.g., guidance on angles and brightness). The interface application 130 may provide the user guidance in the form of overlays on the interface application 130 to assist the user 10 in aligning the object with their object capture module 132. The interface application 130 may be configured to determine the object type (e.g., vehicle make and model) before capturing images (e.g., input from the user 10, scanning an identification or serial number such as a VIN, etc.) orwhile capturing images (e.g., analyzing the images to determine object type). In such embodiments, the interface application 130 may be configured to provide an overlay on the interface application 130 to guide the user in aligning the images based on the cage for that object type. In embodiments where defect or damage has been appended to the SVC, then the system 100 may generate additional guidance for the user so that the user captures images specifically of the defect or damage. In such embodiments, a direct image (e.g., an anchor view) of the defect or damage may be requested. In some embodiments, two support views (e.g., at 45° from the angle of the anchor view) may also be requested. In such embodiments, the interface application 130 may provide overlays specific to component with the defect or damage to assist the user in orienting the object capture module 132 or even may provide overlays based on the defect or damage itself (e.g., as it previously appeared). In some embodiments, the cage may be the closest cage type to the captured object where a matching cage type cannot be found and/or identified. In some embodiments, a given image can be optimized with multiple different cages in parallel within GPU memory, based on a best matching cage from a set of candidate cages. In some embodiments, the process of identifying a best matching cage can be done with multiple images in parallel. In some embodiments, a plurality of images (e.g., from different perspectives) can be run against a plurality of cages in parallel to determine the best overall matching cage for all the images. These embodiments enable the system to retain a scalability that may be better than systems which utilize 3D models, such as 3D point clouds, 3D meshes or 3D Voxel Grids. In some embodiments, the cage may comprise a series of 2D views and the user 10 is directed to capture images corresponding to each of the views in turn. In some embodiments, the system is configured to detect which 2D view the user 10 is about to capture an image of and automatically show the relevant 2D view overlay. In some embodiments, the cage may comprise a 3D model and orient the model to overlay with the object at any angle (though guidance may still be provided to ensure the user captures a sufficient number of varied images). Other methods of capturing images of an object are conceived (e.g., capturing images of a field of view to detect objects and components thereof in the field of view, capturing images of objects as they emerge from an assembly line). In some embodiments, the method 200 can work in a drive-through environment for vehicles (e.g., an enclosed area with cameras situated at desired locations to collect the required images as a user drives into the area).

[0136] The system may then align the images with a cage (step 204) using, for example, the alignment engine 120. The system may use outputs of computer vision models to ‘see’ the object in the captured images. If not already done so (e.g., as part of step 202), then the system may also retrieve the appropriate cage for the object type. It may retrieve the cage type appropriate for the object type based on a user input, a scanned identification or serial number (e.g., a VI N), the captured images, or some other method and compare those outputs to the cage for that object type. In some embodiments, the system may analyze the captured images to determine the appropriate cage. In some embodiments where the appropriate cage cannot be determined, the system may then select a cage that is similar to the captured object or it may generate a new cage (e.g., based on the captured images and/or other cages in its data storage). In some embodiments, the system may use semantic segmentation models to generate a mask of the object in the captured images. The system may then optimize a homography matrix (defining the composition of rotations, translations, and scaling required to align the cage with the semantic mask) by, for example, comparing the cage and the mask as two binary arrays of the same shape. In some embodiments, other computer vision models may be used to generate the computer vision output. Such models may include Object Detection Models (which can identify the presence and location of objects within an image, and could be useful for identifying the object type and its components within the captured images), Instance Segmentation Models (which can not only identify the presence and location of objects within an image, but also differentiate between multiple instances of the same object and could be useful for identifying multiple objects of the same type within the captured images), Depth Estimation Models (which can estimate the depth of objects within an image and could be useful for determining the appropriate cage size for the object). In some embodiments, other methods of comparing the output from the computer vision model and the cage may be used. Once the captured images have been aligned with the cage, the captured images can be segmented using the segments defined by the cage. These segments can correspond to a variety of parts that make up the object (e.g., side door, tire, windshield, steering wheel, engine and subdivisions thereof, etc.). In some embodiments, the interface application 130 may be configured to carry out the operation of the alignment engine 120. In some embodiments another component carries out the operation of the alignment engine 120.

[0137] The system may then detect defects or damage in the segments of the captured images (step 206) using, for example, recognition engine 122. The system may analyze the captured images to determine whether there is damage or defects. This may be carried out over the whole image and then mapped to segments depending on where the damage is found in the captured image (and how that location maps to the cage) before, in parallel, or after step 204 or it may be carried out on a segment-by-segment process or on targeted parts (i.e. , after step 204). In some embodiments, the damage may be detected and the aligned cage may be used to identify the segment on which the damage exists. In some embodiments, user 10 or someone else assesses captured images to identify defects or damage. In some embodiments, damage or defects are detected using computer vision methods. For example, in some embodiments, the system may compare the captured images to the cage to ascertain whether there are deviations consistent with damage or defects. This comparison can be between the whole image or carried out on a segment-by-segment basis. Other methods of defect detection are also conceived. In some embodiments, the interface application 130 may be configured to carry out the operation of the recognition engine 122. In some embodiments another component carries out the operation of the recognition engine 122. In some embodiments, the system may be configured to detect deviations which are not damage.

[0138] In some embodiments, the defect or damage detection can be carried out by tiling the image into a grid of approximately equal cell sizes and processing those cells using a machinelearning computer vision model. In such embodiments, the model may not be as effective in detecting damage which occurs along the borders of the cells. Accordingly, an additional grid (or more) may be used which also tiles the image into a different grid offset (or otherwise different from) from the first grid. The offset (or other difference) can ensure that border of the first grid is more centrally captured in the cells of the second grid. This can effectively reduce the difficulty the model may have in detecting damage along borders.

[0139] Using a grid, the image can be divided into cells which can themselves be analyzed by the computer vision model. By performing the analysis on the cells as opposed to the full captured image, the model is able to detect damage with a higher resolution which can enable the system to detect damage or defects which are may be too small to see in the larger image and can enable the system to provide a clearer assessment of the nature of the damage (e.g., type, severity, etc.) which can improve the accuracy of the damage or defect analysis and subsequent cost and/or valuation estimations.

[0140] In some embodiments, the grid can be configured independent of the SVC. For example, the grid may be a square grid with approximately equal cell sizes used to divide the image. In some embodiments, the grid can be configured with the SVC. For example, rather than uniformly tiling the image, the system may be configured to tile the image using the location of the segments or it may be configured to differently tile different segments within the SVC.

[0141] Using a grid, the machine-learning model (e.g., the neural network) can ‘see’ different parts of the vehicle or object and understand how the cells relate to the rest of the vehicle or object without reading the entire image at once. This may be more computationally efficient. This neural network may be configured to detect damage or defects in the cells of the grid. These cells may be further analyzed for information such as defect or damage size, location, penetration, severity, and type. This analysis may be carried out by another neural network.

[0142] In some embodiments, a deep neural network may perform a binary classification on a segment to predict whether it has a damage or defect. A subsequent neural network may then predict information about the damage or defect (e.g., label a key-point in the vicinity of the damage to characterize the damage).

[0143] The system may then append the defects to the cage and associate the updated cage with that particular object (step 208). The updated cage may be used in subsequent inspections to, for example, align the cage more easily with the object. In some embodiments, the updated cage may change the directions given to the user (e.g., it may guide the user to capture an image of the defects or damage itself and may generate overlays thereof to guide the user) The cage may also be used to continue to monitor the status of the defect or damage. For example, by appending the damage or defect information into the cage, the step-wise changes of that feature can be observed in each subsequent inspection (e.g., to see any further damage and identify between which inspections the further damage occurred). In some embodiments, the defects or damage may be appended to the cage as additional landmark information stored with the cage. The landmark data may localize the defect as an individual feature of the object and provide information about the defect (e.g., severity, type, size, and location). These parameters may help in a cost estimate for repairing the defect. These parameters may also describe the defect sufficiently such that it can later be regenerated with the cage. In some embodiments, the defect or damage may be appended as its own continuous curve (e.g., a Bezier curve) within the cage. In some embodiments, the defect or damage may be appended as a modification of an existing smooth curve within the cage.

[0144] In some embodiments, the defects or damage can be characterized using a variety of states (discrete and/or continuous) (e.g., representations, embeddings, or vector states). These states may include, for example, type, location, severity, penetration, size, etc. Continuous states may be stored as vectors. In some embodiments, the system may use some or all of these states to generate a report of the damage. In some embodiments, the system may save these states to describe the defects or damage for subsequent analysis, but truncate the states when reporting (e.g., a continuous size state may be converted into a discrete state of small, medium, or large).

[0145] In some embodiments, the system may apply a defect grid to the captured images to subdivide and process the damage. This may be more computationally taxing on the system, but may provide a better assessment of the damage (and a better baseline for subsequent comparison). In some embodiments, the defects or damage may be appended as a multi-modal data structure. For example, the defect or damage may be stored visually. The defect or damage may also be stored with linguistic data associated therewith. Such linguistic data may include descriptors of the damage or defect (severity, size, location, type, etc.).

[0146] In some embodiments, the method may be configured to generate a virtual representation of the object using the captured images and the cage. For example, the method may generate a series of views based on the captured images, but aligned with the cage. As another example, the method may generate a virtual 3D model by applying the captured images onto a 3D cage. In these embodiments it may further provide an interactive view of the object, for example, by allowing the user 10 to see the segment-by-segment analysis and having the virtual model zoom in or otherwise focus on that segment. [0147] In some embodiments, the defects or damage may undergo a cost or value estimation (e.g., to determine how much it will cost to fix it). The defect or damage information (e.g., the location, severity, type, penetrations, etc.) can indicate which segments are impacted and what remedial actions may be required. Using the segment information, the materials used for the components and the economic value of the defects or damage may be ascertained.

[0148] FIG. 2B illustrates another process diagram of a method 210 of detecting the physical condition of segments of an object and previously identified defects, according to some embodiments.

[0149] The method comprises the steps of Capturing Images of the Object (212), Aligning the Images with the Cage (214), and Detecting Physical Condition of Defects (216).

[0150] In operation, the method 210 may begin, for example, with a user 10 capturing images of the object (e.g., a vehicle) using, for example, an object capture module 132. The interface application 130 may provide guidance to the user as to how to capture the images (e.g., guidance on angles and brightness). The interface application 130 may provide the user guidance in the form of overlays on the interface application 130 to assist the user 10 in aligning the object with their object capture module 132. The interface application 130 or server 108 may be configured to determine the object type (e.g., vehicle make and model) while capturing images (e.g., analyzing the images to determine object type). In such embodiments, the system may process captured images to determine whether a bespoke cage has previously been generated for this object.

[0151] The user may capture images of the object using, for example, a camera or other imaging device. The system may then retrieve the appropriate cage for the object (block 212). It may retrieve the cage appropriate for the object based on the captured images. For example, the system may take other input based on a user input, a scanned identification or serial number (e.g., a VIN or a license plate), or some other method and retrieve a cage previously generated for an object matching that input. The cage may be stored in a memory. The memory may store only information about previously identified defects use that information to modify a template cage.

[0152] The system may then optimize a homography matrix (defining the composition of rotations, translations, and scaling required to align the cage with the semantic mask) by, for example, comparing the cage and the mask as two binary arrays of the same shape (block 214). In some embodiments, alignment may take place while identifying the cage (block 212). In some embodiments, other computer vision models may be used to generate the computer vision output. Such models may include Object Detection Models (which can identify the presence and location of objects within an image, and could be useful for identifying the object type and its components within the captured images), Instance Segmentation Models (which can not only identify the presence and location of objects within an image, but also differentiate between multiple instances of the same object and could be useful for identifying multiple objects of the same type within the captured images), Depth Estimation Models (which can estimate the depth of objects within an image and could be useful for determining the appropriate cage size for the object). In some embodiments, other methods of comparing the output from the computer vision model and the cage may be used. Once the captured images have been aligned with the cage, the captured images can be segmented using the segments and defects/damage defined by the cage. These segments can correspond to a variety of parts that make up the object (e.g., side door, tire, windshield, steering wheel, engine and subdivisions thereof, etc.). The defects/damage may correspond to previously identified defects and damage. In some embodiments, the interface application 130 may be configured to carry out the operation of the alignment engine 120. In some embodiments, another component carries out the operation of the alignment engine 120.

[0153] The system may then detect the physical condition such as defects or damage in the segments of the captured images (step 216) using, for example, recognition engine 122. The system may also detect the physical condition of any previously identified defect/damage to determine whether the damage has been repaired, remained the same, or gotten worse. In some embodiments, physical condition is detected using computer vision methods. This may be the same analysis conducted to identify the defect/damage (as described in relation to FIG. 2A), but now the method can assess the change in the defect/damage (e.g., its physical condition). For example, in some embodiments, the system may compare the captured images to the cage to ascertain whether there are additional deviations consistent with damage or defects. This comparison can be between the whole image or carried out on a segment-by-segment basis. Other methods of physical condition detection are also conceived. In some embodiments, the interface application 130 may be configured to carry out the operation of the recognition engine 122. In some embodiments, another component carries out the operation of the recognition engine 122.

[0154] As described above, detecting the physical condition of the defect (step 216) may occur before or in parallel with aligning the images with the cage (step 214). In such embodiments, the defect may be detected and the aligned cage may be used to identify the segment on which the defect exists.

[0155] In some embodiments, the detection of the physical condition of the defect or damage (step 216) can be carried out by tiling as described above in relation to FIG. 2A to detect the defect. Where a defect or damage has been appended to the cage, then this defect information may be used while processing the image. In some embodiments, the defect information may be used to distort or vary the vehicle grids to better assess the defect. However, such approaches may generate challenges for processing the images within the GPU. Accordingly, in some embodiments, a further defect grid may applied over the defect or damage specifically to subdivide the defect area for machine-learning computer vision processing. As described above, there may be a second defect grid offset with first defect grid to better cover border regions of the individual grids. In some embodiments, the method may be configured to generate a virtual representation of the object using the captured images and the cage. For example, the method may generate a series of views based on the captured images, but aligned with the cage. As another example, the method may generate a virtual 3D model by applying the captured images onto a 3D cage. In these embodiments it may further provide an interactive view of the object, for example, by allowing the user 10 to see the segment-by-segment analysis and having the virtual model zoom in or otherwise focus on that segment.

[0156] According to an aspect there is provided a non-transitory computer recordable storage medium having stored therein computer executable program code, which when executed by a processor, causes the processor to carry out the methods 200 and 210 described above.

[0157] Approaches to Segmentation

[0158] FIG. 3 illustrates different approaches 302 and 304 to object segmentation, according to some embodiments.

[0159] In approach 302, the system performs a pixel-wise comparison of individual pixels in coloured images. These approaches may be able to ascertain that different regions of a captured image correspond to different segments of the object (e.g., in approach 302, segment 302a is the tire and segment 302b approximately corresponds to the front bumper). These approaches may have challenges in providing detailed classification of small components and in identifying rare components. In order to alleviate these challenges a very large, labelled dataset may be required, involving detailed pixel-level annotations that can be very cumbersome to generate. [0160] In approach 304, the system performs a comparison of the whole object to the SVC 304a to identify how the SVC 304a maps on top of the object in the captured image. The SVC 304a itself holds information related to the positioning of segments relative to the whole object. This information may have been manually designed in advance or it may be automatically generated beforehand based on ideal captured images of an object type (e.g., captured images of an ideal object type (e.g., a vehicle fresh off the assembly line) may be segmented by a computing device and that file may be stored for later use). As such rather than conducting a pixel-wise comparison for each pixel in the image, the system is able to efficiently segment the captured image by accurately positioning the SVC 304a on top of the whole object. Such an approach may be more efficient for processing and may be able to more readily segment small or rare parts or known damage or defects that a pixel-wise comparison may struggle with. For example, with vehicles, this approach may be more computationally efficient for areas such as a vehicle’s interior or under it’s hood (i.e., where the system may need to identify a number of small components or rarer components).

[0161] Example Vehicular Method

[0162] FIG. 4 illustrates a process 400 for aligning a captured image with a cage, according to some embodiments.

[0163] In this exemplary process 400, the user 10 captures an image of the object (here a vehicle) (402). The image is then processed using advanced machine learning algorithms to extract imagery based vehicle information (406). In some embodiments, process 400 can process the image using various semantic segmentation algorithms that use Deep Neural Networks to process images in a pixel format. In some embodiments, process 400 can use prompt-based segmentation approaches and Generative Al methods for image processing. Process 400 can use any advanced image processing transformations of the image that can produce informative representations of regions of pixels confined within the various components of the SVC. The imagery based vehicle information can include information such as the make and model of the vehicle (e.g., through a VIN), the location and extent of any damage or defects, information from the instrument cluster (e.g., odometer, engine lights, etc.), etc. Other information is also conceived.

[0164] The system pulls an appropriate SVC. SVCs may be stored in a computing device, a remote server, or elsewhere. In some embodiments, the SVCs type may be selected based on information extracted from the captured image. The SVCs are then processed using a Scalable Vector Cages View Alignment Algorithm to align the SVCs with the captured image (408).

[0165] These two processes are then combined as the Advanced Vehicle Classification 410. This Advanced Vehicle Classification 410 comprises the captured image with any imagery based vehicle information with the SVC (or cages for multiple views) superimposed thereon. In having superimposed the SVC on the captured image, it may be possible to identify the location of any defects or damage done to the vehicle based on its location relative to the overlaid SVC. The system may further use this information to provide an appraisal report or estimate repair costs and/or fair market value of the vehicle.

[0166] Example Vehicular Cage

[0167] In some embodiments, SVCs can be used to distinguish between thousands of makes and models of vehicles. These SVCs can further be used to distinguish between 255 or more parts of the vehicle in a captured image. The SVCs may only require annotation once (for example, by a person or through computer algorithms such as those generated by machine learning). The SVCs may be updated based on damage or defect information during an inspection and that damage or defect be used in subsequent alignment and detection processes.

[0168] FIG. 5 illustrates an example interface of a front view 500 of an object cage with example parts and damage shown, according to some embodiments.

[0169] In some embodiments an example interface of a front view 500 of an object such as a vehicle with example parts shown may include the front cage 502, with part labels 504 (including part labels 504a, 504b, and 504c) and defect/damage labels 508. The interface may also include view options 506 (including 506a, 506b, and 506c) to toggle between other views of the object included in the object’s complete cage.

[0170] Front cage 502 can be colour-coded to visually distinguish different parts or regions of the object. For example, the image shows four different regions corresponding to the body, the features, the windows, and the tires. Defect/damage labels 508 can be shown in another colour to further differentiate them from the underlying object. Such colour coordination can assist a user in visually distinguishing the parts or regions of the vehicle.

[0171] The part labels 504 can indicate which part they are pointing to. In some embodiments, the part labels 504 can also be colour-coded to match or correspond to the colour-coding of the front cage 502. In some embodiments part labels 504 may be colour-coded differently than the colour-code used for front cage 502. Part labels 504a correspond to those of the features and may be used to indicate parts such as trim windshield right, light head right, light fog right, grill left, badge front, housing mirror left, vent hood, etc. Part labels 504b correspond to those of the body and may be used to indicate pillar A center, fender front right, bumper valance, etc. Part labels 504c correspond to those of the windows and may indicate windshield, etc. In some embodiments, part labels 504 may not be differentiated based on regions of the object.

[0172] Part labels 504 may further be customizable, for example, when generating a new object cage, a user or a system may start with a default cage (labelled or otherwise) and add, remove, or rename part labels 504 as is appropriate.

[0173] Damage/defect labels 508 can indicate which part of the vehicle has been damaged.

[0174] View options 506 can be used to toggle between different views of an object. In the

Figure, view option 506a indicates the front view and is presently selected (hence front cage 502 being displayed). Selecting view option 506b, which indicates rear, may update the interface to display a rear cage (e.g., a rear cage 702). Selecting view option 506c, which indicates left/right, may update the interface to display a side cage (e.g., a side cage 602).

[0175] Example Damage Alignment Process

[0176] FIG. 6 illustrates a process diagram of a method 600 of comparing a defect identified in previous inspection to its state in a subsequent inspection, according to some embodiments.

[0177] The method 600 can include image capture and segmentation (block 602), region alignment (block 604), comparison and analysis (block 606).

[0178] Images of the vehicle can be captured from the specified angles (block 602). The cage segmentation method can use SVCs (vector image representation of the outline of the vehicle with all the components labelled) to segment and isolate specific regions of a vehicle. This can identify distinct vehicle parts that are to be inspected for damage or changes. In some embodiments, images are captured from an anchor view (e.g., perpendicular to the vehicle's side) and two side support views (e.g., approximately 45 degrees on either side of the anchor view). These multiple views can ensure comprehensive coverage of the vehicle's condition. This can aide in generating more data for training (damage may be explicitly labelled in the anchor view and implicitly labelled in the support views). [0179] Isolated regions can be matched across the different views using a transformation algorithm that searches for the necessary transformation of the SVC and feature matching methodology with the photographs to align these regions (block 604). This alignment process can ensure consistent comparison across the different inspections. With this transformation, select sets of vehicle components can be considered as they appear across the different views. These can be used to produce stacks of the three different views of each image region. This can facilitate the comparison of vehicle components across different inspection times.

[0180] Stacked views of each region can be compared using the matching methodology and damage detection (block 606). This method can identify and document any changes or damage to the vehicle across inspections (e.g., against the previous inspection or a baseline). With these stacks the same matching methodology used before can be used to compare each side of the vehicle across distinct inspections which can provide a very granular understanding of changes to the vehicle condition.

[0181] Example Cage Alignment Method

[0182] The cage alignment method can be applied to an image to isolate the foreground vehicle from the background and perform a range of quality verification steps before the image is passed to another system, for example, a Human-in-the-Loop system.

[0183] Based on the fit from the cage alignment method the following rejection conditions can be determined: 1) whether the image is of the vehicle being inspected, 2) whether the door is open, 3) whether the image correspond to the appropriate camera perspective being captured by the user, 4) whether the image is clear, 5) whether the vehicle is fully in frame, 6) whether the view of the vehicle is obstructed by another object, 7) whether the region of the image associated with the inspected vehicle is under appropriate lighting conditions (not too dark or too bright), 8) whether the region of the image associated with the inspected vehicle is captured under appropriate weather conditions. There can be additional rejection conditions. The rejection conditions can be based on quality issues associated with the captured images or rejection conditions may also be based on other factors (e.g., a photo violates a business policy). This configuration prevents the system from making bad predictions based on out-of-distribution inputs. This is one way to remove ambiguous data before being presented to the defect detection system. [0184] If the image clears all of these rejection conditions, the image can be sent to a Human- in-the-Loop system while performing automated defect analysis to provide the agents with updated defect detections using the Cage Data Structure that allows for easy overlaying of Al results with the web-interfaces that we create for the Human-in-the-Loop system. This prevents the situation wherein images that do not contain the object are presented to the defect analysis system which may result in hallucinations on the part of the defect analysis system producing highly volatile outputs.

[0185] A committee of several neural networks can predict likelihoods for all the rejection conditions as well as the different camera perspectives. The averaged likelihoods from this committee of neural networks can be used to determine how confident the model is in its prediction and whether there is consensus across all the different neural networks in the committee. If there is consensus, then the system accepts or rejects the image based on that consensus. If there is not consensus, the system may still carry out the favoured answer of the committee, but it may flag the image for further inspection to improve the system.

[0186] In some embodiments, this process is being carried out while, for example, an imaging device is being fed a live-feed of images and a confident result triggers the system to capture an image from the live-feed at that point in time.

[0187] The committee of neural networks can be trained using a strategy to perform data sanitization with ambiguous data as follows. The system can use a collection of expert neural networks where each one acts as an expert to predict one of the given reject conditions. Each of the expert neural networks can predict: 1) whether the rejection condition is met and the image should be rejected forthat particular reason, 2) whether the image should be accepted, 3) whether the image should be rejected due to any of the other rejection condition being met, and 4) what the camera perspective is. Having each expert neural network predict what the camera perspective can ensure that the system receives a holistic view of the vehicle. The perspective prediction can be used to identify misalignment of the cage with the vehicle. The cage misalignment with the vehicle can also be used to obtain a likelihood for perspective prediction.

[0188] Using this quaternary classification scheme a large collection of expert classifiers are obtained that can be combined together so the average likelihood of an image being accepted is obtained as well as the likelihoods for each of the different camera perspectives that an image could be associated with. The combined likelihood of the expert classifier for that rejection condition can be considered where an image gets rejected in combination with the average likelihood of other reject classifiers predicting that one of the other reject conditions is being met.

[0189] Using these averaged expert likelihoods, a more compressed and efficient neural network can be trained to predict the aggregated likelihoods of the expert system. The training data may be the subset of the data where, for example, the Human-in-the-Loop system has a documented consensus with the collection of expert classifiers. This procedure can prevent the second network model from being exposed to ambiguous data inputs during training which are prevalent in the context of quality control applications and can negatively impact the downstream defect analysis.

[0190] This training process can be repeated several times with different partitions of the dataset being used to produce several neural networks. This plurality of neural networks forms the committee of several neural networks that can predict likelihoods for all the rejection conditions as well as the different camera perspectives.

[0191] Example Implementations.

[0192] According to an aspect, there is provided a system 100 for classifying segments of an object. The system 100 comprising a server having non-transitory computer readable storage medium 108 with executable instructions for causing one or more processors 104 to process captured images using a cage for image alignment, the cage defining segments of the object, align the captured images onto a cage to identify segments of the object in the captured images using an alignment engine 120, detect one or more defects in the segments of the object in the captured images using the recognition engine 122, and append data representative of the one or more defects to the cage.

[0193] According to an aspect, there is provided a system 100 for classifying segments of an object. The system 100 comprising a server having non-transitory computer readable storage medium 108 with executable instructions for causing one or more processors 104 to process captured images using a cage for image alignment, the cage defining segments of the object and one or more defects, align the captured images onto the cage to identify segments of the object and the one or more defects in the captured images using an alignment engine 120, and detect one or more physical conditions of the segments of the object and the one or more defects in the captured images using the recognition engine 122. [0194] According to an aspect, there is provided a system 100 for classifying segments of an object. The system 100 includes a server having non-transitory computer readable storage medium 108 with executable instructions for causing one or more processors 104 to receive captured images from an object capture module 132, align the captured images onto a cage for image alignment to identify segments of the object in the captured images using an alignment engine 120, the cage defining segments of the object, detect one or more defects in the captured images using a recognition engine 122, combine the aligned captured images and the detected one or more defects to generate data representative of the one or more defects, and append the data representative of the one or more defects to the cage.

[0195] In some embodiments, the one or more defects are characterized with a multi-modal data structure comprising linguistic data and visual data.

[0196] In some embodiments, detecting the one or more defects comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

[0197] In some embodiments, the object comprises a vehicle.

[0198] In some embodiments, the cage comprises a 3D model.

[0199] In some embodiments, the cage includes a graphic script in a domain specific language.

[0200] In some embodiments, the server is further configured to compute cost data for repair of the one or more defects.

[0201] In some embodiments, the processor 104 is configured to align the captured images onto the cage by generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask. The homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask, wherein the cage and the semantic mask are compared as two binary arrays of a same shape.

[0202] In some embodiments, the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection. [0203] In some embodiments, the data representative of the one or more defects is appended to the cage on first observation of the one or more defects.

[0204] In some embodiments, the one or more defects are detected by dividing the captured images into one or more object grids, processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects, and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

[0205] In some embodiments, the processor 104 is further configured to receive a second set of captured images from an object capture module 132, align the second set of captured images onto the cage to identify segments of the object in the captured images using the alignment engine 120, detect one or more physical conditions of the one or more defects in the captured images using the recognition engine 122, and combine the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

[0206] In some embodiments, aligning the captured images comprises aligning the one or more defects.

[0207] In some embodiments, the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

[0208] In some embodiments, the one or more defects is removed from the cage when the system fails to identify the one or more defects.

[0209] In some embodiments, the one or more one or more physical conditions of the one or more defects are detected by dividing the captured images into one or more object grids and one or more defect grids, processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects, and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

[0210] In some embodiments, receiving the captured images includes receiving a captured image and the cage, processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image, determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0211] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

[0212] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0213] According to an aspect, there is provided a method 200 for classifying segments of an object. The method 200 comprising processing captured images using a cage for image alignment, the cage defining segments of the object, aligning the captured images onto the cage to identify segments of the object in the captured images (step 204), detecting one or more defects in the segments of the object in the captured images (step 206), and appending data representative of the one or more defects to the cage (step 208).

[0214] According to an aspect, there is provided a method 210 for classifying segments of an object. The method 210 comprising processing captured images using a cage for image alignment, the cage defining segments of the object and one or more defects, aligning the captured images onto the cage to identify segments of the object and the one or more defects in the captured images (step 214), and detecting one or more physical conditions of the segments of the object and the one or more defects in the captured images (step 216). [0215] According to an aspect, there is provided a method 200 for classifying segments of an object. The method 200 including: receiving captured images (step 202), aligning the captured images onto a cage for image alignment to identify segments of the object in the captured images (step 204), the cage defining segments of the object, detecting one or more defects in the captured images (step 206), combining the aligned captured images and the detected one or more defects to generate data representative of the one or more defects, and appending the data representative of the one or more defects to the cage (step 208).

[0216] In some embodiments, the one or more defects are characterized with a multi-modal data structure comprising linguistic data and visual data.

[0217] In some embodiments, detecting the one or more defects (step 206) comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

[0218] In some embodiments, the object comprises a vehicle.

[0219] In some embodiments, the cage comprises a 3D model.

[0220] In some embodiments, the cage comprises a graphic script in a domain specific language.

[0221] In some embodiments, the method 200 further comprises computing cost data for repair of the one or more defects.

[0222] In some embodiments, aligning the captured images onto the cage (step 204) comprises generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask. The homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask. The cage and the semantic mask are compared as two binary arrays of a same shape.

[0223] In some embodiments, the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection.

[0224] In some embodiments, the data representative of the one or more defects is appended to the cage on first observation of the one or more defects. [0225] In some embodiments, the one or more defects are detected (step 206) by dividing the captured images into one or more object grids, processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects, and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

[0226] In some embodiments, the method 200 further includes method 210 which includes receiving a second set of captured images (step 212), aligning the second set of captured images onto the cage to identify segments of the object in the captured images (step 214), detecting one or more physical conditions of the one or more defects in the captured images (step 216), and combining the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

[0227] In some embodiments, aligning the captured images (step 214) comprises aligning the one or more defects.

[0228] In some embodiments, the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

[0229] In some embodiments, the one or more defects is removed from the cage when the system fails to identify the one or more defects.

[0230] In some embodiments, the one or more one or more physical conditions of the one or more defects are detected (step 216) by dividing the captured images into one or more object grids and one or more defect grids, processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects, and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

[0231] In some embodiments, receiving the captured images (step 202) includes receiving a captured image and the cage, processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image, determining presence or absence consensus between outputs of each of the plurality of neural networks, and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0232] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks. Each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition. The quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

[0233] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0234] According to an aspect, there is provided a method of accepting or rejecting an image. The method comprising receiving a captured image and a cage, processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image, determining presence or absence consensus between outputs of each of the plurality of neural networks, and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

[0235] In some embodiments, each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks. Each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition. The quaternary classification classifies an image based on 1) whether the associated rejection condition is met, 2) whether the image should be accepted, 3) whether another expert neural network’s associated rejection condition is met, and 4) the perspective of the image.

[0236] In some embodiments, each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

[0237] According to an aspect there is provided a non-transitory computer recordable storage medium having stored therein computer executable program code, which when executed by a processor, causes the processor to carry out methods described herein.

[0238] Computing Device Implementation Details

[0239] The embodiments of the devices, systems and methods described herein may be implemented in a combination of both hardware and software. These embodiments may be implemented on programmable computers, each computer including at least one processor, a data storage system (including volatile memory or non-volatile memory or other data storage elements or a combination thereof), and at least one communication interface.

[0240] Program code is applied to input data to perform the functions described herein and to generate output information. The output information is applied to one or more output devices. In some embodiments, the communication interface may be a network communication interface. In embodiments in which elements may be combined, the communication interface may be a software communication interface, such as those for inter-process communication. In still other embodiments, there may be a combination of communication interfaces implemented as hardware, software, and combination thereof.

[0241] Throughout this discussion, numerous references have been made regarding servers, services, interfaces, portals, platforms, or other systems formed from computing devices. It should be appreciated that the use of such terms is deemed to represent one or more computing devices having at least one processor configured to execute software instructions stored on a computer readable tangible, non-transitory medium. For example, a server can include one or more computers operating as a web server, database server, or other type of computer server in a manner to fulfill described roles, responsibilities, or functions.

[0242] One should appreciate that the systems and methods described herein may provide better memory usage, improved processing, improved bandwidth usage, efficient image processing, accurate defect identification, enhanced cost estimates, etc.

[0243] The technical solution of embodiments may be in the form of a software product. The software product may be stored in a non-volatile or non-transitory storage medium, which can be a compact disk read-only memory (CD-ROM), a USB flash disk, or a removable hard disk. The software product includes a number of instructions that enable a computer device (personal computer, server, or network device) to execute the methods provided by the embodiments.

[0244] The embodiments described herein can be implemented by physical computer hardware, including computing devices, servers, receivers, transmitters, processors, memory, displays, and networks. The embodiments described herein provide useful physical machines and particularly configured computer hardware arrangements. The embodiments described herein are directed to electronic machines and methods implemented by electronic machines adapted for processing and transforming electromagnetic signals which represent various types of information. The embodiments described herein pervasively and integrally relate to machines, and their uses; and the embodiments described herein have no meaning or practical applicability outside their use with computer hardware, machines, and various hardware components. Substituting the physical hardware particularly configured to implement various acts for nonphysical hardware, using mental steps for example, may substantially affect the way the embodiments work. Such computer hardware limitations are clearly essential elements of the embodiments described herein, and they cannot be omitted or substituted for mental means without having a material effect on the operation and structure of the embodiments described herein. The computer hardware is essential to implement the various embodiments described herein and is not merely used to perform steps expeditiously and in an efficient manner.

[0245] FIG. 7 illustrates a schematic diagram of computing device 700, according to some embodiments.

[0246] For simplicity only one computing device 700 is shown but system may include more computing devices 700 operable by users to access remote network resources and exchange data. The computing devices 700 may be the same or different types of devices. As depicted, computing device 700 includes at least one processor 702, memory 704, at least one I/O interface 706, and at least one network interface 708. The computing device components may be connected in various ways including directly coupled, indirectly coupled via a network, and distributed over a wide geographic area and connected via a network (which may be referred to as “cloud computing”). For example, and without limitation, the computing device 700 may be a server, network appliance, set-top box, embedded device, computer expansion module, personal computer, laptop, personal data assistant, cellular telephone, smartphone device, LIMPC tablets, video display terminal, gaming console, electronic reading device, and wireless hypermedia device or any other computing device capable of being configured to carry out the methods described herein

[0247] Each processor 702 may be, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), a reconfigurable processor, a programmable read-only memory (PROM), a graphics processing unit (GPU), or any combination thereof.

[0248] Memory 704 may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically- erasable programmable read-only memory (EEPROM), Ferroelectric RAM (FRAM) or the like.

[0249] Each I/O interface 706 enables computing device 700 to interconnect with one or more input devices, such as a keyboard, mouse, camera, touch screen and a microphone, or with one or more output devices such as a display screen and a speaker.

[0250] Each network interface 708 enables computing device 700 to communicate with other components, to exchange data with other components, to access and connect to network resources, to serve applications, and perform other computing applications by connecting to a network (or multiple networks) capable of carrying data including the Internet, Ethernet, plain old telephone service (POTS) line, public switch telephone network (PSTN), integrated services digital network (ISDN), digital subscriber line (DSL), coaxial cable, fiber optics, satellite, mobile, wireless (e.g. Wi-Fi, WiMAX), SS7 signaling network, fixed line, local area network, wide area network, and others, including any combination of these. [0251] Computing device 700 is operable to register and authenticate users (using a login, unique identifier, and password for example) prior to providing access to applications, a local network, network resources, other networks and network security devices. Computing devices 700 may serve one user or multiple users.

[0252] Implementation with other Related Solutions

[0253] The systems and methods described herein may be used in conjunction with the systems and methods described in PCT Application No PCT/CA2019/051463, filed 16 October 2019, entitled “AUTOMATED ARTIFICIAL INTELLIGENCE VEHICLE APPRAISALS”, published as PCT Pub No W02020/077449, the contents of which are herein incorporated by reference.

[0254] The systems and methods described herein are compatible with systems and methods described in PCT Patent Application No PCT/CA2024/050358, filed on 22 March 2024, titled “SCALABLE VECTOR CAGES: VECTOR-TO-PIXEL METADATA TRANSFER FOR OBJECT PART CLASSIFICATION”, the contents of which are incorporated herein by reference.

[0255] In some embodiments, the systems and method provided herein may be used with a system for vehicle appraisals using image processing. The system has a server having non- transitory computer readable storage medium with executable instructions for causing one or more processors to configure: an interface application with a vehicle capture module to capture images of a vehicle and metadata for the captured images, the interface application displaying an interactive guide to assist in capturing the images, the interactive guide generated using a cage for a vehicle type, the cage defining locations or components of the vehicle, a vehicle identification number being metadata for the captured images, the vehicle identification number indicating the vehicle type; a recognition engine to process the captured images and metadata to detect defects of the vehicle and compute vehicle metrics, the processing based on different tasks dispatched to agent interfaces to receive input data for detecting the defects of the vehicle and computing the vehicle metrics; a cost estimate tool to process the vehicle metrics to compute cost data for repair of the defects of the vehicle; a valuation tool to compute a market value estimate for the vehicle using the vehicle metrics and the cost data; and wherein the interface application has visual elements corresponding to the interactive guide, the market value estimate, the cost data, and at least a portion of the vehicle metrics.

[0256] In some embodiments, the systems and method provided herein may be used with a system for vehicle appraisals using image processing. The system has a server having non- transitory computer readable storage medium with executable instructions for causing one or more processors to configure: an interface application with a vehicle capture module to capture images of a vehicle and metadata for the captured images, the interface application displaying an interactive guide to assist in capturing the images, the interactive guide having overlays that update to assist in capturing different images of views of the vehicle that correspond to side views of the vehicle, a front view of the vehicle, and a rear view of the vehicle, the interactive guide generated using a cage for a vehicle type, the cage defining locations or components of the vehicle, a vehicle identification number being metadata for the captured images, the vehicle identification number indicating the vehicle type, wherein the cage has different cage views comprising the side views of the vehicle, the front view of the vehicle, and the rear view of the vehicle, wherein the overlays comprise the different cage views to assist in capturing the different images with corresponding views of the vehicle. The system also has a plurality of agent interfaces, each agent interface having a task dashboard to display a portion of the captured images to receive input data, wherein the plurality of agent interfaces display the different cage views as overlays for the captured images with corresponding views. The server further configuring: a recognition engine to process the captured images and metadata to detect defects of the vehicle and compute vehicle metrics, the processing based on a plurality of tasks dispatched to the plurality of agent interfaces to receive the input data for detecting the defects of the vehicle and computing the vehicle metrics, each task dispatched to a corresponding agent interface; each task associated with the portion of the captured images to display within the corresponding agent interface, wherein the system defines tasks for each view of the vehicle; a cost estimate tool to process the vehicle metrics to compute cost data for repair of the defects of the vehicle; a valuation tool to compute a market value estimate for the vehicle using the vehicle metrics and the cost data. The interface application has visual elements corresponding to the interactive guide, the market value estimate, the cost data, and at least a portion of the vehicle metrics.

[0257] General Implementation Details

[0258] Applicant notes that the described embodiments and examples are illustrative and non-limiting. Practical implementation of the features may incorporate a combination of some or all of the aspects, and features described herein should not be taken as indications of future or existing product plans. Applicant partakes in both foundational and applied research, and in some cases, the features described are developed on an exploratory basis. [0259] The foregoing has been described with particular attention to the application to segmenting parts of a vehicle for exemplary purposes only. The systems and methods described herein, even when explicitly referencing a vehicle, are broadly applicable to many object categories. Other applicable object categories include other merchandise, industrial equipment, aircraft parts and components, construction equipment and machinery, medical equipment and devices, electronic devices and components, furniture and fixtures, agricultural machinery and equipment, marine vessels and components, manufacturing machinery and equipment, power generation and distribution equipment, scientific instruments, equipment, and more generally, any item with parts. These processes may also be useful in any application where something needs to be visually inspected, particularly complex objects comprising parts.

[0260] The foregoing discussion provides many example embodiments. Although each embodiment represents a single combination of inventive elements, other examples may include all possible combinations of the disclosed elements. Thus if one embodiment comprises elements A, B, and C, and a second embodiment comprises elements B and D, other remaining combinations of A, B, C, or D, may also be used.

[0261] The term “connected” or “coupled to” may include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements).

[0262] The term “defect” and “damage” have been used interchangeably in the above description and may generally relate to, for example, damage caused by accidents (e.g., vehicular accidents), wear to the object, manufacturing defects, and other object conditions that may benefit from ongoing monitoring. Furthermore, in other applications, “defect” and “damage” may refer to a condition of something living (e.g., an organ) or tumour (in applications for the living body).

[0263] Although the embodiments have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification.

[0264] Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

[0265] As can be understood, the examples described above and illustrated are intended to be exemplary only. The scope is indicated by the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A system for classifying segments of an object, the system comprising: a server having non-transitory computer readable storage medium with executable instructions for causing one or more processors to: receive captured images; align the captured images onto a cage for image alignment to identify segments of the object in the captured images, the cage defining segments of the object; detect one or more defects in the captured images; combine the aligned captured images and the detected one or more defects to generate data representative of the one or more defects; and append the data representative of the one or more defects to the cage.

2. The system of claim 1 , wherein the one or more defects are characterized with a multimodal data structure comprising linguistic data and visual data.

3. The system of claim 1 , wherein detecting the one or more defects comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

4. The system of claim 1 , wherein the object comprises a vehicle.

5. The system of claim 1 , wherein the cage comprises a 3D model.

6. The system of claim 1 , wherein the cage comprises a graphic script in a domain specific language.

7. The system of claim 1 , wherein the server is further configured to compute cost data for repair of the one or more defects.

8. The system of claim 1 , wherein the processor is configured to align the captured images onto the cage by: generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask, wherein the homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask, wherein the cage and the semantic mask are compared as two binary arrays of a same shape.

9. The system of claim 1 , wherein the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection.

10. The system of claim 1 , wherein the data representative of the one or more defects is appended to the cage on first observation of the one or more defects.

11 . The system of claim 1 , wherein the one or more defects are detected by: dividing the captured images into one or more object grids; processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

12. The system of claim 1 , wherein the processor is further configured to: receive a second set of captured images; align the second set of captured images onto the cage to identify segments of the object in the captured images; detect one or more physical conditions of the one or more defects in the captured images; and combine the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

13. The system of claim 12, wherein aligning the captured images comprises aligning the one or more defects.

14. The system of claim 12, wherein the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

15. The system of claim 12, wherein the one or more defects is removed from the cage when the system fails to identify the one or more defects.

16. The system of claim 12, wherein the one or more one or more physical conditions of the one or more defects are detected by: dividing the captured images into one or more object grids and one or more defect grids; processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

17. The system of claim 1 , wherein receiving the captured images comprises: receiving a captured image and the cage; processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image; determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

18. The system of claim 17, wherein each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

19. The system of claim 18, wherein each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

20. A method for classifying segments of an object, the method comprising: receiving captured images; aligning the captured images onto a cage for image alignment to identify segments of the object in the captured images, the cage defining segments of the object; detecting one or more defects in the captured images; combining the aligned captured images and the detected one or more defects to generate data representative of the one or more defects; and appending the data representative of the one or more defects to the cage.

21 . The method of claim 20, wherein the one or more defects are characterized with a multimodal data structure comprising linguistic data and visual data.

22. The method of claim 20, wherein detecting the one or more defects comprises detecting one or more of severity, type, size, location, and penetration of the one or more defects.

23. The method of claim 20, wherein the object comprises a vehicle.

24. The method of claim 20, wherein the cage comprises a 3D model.

25. The method of claim 20, wherein the cage comprises a graphic script in a domain specific language.

26. The method of claim 20, wherein the method further comprises computing cost data for repair of the one or more defects.

27. The method of claim 20, wherein aligning the captured images onto the cage comprises: generating a semantic mask from the captured images using semantic segmentation models; and optimizing a homography matrix between the cage and the semantic mask, wherein the homography matrix defines a composition of rotations, translations, scaling, and/or distortion correction to align the cage with the semantic mask, wherein the cage and the semantic mask are compared as two binary arrays of a same shape.

28. The method of claim 20, wherein the cage is selected from a plurality of cages based on at least one of an object identification number, user selection, and system selection.

29. The method of claim 20, wherein the data representative of the one or more defects is appended to the cage on first observation of the one or more defects.

30. The method of claim 20, wherein the one or more defects are detected by: dividing the captured images into one or more object grids; processing each cell of the one or more object grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the data representative of the one or more defects.

31 . The method of claim 20, the method further comprising: receiving a second set of captured images; aligning the second set of captured images onto the cage to identify segments of the object in the captured images; detecting one or more physical conditions of the one or more defects in the captured images; and combining the aligned second set of captured images and the detected one or more physical conditions of one or more defects to generate data representative of a change in the one or more defects.

32. The method of claim 31 , wherein aligning the captured images comprises aligning the one or more defects.

33. The method of claim 31 , wherein the one or more defects is updated when the system detects a change in the physical condition of the one or more defects.

34. The method of claim 31 , wherein the one or more defects is removed from the cage when the system fails to identify the one or more defects.

35. The method of claim 31 , wherein the one or more one or more physical conditions of the one or more defects are detected by: dividing the captured images into one or more object grids and one or more defect grids; processing each cell of the one or more object grids and the one or more defect grids using computer vision to detect the presence of the one or more defects; and processing cells with the one or more defects to predict at least some of the one or more physical conditions of the one or more defects.

36. The method of claim 20, wherein receiving the captured images comprises: receiving a captured image and the cage; processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image; determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

37. The method of claim 36, wherein each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

38. The method of claim 37, wherein each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.

39. A method of accepting or rejecting an image, the method comprising: receiving a captured image and a cage; processing the captured image with a plurality of neural networks, each neural network predicting whether the captured image should be accepted or rejected and the perspective of the captured image; determining presence or absence consensus between outputs of each of the plurality of neural networks; and accepting or rejecting the captured image when consensus is present or flagging the captured image for further inspection where consensus is absent.

40. The method of claim 39, wherein each neural network of the plurality of neural networks is trained by predicting an averaged likelihood of a quaternary classification for a training image from a plurality of expert neural networks, wherein each neural network of the plurality of neural networks is trained on a different partition of training data, each of the plurality of expert neural networks is an expert for predicting an associated rejection condition, and wherein the quaternary classification classifies an image based on: whether the associated rejection condition is met, whether the image should be accepted, whether another expert neural network’s associated rejection condition is met, and the perspective of the image.

41. The method of claim 40, wherein each neural network of the plurality of neural networks has an associated rejection condition selected from a group consisting of: the captured image is not of the object under inspection, a door is open, the captured image corresponds to an inappropriate camera perspective, the image is not clear, the object is not fully in frame, a view of the object in the captured image is obstructed by another object, a region of the captured image associated with the inspected object is not under appropriate lighting conditions, and the region of the captured image associated with the inspected object is not captured under appropriate weather conditions.