CN121359004A

CN121359004A - System and method for intraoral scanning

Info

Publication number: CN121359004A
Application number: CN202480039853.8A
Authority: CN
Inventors: J·N·詹森
Original assignee: 3Shape AS
Current assignee: 3Shape AS
Priority date: 2023-06-16
Filing date: 2024-06-17
Publication date: 2026-01-16
Also published as: WO2024256722A1

Abstract

An intraoral scanning system (102) is provided that includes a handheld intraoral scanner (104), a processor (106), and a continuous volumetric ML model (108) for generating a 3D internal geometry (120) of a tooth (306). The intraoral scanner includes a sensor (214D) for detecting NIR and visible light. The processor receives visible light and NIR information from the sensor, generates a 3D surface model (118) of the tooth using the visible light information, and determines input parameters of the continuous volume ML model by capturing a 2D NIR image (114) of an interior region of the tooth from the NIR information. Each NIR image includes pixels (722) having a projection object. The input parameters include spatial position and view angle, and the projection object contains point coordinates (726,732). The continuous volume ML model processes the input parameters to determine intensity values and density values for pixels in the 3D internal geometry of the 3D surface model.

Description

System and method for intraoral scanning

Technical Field

Exemplary embodiments of the present invention relate generally to intraoral scan registration, and more particularly, to intraoral scan systems and methods for intraoral scanning to generate three-dimensional internal geometries (geometries) of a subject's teeth.

Background

Intraoral scanners are electronic devices that can be used, for example, to capture three-dimensional (3D) digital images. In one example, an intraoral scanner may include a light source that may project light onto an object to be scanned (e.g., teeth, gums, and other intraoral structures within a patient's mouth). In one example, images captured by an intraoral scanner may be processed to generate a digital impression, such as a 3D surface model of a patient's mouth. The 3D surface model may be displayed on a screen for examination of the oral cavity.

Typically, intraoral scanners are used to examine or treat within a patient's mouth. The use of an intraoral scanner can eliminate the use of conventional impression materials, plaster models, and simplify the clinical treatment process of the dentist on the teeth while reducing the discomfort of the patient. However, conventional intraoral scanners may not be suitable for detecting tooth internal structures, such as structures of enamel and dentin within a patient's teeth, because they are typically designed only for detecting dental surfaces. In particular, intraoral scanners may not be able to determine the general internal composition of teeth in a patient's mouth. Thus, intraoral scanners are not suitable for detecting deep cracks, margin lines or other errors in dental restorations (e.g., crowns, bridges, implants, inlays, onlays, etc.) such as the development of caries and cracks in the enamel and underlying dentin of a tooth, bleeding or any other damage in the enamel and underlying dentin of a tooth, and/or for preparing a patient. In particular, internal structural information of the natural tooth or prosthetic implant, in combination with surface information, is critical for effective treatment of the patient and for ensuring long-span restoration using the natural tooth or prosthetic implant.

Conventional methods for constructing 3D internal geometries of intraoral dental structures, such as teeth, using intraoral scanners have limitations. In one example, conventional approaches may have an intraoral scanner capture a large number of images from various different viewpoints to reconstruct the internal geometry of a dental structure in 3D. In this regard, a plurality of points corresponding to the 3D internal geometry of the dental structure are projected based on a number of pixels and corresponding intensities of a large number of images captured from different viewpoints. Then, only the minimum projected object intensity or minimum scattering intensity corresponding to each artificial point is considered to reconstruct the 3D internal geometry of the dental structure in 3D.

Conventional methods of generating 3D models of 3D internal geometries of dental structures have some drawbacks. In one example, conventional methods cannot reliably disambiguate between different components of the 3D internal geometry of the dental structure. Furthermore, conventional methods require capturing a very large number of images from a large number of viewpoints, since reliable scattering coefficients can be determined only if multiple viewpoints are covered. This increases the computational effort and leads to delays in generating the 3D model. Thus, real-time operations may not be performed on such 3D models of the 3D internal geometry of the dental structure. For example, all points along a given ray are determined to belong to the same material unless the point on the ray is also seen by another ray from another image having a lower intensity, i.e. another ray in which the point is also seen in another image does not intersect any other part of the dental structure than the point within the 3D internal geometry of the dental structure. For example, to determine a certain point as enamel, at least one of the light rays that need to be projected onto the dental structure and back forms an image that does not intersect dentin and any other dental structure within the internal structure at any point. Accordingly, there is a need for systems and methods that improve in-portal scanning to overcome the shortcomings of conventional methods for generating 3D models of the internal geometry of dental structures.

Disclosure of Invention

An intraoral scanning system, method and computer programmable product are provided for generating three-dimensional (3D) internal geometry of a subject's teeth using a handheld intraoral scanner comprising sensors for detecting Near Infrared (NIR) and visible light, rendering a dental three-dimensional (3D) surface model of the subject's teeth, and a continuous volume machine learning model.

Some embodiments are based on the insight that the 3D internal geometry of a subject's teeth is determined using visible light information and NIR information from one or more sensors, wherein the visible light information and the NIR information are simultaneously from one or more sensors or have a delay in between, such delay being caused by switching of the visible light and the NIR light. Such switching may be required because the intraoral scanning system is not configured to capture both visible light information and NIR information. The period of delay may be within 200ms or 500 ms.

In one aspect, an intraoral scanning system for generating three-dimensional (3D) internal geometry of a subject's teeth is disclosed. The intraoral scanning system may include a handheld intraoral scanner configured to operate with one or more sensors to detect Near Infrared (NIR) and visible light. The one or more sensors may include an image sensor. The intraoral scanning system may include one or more processors operatively connected to a handheld intraoral scanner. The one or more processors may be configured to receive visible light information and Near Infrared (NIR) information from the one or more sensors. The one or more processors may be configured to determine surface information in real-time from the visible light information to generate a three-dimensional (3D) surface model of the subject's teeth using the surface information. The one or more processors may be configured to capture, with the image sensor, a plurality of two-dimensional (2D) near infrared images of an interior region of the subject's teeth in real time from the near infrared information. Each of the plurality of near infrared images includes a plurality of corresponding pixels. The one or more processors may be configured to determine a set of input parameters of the projection object corresponding to each of the plurality of pixels, wherein the set of input parameters may include spatial position information and perspective information, and the projection object may include a plurality of point coordinates. The intraoral scanning system may include a continuous volume machine learning model configured to receive and process the set of input parameters to determine intensity values and density values of a three-dimensional internal geometry of the three-dimensional surface model. The continuous volume machine learning model may be configured to be trained using a plurality of two-dimensional near infrared images.

In some embodiments, the continuous volume machine learning model may be trained by receiving the set of input parameters corresponding to the projected object for each of a plurality of pixels corresponding to each of a plurality of 2D NIR images. The projection object includes a plurality of point coordinates associated with the three-dimensional surface model. The continuous volume machine learning model may be trained by generating an intensity value and a density value for each of a plurality of point coordinates based on the continuous volume machine learning model using the set of input parameters. The continuous volume machine learning model may be trained by determining a composite pixel value of the projected object based on the corresponding determined intensity value and density value for each of the plurality of point coordinates. Further, the continuous volume machine learning model may be trained by minimizing a loss function between the synthesized pixel value and a corresponding true pixel value in a plurality of pixels in the plurality of two-dimensional near-infrared images.

In some embodiments, the one or more processors may be further configured to determine a plurality of grid points within the three-dimensional surface model and determine the three-dimensional internal geometry by arranging at least one of intensity values or density values at each of the plurality of grid points.

In some embodiments, the intraoral scanning system may further comprise a display unit configured to display the 3D internal geometry based on at least one of intensity values or density values determined by the continuous volume machine learning model.

In some embodiments, the display unit may be configured to display a three-dimensional internal geometry within the three-dimensional surface model.

In some embodiments, the one or more processors may be further configured to determine a boundary between enamel and dentin in the dentition of the subject's teeth based on the intensity value and the density value of the projected object for each of the plurality of pixels.

In some embodiments, the projection object may be one of a ray or cone.

In some embodiments, a handheld intraoral scanner may include a projector unit that may be configured to illuminate a subject's teeth with one or more white wavelength pulses and one or more Near Infrared (NIR) wavelength pulses. Further, the handheld intraoral scanner may include one or more sensors that may be configured to generate a set of white light images and a plurality of two-dimensional near infrared images based on the illumination.

In some embodiments, the three-dimensional surface model is determined based on the set of white light images.

In some embodiments, the one or more processors may be further configured to estimate a relative position between the handheld intraoral scanner and the subject's teeth corresponding to each of the plurality of near infrared images. The estimated relative position indicates perspective information and spatial position information of the corresponding projection object.

In some embodiments, the projection object may be a cone. In this regard, the one or more processors may be further configured to determine one or more truncated cones for each of the cones corresponding to each of the plurality of pixels. One or more truncated cones are associated with a plurality of point coordinates. The one or more processors may be further configured to determine an integral position code of one or more truncated cones for transforming the plurality of point coordinates of each cone. The integral position coding includes at least gaussian coding and sinusoidal coding. The one or more processors may be further configured to generate a composite pixel value for each cone based on the integrated position encoding of the corresponding one or more cones and the continuous volume machine learning model.

In some embodiments, the one or more processors may be further configured to determine a radius indicative of each cone of the projection object corresponding to each of the plurality of pixels based on a size of the corresponding pixel of the plurality of pixels.

In some embodiments, the one or more processors may be further configured to determine an average of content within the viewable volume of each of the plurality of pixels and render each cone indicative of the projected object for each of the plurality of pixels based on the corresponding average of content. The content indicates color intensity.

In some embodiments, the continuous volume machine learning model may include a network of neural radiation fields (NeRF) based on machine learning.

In another aspect, a method for generating a three-dimensional (3D) internal geometry of a subject's teeth using an intraoral scanning system is disclosed. The intraoral scanning system may include a handheld intraoral scanner configured to operate with one or more sensors to detect Near Infrared (NIR) and visible light, one or more processors operatively connected to the handheld intraoral scanner, and a continuous volume machine learning model. The one or more sensors may include an image sensor. The method includes receiving visible light information and Near Infrared (NIR) information from one or more sensors. The method may further include determining surface information from the visible light information to generate a three-dimensional (3D) surface model of the subject's teeth in real-time using the surface information. The method may further include capturing, with an image sensor, a plurality of two-dimensional (2D) near infrared images of an interior region of the subject's teeth in real time from the near infrared information. Each of the plurality of near infrared images includes a plurality of corresponding pixels. The method may include determining a set of input parameters of the projection object corresponding to each of a plurality of pixels. The set of input parameters may include spatial position information and perspective information, and the projection object may include a plurality of point coordinates. The method may further include processing the set of input parameters using a continuous volume machine learning model to determine intensity values and density values of the three-dimensional internal geometry of the three-dimensional surface model. The continuous volume machine learning model is configured to be trained using a plurality of near infrared images.

In yet another aspect, a computer-programmable product includes a non-transitory computer-readable medium having stored thereon computer-executable instructions that, when executed by processing circuitry, cause the processing circuitry to perform operations. The operations may include receiving visible light information and Near Infrared (NIR) information from one or more sensors mounted within a handheld intraoral scanner. The one or more sensors may be configured to detect Near Infrared (NIR) and visible light, wherein the one or more sensors may include an image sensor. The operations may also include determining surface information from the visible light information to generate a three-dimensional (3D) surface model of the subject's teeth in real-time using the surface information. The operations may also include capturing, in real-time, a plurality of two-dimensional (2D) near infrared images of the interior region of the subject's teeth from the near infrared information using an image sensor. Each of the plurality of near infrared images may include a plurality of corresponding pixels. The operations may also include determining a set of input parameters for the projection object corresponding to each of the plurality of pixels. The set of input parameters may include spatial position information and perspective information, and the projection object may include a plurality of point coordinates. The operations may also include processing the set of input parameters using a continuous volume machine learning model to determine intensity values and density values of the three-dimensional internal geometry of the three-dimensional surface model. The continuous volume machine learning model may be configured to be trained using a plurality of near infrared images.

The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the exemplary aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

In accordance with the present invention, an intraoral scanning system, method and computer programmable product are provided. One of the objects of the present invention is to provide enhanced processing power in a handheld intraoral scanner to generate three-dimensional (3D) internal geometry of a subject's teeth.

Conventional systems may include intraoral scanners for capturing two-dimensional (2D) intraoral scans and/or three-dimensional (3D) information of a patient's dental arch. However, conventional intraoral scanners have limited processing power and may only be used to capture 2D intraoral scans and/or 3D models of a patient's dental arch surface. But the 3D surface model of the dental arch may not detect internal structures of the teeth, such as enamel and dentin structures within the patient's dental arch. Thus, unless a disease or abnormality in the internal structure of the tooth progresses to the surface of the tooth, or the dentist uses other means such as an X-ray machine, it may not be possible to identify such a disease or abnormality in the internal structure of the tooth, however, radiation is ionizing radiation, which may cause human body cell mutation upon exposure, and thus should be avoided. For example, conventional intraoral scanners may not be able to identify the occurrence of abnormalities or early stages of disease occurring inside a tooth, such as dentin erosion, enamel erosion, cracks or caries inside a tooth, etc., unless the bone of the tooth or the surface of the tooth is affected. Irreversible damage may occur in the teeth due to delayed diagnosis of internal diseases or abnormalities of the teeth, resulting in the removal of such teeth using surgical methods. This can cause significant discomfort to the patient. Furthermore, conventional intraoral scanners may not be able to identify abnormalities occurring inside a dental prosthetic implant. Identifying any abnormalities that occur inside a dental prosthetic implant can be critical to ensure the long-term service life of such dental prosthetic implants. For this reason, information related to the internal geometry of a patient's teeth or dental arch, combined with surface information, can be critical for early diagnosis, effective treatment, and ensuring long-span restoration using natural teeth or prosthetic implants.

In some conventional approaches, intraoral scanning may be used to reconstruct the internal geometry of the tooth. However, conventional methods for generating tooth internal geometry may be practically infeasible due to the large number of viewpoints and angles required for intraoral 2D images, and other limitations exist. To project points indicative of a portion of the internal geometry of a tooth or teeth, a number of values of a plurality of pixels in the near infrared two-dimensional image from different perspectives corresponding to the portion of the three-dimensional internal geometry may be processed to identify an intensity of the portion in each of the plurality of pixels.

Then, based on the processing of the plurality of pixels, a three-dimensional point indicating the portion may be projected using a minimum intensity value corresponding to the portion. For example, a minimum intensity value for a pixel of a two-dimensional image may indicate that the reflection for the portion originated from the material of the portion. Thus, a three-dimensional point is to be determined to belong to a portion of enamel, requiring the receipt of at least one ray forming a pixel on a two-dimensional image, such that the ray corresponding to that portion of enamel does not intersect dentin at any point. In other words, when a ray is reflected back from the portion of enamel to form a two-dimensional image, the pixel corresponding to the portion of enamel may have a minimum intensity. Furthermore, a three-dimensional point may then be projected based on the minimum intensity as indicating the portion of enamel.

For this reason, such processing of pixels corresponding to the portion from a large number of near-infrared two-dimensional images may be highly processing-loaded. A significant amount of computing power may be required, thereby increasing the size and cost of the intraoral scanning system. Furthermore, since the projected three-dimensional points depend on minimum intensity values, 3D models of 3D internal geometries generated using conventional methods may be inaccurate. For example, all points along a given ray (e.g., a ray reflected from dentin) may be determined to be related to the same material (e.g., dentin) unless a certain point on the ray (corresponding to a portion of the internal geometry) is also seen by another ray from another image with lower intensity (e.g., a ray reflected from enamel). It will be appreciated that there is little reflection in enamel, whereas the reflection in dentin will be somewhat stronger, as the light is scattered much more. Thus, conventional methods may not be able to delineate the boundary between enamel and dentin in the internal geometry of one or more teeth. Thus, generating a 3D model of internal geometry using conventional systems for intraoral scanning may be inaccurate, time consuming, computationally expensive, cumbersome to operate, and in some cases practically infeasible.

In another aspect, an intraoral scanning system of the present disclosure includes a handheld intraoral scanner, a processor, and a continuous volume machine learning model for generating a three-dimensional internal geometry of a subject's teeth. In one embodiment, this generation may be done in real time, or may be done as a subsequent processing step after the data acquisition. The intraoral scanning system may provide enhanced processing capabilities within a handheld intraoral scanner, thereby eliminating the need for additional equipment or devices to generate three-dimensional internal geometries of teeth. In particular, additional cumbersome equipment or devices may not be needed to process the two-dimensional images to generate the 3D model of the internal geometry of the subject's teeth as well as the three-dimensional surface model. Furthermore, a user (e.g., dentist) may not have to carry and manipulate cumbersome equipment to generate a 3D model of the internal geometry and/or surface of a subject's teeth.

An intraoral scanning system of the present disclosure includes a handheld intraoral scanner that emits light and generates a two-dimensional image with a sensor. For example, the handheld intraoral scanner may continuously emit light in a spectral range corresponding to the visible white light wavelength range (e.g., 400-750 nm) and the Near Infrared (NIR) wavelength range (e.g., 750-2500 nm). In this regard, white light and two-dimensional near infrared image scans or images of teeth are generated without multiple separate exposures of white light and/or near infrared light to the teeth of a subject. For example, based on scanning of a subject's mouth or teeth using a handheld intraoral scanner, a white light image and a two-dimensional near infrared image may be generated by switching the visible light source and the near infrared light source on-off in sequence. On-off switching may be required because the intraoral scanner may not be configured to capture both visible and near infrared information. In one example, the switching time interval may be within 200ms or 500 ms. Alternatively, white light illumination and near infrared illumination may be performed simultaneously to avoid any relative movement between the intraoral scanner and the patient's teeth between the illuminations. In this case, the intraoral scanner may comprise suitable optical filters for selectively exposing individual pixels or groups of pixels (in the case of a single sensor) with a subset of the illumination light. Alternatively, the white light illumination and the near infrared illumination may be performed simultaneously using different image sensors within the intraoral scanner such that the different image sensors are configured to be sensitive only under the corresponding selective ranges associated with white light or near infrared light.

Furthermore, the intraoral scanning system of the present disclosure alleviates the problems associated with dental internal geometry inaccuracy by using a continuous volume machine learning model based on neural radiation fields (NeRF). In particular, the processor of the intraoral scanning system does not estimate the material of a three-dimensional point based on the minimum intensity value of the pixel, i.e. whether the three-dimensional point corresponds to enamel or dentin. Instead, the point coordinates along the ray are integrated by pixels in the two-dimensional near infrared image. Furthermore, the integrated point coordinates of each pixel in the plurality of two-dimensional near-infrared images are optimized to predict intensity and density values at three-dimensional points along the ray using a function defined by a continuous volume machine learning model. The intensity and density values may describe material properties corresponding to three-dimensional points, enabling differentiation between enamel and dentin and identification of boundaries between enamel and dentin. Thus, the point coordinates on the rays passing through the pixel may not belong to the same material, and the optimization process is applied on all rays. As a result, the number of two-dimensional near infrared images required to reconstruct the three-dimensional internal geometry of a tooth in 3D is significantly reduced. This is advantageous because when the intraoral scanner moves relatively quickly over the patient's dental arch, the coverage of two-dimensional near infrared images from multiple angles and viewpoints obtained when performing an intraoral scanning session may be limited, and the intraoral scanner views or illuminates only the same area of the patient's teeth from a limited number of poses (perspectives) (e.g., bite, lingual and labial orientations).

Further, the intraoral scanning system of the present disclosure includes or is coupleable to a display that is communicably coupled to the processor and the continuous volume machine learning model via a communication network. Based on the three-dimensional internal geometry of the tooth and the three-dimensional surface model of the tooth generated using the white light image, a full interactive rendering of the subject's tooth may be displayed on a display. For example, the display may be a display of a smart device, such as a smart phone, monitor, television, tablet computer, laptop computer, or the like. Thus, the intraoral scanning system may provide a user-friendly and time-efficient process for generating a 3D model of internal geometry and rendering a comprehensive interactive view of the subject's teeth or 3D model.

To this end, the intraoral scanning system of the present disclosure enables reconstructing the internal geometry of a subject's teeth in 3D using a limited number of images of the interior region of the teeth obtained by acquiring two-dimensional Near Infrared (NIR) images as well as three-dimensional surface information.

Drawings

The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 illustrates a network environment in which an intraoral scanning system for oral scanning and generating 3D internal geometry is implemented in accordance with an illustrative embodiment;

FIG. 2 illustrates a block diagram of a handheld intraoral scanner in accordance with an illustrative embodiment;

FIG. 3 is a schematic diagram of a scanning session in accordance with an exemplary embodiment;

FIG. 4 shows a sequence diagram depicting the generation of a 3D surface model in accordance with an example embodiment;

FIG. 5 illustrates a method for generating a set of input parameters for a projection object in accordance with an exemplary embodiment;

FIG. 6 illustrates a method for training a continuous volume machine learning model in accordance with an exemplary embodiment;

fig. 7A shows a sequence diagram depicting the generation of a 3D internal geometry according to an example embodiment.

FIG. 7B shows a schematic diagram of a first type of projection object in accordance with an example embodiment;

FIG. 7C illustrates a schematic diagram of a second type of projection object in accordance with an exemplary embodiment;

FIG. 7D shows a schematic diagram of a generated 3D model of a subject's teeth, according to an example embodiment;

FIG. 8 illustrates a method for generating synthesized pixel values using a continuous volume machine learning model in accordance with an exemplary embodiment;

FIG. 9 illustrates a method for generating a three-dimensional (3D) internal geometry and 3D surface model of a subject's teeth, according to an example embodiment, and

FIG. 10 shows a schematic diagram depicting an exemplary environment for generating a three-dimensional (3D) internal geometry and 3D surface model of teeth and rendering an interactive 3D graphical representation in real-time, according to an exemplary embodiment.

Detailed Description

For purposes of explanation, the following description sets forth numerous specific details in order to provide a thorough understanding of the present disclosure. However, it will be understood by those skilled in the art that the present disclosure may be practiced without these specific details. In other instances, the systems and methods have been shown in block diagram form in order to avoid obscuring the present disclosure.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Furthermore, the terms "a" and "an" herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item. In addition, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described herein which may be applicable to some embodiments but not others.

Some embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather as provided to enable the disclosure to meet applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, "data," "content," "information," and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present disclosure. Furthermore, the terms "processor," "controller," and "processing circuitry" and the like may be used interchangeably to refer to a processor capable of processing information in accordance with embodiments of the present disclosure. Furthermore, the terms "electronic device," "electronic apparatus," and "apparatus" are used interchangeably to refer to an electronic device being monitored by a system in accordance with embodiments of the present disclosure. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present disclosure.

The embodiments described herein are for illustrative purposes only and many variations are possible. It will be appreciated that various omissions and substitutions of equivalents may be made as appropriate or appropriate for the circumstances, but are intended to cover the application or implementation without departing from the spirit or scope of the disclosure. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. Any headings used in this specification are for convenience only and do not have any legal or limiting effect.

In this specification and claims, the terms "for example," "for instance," and "such as," and the verbs "comprising," "having," "including," and their other verb forms, when used in conjunction with a listing of one or more components or other items, are each to be construed as open-ended, meaning that the listing is not to be considered as excluding other, additional components or items. Unless used in a context where a different interpretation is required, other terms should be interpreted in their broadest reasonable sense.

The present invention provides an intraoral scanning system, method and computer programmable product for generating a three-dimensional (3D) internal geometry of a subject's teeth and rendering the 3D internal geometry with 3D surface information of the teeth, thereby providing a comprehensive view of the teeth.

For example, an exemplary network environment for an intraoral scanning system for oral scanning and generating 3D internal geometry is provided below with reference to fig. 1.

Fig. 1 illustrates an exemplary network environment 100 in which an intraoral scanning system 102 for oral scanning and generating 3D internal geometry is implemented in accordance with an exemplary embodiment. The intraoral scanning system 102 may be used to generate 3D internal geometries of teeth or dental arches in a subject's mouth. For example, the intraoral scanning system 102 may be used by a user, such as a person with dental knowledge, e.g., dentist, dental technician, etc. Moreover, one or more components may be rearranged, altered, added, and/or removed without departing from the scope of the disclosure.

The intraoral scanning system 102 may include a handheld intraoral scanner 104, one or more processors 106, and a continuous volume machine learning model 108. The network environment 100 may also include communication channels 110 that may be configured to establish communication links between components of the intraoral scanning system 102 (i.e., the handheld intraoral scanner 104, the one or more processors 106 (hereinafter processor 106), and the continuous volume machine learning model 108).

The network environment 100 may also include data generated by the intraoral scanning system 102. For example, the network environment 100 may include data captured by the handheld intraoral scanner 104, which is presented in the form of a plurality of 2D images 112. As shown, the plurality of 2D images 112 generated by the handheld intraoral scanner 104 include a plurality of two-dimensional (2D) Near Infrared (NIR) images 114 and white light images 116. Further, the data generated by the processor 106 may include a three-dimensional (3D) model. The 3D model includes a three-dimensional (3D) surface model 118 of the subject's teeth and a 3D internal geometry 120 of the subject's teeth. Further, the data generated by the continuous volume machine learning model 108 may include intensity values and density values 122 of the 3D internal geometry 120.

Intraoral scanning system 102 may be used to register intraoral scans and generate three-dimensional internal geometry 120 and three-dimensional surface model 118 of teeth. The intraoral scanning system 102 may include a plurality of components, such as a handheld intraoral scanner 104, a processor 106, and a continuous volume machine learning model 108, which may communicate with one another to register intraoral scans having three-dimensional internal geometry 120 of teeth.

In one example, the handheld intraoral scanner 104 may include a processor 106 and/or a continuous volume machine learning model 108. In another example, the handheld intraoral scanner 104 may be coupled to the processor 106 and/or the continuous volume machine learning model 108, wherein the processor 106 and/or the continuous volume machine learning model 108 may be remotely located and may perform operations associated with the intraoral scanning system 102. The intraoral scanning system 102 may have enhanced processing capabilities that may be required to process multiple two-dimensional near infrared images 114 and visible light images 116 in real time to generate a three-dimensional surface model and three-dimensional internal geometry.

The handheld intraoral scanner 104 may be configured to capture a plurality of 2D images 112 during a scanning session of a subject's teeth. The plurality of 2D images 112 may include a plurality of two-dimensional near infrared images 114 that are indicative of images of the subject's teeth captured using light emitted at wavelengths corresponding to the NIR wavelength range. The plurality of two-dimensional near infrared images 114 may include images of the subject's teeth from different perspectives or viewpoints. Further, the plurality of 2D images 112 may include visible light images 116 that indicate images of the subject's teeth captured using light emitted at wavelengths corresponding to a selected sub-spectrum of white light wavelength range or visible light, such as blue light (400-495 nm). White light image 116 may also include images of the subject's teeth from different perspectives. In some cases, the plurality of 2D images 112 may also include images captured at other visible wavelengths.

Based on the captured white light image 116 and the visible light information, the processor 106 may be configured to determine three-dimensional surface information of the subject's teeth. The three-dimensional surface information may be a digital representation of the dental arch of the subject's teeth depicted in 3D space. For example, the three-dimensional surface information may include three-dimensional point cloud data corresponding to the white light image 116. The three-dimensional point cloud data may correspond to three-dimensional real world coordinates. In addition, the three-dimensional surface information may also include color textures reflected from the tooth surface.

In one example, the handheld intraoral scanner 104 may include a web server that may be configured to communicate over the web and establish a connection to the communication channel 110. The handheld intraoral scanner 104 may include one or more sensors. The handheld intraoral scanner 104 may be configured to execute a web server to provide visible light information and near infrared information obtained from the sensors to the processor 106. In one example, the handheld intraoral scanner 104 may be configured to provide three-dimensional surface information to the processor 106 based on visible light information and/or white light images 116. The handheld intraoral scanning device 104 may also include a processing unit, a memory unit, a communication interface, and additional components. The processing unit, the memory unit, the communication interface, and the additional components may be communicatively coupled to each other. Details of the components of the handheld intraoral scanner 104 are further provided, for example, in fig. 2.

The processor 106 may include the processing power required to process the visible light information or white light image 116 and the near infrared information or the plurality of two-dimensional near infrared images 114. The processor 106 may be configured to establish a connection with the communication channel 110. In one example, the processor 106 may be located within the handheld intraoral scanner 104. For example, the processor 106 may receive visible light information or a white light image 116 from the handheld intraoral scanner 104 or a sensor of the handheld intraoral scanner 104 to generate three-dimensional surface information of the subject's teeth. Further, a three-dimensional surface model 118 of the subject's teeth is generated based on three-dimensional surface information derived from the white light image 116. The processor 106 may further render the 3D surface information or 3D surface model 118 of the tooth into an interactive 3D graphical representation.

For example, the processor 106 may be implemented as one or more of a variety of hardware processing devices, such as a coprocessor, a microprocessor, a controller, a Digital Signal Processor (DSP), a processing element with or without a companion DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. Thus, in some embodiments, the processor 106 may include one or more processing cores configured to execute independently. Multi-core processors may implement multiprocessing within a single physical package. Additionally or alternatively, the processor 106 may include one or more processors configured in series via a bus to enable independent execution, pipelining, and/or multithreading of instructions. Additionally or alternatively, the processor 106 may include one or more processors capable of handling a large number of workloads and operations to provide support for large data analysis. In an exemplary embodiment, the processor 106 may communicate with other components of the intraoral scanning system 102 (hereinafter referred to simply as system 102) over a bus or communication network 110 to communicate information between the components of the system 102.

In one example, when the processor 106 is implemented as an executor of software instructions, the instructions may configure the processor 106 specifically to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor 106 may be a processor-specific device (e.g., a mobile terminal or a stationary computing device) configured to employ embodiments of the present disclosure by further configuring the processor 106 with instructions for performing the algorithms and/or operations described herein. The processor 106 may include a clock, an Arithmetic Logic Unit (ALU), logic gates configured to support the operation of the processor 106, and the like. The communication network 110 may be used to access a network environment, such as the network environment 100.

The continuous volume machine learning model 108 may include a fully connected neural network that may generate a view or model of a three-dimensional scene based on a set of two-dimensional images. The continuous volume machine learning model 108 may enhance the ability of the system 102 to generate three-dimensional internal geometries of the subject's teeth. The continuous volume machine learning model 108 may be configured to process the near infrared information or the plurality of two-dimensional near infrared images 114 to generate an output indicative of an intensity value and a density value for each point coordinate for each pixel of each of the plurality of two-dimensional near infrared images 114. The continuous volume machine learning model may be used to generate a three-dimensional internal geometry 120 or internal volume of the tooth from a set of input parameters. The continuous volume machine learning model 108 may be based on the concept that the three-dimensional internal geometry 120 may be represented as a continuous field of optical density and intensity values, where each pixel in the two-dimensional near-infrared image 114 may correspond to a projection object in the continuous field of optical density and intensity values. By using a deep neural network, the continuous volume machine learning model 108 may be able to learn the mapping between the set of input parameters and the optical density and intensity values. This enables it to generate the internal volume 120 of the tooth from a given set of inputs. For example, the continuous volume machine learning model 108 may be trained using volume rendering to map values of perspective and spatial position to density and intensity. The continuous volume machine learning model 108 may receive a plurality of two-dimensional near-infrared images 114 representing an interior region of a subject's teeth and process the plurality of two-dimensional near-infrared images 114 to generate synthesized pixel values for pixels of the plurality of two-dimensional near-infrared images 114. In one example, the continuous volume machine learning model 108 may include a neural radiation field (NeRF) based model, a multi-scale antialiasing based neural radiation field (mip-NeRF) model, or any other suitable architecture or variant associated with the same series of neural radiation field models.

In one example, the processor 106 may be configured to generate and render a three-dimensional internal geometry 120 of the tooth within the three-dimensional surface model 118 based on the trained continuous volume machine learning model. For example, the three-dimensional internal geometry 120 of the tooth may be rendered within the three-dimensional surface model 118 in the form of an interactive three-dimensional graphical representation. The interactive 3D graphical representation may be rendered on a display unit of the device. A user (e.g., a dentist) can view the interactive 3D graphical representation on a display to diagnose any diseases or abnormalities within the teeth and/or treat the subject. The user may modify the view of the interactive 3D graphical representation according to preferences. For example, the perspective of the interactive 3D graphical representation may be altered, or the interactive 3D graphical representation may be enlarged or reduced, according to user preferences.

The display may be associated with any user accessible device, such as a display unit, monitor, mobile phone, smart phone, tablet computer, virtual reality (XR) device, and the like. In some examples, the display may be part of a user accessible device. For example, the display may be a touch screen display. Additional, different, or fewer components may be provided. Moreover, one or more components may be rearranged, altered, added, and/or removed without departing from the scope of the disclosure.

The communication channel 110 may be wired, wireless, or any combination of wired and wireless communication networks, such as cellular, wireless fidelity (Wi-Fi), the internet, a local area network, and the like. According to an embodiment, the communication channel 110 may be one or more wireless full duplex communication channels. In one embodiment, the communication channel 110 may include one or more networks, such as a data network, a wireless network, a telephone network, or any combination thereof. It is contemplated that the data network may be any Local Area Network (LAN), metropolitan Area Network (MAN), wide Area Network (WAN), a public data network (e.g., the internet), a short range wireless network, or any other suitable packet-switched network, such as a commercially available proprietary packet-switched network (e.g., a proprietary cable or fiber-optic network), the like, or any combination thereof. Further, the wireless network may be, for example, a cellular network, and may employ various technologies including enhanced data rates for global evolution (EDGE), general Packet Radio Service (GPRS), global system for mobile communications (GSM), internet protocol multimedia subsystem (IMS), universal Mobile Telecommunications System (UMTS), etc., as well as any other suitable wireless medium, such as Worldwide Interoperability for Microwave Access (WiMAX), long Term Evolution (LTE) networks (e.g., LTE-Advanced Pro), 5G new radio networks, ITU-IMT 2020 networks, code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), wireless fidelity (Wi-Fi), wireless Local Area Network (WLAN), bluetooth, internet Protocol (IP) data broadcasting, satellite, mobile ad hoc network (MANET), etc., or any combination thereof. The handheld intraoral scanner 104 may be configured to communicate with the processor 106 and the continuous volume machine learning model 108 via a communication channel 110.

For example, the subject may need dental treatment. In this case, the system 102 may be used by a user (e.g., a dentist) to provide dental treatment to a subject. In one embodiment, the subject may be present in a dental office. In this case, the system 102 may be used in a treatment room of a dental office. In another embodiment, the subject may require a home visit for dental treatment. In this case, the system 102 may be used in the home of the subject. To initiate a dental treatment, a user may use the handheld intraoral scanner 104 to capture white light image 116 and a plurality of two-dimensional near infrared images 114 of a subject's teeth with one or more sensors.

In operation, one or more sensors of the handheld intraoral scanner 104 may be configured to detect near infrared light and/or visible light. For example, near infrared light and visible light may be emitted by one or more sensors, and near infrared light and visible light may be reflected from the interior regions and surfaces of the teeth. Characteristics (e.g., wavelength, frequency, etc.) of the transmitted and received near infrared light may be stored as near infrared information and may be used to generate a plurality of two-dimensional near infrared images 114. In addition, characteristics (e.g., wavelength, frequency, etc.) of the emitted and received visible light may be stored as visible light information and may be used to generate the white light image 116. The captured near infrared information and visible light information may be provided to the processor 106 of the system 102.

After receiving the near infrared information and the visible light information, the processor 106 may determine three-dimensional surface information from the visible light information. Based on the three-dimensional surface information, the processor 106 may be configured to generate a three-dimensional surface model 118 of the subject's teeth. Details of capturing visible light information of the white light image 116, determining three-dimensional surface information, and generating the three-dimensional surface model 118 are further provided, for example, in fig. 4.

Once the three-dimensional surface model 118 of the tooth is created, it may be necessary to create a 3D model of the three-dimensional internal geometry of the tooth. In this regard, the processor 106 may capture (or generate) a plurality of two-dimensional near infrared images 114 of the interior region of the tooth using the received near infrared information. For example, the processor 106 may filter the received NIR information to generate only a plurality of two-dimensional near infrared images 114 of the interior region. Each of the plurality of two-dimensional near infrared images 114 may include a plurality of pixels. That is, the two-dimensional near-infrared image from the plurality of two-dimensional near-infrared images 114 may be a collection of pixels that indicate different portions of the interior region of the two-dimensional near-infrared image capture. Further, the processor 106 is configured to determine a set of input parameters of the projection object corresponding to each of a plurality of pixels of each of the plurality of two-dimensional near infrared images 114. The set of input parameters may include spatial position information and perspective information corresponding to each of the plurality of pixels. The projection object corresponding to the pixel includes a plurality of point coordinates along the projection object. The plurality of point coordinates may correspond to different materials within the interior region of the tooth, such as enamel or dentin. For example, the set of input parameters may include five-dimensional (5D) information that may be fed into the continuous volume machine learning model 108 for generating intensity and density values 122 for grid points generated in the 3D surface model, as well as the viewing directions of the plurality of 2D near infrared images 114. Details of the set of input parameters for generating the continuous volume machine learning model 108 are provided further, for example, in fig. 5, 7A, and 8.

The continuous volume machine learning model 108 of the system 102 may be configured to receive and process the set of input parameters. The continuous volume machine learning model 108 may be configured to be trained using a plurality of 2D near infrared images 114. Based on the set of input parameters, the continuous volume machine learning model 108 may determine the synthesized pixel value 122 based on intensity values and density values of the projected object along corresponding pixels of the plurality of pixels. An intensity value and a density value are associated with each of a plurality of point coordinates along the projection object. In one example, the continuous volume machine learning model 108 may generate synthesized data, i.e., synthesized pixel values 122, that are indicative of color intensity values or density values of material corresponding to each of a plurality of pixels of each of the plurality of two-dimensional near infrared images 114. For example, the composite data for a pixel may indicate integration and optimization of data related to a plurality of point coordinates associated with the pixel. The composite data for a pixel may be used to determine the material (e.g., enamel or dentin) associated with the pixel and/or corresponding 3D point in the three-dimensional internal geometry 120.

The processor 106 may be configured to generate and render a three-dimensional internal geometry 120 of the tooth within the three-dimensional surface model 118 based on the trained continuous volume machine learning model. For example, the one or more processors are configured to generate grid points within the 3D surface model in different directions of view of the three-dimensional surface model, and to supply the grid points along with directions of view to the trained continuous volume machine learning model 108, which then generates intensity values and density values for the corresponding grid points. The trained continuous volume machine learning model comprises a continuous definition of the three-dimensional internal geometry, which means that the trained continuous volume machine learning model is configured to evaluate and output intensity values and density values at any given grid point of the three-dimensional internal geometry. The manner in which the continuous volume machine learning model 108 generates the synthesized pixel values 122 will be described in detail in connection with later figures, such as fig. 7A and 8.

After generating the three-dimensional internal geometry 120 of the subject's teeth, the system 102 may be connected with a display to render an interactive three-dimensional graphical representation comprising the three-dimensional internal geometry 120 and the three-dimensional surface model 118. For example, the display may be any display unit capable of rendering an interactive three-dimensional graphical representation. The dentist can use the interactive three-dimensional graphical representation of the subject's teeth as human eye readable three-dimensional data for the white light image 116 and the plurality of two-dimensional near infrared images 114. For example, the interactive three-dimensional graphical representation of the subject's teeth may be used to comprehensively evaluate the condition of the subject's teeth from both the inside and the outside. The interactive three-dimensional graphical representation may be modified or edited on the display, for example, by using a dentist provided gesture as an input. Details of rendering the 3D internal geometry 120 and 3D surface model 118 into an interactive 3D graphical representation are further provided, for example, in fig. 10.

Fig. 2 illustrates a block diagram 200 of the handheld intraoral scanning device 104 in accordance with an illustrative embodiment. Fig. 2 is explained in connection with the elements in fig. 1. The handheld intraoral scanning device 104 may include at least one processing unit (hereinafter "processing unit 202"), a memory unit 204, a web server 206, a monitoring unit 208, a temporary storage unit 210, a scan feedback unit 212, an input/output (I/O) unit 214, and a communication interface 216.

The processing unit 202 may be implemented in a number of different ways. For example, the processing unit 202 may be implemented as one or more of various hardware processing devices, such as a coprocessor, a microprocessor, a controller, a Digital Signal Processor (DSP), a processing element with or without a companion DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In one embodiment, the processing unit 202 may be implemented as a high performance microprocessor having a range of system on a chip (SOC) including relatively powerful and power efficient Graphics Processing Units (GPUs) and Central Processing Units (CPUs) and small form factors. Thus, in some embodiments, processing unit 202 may include one or more processing cores configured to execute independently. Multi-core processors may implement multiprocessing within a single physical package. Additionally or alternatively, the processing unit 202 may include one or more processors configured in series via a bus to enable independent execution, pipelining, and/or multithreading of instructions.

In some embodiments, the processing unit 202 may be configured to detect near infrared light and visible light using the I/O unit 214 during a scanning session of a tooth of a subject (e.g., a patient in need of dental treatment). The detected near infrared light and visible light may be used to generate a plurality of 2D images 112, such as a plurality of two-dimensional near infrared images 114 and a white light image 116. The plurality of 2D images 112 may include images of the subject's teeth from different angles or viewpoints. For example, the processing unit 202 may be configured to generate 3D surface information of the tooth based on the white light image 116, and the continuous volume machine learning model may generate the intensity values and the density values 122 based on the plurality of 2D near infrared images 114.

In one exemplary embodiment, the processing unit 202 may communicate with the memory unit 204 via a bus to transfer information between the components of the handheld intraoral scanner 104.

The memory unit 204 may be non-transitory and may include, for example, one or more volatile memories and/or non-volatile memories. In other words, for example, the memory unit 204 may be an electronic storage device (e.g., a computer-readable storage medium) that includes gates configured to store data (e.g., bits) that is retrievable by a machine (e.g., a computing device similar to the processing unit 202). The memory unit 204 may be configured to store information, data, content, applications, instructions, or the like to enable the device to perform various functions in accordance with exemplary embodiments of the present disclosure. For example, the memory unit 204 may be configured to store the detected near infrared light and the detected visible light after the end of a scanning session of the tooth. Near infrared light and visible light detected after a scanning session may be stored as near infrared information and visible light information, respectively. In some cases, the memory unit 204 may be configured to store compressed near infrared information and visible light information. In some embodiments, the memory unit 204 may be configured to store calibration data required to measure the detected near infrared and visible light to generate near infrared information, visible light information, white light image 116, and/or a plurality of two-dimensional near infrared images 114. As exemplarily shown in fig. 2, the memory unit 204 may be configured to store instructions for execution by the processing unit 202. Thus, whether configured in a hardware approach or a software approach, or a combination of both, the processing unit 202 may represent an entity (e.g., physically embodied in circuitry) that, when configured accordingly, is capable of performing operations in accordance with embodiments of the present disclosure. Thus, for example, when the processing unit 202 is implemented in the form of a microprocessor, the processing unit 202 may be specially configured hardware for performing the operations described herein. Alternatively, as another example, when the processing unit 202 is implemented in the form of a software instruction executor, the instructions may specifically configure the processing unit 202 to perform the algorithms and/or operations described herein when the instructions are executed. The processing unit 202 may include a clock, an Arithmetic Logic Unit (ALU), logic gates, and the like, configured to support operation of the processing unit 202.

The web server 206 may be software, hardware, or a combination thereof configured to store and provide data to a web browser associated with the processor 106 and/or the continuous volume machine learning model 108. For example, visible light information and Near Infrared (NIR) information may be provided to a web browser of the processor 106 through the web server 206. Since any web browser can access the web server 206, the processor 106 is not required to install additional software to connect to the web server 206. The web server 206 may communicate with one of the communication channels 110 over a web network. In one example, the web server 206 and the processor 106 and/or the continuous volume machine learning model 108 may communicate with a common wireless full duplex communication channel over a web network to transmit and receive visible light information and NIR information. The web server 206 and web browser may communicate via, for example, hypertext transfer protocol (HTTP), simple Mail Transfer Protocol (SMTP), or File Transfer Protocol (FTP). After the web server 206 is connected to the web browser, the web server 206 may provide the web application on the web browser.

The monitoring unit 208 may be software, hardware, or a combination thereof, which may be configured to monitor the bandwidth of one of the communication channels 110 (e.g., a wireless full duplex communication channel) through which the handheld intraoral scanning device 104 and the processor 106 and/or the continuous volume machine learning model 108 may be connected. In addition, the monitoring unit 208 may be further configured to monitor the connection of one of the communication channels 110 through which the handheld intraoral scanning device 104 and the processor 106 and/or the continuous volume machine learning model 108 may be connected.

In one embodiment, if the monitoring unit 208 determines that the bandwidth of the communication channel 110 is below the minimum bandwidth, the monitoring unit 208 may provide such information to the processing unit 202. The processing unit 202 may downsample the visible light information and the near infrared information based on the received information. In another embodiment, if the monitoring unit 208 can determine that the duration that the bandwidth of the communication channel 110 is below the minimum bandwidth is longer than the maximum period, the monitoring unit 208 can provide such information to the processing unit 202. The processing unit 202 may compress and store the visible light information and the NIR information into the memory unit 204. In some embodiments, if the monitoring unit 208 can determine that the connection between the handheld intraoral scanning device 104 and the processor 106 and/or continuous volume machine learning model 108 is lost, the monitoring unit 208 can provide such information to the processing unit 202. In this case, the processing unit 202 may compress and store the visible light information and the NIR information into the memory unit 204.

The temporary storage unit 210 may be software, hardware, or a combination thereof, which may be configured to store visible light information and NIR information when the bandwidth of the communication channel 110 (e.g., a wireless full duplex communication channel) is determined to be below a minimum bandwidth. The temporary storage unit 210 may also be configured to further transmit the stored visible light information and NIR information to the processor 106 and/or the continuous volume machine learning model 108 when the bandwidth is determined to be greater than or equal to the minimum bandwidth. Examples of temporary storage unit 210 include, but are not limited to, random Access Memory (RAM) or cache memory.

The scan feedback unit 212 may be software, hardware, or a combination thereof, which may be configured to receive status inputs from the monitoring unit 208. Based on the received status input, the scan feedback unit 212 may provide a scan feedback signal to a user (e.g., dentist) of the handheld intraoral scanner 104. In one embodiment, the scan feedback signal is used to provide guidance to the user regarding tooth areas where the scan quality of the scan session is low and insufficient visible and/or near infrared information is received. For example, the scan feedback unit 212 may provide a scan feedback signal, such as an acoustic feedback signal, a haptic feedback, or a visual feedback.

The I/O unit 214 may include circuitry and/or software that may be configured to provide output to a user of the handheld intraoral scanning device 104 and to receive, measure, or sense input information. The I/O unit 214 may include a speaker 214A, a vibrator 214B, a projector unit 214C, and one or more sensors 214D. In one embodiment, the speaker 214A may be configured to output an acoustic feedback signal to guide the user. Vibrator 214B may be, for example, a transducer configured to convert a scanning feedback signal (which may be an electrical signal) into a mechanical output, such as tactile feedback in the form of vibrations, to guide a user.

It is to be appreciated that the handheld intraoral scanner 104 may be configured to detect near infrared light and visible light that may be reflected from a subject's teeth. In this regard, the projector unit 214C may be configured to output one or more visible light or white wavelength pulses, as well as one or more near infrared wavelength pulses. For example, white or visible wavelength pulses and near infrared wavelength pulses may be projected onto teeth to illuminate the teeth of a subject (e.g., a patient). In addition, the white wavelength pulses and the wavelength pulses may reflect or refract from the tooth surface and/or interior regions. The one or more sensors 214D may be configured to detect visible or white wavelength pulses and near infrared wavelength pulses that may be reflected and/or refracted from the tooth surface or interior region. In one example, the one or more sensors 214D may include one or more image sensors, such as a camera. For example, the image sensor may be configured to generate a white light image 116 and a plurality of two-dimensional near infrared images 114 based on illumination of the tooth with near infrared light and visible light.

For example, the I/O unit 214 may include a light emitting device, such as a Light Emitting Diode (LED), to provide a scanning feedback signal in the form of light to guide the user or dentist. For example, the plurality of LEDs may be divided into left and right LED groups, and flash light is emitted depending on which side of the tooth is not scanned correctly. In one example, the I/O unit 214 may include additional input and output devices, such as buttons for receiving operation signals from a user, power sensors, battery charging sensors, and the like.

The communication interface 216 may include an input interface and an output interface for supporting communication to the handheld intraoral scanner 104 and from the handheld intraoral scanner 104. The communication interface 216 may be a device or circuitry embodied in hardware or a combination of hardware and software that is configured to receive and/or transmit data to/from the handheld intraoral scanner 104. In this regard, the communication interface 216 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, communication interface 216 may include circuitry to interact with antenna(s) to cause transmission of signals through the antenna(s) or to process reception of signals received through the antenna(s). In some environments, the communication interface 216 may additionally or alternatively support wired communications. Thus, for example, communication interface 216 may include a communication modem and/or other hardware and/or software to support communication over a cable, digital Subscriber Line (DSL), universal Serial Bus (USB), or other mechanisms.

In operation, the handheld intraoral scanner 104 may be configured to capture two-dimensional images or two-dimensional scans of a subject's mouth. In one example, a user of the handheld intraoral scanner 104 may perform a scanning session of a subject's mouth. The handheld intraoral scanner 104 may illuminate the oral cavity or teeth of the subject with one or more white wavelength pulses and one or more near infrared wavelength pulses simultaneously. Such one or more white wavelength pulses and one or more near infrared wavelength pulses may be generated by projector unit 214C. For example, in a single time range (TIME FRAME), the pulses emitted by projector unit 214C may include multiple wavelengths, such as one or more white wavelength pulses and one or more near-infrared wavelength pulses. The number of near infrared wavelength pulses may be less than the number of white wavelength pulses. In one embodiment, the number of white wavelength pulses may be three times the number of near infrared wavelength pulses over a given time range. It should be noted that this ratio of white wavelength pulses to the number of near infrared wavelength pulses is merely an exemplary illustration and should not be construed as limiting. In other examples, projector unit 214C may be configured to illuminate teeth with different numbers of white wavelength pulses and near infrared wavelength pulses within a single time frame, to illuminate teeth with only white wavelength pulses or near infrared wavelength pulses within a single time frame, and so on. In some cases, projector unit 214C may also be configured to illuminate the subject's teeth or oral cavity with visible light (e.g., corresponding to blue wavelength pulses, red wavelength pulses, and/or green wavelength pulses).

After illuminating the tooth or oral cavity, the reflected or refracted wavelength pulses may be detected by one or more sensors 214D. For example, the one or more sensors 214D may be image sensors. The one or more sensors 214D may be configured to generate a white light image 116 using the detected white light wavelength pulses and a plurality of two-dimensional near infrared images 114 using the detected near infrared wavelength pulses. The handheld intraoral scanner 104 may be configured to transmit detected visible light information (or detected white light wavelength pulses) and near infrared information (or detected near infrared wavelength pulses) to the processor 106. Details of the operations performed by the processor 106 will be further described in connection with fig. 3.

Fig. 3 is a schematic diagram 300 of a scanning session in accordance with an example embodiment. Schematic 300 may include a user (e.g., dentist 302 of system 102) and a subject (e.g., patient 304 whose teeth 306 may need treatment). The handheld intraoral scanner 104 may be used by dentist 302 to capture a plurality of two-dimensional images 308 of teeth 306 of patient 304. During the scan session, a plurality of two-dimensional images 308 of the teeth 306 of the patient 304 may be captured, including, for example, a white light image (as shown by white light image 308A) and a two-dimensional near-infrared image (as shown by two-dimensional near-infrared image 308B), to generate three-dimensional information 310, including, for example, the three-dimensional surface model 118 and the three-dimensional internal geometry 120 of the teeth 306 of the patient 304.

In one exemplary scenario, dentist 302 and patient 304 can be present at a dental office. The dentist 302 can initiate a scan session of the teeth 306 of the patient 304 to perform a diagnosis of the condition of the teeth 306 of the patient 304. The handheld intraoral scanner 104 may include a projector unit 214C and one or more sensors 214D. For example, the handheld intraoral scanner 104 is placed within the mouth or oral cavity of the patient 304 and moved around the teeth 306 and gums of the patient 304 to record the topography of the oral cavity of the patient 304. In this regard, the projector unit 214C may illuminate the oral cavity of the patient 304 with visible (or white light) wavelength pulses and near infrared wavelength pulses. In addition, one or more sensors (e.g., image sensors) can capture white light images and two-dimensional near infrared images of the tooth 306.

In one example, the handheld intraoral scanner 104 may record the size and shape of each tooth, interproximal separation, palate surface appearance, gums, implants, prostheses, and other elements that make up the interior of the mouth of the patient 304. In one embodiment, the handheld intraoral scanner 104 may be moved multiple times over the teeth 306 and gums of the patient 304 to capture high quality two-dimensional images. For example, the plurality of two-dimensional images 308 may include a white light image 308A detected based on the visible wavelength pulses illuminating the tooth 306 and a two-dimensional near infrared image 308B detected based on the near infrared wavelength pulses illuminating the tooth 306. It is noted that the plurality of two-dimensional images 308 may include a plurality of white light images captured using visible wavelength pulses and a plurality of near infrared images captured using near infrared wavelength pulses. Multiple two-dimensional images 308, such as a white light image 308A and a two-dimensional near infrared image 308B, may be used to generate three-dimensional information 310, particularly the three-dimensional surface model 118 and the three-dimensional internal geometry 120 of the tooth 306.

Once the scanning session begins, a plurality of two-dimensional images 308, such as white light images 308A and near infrared images 308B, may be captured. The handheld intraoral scanner 104 may be configured to process a plurality of two-dimensional images 308 by applying calibration parameters and filtering noise from the plurality of two-dimensional images 308. For example, calibration parameters of one or more sensors 214D may be applied to process a plurality of two-dimensional images 308. In addition, noise may be filtered from the plurality of two-dimensional images 308. Based on the processing of the plurality of two-dimensional images 308, the handheld intraoral scanner 104 may be configured to provide high quality two-dimensional images to the processor 106 of the system 102. The details of the generation of the three-dimensional surface model 118 of the tooth 306 are further provided, for example, in fig. 4.

FIG. 4 illustrates a sequence diagram 400 depicting the generation of the three-dimensional surface model 118, according to an example embodiment. Fig. 4 is explained in connection with the elements in fig. 1,2 and 3. The sequence diagram 400 may include a handheld intraoral scanner 104 and one or more processors 106. The sequence diagram 400 may depict operations performed by at least one of the handheld intraoral scanner 104 and/or the processor 106.

In step 402, the projector unit 214C of the handheld intraoral scanner 104 may illuminate the oral cavity of the patient 304. Projector unit 214C irradiates teeth 306 and other portions of the oral cavity with visible or white light wavelength pulses and near infrared wavelength pulses. Details of irradiating the mouth and teeth 306 of the patient 304 are provided in detail, for example, in fig. 2 and 3.

In step 404, one or more sensors 214D of the handheld intraoral scanner 104 may detect visible light and near infrared light. Such detected visible and near infrared light may be reflected or refracted from the surface or interior regions of the tooth 306. In other words, the one or more sensors 214D can detect visible light information and near infrared information reflected or refracted from the tooth 306.

In step 406, one or more sensors 214D of the handheld intraoral scanner 104 may capture a white light image 116 and a plurality of two-dimensional near infrared images 114 (hereinafter referred to as NIR images 114). For example, the one or more sensors 214D may include an image sensor that may be configured to capture a white light image 116 based on detected visible light information and a NIR image 114 based on detected NIR information. In this way, the one or more sensors 214D may be configured to generate a plurality of 2D images 112 of the tooth 306.

In step 408, the processor 106 receives a plurality of 2D images 112 of the tooth 306. As described above, the plurality of 2D images 112 of the tooth 306 may include the white light image 116 and the NIR image 114.

In step 410, three-dimensional surface information is determined using the visible light information. In particular, the white light image 116 may be processed to determine three-dimensional surface information. In one example, a clear focus measurement in the white light image 116 may be determined and the projected features of the tooth 306 tracked on the white light image 116. For example, a corresponding function may be performed or solved to triangulate depth information of tooth 306 and determine 3D surface information of tooth 306.

In step 412, a three-dimensional surface model 118 of the tooth 306 is generated based on the three-dimensional surface information. For example, a 3D patch (patch) corresponding to a portion of the surface of tooth 306 may be generated by accessing calibration data of one or more sensors 214D and transforming the three-dimensional surface information corresponding to the portion into real world three-dimensional coordinates and texture information. For example, different 3D patches of different portions of the tooth 306 surface may be generated using the white light image 116. In some cases, there may be overlap in the portions covered by the different 3D patches. For example, the 3D patch of the portion of the tooth 306 surface may be registered or associated with one or more previously generated 3D patches of other and/or overlapping portions of the tooth 306 surface by locating corresponding data points. The 3D patch of that portion may then be fused with one or more previously generated 3D patches associated with other portions of the tooth 306 surface to generate the 3D surface model 118. For example, the 3D surface model 118 may include 3D points within voxels in a signed distance field, and the signed distance field may be converted to a 3D mesh to render the 3D surface model 118. The 3D surface model 118 may also include texture data for the surface of the tooth 306.

After generating the 3D surface model 118 of the tooth 306, the processor 106 may be configured to generate the 3D internal geometry 120 of the tooth 306. The generation details of the 3D internal geometry may be.

Fig. 5 illustrates a method 500 for generating a set of input parameters for a projection object in accordance with an exemplary embodiment. The set of input parameters is generated by the processor 106 using the near infrared image 114. For example, the set of input parameters is generated for the continuous volume machine learning model 108. Fig. 5 is explained in conjunction with the elements in fig. 1, 2, 3 and 4.

In step 502, the relative position between the handheld intraoral scanner 104 and the tooth 306 is estimated. Once the processor 106 receives the near infrared images 114, the processor 106 may be configured to estimate a relative pose between the tooth 306 and the handheld intraoral scanner 104 for each near infrared image 114.

Further, the estimated relative pose indicates, for example, perspective information and spatial position information of the handheld intraoral scanner, i.e., the position of the image sensor unit within the handheld intraoral scanner relative to the subject's teeth when the corresponding near infrared image is captured from the near infrared image 114. The perspective information of the near infrared image indicates the angle formed between the handheld intraoral scanner 104 and the tooth 306 at the time the near infrared image was captured. Further, the spatial location information indicates where the handheld intraoral scanner 104 is physically located relative to the location of tooth 306.

For example, projector unit 214C may illuminate the mouth and teeth 306 with near infrared wavelength pulses and white light wavelength pulses over a single time range. Thus, near infrared image 114 may be captured with white light image 116 or between white light images 116. To estimate the relative position between the handheld intraoral scanner 104 and the tooth 306 for a given near infrared map, the pose of one or more white light images associated with the given near infrared image may be interpolated. For example, one or more white light images associated with a given near infrared image may be captured during the same time frame as the near infrared image. Based on the pose of one or more white light images associated with a given near infrared image, perspective information and spatial location information for the given image may be determined.

In step 504, the set of input parameters is determined based on the estimated relative pose. For example, the set of input parameters includes spatial position information and perspective information of the projection object corresponding to each of the plurality of pixels of each near infrared image 114. In this regard, the processor 106 may be configured to determine the object region of each near infrared image 114. The object region in the near infrared image may include a plurality of pixels indicating the teeth 306 or a portion of the teeth 306 that may be captured in the corresponding near infrared image. In one example, the object region or plurality of pixels associated with the tooth 306 in the near infrared image may be determined by projecting the 3D surface model 118 back into the near infrared image and determining which pixels in the near infrared image are projected to hit. In this case, a plurality of pixels of the object region may be identified as pixels of the pixels hit by the near infrared image projected back by the 3D surface model 118. In another example, the object region or a plurality of pixels associated with the tooth 306 within the near infrared image may be determined by performing edge detection on the near infrared image to identify edges of the tooth 306 or portions of the tooth 306 covered by the near infrared image. In this case, a plurality of pixels of the object region may be identified as pixels within the detected edge in the near infrared image.

Once a plurality of pixels associated with tooth 306 are determined for each near infrared image 114, a casting object (casting object) may be determined for each of the plurality of pixels. For example, the projection object may be a camera ray or cone, which may be projected through each of a plurality of pixels of each near infrared image 114. Further, the projection object projected through the pixel may include a plurality of point coordinates. Thus, the plurality of point coordinates of the projected object may correspond to different depths in the interior region of the tooth 306 and may be related to different materials of the interior region of the tooth 306. Based on the plurality of point coordinates of the projection object, the set of input parameters corresponding to the projection object may be determined.

For example, the set of input parameters indicative of the projection object may include spatial position information (e.g., values of (x, y, z) coordinates) and perspective information (e.g.,Is a value of (2). The set of input parameters indicative of the projection object may be determined based on the estimated relative position at which the corresponding near infrared image was captured. For example, the spatial position information of the projection object may indicate a spatial position of each of a plurality of point coordinates (x, y, z). Furthermore, perspective information of the projection object may indicate a plurality of point coordinates or polar coordinates seen by the projection object during capturing the corresponding near infrared imageAndIn the direction of view of the display. In this way, the set of input parameters may be determined for each projected object projected through each of the plurality of pixels of the object region within each near infrared image 114.

FIG. 6 illustrates a method 600 for training a continuous volume machine learning model 108, according to an example embodiment. The continuous volume machine learning model 108 may be trained based on a plurality of two-dimensional near infrared images 114. Fig. 6 is explained in conjunction with the elements in fig. 1,2, 3, 4 and 5.

For example, the plurality of near infrared images 114 may be captured by the handheld intraoral scanner 104 from different locations. In one example, the projection object may be generated from each of a plurality of pixels of a plurality of near infrared images 114 captured from different locations of the handheld intraoral scanner. For example, an image sensor (e.g., a camera) of the handheld intraoral scanner 104 may capture a two-dimensional near infrared image 114. Such two-dimensional pixels of the plurality of two-dimensional near-infrared images 114 may comprise, for example, a portion of one or more teeth belonging to the patient 304. It is noted that the projection object may comprise a corresponding plurality of point coordinates. In addition, a three-dimensional surface model 118 of the subject's teeth may be generated based on the white light image.

In one example, the determined data, e.g., the plurality of near infrared images 114, the location, the projected object, the plurality of point coordinates, the three-dimensional surface model, etc., may constitute training data for training the continuous volume machine learning model 108. Training data may be generated for each patient in order to train the continuous volume machine learning model 108 differently for each patient data.

In step 602, the set of input parameters corresponding to the projected objects of the plurality of two-dimensional near infrared images 114 is received. For example, the set of input parameters may be determined based on a known or calibrated position of a sensor (e.g., an image sensor or camera of the handheld intraoral scanner 104 that may be used to capture multiple two-dimensional near infrared images 114). It is noted that the projection object may be projected or projected through two-dimensional pixels within the plurality of two-dimensional near infrared images 114 that comprise a portion of a tooth. Furthermore, each projection object includes a plurality of point coordinates.

For example, a three-dimensional surface model 118 of the subject may be previously generated, e.g., using a white light image 116 captured with the two-dimensional near infrared image 114. Further, based on the three-dimensional surface model 118, grid points may be generated within which artificial pixels indicating the coordinates of the corresponding points may need to be drawn or rendered. Further, the near-infrared camera rays or near-infrared wavelength rays are intersected by the external geometry (i.e., the three-dimensional surface model 118) to determine which portions of the camera rays or near-infrared rays (and corresponding two-dimensional pixels of the plurality of near-infrared images 114) lie within the outer boundaries of the one or more teeth.

The continuous volume machine learning model 108 may be trained to determine synthesized pixel values for rendering artificial pixels of the 3D internal geometry 120. In addition, the set of input parameters (e.g., spatial position parameters (x, y, z) and viewing direction or perspective parameters) The determination is made based on a known calibration of the image sensor. For example, multiple 3D positions along the projected object that lie within the outer boundary of the corresponding tooth or within the 3D training surface model 118 may be alignedSampling is performed to determine a plurality of point coordinates. Further, the indication in the viewing direction is based on the combination of coordinates for a plurality of points3D position of spatial position withinThe set of input parameters that can determine a plurality of point coordinates。

In step 604, an intensity value and a density value for each of a plurality of point coordinates are generated based on the continuous volume machine learning model 108. For example, the continuous volume machine learning model 108 may use the received set of input parameters. Based on the set of input parameters related to the point coordinates of the projected object, the continuous volume machine learning model 108 may be trained to be based on the corresponding input parametersDetermining or outputting intensity and density values for each point coordinate of the projected object. It should be noted that the continuous volume machine learning model 108 may be trained to determine intensity values and density values for point coordinates of a plurality of different projection objects belonging to the two-dimensional near infrared image 114 and associated with the subject.

In step 606, a composite pixel value for each projection object is determined based on the corresponding determined intensity value and density value for each of the plurality of point coordinates. For example, the composite pixel value of the projection object may be determined as the expected intensity of the projection object along the camera ray capturing the corresponding two-dimensional pixel within the corresponding two-dimensional near-infrared image. Expected intensityThe calculation may be based on an integration of the determined intensity value and density value of the point coordinates associated with the projection object. Subsequently, the pixel values are synthesized to the desired intensitiesValues.

In step 608, a loss function between the synthesized pixel value and corresponding true pixel values of a plurality of pixels of the plurality of two-dimensional near-infrared images 114 is minimized. For example, the synthesized pixel value generated for the projection object may indicate an intensity value of the corresponding artificial pixel.

For example, given a composite pixel value of a plurality of projection objects, indicating a set or group of calculated expected intensitiesThe loss function for the synthesized pixel value may be calculated based on the corresponding true pixel value as follows:

Wherein, the Is a set of projection objects, andIs the reference true intensity observed in the plurality of two-dimensional near infrared images 114 used for training.

The generation of the loss function is based on the understanding that the set of objects is projectedThe projected objects in (a) do not necessarily come from the same point of view of the camera or image sensor. Furthermore, the weights of the continuous volume machine learning model 108 are updated with the loss functions corresponding to the plurality of composite values and iterated according to the training steps described in fig. 8 until a fixed number of iterations are converged or performed.

To this end, the continuous volume machine learning model 108 may be trained each time a set of input parameters is generated for different patients. Due to the retraining of the continuous volume machine learning model 108, synthesized pixel values may be generated for the patient such that the synthesized pixel values are user-specific. In addition, significantly less training data is required for training the continuous volume machine learning model 108.

The set of input parameters is provided to a continuous volume machine learning model 108 for solving a continuous volume scene function. The manner in which the continuous volume machine learning model 108 operates will be described in detail in connection with fig. 7A.

Fig. 7A is a sequence diagram 700 depicting the generation of 3D internal geometry 120, according to an example embodiment. The processor 106 generates the 3D internal geometry 120 using the continuous volume machine learning model 108. Fig. 7A is explained in conjunction with the elements in fig. 1,2, 3, 4, 5 and 6. The sequence diagram 700 may include the processor 106 and the continuous volume machine learning model 108. The sequence diagram 700 may depict operations performed by the processor 106 of the system 102 and the continuous volume machine learning model 108 to generate the 3D internal geometry 120 and 3D model of the teeth 306 of the subject 304.

In step 702, a plurality of 2D near infrared images 114 are captured. For example, the processor 106 may capture a near infrared image 114 indicative of an interior region of the subject's teeth 306. Near infrared image 114 may be captured using one or more sensors 214D (particularly an image sensor) from near infrared information received from handheld intraoral scanner 104. For example, near infrared image 114 may be captured based on calibration information of the image sensor and near infrared wavelength pulses detected by the image sensor.

In step 704, the set of input parameters of the continuous volume machine learning model 108 is determined. The set of input parameters may include values corresponding to input variables, such as spatial position and viewing angle and viewing direction, i.eAnd. The processor 106 may determine the set of input parameters for each projection object corresponding to each of the plurality of pixels of each near infrared image 114. The set of input parameters may be generated to optimize a function of a continuous volume Machine Learning (ML) model 108. For example, the continuous volume scene function of the continuous volume ML model 108 can be optimized,. Details of the generation of the set of input parameters are provided, for example, in fig. 5.

At step 706, the continuous volume machine learning model 108 receives the plurality of near infrared images 114 and the set of input parameters. For example, the processor 106 may forward the near infrared image 114 and the set of input parameters to the continuous volume machine learning model 108. The continuous volume machine learning model 108 may be implemented as a continuous volume scene function, where the continuous volume scene function or continuous volume machine learning model 108 is a fully connected deep neural network.

For example, the continuous volume machine learning model 108 may include a neural network model based on a neural radiation field (NeRF). In some cases, the continuous volume machine learning model 108 may include an enhanced version of NeRF, such as, for example, mip-NeRF. The mip-NeRF can be used to reduce aliasing, reveal fine details in the 3D internal geometry, and reduce error rates. It is understood that the continuous volume machine learning model 108 may include a NeRF-based neural network model when the cast object is cast as a ray, and the continuous volume machine learning model 108 may include a mip-NeRF-based neural network model when the cast object is cast as a cone. The advantage of using cone-shape based projection objects is that model considerations that characterize the telecentric viewing angle of the intraoral scanner when acquiring 2D near infrared images are incorporated. This may improve the final result.

The set of input parameters is related to the projected object projected through each of the plurality of pixels of near infrared image 114. The plurality of pixels of near infrared image 114 may correspond to teeth 306, such as a portion of teeth 306 or a point on teeth 306. Furthermore, a projection object (i.e., a ray or cone) may be projected through each of the plurality of pixels of each near infrared image 114. Details of the type of projection object projected through the pixel are further provided, for example, in fig. 7B and 7C.

In step 708, an intensity value and a density value for each of a plurality of point coordinates along the projection object are generated using the set of input parameters. For example, a projection object is projected from each of a plurality of pixels of each near infrared image 114. Each projection object may comprise a plurality of point coordinates, which may have corresponding values, for example, of spatial position and viewing angle or viewing direction. Furthermore, each pixel of the projection object may also have corresponding values such as color intensity and texture information.

The continuous volume machine learning model 108 may be configured to determine an intensity value and a density value for each point coordinate based on the obtained near infrared image 114 and the set of input parameters. For example, the input parameters (x, y, z) corresponding to the spatial position information of the point coordinates and the input parameters corresponding to the view angle information of the point coordinates may be based onOptimizing continuous volume scene functions for sample point coordinates of a projected object. Optimized continuous volume scene functionCan be used to generate an output. The one or more processors may be configured to generate grid points within the three-dimensional surface model in different directions of view of the three-dimensional surface model. A three-dimensional surface model is determined based on the received visible light information. The model 108 receives the generated grid points and the directions of view of the grid points and determines intensity and density values for each grid point and for different directions of view. Based on the intensity values and the density values, the one or more processors are configured to generate a three-dimensional internal geometry of the three-dimensional surface model by mapping the intensity values and the density values onto corresponding grid points of corresponding directions of view of the three-dimensional surface model. A three-dimensional surface model containing the three-dimensional internal geometry may then be displayed. Based on the user input, the one or more processors are configured to zoom in on the three-dimensional surface model to scrutinize the three-dimensional internal geometry of the three-dimensional surface model. Based on the user input, the one or more processors are configured to cause the three-dimensional surface model to rotate in a manner that the three-dimensional internal geometry is similarly rotated.

In step 710, a composite pixel value of the projection object is determined. For example, the intensity value and the density value of each of a plurality of point coordinates of the projection object may be integrated to determine a synthesized pixel value corresponding to the projection object. Synthetic pixel values 122 may be determined for a plurality of projection objects corresponding to a plurality of pixels of each near infrared image 114. The synthesized pixel value of the projection object may indicate pixel information corresponding to the artificial pixel, such as color, depth, transparency, position, texture, and the like. Such artificial pixels may be projected as artificial points in the three-dimensional internal geometry of the subject's teeth 306. For example, for two-dimensional pixels in near-infrared image 114, the near-infrared image 114 may be processed based on using continuous volume machine learning model 108 to determine a plurality of artificial points or artificial pixels.

In step 712, the processor 106 may be configured to generate grid points within the three-dimensional surface model 118 based on the different directions of view of the three-dimensional surface model 118. It should be noted that the processor 106 may use the visible light information and/or the white light image 116 received from the handheld intraoral scanner 104 to generate the three-dimensional surface model 118 of the subject's teeth 306. The generated grid points and the direction of view are fed into a continuous volume machine learning model 108, and the model 108 is then configured to determine intensity values and density values for the corresponding grid points and forward these values to the one or more processors 106. The one or more processors 106 are then configured to generate a three-dimensional internal geometry of the three-dimensional surface model 118 based on the intensity values and the density values of the corresponding grid points.

In one embodiment, the intensity values and density values of the plurality of point coordinates may be indicative of a property of the material at the plurality of coordinate points within a pixel of the near infrared image.

Further, the processor 106 may be configured to identify differences between enamel and dentin and boundaries between enamel and dentin within the three-dimensional internal geometry 120 of the tooth 306 based on the determined intensity and density values that have been determined based on grid points within the three-dimensional surface model and observations of those grid points. The processor 106 may identify changes in the material in the three-dimensional internal geometry based on the changes in the intensity values and the density values between the grid points. Changes in the intensity value and density value that exceed a certain change threshold may correspond to changes in the material. The one or more processors may be configured to determine whether the intensity values and the density values correspond to dental features, such as anatomical features, disease features, or mechanical features. The anatomical feature may be enamel, dentin, or pulp. The disease feature may be a crack or caries. The mechanical feature may be a filler and/or a composite prosthesis.

To this end, a three-dimensional internal geometry 120 may be generated within the three-dimensional surface model 118 such that the three-dimensional internal geometry 120 identifies different materials in the dentition of the subject's teeth 306. The different materials may correspond to enamel and dentin, but are not limited thereto. In some cases, the different materials may correspond to any prosthetic implant or any filling that may be inserted into a tooth, or a material of an artificial tooth.

Referring to fig. 7B, a schematic diagram 720 of a first type of projection object is shown in accordance with an exemplary embodiment. Referring to fig. 7C, a schematic diagram 730 of a second type of projection object is shown according to another exemplary embodiment. For example, the first type of projection object is a ray and the second type of projection object is a cone. For example, near infrared image 718 may be an image in near infrared image 114 captured by processor 106 using an image sensor and near infrared information. Upon receiving the near infrared image 718, the processor 106 may be configured to determine a region of the subject, i.e., pixels depicting portions of the subject's teeth 306. For example, a plurality of pixels within the near infrared image 718 indicating the region of the object (i.e., the tooth 306) may be identified.

Further, the processor 106 may be configured to project or render a projection object from each of the plurality of pixels of the near infrared image 718 and other near infrared images 114 captured by the processor 106. The projection object may be projected from a plurality of pixels to determine a set of input parameters for optimizing the continuous volume machine learning model 108. According to the present example, the entire near infrared image 718 may be rendered by rendering or projecting a projection object through each of a plurality of pixels.

Referring specifically to fig. 7B, a ray-based projection object 724 (hereinafter ray 724) may be projected or projected through a pixel 722 of the near infrared image 718. Ray 724 may include a plurality of point coordinates, such as point coordinates 726A, 726B, 726C, 726D, and 726E (hereinafter collectively referred to as point coordinates 726) as shown in the figure. For example, each point coordinate 726 along the ray 724 may need to be transformed using position coding, such as using a gamma function. The point coordinates 726 may be sampled along the ray 724. The spatial position information and perspective information corresponding to each transformed point coordinate is used to determine the set of input parameters for optimizing the continuous volume scene function of the continuous volume machine learning model 108.

Referring specifically to fig. 7C, cone-based projection objects 728 (hereinafter cones 728) may be projected or projected through pixels 722 of near infrared image 718. For example, a radius of the cone 728 indicating the projection object corresponding to the pixel 722 may be determined based on the size of the corresponding pixel 722 on the image plane.

For example, the processor 106 may be configured to determine an average of content within the visible volume 734 of the pixel 722. For example, content within the visible volume 734 of the pixel 722 may indicate a color intensity of the pixel 722. Further, the average of the color intensities within the visible volume 734 may enable the identification of the presence of, for example, portions or points of the tooth 306, caries, lesions, cracks, dentin, enamel junctions, fillings or any other objects present on or within the points of the tooth 306 captured by the pixels 722. For example, the processor 106 may be configured to render or project the cone 728 as a projected object corresponding to the pixel 722 based on the corresponding average of the content. Further, the cone 728 may be rendered such that the cone 728 models the entire volume of space that is visible through the pixels 722 based on the content within the visible volume 734 or an average of the color intensities.

Cone 728 may include a plurality of point coordinates, depicted as point coordinates 732A, 732B, 732C, 732D, and 732E (hereinafter collectively referred to as point coordinates 732). Furthermore, the cone 728 may be cut into a truncated cone corresponding to the point coordinates 732. For example, the truncated cone 736 may correspond to the point coordinates 732D. To this end, each point coordinate 732 along the cone 728 may be transformed using a position code corresponding to the volume of the frustum. The point coordinates 732 may be sampled along the cone 728. The spatial location information and perspective information corresponding to each transformed point coordinate is used to determine the set of input parameters for optimizing the continuous volume scene function of the continuous volume machine learning model 108, where the continuous volume machine learning model 108 is based on a mip-NeRF neural network. The details of generating the set of input parameters for the cone-based projection object are further described, for example, in FIG. 8.

The manner in which the continuous volume machine learning model 108 uses the set of input parameters corresponding to the projection object is described in detail in connection with fig. 7A and 8.

Fig. 7D shows a schematic diagram of a generated 3D model 740 of the subject's teeth 306, according to an example embodiment. For example, the processor 106 may be configured to generate the 3D model 740. The 3D model 740 includes the 3D surface model 118 and the 3D internal geometry 120 of the tooth 306. It is noted that the 3D internal geometry 120 may be generated by rendering grid points within the 3D surface model 118 in different directions of view of the 3D surface model. Further, the intensities and densities of the corresponding grid points are determined based on a continuous volume machine learning model.

The 3D model 740 of the tooth 306 shows the geometry of the interior region of the tooth 306. This has the advantage of enabling the user or dentist to identify any disease, abnormality or abnormality within the surface of the tooth 306. In particular, 3D model 740 provides an accurate representation of the condition of different layers within the surface of tooth 306. For example, the 3D model may be used to identify caries, cracks, lesions, abnormal hyperplasia, fillings, boundaries between enamel and dentin, etc. within the surface of tooth 306.

For example, 3D model 740 may include three-dimensional internal geometry 120 rendered within three-dimensional surface model 118. The model 740 may represent the portion of the tooth 306 corresponding to enamel 742, the portion of the tooth 306 corresponding to dentin 744, and the boundary 746 or interface between enamel 742 and dentin 744 in the dentition of the tooth 306 based on the three-dimensional internal geometry 120,3D.

Fig. 8 illustrates a method 800 for generating synthesized pixel values 122 using a continuous volume machine learning model 108, according to an example embodiment. The method 800 depicts generating the synthesized pixel value 122 when the projected object corresponds to a projected object of a second type (e.g., cone 728). In this case, the continuous volume machine learning model 108 may be implemented using, for example, a mip-NeRF based neural network model. Fig. 7 is explained in conjunction with the elements in fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7A, fig. 7B, fig. 7C, and fig. 7D.

In step 802, one or more truncated cones are determined for each cone corresponding to each pixel of the plurality of pixels. For example, a truncated cone 736 may be determined for the cone 728 corresponding to the pixel 722. The truncated cone 736 may be associated with a point coordinate 732D of the plurality of point coordinates 732 of the cone 728. In one example, the processor 106 may determine the frustum 736 such that the frustum covers the visible volume around the point coordinates 732D. It should be noted that a similar truncated cone may be determined for each point coordinate 732 of the cone 728.

In step 804, one or more integrated position codes (INTEGRATED POSITIONAL ENCODING) of the cone frustum are determined for transforming the point coordinates 732 of the cone 728. The integral position encoding of the truncated cone of point coordinates 732 may include gaussian encoding and/or sinusoidal encoding.

For example, fitting a multivariate gaussian distribution in the position encoding of the truncated cone 736 can analytically reduce the complexity of manipulating the position encoding. Furthermore, the expected integration position codes may be determined based on each of the gaussian-fitted truncated cones 736 and the other gaussian-fitted truncated cones corresponding to the point coordinates 732 of, for example, cone 728. Similarly, an expected integral position code of a cone projected from each of a plurality of pixels of the plurality of near infrared images 114 may be determined. The integral position coding of the truncated cone of cone 728 may have the characteristics of a sine function and a gaussian function.

In step 806, a composite pixel value for each cone is generated based on the integrated position encoding of the corresponding cone or cones and the continuous volume machine learning model 108.

For example, the continuous volume machine learning model 108 may be configured to change the width of the truncated cone gaussian fit in the integral position encoding at a higher frequency to determine the synthesized pixel value 122. For example, when the width is widened, the integral position code may be close to zero, indicating that the volume of the cone is small, and when the width is narrowed, the integral position code may be increased toward a non-zero value, indicating that the volume of the cone is large. Thus, the MIP-Nerf based continuous volume machine learning model 108 is able to infer the scale of the input near infrared image 114 by observing the scale of the integral position coding of the projected cone. The continuous volume ML model 108 can distinguish between small visible volumes and large visible volumes corresponding to cones.

Fig. 9 illustrates a method 900 for generating a three-dimensional internal geometry 120 and a three-dimensional surface model 118 of a subject's teeth 306, according to an example embodiment. Fig. 9 is explained in conjunction with the elements of fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7A, fig. 7B, fig. 7C, fig. 7D, and fig. 8. The steps of method 900 may be performed by system 102. As previously described, the system 102 includes a handheld intraoral scanner 104, a processor 106, and a continuous volume machine learning model 108. The handheld intraoral scanner 104 includes a projector unit 214C and one or more sensors 214D. Projector unit 214C may illuminate subject's teeth 306 with visible wavelength pulses and near infrared wavelength pulses. The one or more sensors 214D may detect near infrared light and visible light. For example, the one or more sensors 214D may include an image sensor that may be configured to capture a white light image 116 based on detecting visible light and a near infrared image 114 based on detecting near infrared light. Further, the processor 106 may be configured to generate the three-dimensional internal geometry 120 of the tooth 306 using the continuous volume machine learning model 108.

In step 902, visible light information and near infrared information are received from one or more sensors 214D. The visible light information and the near infrared information may indicate visible light wavelength pulses and near infrared wavelength pulses, respectively, that may be detected by the one or more sensors 214D. For example, the detected visible wavelength pulses and near infrared wavelength pulses may be reflected from the subject's teeth 306. In some cases, the visible light information and the near infrared information may also include calibration information associated with the one or more sensors 214D, based on which the visible light wavelength pulses and the near infrared wavelength pulses are detected. For example, the visible light information and the near infrared information may also include a white light image 116 and a near infrared image 114, respectively.

In step 904, surface information is determined in real time. For example, the processor 106 may be configured to generate surface information of the tooth 306 from the visible light information or the white light image 116. Based on the surface information, the processor 106 may be configured to generate a three-dimensional surface model 118 of the subject's teeth 306. Details of generating the 3D surface model 118 are provided in connection with fig. 4.

In step 906, a plurality of 2D near infrared images 114 of an interior region of the subject's teeth 306 may be captured from the near infrared information. For example, the processor 106 may capture the near infrared image 114 using an image sensor of the one or more sensors 214D. It should be noted that each near infrared image 114 may contain a corresponding set of pixels. In one example, a plurality of pixels of each near infrared image 114 corresponding to a portion or point on the tooth 306 may be determined for further processing.

In step 908, a set of input parameters is determined. In this regard, a projection object may be projected from each of the plurality of pixels of each near infrared image 114. The projection object may contain a corresponding plurality of point coordinates. The plurality of point coordinates corresponds to different depths within the corresponding projection object and the corresponding pixel. For example, the plurality of point coordinates may correspond to different materials within the subject's teeth 306. Thus, the plurality of point coordinates of the corresponding projection object may not belong to the same material. Further, the processor 106 may be configured to determine the set of input parameters for a plurality of point coordinates of a projection object projected from a plurality of pixels of the near infrared image 114. For example, the set of input parameters may include spatial location information (e.g., (x, y, z) values) and perspective information (e.g.,Is a value of (2). Based on the set of input parameters, an optimization process is applied to all of the projected objects of the plurality of pixels, and an optimized continuous volume scene function can be definedIs a parameter of (a).

In step 910, the set of input parameters is processed using the continuous volume machine learning model 108. For example, the continuous volume machine learning model may process the set of input parameters to determine a composite pixel value of the projection object based on the intensity values and the density values of the point coordinates. The continuous volume machine learning model 108 is configured to be trained using a plurality of near infrared images 114. Training details of the continuous volume machine learning model 108 are described in connection with, for example, fig. 8.

It is to be appreciated that each step of sequence diagram 900 can be implemented in various manners, such as hardware, firmware, processors, circuitry, and/or other communication means associated with executing software comprising one or more computer program instructions. For example, one or more of the steps described above may be implemented by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored in a memory of a system 102 employing an embodiment of the present disclosure. It should be appreciated that any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in sequence diagram 1000. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the sequence diagram 900. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the sequence diagram 900.

Thus, the steps of sequence diagram 900 support combinations of means for performing the specified functions (means) and combinations of operations for performing the specified functions. It will also be understood that one or more steps of sequence diagram 900, as well as combinations of steps in sequence diagram 900, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. The sequence diagram 900 shown in fig. 9 is used to generate and render a 3D model of a tooth that contains the three-dimensional internal geometry of the tooth. Fewer, more, or different steps may be provided.

Fig. 10 shows a schematic diagram 1000 depicting an exemplary environment for generating three-dimensional internal geometry 120 and three-dimensional surface model 118 of a tooth in real-time and rendering an interactive three-dimensional graphical representation 1010 in accordance with an exemplary embodiment. Fig. 10 is explained in conjunction with the elements of fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7A, fig. 7B, fig. 7C, fig. 7D, fig. 8, and fig. 9. Schematic 1000 may include a dentist 1002 and a patient 1004.

The dentist 1002 can use the handheld intraoral scanning device 104 to capture a plurality of 2D images 112 of the teeth 1006 of the patient 1004, including near infrared images 114 and white light images 116. The handheld intraoral scanning device 104 may transmit a plurality of 2D images 112 to the processor 106. The processor 106 may process the white light image 116 and generate three-dimensional surface information of the tooth 1006. Based on the three-dimensional surface information, the processor 106 may generate a 3D surface model 118 of the teeth 1006 of the patient 1004.

In addition, the processor 106 may process the near infrared image 114 to generate a set of input parameters for optimizing a function of the continuous volume machine learning model 108. The set of input parameters is generated for each projection object projected by each of the plurality of pixels of each near infrared image 114. The projection object may be a ray or cone. For example, the set of input parameters may be used to optimize a continuous volume scene function of the continuous volume machine learning model 108. Given an optimized continuous volume scene function with the set of input parametersThe continuous volume machine learning model 108 may generate intensity values and density values for each of a plurality of point coordinates of the projection object. Further, the continuous volume machine learning model 108 may generate synthesized pixel values of the projection object, for example, by integrating intensity values and density values of the plurality of point coordinates. In this way, composite pixel values 122 for a plurality of projection objects corresponding to a plurality of pixels in each near infrared image 114 may be determined.

The 3D model of the teeth 1006 of the patient 1004 may be displayed in real-time on the display unit 1008 as an interactive three-dimensional graphical representation 1010. Further, the interactive three-dimensional graphical representation 1010 may be generated in real-time and the interactive three-dimensional graphical representation 1010 may be manipulated, e.g., rotated or viewed at multiple perspectives, by the dentist 1002 as desired.

Thus, the intraoral scanning system 102 may enable processing of multiple 2D images 112 to generate a three-dimensional surface model 118 of the tooth 1006 and a three-dimensional internal geometry 120. The three-dimensional internal geometry 120 can display the condition, shape, size, and other characteristics of the internal region of the tooth 1006. The generated three-dimensional internal geometry 120 accurately identifies different materials, such as enamel and dentin, within the interior region of the tooth 1006 based on a comparison of the intensity and/or density values of the projected object to the change threshold. For example, if the difference between the first intensity value or density value of the first projection object and the second pixel value of the second projection object is greater than the change threshold, the first projection object and the second projection object may be identified as belonging to different materials. In this way, boundaries between enamel and dentin within the teeth 1006 of the patient 1004 can be identified in the three-dimensional internal geometry 120.

A user (e.g., a dentist) can access the interactive three-dimensional graphical representation 1010 of the 3D model of the tooth 1006 through the display unit. Thus, the intraoral scanning system 102 is capable of processing multiple 2D images 112 in real time and generating a 3D model for real time access by a user through a display unit.

Many modifications and other embodiments of the invention will come to mind to one skilled in the art to which this invention pertains having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Furthermore, while the foregoing description and drawings describe exemplary embodiments of certain element and/or functional combinations, it should be appreciated that other embodiments can provide different element and/or functional combinations without departing from the scope of the appended claims. In this regard, for example, other combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Parts list:

Intraoral scanning system-102

Hand-held intraoral scanner-104

Processor-106

Continuous volume machine learning model-108

Two-dimensional image-112, 308

Two-dimensional near infrared images-114, 308B, 718

Two-dimensional white light image-116, 308A

Three-dimensional surface model-118

Three-dimensional internal geometry-120

Dentist-302, 1002

Patient-304, 1004

Subject's teeth-306, 1006

Three-dimensional information-310

Pixel-722

Ray casting object-724

Point coordinates-726A, 726B, 726C, 726D, 726E, 732A, 732B732C, 732D, and 732E

Cone projection object-728

Visible volume-734

Cone frustum-736

3D model-740

Enamel-742

Dentin-744

Boundary-746

Display unit-1008

Interactive three-dimensional graphic representation-1010

Item list:

1. an intraoral scanning system (102) configured to generate a three-dimensional (3D) internal geometry (120) of a subject's teeth (306), the system comprising:

a handheld intraoral scanner (104) configured to operate with one or more sensors (214D) to detect Near Infrared (NIR) and visible light, wherein the one or more sensors include an image sensor;

One or more processors (106) operatively connected to the handheld intraoral scanner, the one or more processors configured to:

receiving visible light information and Near Infrared (NIR) information from one or more sensors;

Determining surface information from the visible light information in real time to generate a three-dimensional (3D) surface model of the subject's teeth using the surface information (118);

Capturing a plurality of two-dimensional (2D) NIR images (114) of an interior region of a subject's teeth in real time from NIR information using an image sensor, wherein each of the plurality of NIR images includes a plurality of corresponding pixels (722), and

Determining a set of input parameters of the projection object corresponding to each of the plurality of pixels, wherein the set of input parameters includes spatial position information and perspective information, and the projection object includes a plurality of point coordinates (726, 732), and

A continuous volume machine learning model (108) configured to receive and process the set of input parameters to determine intensity values and density values of a three-dimensional internal geometry of the 3D surface model, wherein the continuous volume machine learning model is configured to be trained using a plurality of 2D NIR images.

2. The intraoral scanning system (102) of item 1, wherein the continuous volume machine learning model (108) is trained by:

Receiving (602) the set of input parameters corresponding to a projection object of each of the plurality of 2D NIR images (114), wherein the projection object comprises a plurality of point coordinates (726, 732) associated with the 3D surface model (118);

Generating (804) an intensity value and a density value for each of a plurality of point coordinates using the set of input parameters based on a continuous volume machine learning model;

Determining (606) a composite pixel value of the projection object based on the corresponding determined intensity value and density value for each of the plurality of point coordinates, and

By varying the intensity value and the density value of each of the plurality of point coordinates, a loss function between the synthesized pixel value and corresponding real pixel values of a plurality of pixels (722) of the plurality of 2D NIR images is minimized (608).

3. The intraoral scanning system (102) of any preceding claim, wherein the one or more processors (106) are further configured to:

Determining a plurality of grid points within the 3D surface model (118), and

The 3D internal geometry (120) is determined by arranging at least one of intensity values or density values at each of the plurality of grid points.

4. The intraoral scanning system (102) according to any one of the preceding claims, further comprising a display unit (1008) configured to display the 3D internal geometry (120) based on at least one of intensity values or density values determined by the continuous volume machine learning model (108).

5. The intraoral scanning system (102) of item 4, wherein the display unit (1008) is configured to display the three-dimensional internal geometry (120) inside the three-dimensional surface model (118).

6. The intraoral scanning system (102) of any preceding claim, wherein the one or more processors (106) are further configured to:

Based on the determined intensity and density values of the projected object for each of the plurality of pixels (722), a boundary (746) between enamel (742) and dentin (744) in the dentition of the subject's tooth (306) is determined.

7. The intraoral scanning system (102) of item 6, wherein the one or more processors (106) are further configured to:

The boundary (746) is determined based on a change in at least one of an intensity value or a density value of at least two of the projection objects of the plurality of pixels (722) being above a change threshold.

8. The intraoral scanning system (102) of any preceding claim, wherein the projection object is one of a ray (724) or cone (728).

9. The intraoral scanning system (102) of any preceding claim, wherein:

the handheld intraoral scanner (104) includes a projector unit (214C) configured to illuminate a subject's teeth (306) with one or more white wavelength pulses and one or more Near Infrared (NIR) wavelength pulses, and

The one or more sensors (214D) are configured to generate a set of white light images (116) and a plurality of 2D NIR images (114) based on the illumination.

10. The intraoral system (102) of item 9, wherein the three-dimensional surface model (118) is determined based on the set of white light images (116).

11. The intraoral scanning system (102) of any preceding claim, wherein the one or more processors (106) are further configured to:

a relative position between a handheld intraoral scanner (104) and a subject's teeth (306) corresponding to each of a plurality of NIR images (114) is estimated, wherein the estimated relative position is indicative of perspective information and spatial position information of a corresponding projection object.

12. The intraoral scanning system (102) of any preceding claim, wherein the projection object is a cone (728), the one or more processors further configured to:

determining (802) one or more truncated cones (736) for each cone corresponding to each of the plurality of pixels (722), wherein the one or more truncated cones are associated with a plurality of point coordinates (732);

Determining (804) an integral position code of one or more truncated cones for transforming a plurality of point coordinates of each cone, the integral position code comprising at least a Gaussian code and a sinusoidal code, and

A composite pixel value for each cone is generated (806) based on the integrated position encoding of the corresponding cone or cones and a continuous volume machine learning model (108).

13. The intraoral scanning system (102) of item 12, wherein the one or more processors (106) are further configured to:

Based on the size of a corresponding pixel in the plurality of pixels, a radius is determined that indicates each cone (728) of the projection object corresponding to each of the plurality of pixels (722).

14. The intraoral scanning system (102) of item 12, wherein the one or more processors (106) are further configured to:

Determining an average value of content within a visible volume (734) of each of the plurality of pixels (722), wherein the content is indicative of color intensity, and

Each cone indicative of a projected object for each of the plurality of pixels is rendered based on the average of the corresponding content (728).

15. The intraoral scanning system (102) of any preceding claim, wherein the continuous volume machine learning model (108) is a machine learning based neural radiation field (NeRF) network.

16. A method (900) for generating a three-dimensional (3D) internal geometry (120) of a tooth (306) of a subject using an intraoral scanning system (102), the intraoral scanning system comprising:

one or more processors (106) operatively connected to the handheld intraoral scanner, and

A continuous volume machine learning model (108);

wherein the method comprises the following steps:

receiving (902) visible light information and Near Infrared (NIR) information from one or more sensors;

determining (904) surface information from the visible light information in real time to generate a three-dimensional (3D) surface model (118) of the subject's teeth using the surface information;

Capturing (906), in real-time, a plurality of two-dimensional (2D) NIR images (114) of an interior region of a subject's tooth from NIR information using an image sensor, wherein each of the plurality of NIR images includes a plurality of corresponding pixels (722);

Determining (908) a set of input parameters of a projection object corresponding to each of the plurality of pixels, wherein the set of input parameters includes spatial position information and perspective information, and the projection object includes a plurality of point coordinates (726, 732), and

The set of input parameters is processed (910) using a continuous volume machine learning model to determine intensity values and density values of a three-dimensional internal geometry of a three-dimensional surface model, wherein the continuous volume machine learning model is configured to be trained using a plurality of 2D NIR images.

17. The method of item 17, further comprising:

Determining a plurality of grid points within the three-dimensional surface model (118);

determining a three-dimensional internal geometry (120) by arranging at least one of an intensity value or a density value at each of a plurality of grid points based on the set of input parameters, and

The three-dimensional internal geometry is displayed within the three-dimensional surface model.

18. The method of any preceding claim, further comprising:

a boundary (746) between enamel (742) and dentin (744) in a dentition of a subject's tooth (306) is determined based on a change in at least one of an intensity value or a density value of at least two projected objects of the plurality of pixels above a change threshold.

19. A computer-programmable product comprising a transitory computer-readable medium having stored thereon computer-executable instructions that, when executed by processing circuitry, cause the processing circuitry to perform operations comprising:

Receiving visible light information and Near Infrared (NIR) information from one or more sensors (214D) mounted within the handheld intraoral scanner (104), the one or more sensors configured to detect Near Infrared (NIR) and visible light, wherein the one or more sensors include an image sensor;

Determining surface information in real-time from the visible light information to generate a three-dimensional (3D) surface model (118) of the subject's teeth (306) using the surface information;

Capturing, in real-time, a plurality of two-dimensional (2D) NIR images (114) of an interior region of a subject's tooth from NIR information using an image sensor, wherein each of the plurality of NIR images includes a plurality of corresponding pixels (722);

The set of input parameters is processed using a continuous volume machine learning model (118) configured to be trained using a plurality of two-dimensional near infrared images to determine intensity values and density values of a three-dimensional internal geometry (120) of the three-dimensional surface model.

Claims

1. An intraoral scanning system (102) configured to generate a three-dimensional (3D) internal geometry (120) of a subject's teeth (306), the intraoral scanning system comprising:

One or more processors (106), the one or more processors are operatively connected to the handheld intraoral scanner, the one or more processors are configured to:

receiving visible light information and Near Infrared (NIR) information from the one or more sensors;

Capturing a plurality of two-dimensional (2D) NIR images (114) of an interior region of a subject's teeth in real time from the NIR information using the image sensor, wherein each of the plurality of NIR images includes a plurality of corresponding pixels (722), and

Determining a set of input parameters of a projection object corresponding to each of the plurality of pixels, wherein the set of input parameters includes spatial position information and perspective information, and the projection object includes a plurality of point coordinates (726, 732), and

A continuous volume machine learning model (108) configured to receive and process the set of input parameters to determine intensity values and density values of a three-dimensional internal geometry of the 3D surface model, wherein the continuous volume machine learning model is configured to be trained using the plurality of 2D NIR images.

2. The intraoral scanning system (102) of claim 1, wherein the continuous volume machine learning model (108) is trained by:

-receiving (602) the set of input parameters corresponding to a projection object of each of the plurality of 2D NIR images (114), wherein the projection object comprises a plurality of point coordinates (726, 732) associated with the 3D surface model (118);

Generating (804) an intensity value and a density value for each of the plurality of point coordinates using the set of input parameters based on the continuous volume machine learning model;

Determining a plurality of grid points within the 3D surface model (118), and

5. The intraoral scanning system (102) of claim 4, wherein the display unit (1008) is configured to display the 3D internal geometry (120) inside the 3D surface model (118).

7. The intraoral scanning system (102) of claim 6, wherein the one or more processors (106) are further configured to:

The boundary (746) is determined based on a change in at least one of an intensity value or a density value of at least two projected objects of the plurality of pixels (722) above a change threshold.

8. The intraoral scanning system (102) of any preceding claim, wherein the projection object is one of a ray (724) or a cone (728).

9. The intraoral scanning system (102) of any preceding claim, wherein:

The one or more sensors (214D) are configured to generate a set of white light images (116) and the plurality of 2D NIR images (114) based on the illumination.

10. The intraoral system (102) of claim 9, wherein the 3D surface model (118) is determined based on the set of white light images (116).

a relative position between the handheld intraoral scanner (104) and a subject's teeth (306) corresponding to each of the plurality of NIR images (114) is estimated, wherein the estimated relative position is indicative of perspective information and spatial position information of the corresponding projection object.

Determining (802) one or more truncated cones (736) for each cone corresponding to each of the plurality of pixels (722), wherein the one or more truncated cones are associated with the plurality of point coordinates (732);

Determining (804) an integral position code of the one or more truncated cones for transforming a plurality of point coordinates of each cone, the integral position code comprising at least a gaussian code and a sinusoidal code, and

A composite pixel value for each cone is generated (806) based on the integrated position encoding of the corresponding cone or cones and the continuous volume machine learning model (108).

13. The intraoral scanning system (102) of claim 12, wherein the one or more processors (106) are further configured to:

Based on the size of a corresponding pixel in the plurality of pixels, a radius is determined for each cone (728) that indicates a projection object corresponding to each of the plurality of pixels (722).

14. The intraoral scanning system (102) of claim 12, wherein the one or more processors (106) are further configured to:

Each cone indicative of a projected object for each of the plurality of pixels is rendered based on an average of the corresponding content (728).