WO2025027583A1

WO2025027583A1 - Method and system for identifying a medical disposable product

Info

Publication number: WO2025027583A1
Application number: PCT/IB2024/057513
Authority: WO
Inventors: Mundeep MINHAS; Thierry WONG
Original assignee: Assistiq Technologies Inc.
Priority date: 2023-08-03
Filing date: 2024-08-02
Publication date: 2025-02-06

Abstract

There is provided a computer-implemented method for identifying a medical disposable product, the method comprising: receiving from a camera a video stream of an area of interest; detecting a presence of an object within the area of interest; determining that an image capture procedure is to be performed; providing for display a user interface for guiding a user to position the object at a desired position relative to the camera; when the object is at the desired position, capturing an image of the object; extracting packaging data from the image of the object; identifying the object as being a given medical disposable product based on the packaging data; and outputting the identification of the given medical disposable product.

Description

METHOD AND SYSTEM FOR IDENTIFYING A MEDICAL DISPOSABLE

PRODUCT

CROSS-REFERENCE TO RELATED APPLICATION

[1] The present application claims priority on US Provisional application No. 63/517,454 filed on August 3, 2023, the content of which being incorporated herein by reference.

FIELD

[2] The present technology pertains to the field of object identification, and more particularly to the identification of medical disposable products.

BACKGROUND

[3] Medical disposable products account for one of the highest variable costs in medical institutions such as hospitals. However, it is usually difficult for such medical institutions to accurately track their usage of the medical disposable products.

[4] In order to track usage of medical disposable products, manual recording of used medical disposable products into an electronic system is often performed. However, such a tracking method is prone to errors in addition to being labor intensive. The tracking can also be made using a reader to manually scan a code present on the package of the medical disposable products. Such a tracking method is also labor time consuming and only applies to packages having a code printed thereon. RFID tags or stickers may also be attached to the package of medical disposable products. However, such a tracking method is expensive and time consuming since an RFID tag or sticker must be attached to each package.

[5] Therefore, there is a need for an improved method and system for identifying medical disposable products in medical institutions in order to track their usage for example. SUMMARY

[6] In accordance with a first broad aspect, there is provided a computer- implemented method for identifying a medical disposable product, the method comprising: receiving from a camera a video stream of an area of interest; detecting a presence of an object within the area of interest; determining that an image capture procedure is to be performed; providing for display a user interface for guiding a user to position the object at a desired position relative to the camera; when the object is at the desired position, capturing an image of the object; extracting packaging data from the image of the object; identifying the object as being a given medical disposable product based on the packaging data; and outputting the identification of the given medical disposable product.

[7] In some embodiments, the method further comprises correcting the video stream.

[8] In some embodiments, the step of correcting the video stream comprises: measuring a luminance of a space surrounding the camera; comparing the measured luminance to a luminance threshold; and when the measured luminance is below the luminance threshold, performing said correcting the video stream.

[9] In some embodiments, the step of correcting the video stream comprises adjusting at least one of a frame rate, a resolution, an aspect ratio and a contrast.

[10] In some embodiments, the step of correcting the video stream comprises applying a white balance method to the video stream.

[11] In some embodiments, the step of detecting the presence of the object is performed by detecting differences between successive frames of the video frame.

[12] In some embodiments, the step of determining that the image capture procedure is to be performed comprises determining that the object is moving closer to the camera and towards a center of the camera.

[13] In some embodiments, the step of determining that the image capture procedure is to be performed further comprises determining that a speed of motion of the object decreases. [14] In some embodiments, the method further comprises providing the user with a feedback once said capturing the image of the object is performed.

[15] In some embodiments, the step of extracting the packaging data is performed using at least one of optical character recognition, a machine learning model configured for extracting text, text position and text size, and a computer vision model.

[16] In accordance with another broad aspect, there is provided a system for identifying a medical disposable product, the system comprising: a processor; a non- transitory storage medium operatively connected to the processor, the non-transitory storage medium comprising computer-readable instructions; the processor, upon executing the instructions, being configured to: receiving from a camera a video stream of an area of interest; detecting a presence of an object within the area of interest; determining that an image capture procedure is to be performed; providing for display a user interface for guiding a user to position the object at a desired position relative to the camera; when the object is at the desired position, capturing an image of the object; extracting packaging data from the image of the object; identifying the object as being a given medical disposable product based on the packaging data; and outputting the identification of the given medical disposable product.

[17] In some embodiments, the processor is further configured for correcting the video stream.

[18] In some embodiments, said correcting the video stream comprises: measuring a luminance of a space surrounding the camera; comparing the measured luminance to a luminance threshold; and when the measured luminance is below the luminance threshold, performing said correcting the video stream.

[19] In some embodiments, said correcting the video stream comprises adjusting at least one of a frame rate, a resolution, an aspect ratio and a contrast.

[20] In some embodiments, said correcting the video stream comprises applying a white balance method to the video stream.

[21] In some embodiments, said detecting the presence of the object is performed by detecting differences between successive frames of the video frame. [22] In some embodiments, said determining that the image capture procedure is to be performed comprises determining that the object is moving closer to the camera and towards a center of the camera.

[23] In some embodiments, said determining that the image capture procedure is to be performed further comprises determining that a speed of motion of the object decreases.

[24] In some embodiment, the processor is further configured for providing the user with a feedback once said capturing the image of the object is performed.

[25] In some embodiments, said extracting the packaging data is performed using at least one of optical character recognition, a machine learning model configured for extracting text, text position and text size, and a computer vision model.

[26] Terms and Definitions

[27] In the context of the present specification, a “server” is a computer program that is running on appropriate hardware and is capable of receiving requests (e.g., from computing devices) over a network (e.g., a communication network), and carrying out those requests, or causing those requests to be carried out. The hardware may be one physical computer or one physical computer system, but neither is required to be the case with respect to the present technology. In the present context, the use of the expression a “server” is not intended to mean that every task (e.g., received instructions or requests) or any particular task will have been received, carried out, or caused to be carried out, by the same server (i.e., the same software and/or hardware); it is intended to mean that any number of software elements or hardware devices may be involved in receiving/sending, carrying out or causing to be carried out any task or request, or the consequences of any task or request; and all of this software and hardware may be one server or multiple servers, both of which are included within the expressions “at least one server” and “a server”.

[28] In the context of the present specification, “computing device” is any computing apparatus or computer hardware that is capable of running software appropriate to the relevant task at hand. Thus, some (non-limiting) examples of computing devices include general purpose personal computers (desktops, laptops, netbooks, etc.), mobile computing devices, smartphones, and tablets, and network equipment such as routers, switches, and gateways. It should be noted that a computing device in the present context is not precluded from acting as a server to other computing devices. The use of the expression “a computing device” does not preclude multiple computing devices being used in receiving/sending, carrying out or causing to be carried out any task or request, or the consequences of any task or request, or steps of any method described herein. In the context of the present specification, a “client device” refers to any of a range of end-user client computing devices, associated with a user, such as personal computers, tablets, smartphones, and the like.

[29] In the context of the present specification, the expression "computer readable storage medium" (also referred to as "storage medium” and “storage”) is intended to include non-transitory media of any nature and kind whatsoever, including without limitation RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc. A plurality of components may be combined to form the computer information storage media, including two or more media components of a same type and/or two or more media components of different types.

[30] In the context of the present specification, a "database" is any structured collection of data, irrespective of its particular structure, the database management software, or the computer hardware on which the data is stored, implemented or otherwise rendered available for use. A database may reside on the same hardware as the process that stores or makes use of the information stored in the database or it may reside on separate hardware, such as a dedicated server or plurality of servers.

[31] In the context of the present specification, the expression “information” includes information of any nature or kind whatsoever capable of being stored in a database. Thus information includes, but is not limited to audiovisual works (images, movies, sound records, presentations etc.), data (location data, numerical data, etc.), text (opinions, comments, questions, messages, etc.), documents, spreadsheets, lists of words, etc.

[32] In the context of the present specification, unless expressly provided otherwise, an “indication” of an information element may be the information element itself or a pointer, reference, link, or other indirect mechanism enabling the recipient of the indication to locate a network, memory, database, or other computer-readable medium location from which the information element may be retrieved. For example, an indication of a document could include the document itself (i.e. its contents), or it could be a unique document descriptor identifying a file with respect to a particular file system, or some other means of directing the recipient of the indication to a network location, memory address, database table, or other location where the file may be accessed. As one skilled in the art would recognize, the degree of precision required in such an indication depends on the extent of any prior understanding about the interpretation to be given to information being exchanged as between the sender and the recipient of the indication. For example, if it is understood prior to a communication between a sender and a recipient that an indication of an information element will take the form of a database key for an entry in a particular table of a predetermined database containing the information element, then the sending of the database key is all that is required to effectively convey the information element to the recipient, even though the information element itself was not transmitted as between the sender and the recipient of the indication.

[33] In the context of the present specification, the expression “communication network” is intended to include a telecommunications network such as a computer network, the Internet, a telephone network, a Telex network, a TCP/IP data network (e.g., a WAN network, a LAN network, etc.), and the like. The term “communication network” includes a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared and other wireless media, as well as combinations of any of the above.

[34] In the context of the present specification, the words “first”, “second”, “third”, etc. have been used as adjectives only for the purpose of allowing for distinction between the nouns that they modify from one another, and not for the purpose of describing any particular relationship between those nouns. Thus, for example, it should be understood that, the use of the terms “server” and “third server” is not intended to imply any particular order, type, chronology, hierarchy or ranking (for example) of/between the server, nor is their use (by itself) intended imply that any “second server” must necessarily exist in any given situation. Further, as is discussed herein in other contexts, reference to a “first” element and a “second” element does not preclude the two elements from being the same actual real-world element. Thus, for example, in some instances, a “first” server and a “second” server may be the same software and/or hardware, in other cases they may be different software and/or hardware.

[35] Implementations of the present technology each have at least one of the above-mentioned objects and/or aspects, but do not necessarily have all of them. It should be understood that some aspects of the present technology that have resulted from attempting to attain the above-mentioned object may not satisfy this object and/or may satisfy other objects not specifically recited herein.

[36] Additional and/or alternative features, aspects and advantages of implementations of the present technology will become apparent from the following description, the accompanying drawings and the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[37] For a better understanding of the present technology, as well as other aspects and further features thereof, reference is made to the following description which is to be used in conjunction with the accompanying drawings, where:

[38] Fig. 1 is a block diagram illustrating a system for identifying a disposable medical product, in accordance with an embodiment;

[39] Fig. 2 is a flow chart illustrating a method for identifying a disposable medical product, in accordance with an embodiment;

[40] Fig. 3 illustrates an exemplary capture procedure;

[41] Fig. 4 illustrates an exemplary user interface to be displayed to a user for guiding the user to position a disposable medical product at a desired position relative to a camera;

[42] Fig. 5 illustrates a first exemplary package for a disposable medical product;

[43] Fig. 6 illustrates a second exemplary package for a disposable medical product; and

[44] Fig. 7 illustrates a third exemplary package for a disposable medical product. [45] Fig. 8 depicts a schematic diagram of a computing device in accordance with one or more non-limiting implementations of the present technology

DETAILED DESCRIPTION

[46] The examples and conditional language recited herein are principally intended to aid the reader in understanding the principles of the present technology and not to limit its scope to such specifically recited examples and conditions. It will be appreciated that those skilled in the art may devise various arrangements which, although not explicitly described or shown herein, nonetheless embody the principles of the present technology and are included within its spirit and scope.

[47] Furthermore, as an aid to understanding, the following description may describe relatively simplified implementations of the present technology. As persons skilled in the art would understand, various implementations of the present technology may be of a greater complexity.

[48] In some cases, what are believed to be helpful examples of modifications to the present technology may also be set forth. This is done merely as an aid to understanding, and, again, not to define the scope or set forth the bounds of the present technology. These modifications are not an exhaustive list, and a person skilled in the art may make other modifications while nonetheless remaining within the scope of the present technology. Further, where no examples of modifications have been set forth, it should not be interpreted that no modifications are possible and/or that what is described is the sole manner of implementing that element of the present technology.

[49] Moreover, all statements herein reciting principles, aspects, and implementations of the present technology, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof, whether they are currently known or developed in the future. Thus, for example, it will be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the present technology. Similarly, it will be appreciated that any flowcharts, flow diagrams, state transition diagrams, pseudo-code, and the like represent various processes which may be substantially represented in computer-readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[50] The functions of the various elements shown in the figures, including any functional block labeled as a "processor" or a “graphics processing unit”, may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. In one or more non-limiting implementations of the present technology, the processor may be a general purpose processor, such as a central processing unit (CPU) or a processor dedicated to a specific purpose, such as a graphics processing unit (GPU). Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read-only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included.

[51] Software modules, or simply modules which are implied to be software, may be represented herein as any combination of flowchart elements or other elements indicating performance of process steps and/or textual description. Such modules may be executed by hardware that is expressly or implicitly shown.

[52] With these fundamentals in place, we will now consider some non-limiting examples to illustrate various implementations of aspects of the present technology.

[53] Fig. 1 illustrates one embodiment of a system 10 for identifying medical disposable products. For example, the system 10 may be installed into a medical room such as a surgical room to track usage of medical disposable products 12 during a medical procedure such as a surgery.

[54] The system 10 comprises a capture device 14 and optionally a remote server 16 for identification of a medical disposable product. When it is present in the system 10, the server 16 may be seen as a fallback device in the event the capture device 14 would be unsuccessful in identifying a medical disposable product 12.

[55] The capture device 14 comprises at least a processing unit 20, a storing unit 22, a communication interface or communication means 24, a camera 26 and a display unit 28. The server 16 comprises at least a processing unit 30, a storing unit 32 and a communication interface or communication means 34. The capture device 14 and the server 16 are communicatively connected together via a communication network for example.

[56] [0065] In one or more implementations of the present technology, the communication network is the Internet. In one or more alternative non-limiting implementations, the communication network may be implemented as any suitable local area network (LAN), wide area network (WAN), a private communication network or the like. It will be appreciated that implementations for the communication network are for illustration purposes only. How a communication link between the capture device 14, the server 16, and/or another computing device (not shown) and the communications network is implemented will depend inter alia on how each computing device is implemented.

[57] As described in greater detail below, the camera 26 of the capture device 12 monitors an area of interest. Any object entering the monitored area of interest is detected and the object is tracked to determine whether an image capture procedure should be executed. If it is determined that the image capture procedure is to be executed, a graphical user interface (GUI) is displayed on the display unit 28 for guiding the user to position the object at a desired position relative to the camera 24. Once the object is at the desired position, at least one image of the object is captured. If the captured object is a medical disposable product 12, the medical disposable product 12 is identified based on the captured images of the object. First, a local identification is executed on the capture device 14. If the identification by the capture device 14 is unsuccessful, the image(s) of the object is(are) transmitted to the server 16 which identifies the object as being a given medical disposable product. The identification of the given medical disposable product 12 is then transmitted to the capture device 14. [58] During a medical procedure, the system 10 is used by a medical professional or practitioner to identify the medical disposable products that are consumed or used during the medical procedure. Each time a medical disposable product 12 is to be used, the medical practitioner scans the medical disposable product 12 by placing it in front of the camera 24 so that the capture device 14 may identify the medical disposable product 12 that is to be used. Optionally, the capture device 14 may provide the medical practitioner with feedback such as visual feedback to indicate that the medical disposable product 12 was successfully identified. The medical practitioner may then use the medical disposable product 12. By keeping record of all the identified medical disposable products, it is then possible to know which medical disposable products have been consumed during the medical procedure.

[59] While in the present description the medical disposable product 12 is placed in front of the camera 24 before it is used, it will be understood that the medical disposable product 12 or its packaging may be placed int eh front of the camera 24 for identification once it has been used.

[60] Fig. 2 illustrates one embodiment of a computer-implemented method 100 for identifying a medical disposable product such as the medical disposable product 12. It should be understood that the method 100 is executed by a computer machine comprising at least one processing unit, a storing unit and communication means such as by the capture device 14 and optionally the server 16. While in the following the method 100 is described with reference to the system 10, it should be understood that the method 100 can be executed by any other adequate system.

[61] At step 102, a video stream of an area of interest is received. The camera 24 monitors the area of interest and generates a video stream which is transmitted to the processing unit 20. It should be understood that the video stream comprises a series of successive or temporally ordered frames each corresponding to an image of the area of interest at a respective time. For example, the video stream may comprise time-stamped frames or images.

[62] In some embodiments, the method 100 further comprises a step of correcting the video stream received at step 102. In some embodiments, the luminance of the surrounding space, i.e., the space surrounding the capture device 12 or the camera 24, is measured. In this case, the system 19 further comprises a light sensor (not shown) for measuring the luminance of the surrounding space. The processing unit 20 receives the measured luminance from the light sensor and compares the measured luminance to a luminance threshold. If the measured luminance is less than the luminance threshold, the processing unit 20 corrects the frames of the video frames based on the measured luminance to obtain clearer images and facilitate the object detection and identification. In one embodiment, the correction of the frames is performed substantially concurrently while the video frames are received. In some embodiments, the correction of the frames includes the adjustment of the frame rate, the resolution, the aspect ratio and/or the contrast. In some embodiments, when it is determined that the measured luminance is below the luminance threshold, the processing unit 20 applies a white balance method to the frames of the received video stream to compensate for the low light intensity. For example, the Gray World method may be used to average the average value of the R, G, and B components of the frames to a common grey value.

[63] In some embodiments in which green light is mainly used to illuminate the room in which the medical procedure is performed, the video frames correspond to “green images”, i.e., the Green component in the images is dominant. In this case, the processing unit 20 may be configured for compensating for the Red and Blue components in order for them to become in the same range of the Green component.

[64] Referring back to Fig. 1 and Fig. 2, the frames of the video stream are analysed by the processing unit 20 to detect the presence of an object 12 within the area of interest monitored by the camera 26, at step 104. It should be understood that any adequate method for detecting that the presence of an object 12 within an area of interest from a video stream may be used.

[65] In some embodiments, the processing unit 20 compares successive frames of the video stream to determine any difference between them. When the differences between successive frames exceed a given threshold, processing unit 20 determines that an object 12 is present within the monitored area of interest.

[66] In some embodiments, the processing unit 20 is configured for analyzing the pixels of the video frames and detecting that a group of similar pixels (proximity, brightness and color) that is different from the expected background. The grouping of similar pixels may be determined by looking at brightness and colour characteristics (i.e., the boundaries of the pixel group are determined based on the difference between the expected pixels from the undisturbed background and the new pixels in the field of view of the camera 24). The group of similar pixels is then considered as corresponding to an object entering the field of view of the camera 24.

[67] At step 106, the processing unit 20 determines that an image capture procedure is to be executed, i.e., it determines that the medical practitioner intends to scan the object 12 for identification. To do so, the processing unit 20 tracks the movement of the object 12 detected at step 104 within the area of interest by analyzing the frames of the received video stream and determines whether an image capture procedure should be performed based on the movement of the object 12 within the monitored area of interest.

[68] In some embodiments, the processing unit 20 determines that an image capture procedure is to be executed when it is determined that the object 12 is moving closer to the camera 24 and towards the center ofthe camera 24, i.e., towards the center of the monitored area of interest. It should be understood that any adequate method for tracking the movement of the detected object 12 may be used. In some embodiments, successive frames of the video stream are analyzed, and dense optical flow is used to determine whether the detected object 12 is moving and if so, determine the direction of the motion. Contour detection is used to track the shape of the object 12 as well as calculate the density of the pixels (color histogram and luminosity) corresponding to the object 12 between successive frames. Then, the surface area, shape and direction of the group of pixels corresponding to the object 12 is determined over successive frames. When it determines that the shape of the group of pixels is consistent over the successive frames, the surface area of the group of pixels (or the number of pixels contained in the group) grows over the successive frames (which indicates that the object 12 is moving closer to the camera 24) and the movement of the group of pixels is towards the centerline of the field of view of the camera 24, i.e., is towards the center of the monitored area of interest, the processing unit 20 determines that an image capture procedure should be executed. In some embodiments, the processing unit 20 determines that the object 12 is moving towards the centerline of the camera 24 by comparing the position of the center of the group of pixels corresponding to the object 12 to the position of the centerline of the camera 24 over the successive video frames. When, from one video frame to another successive video frame, the distance between the center of the group of pixels and the centerline of the camera 24 decreases, the processing unit 20 determines that the object 12 is moving towards the centerline of the camera 24.

[69] In some embodiments, a third condition has to be met in addition to the increase of the surface area of the group of pixels and the motion of the group of pixels towards the centerline of the camera to determine that the image capture procedure should be executed. This third condition is related to the speed of motion of the group of pixels, i.e., the relative speed of the group of pixels must slow down as the group of pixels nears the centerline of the camera 24.

[70] Fig. 3 illustrates the motion of a group of pixels corresponding to an object such as object 12 that is indicative that an image capture procedure is to be performed. A time ti, a first video frame is received. In this first video frame, a group of similar pixels is detected at a first position relative to the camera. At a second time t2, a second video frame is received. The comparison between the first and second video frames indicates that the size of the detected group of pixels has increased (i.e., the number of pixels belonging to the group increased) which indicates that the object is moving closer to the camera and is located at a second position relative to the camera, and that the group of pixels moved fast towards the centerline of the camera. At a third time tv a third video frame is received. The comparison between the second and third video frames indicates that the size of the detected group of pixels has further increased which indicates that the object further moved closer to the camera and is now at a third position relative to the camera, and that the group of pixels further moved towards the centerline of the camera but at a lower speed. At a fourth time t4, a fourth video frame is received. The comparison between the third and fourth video frames indicates that the size of the detected group of pixels has slightly increased which indicates that the object slightly moved closer to the camera, and that the group of pixels further moved towards the centerline of the camera but at an even lower speed so as to be located at a fourth position relative of the camera. In the fourth video frame, the center of the group of pixels is close to the centerline of the camera. [71] Referring back to Fig. 1 and Fig. 2, once it has been determined that the image capture procedure is to be executed, a graphical user interface is provided for display to help the medical practitioner position the object at a desired position relative to the camera 24 at step 108. The processing unit 20 generates the user interface and transmits the user interface to the display unit 28 for display thereon.

[72] Fig. 4 illustrates an exemplary user interface 150 which comprises a first section 152 in which the video stream received from the camera 24 is inserted for display, thereby allowing the medical practitioner to see the images captured by the camera 24. This further allows the medical practitioner to adequately position the object relative to the camera. The user interface may further comprise a second section such as section 154 in which information about previously identified disposable products is inserted.

[73] In some embodiments, the user interface is substantially always displayed during the execution ofthe method 100. In this case, the appearance of the user interface may be changed upon determination that the image capture procedure is to be executed. For example, and with reference to Fig. 4, the size of the section 152 of the interface 150 may be increased upon determination that the image capture procedure is to be executed while the size of the section 154 may be decreased, or the section 154 may be deleted from the user interface.

[74] While looking at the user interface, the medical practitioner may position the object 12 at a desired position relative to the camera 24. It should be understood that the desired position may comprise a range of desired positions.

[75] In some embodiments, the section 152 of the interface 150 comprises visual markers to help the medical practitioner position the object at the desired position relative to the camera. As illustrated in Fig. 4, the section 152 may comprises four L- shaped markers positioned so as to form the comers of a square or rectangle for visually indicating to the medical practitioner where to position the object 12 relative to the camera 24, e.g., the object 12 has to be positioned relative to the camera 24 so that its comers substantially align with the L-shaped markers within the section 152 of the interface 150. Furthermore, the color of the L-shaped markers may change to indicate that the object 12 approaches the desired position. For example, the color of the L- shaped markers from yellow to green as the object 12 is getting closer to the camera 24.

[76] Once the object 12 is located at the desired position, at least one image of the object 12 is captured or taken at step 110. In some embodiments, the capture of the image comprises selecting at least one of the video frames. In other embodiments, the capture of the image comprises taking a screenshot of at least a portion of the user interface such as section 152 of the interface 150.

[77] In some embodiments, feedback is provided to the medical practitioner once the object 12 has been captured. For example, the system 10 may further comprises a speaker and the processing unit 20 may provide the medical practitioner with an audio feedback indicative that an image of the object 12 has been captured. In some embodiments, the feedback provided to the user comprises visual feedback. For example, the displayed interface may blink to indicate to the medical practitioner that the capture procedure has been performed.

[78] Referring back to Fig. 1 and Fig. 2, packaging data is extracted by the processing unit 20 from the captured image of the object 12 at step 112. The packaging data may comprise any data present on a package of a medical disposable product. For example, the packaging data may comprise a code such as a barcode, a QR code and/or a data matrix code, logos, text such as a symbol, product name, vendor markings and specific product descriptions (e.g., dimensions, etc.), logos, etc. It should be understood that any adequate method may be used for extracting the packaging data from the captured image.

[79] Figs. 5-7 illustrate different packages of medical disposable products on which different information is present. As illustrated the information may comprise, text such as company names, product names, reference numbers, size, dimensions, group of text/words, identifiers, text encoded in a barcode, etc., codes such as barcodes and QR codes, symbols, logos, etc.

[80] For example, optical character recognition (OCR) may be used for extracting text from the captured image and image search techniques may be used for extracting logos codes. In some embodiments, a pre-trained machine learning (ML) model such as Google ML Kit or Apple Vision Framework is used for extracting text and the size and position of bounding boxes containing text from the image, and a pre-trained computer vision model such as OpenCV is used for extracting codes and logos from the image.

[81] In some embodiments, the packaging data extraction is performed in two steps. First, text is extracted from the captured image via OCR and the extracted text is analyzed to determine whether it conforms to text usually found on a package of a medical disposable product. If the extracted text corresponds to text usually present on the packaging of a medical disposable product, it is then determined that the object 12 is a medical disposable product enclosed into a packaging. Then a code such as a barcode, QR code or data matrix code is searched within the extracted image and extracted using a pre-trained code computer vision model for example in order to identify the disposable product.

[82] In some embodiments, the following elements are used to determine whether the object 12 corresponds to a disposable medical product:

[83] - Barcodes, QR codes, data matrix, lot number and/or expiry dates;

[84] - Predefined keywords such as keywords typically found on packages of disposable medical products;

[85] - Dimensions, sizes and or units of measurements; and

[86] - Predefined text snippets.

[87] Once the packaging data has been extracted from the captured image, the processing unit 20 identifies the medical disposable product based on the extracted packaging data at step 114.

[88] In embodiments in which a code has been extracted at step 112, the processing unit 20 determines the ID of the medical disposable product based on the extracted code. For example, the processing unit 20 may access a database in which reference codes and a respective ID of medical disposable product for each code are stored. The database may be stored locally on the storing unit 22 or on a remote server for example. [89] In embodiments for which a code was not successfully extracted at step 112, different methods may be used for trying to identify the medical disposable product. For example, the package of some medical disposable products has printed thereon an alphanumeric string that corresponds to a code. The database may comprise predefined alphanumeric string that are each associated with a respective ID of medical disposable product and the processing unit 20 is configured for identifying alphanumeric strings from the information extracted at step 112. The processing unit 20 then compares any alphanumeric string identified from the extracted information to the predefined alphanumeric strings stored in the database and if an identified alphanumeric string corresponds to a predefined alphanumeric string stored in the database, the processing unit 20 identifies the object as being the disposable medical product associated with the corresponding predefined alphanumeric string.

[90] In some embodiments, some packages of disposable medical products have printed thereon a manufacturer catalog or reference number that is uniquely associated with a respective disposable medical product. In this case, the database may comprise predefined manufacturer reference numbers that are each associated with a respective ID of medical disposable product and the processing unit 20 is configured for identifying manufacturer reference numbers from the information extracted at step 112. The processing unit 20 then compares any manufacturer reference number identified from the extracted information to the predefined manufacturer reference numbers stored in the database and if an identified manufacturer reference number corresponds to a predefined manufacturer reference number stored in the database, the processing unit 20 identifies the object as being the disposable medical product associated with the corresponding predefined manufacturer reference number.

[91] In some embodiments in which different manufacturers may use identical manufacturer reference numbers, the processing unit 20 is further configured for determining the name of the manufacturer from the information extracted at step 112. For example, the name of a manufacturer may be identified from the extracted information using elements such as the text size, the text color, superscript symbols, etc. In this case, once the manufacturer name and the manufacturer reference number have been extracted from the image, the processing unit 20 is configured for comparing the extracted manufacturer reference number to the predefined manufacturer reference numbers stored in the database. If there is a match, the processing unit is further configured for comparing the manufacturer name extracted from the image to the manufacturer name associated wit the manufacturer reference number identified in the database . If there is a match, then the obj ect is identified as being the disposable medical product associated with the manufacturer reference number identified from the database.

[92] In some embodiments in which the object captured at step 110 is partially obstructed such as by fingers of the user, all visible text is extracted at step 112. The extracted text is classified according to parameters such as text size, text position, text color, symbols, logos and/or the like. A search query is performed in the database to look for a disposable medical product of which the associated text and text parameters match the extracted text and parameters, and thereby identify the disposable medical product.

[93] Once the identification is performed at step 114, the ID of the medical disposable product is outputted at step 116. For example, the ID of the medical disposable product may be stored in the storing unit 22. The ID of the medical disposable product may be sent to any other capture devices linked to the capture device 14, such as capture devices present in the same room as the capture device 14.

[94] For example, the ID of the medical disposable product may also be added in the section 154 of the user interface 150 to be displayed to the medical practitioner.

[95] In some embodiments, the method 100 further comprises a step of receiving the identification of the medical procedure being performed during the execution of the method 100. In this case, different databases each specific to a respective medical procedure may be accessible by the processing unit 20. A database specific to a given medical procedure comprises reference codes of medical disposable products that are used during the given medical procedure. The processing unit 20 determines the specific database to be accessed based on the ID of the medical procedure.

[96] In some embodiments, the step of receiving the identification of the medical procedure comprises receiving an identification of a medical room and accessing a time schedule associated with the medical room. The time schedule is indicative of the type of medical procedure to be performed in the medical room for each date and time. Thereby, by knowing the identification of the medical room and the date and time, the identification of the medical procedure can be determined.

[97] While in the above description, the identification step 114 is performed locally on the capture device 14, other embodiments are possible. For example, the identification of the medical disposable product by the capture device 14 may be unsuccessful. In this case, the capture device 14 transmits the captured image to the server 16 in order to identify the medical disposable product.

[98] In some embodiment, the identification of the medical disposable product by the capture device 14 is unsuccessful when a code extracted from the captured image was not identified by the capture device 14, when text extracted from the captured image was unsuccessful in identifying the medical disposable product and/or when information extracted from the captured image could not be matched to a medical disposable product.

[99] The server 16 receives the captured image from the capture device 14 and the processing unit 30 is configured for extracting packaging information such as text content and metadata from the received captured image.

[100] In some embodiments, the processing unit 30 is configured for extracting packaging information such as: size and position of text bounding boxes, text included in each text bounding box, package colors, estimated package dimensions, text font size, text language, text case sensitivity, text color, text capitalization, symbols such as trademarks and copyright, logos, Global Trade Item Number (GTIN) encoded string, dates such as expiry dates, codes such as barcodes, QR codes, data matrix codes, and/or the like.

[101] In some embodiments, the processing unit 30 uses a using pre-trained ML computer vision model such as Google Vision™ to extract the above-described information.

[102] The processing unit 30 is further configured for accessing a database containing a plurality of reference medical disposable products, and for each reference medical disposable product, reference packaging information and a weight determined based on metadata associated with the packaging information. The different metadata associated with each disposable medical product stored in the database may comprise the following:

[103] the size of text;

[ 104] the position of the text;

[105] the color of the text;

[106] text containing sizes and dimensions;

[107] text tokens; and/or the like.

[108] For example, the size of the text may refer to the relative text size. More weight may be given to larger text. In some embodiments, the text may be classified according to three predefined sizes such as small, medium and large.

[109] More weight may be given to text positioned at predefined locations. For example, text located at the top of a package may be given more weight.

[110] More weight may be given to text having a color different from that of the majority of the text. For example, if all of the extracted text is black except for some words that in red, more weight will be given to the red text than to the black text.

[111] Also, more weight may be given to text containing dimensions and/or sizes such as text containing units of measurement in comparison to the rest of the text.

[112] The processing unit 30 identifies the given medical disposable product associated with the received captured image by analyzing similarities between the packaging information extracted from the received captured image and the packaging information associated with the reference packages. To do so, the processing unit 30 accesses the database and uses the packaging information extracted from the captured image.

[113] For example, a search request consisting of a data structure that contains text tokens along with the relative size, position and color may be generated and the search request is used for searching the database. The data structure may be further processed to identify brand names, manufacturers and sizes/dimensions. The tokens are then matched against the search index. Tokens that match more important text will be ranked higher. An ordered list of products with corresponding confidence scores is returned.

[114] Once the processing unit 30 has identified the medical disposable product, the server 16 transmits the identification of the medical disposable product to the capture device 14. The capture device 14 may then store the received identification into the storing unit 22 and insert the identification into the section 154 of the interface 150.

[115] In some embodiment in which the identification of the medical procedure is received and after identifying the medical disposable product, the processing unit 30 executes a Bayesian Probability model against reference products used historically for the same medical procedure to gain a confidence score for our identification, and thereby increase the confidence in the identification of the disposable medical device. Known products used in specific procedures provide the base contextual knowledge (e.g. staplers and staples for thoracic procedures). This base knowledge is augmented by the captured data to train Bayesian probabilistic models with a focus on the conditional relationships between products, between products and procedures and between products, procedures and doctors. The model is deployed to generate real-time probabilistic distributions of disposable medical products expected to be used based on the procedure and the doctor. As new products are identified and relationships between captured products become known, the models’ probabilistic distributions are updated. An ongoing interpretation processor running on the processing unit use these probabilistic distributions to calculate a confidence score of each product identified and low confidence score products are automatically flagged for manual review.

[116] In some embodiments in which both the capture device 14 and the server 16 failed in identifying the disposable medical product, the image of the captured object is provided for display for manual identification.

[117] In embodiments in which the system 10 comprises the server 16, it will be appreciated that the server 16 can be implemented as a conventional computer server and may comprise at least some of the features of the computing device 200 shown in Figure 8. In a non-limiting example of one or more implementations of the present technology, the server 16 is implemented as a server running an operating system (OS). Needless to say that the server 16 may be implemented in any suitable hardware and/or software and/or firmware or a combination thereof. In the disclosed non-limiting implementation of present technology, the server 16 is a single server. In one or more alternative non-limiting implementations of the present technology, the functionality of the server 16 may be distributed and may be implemented via multiple servers (not shown).

[118] The implementation of the server 16 is well known to the person skilled in the art. However, the server 16 comprises a communication interface (not shown) configured to communicate with various entities (such as a database, for example and other devices potentially coupled to a communication network) via a communication network. The server 16 further comprises at least one computer processor (e.g., the processing device of the computing device 200) operationally connected with the communication interface and structured and configured to execute various processes to be described herein.

[119] In some embodiments, the capture device 14 is provided with a plurality of cameras each located for monitoring a different and respective area of interest, or the same area of interest. For example, the capture device 14 may comprise three cameras, e.g., a facing up amra, a facing down camera and a camera angled loan the horizontal plane.

[120] In this case, additional image stream processing may be added to support dynamic changes in camera position between two successive package recognition operations. The additional processing may allow to better detect and recognize a product by eliminating background features from an image in order to more reliably identify when an object enters the camera field of view, i.e., the area of interest.

[121] In some embodiments, a substantially continuous recalibration is performed to support changes in the position of the camera such as when a background change is detected, a recalibration to the new background is performed.

[122] In some embodiments, the structural similarity index method may be used to determine the similarity between a series of time stamped frames of the received video in order to establish common feature points between the frames that can be removed. Once the feature points removed from the frames, the disposable medical product may be identified. Such an embodiment may allow reducing the risk of false positives. [123] In some embodiments, at least two different recognition techniques are performed concurrently or in parallel to identify the medical disposable product. For example, the three following recognition methods may be performed in parallel.

[124] In some embodiments, the first recognition technique consists in a machine learning model trained to match specific extracted metadata, which processes the extracted metadata to identify the medical disposable product based on the relevance of the text snippets in the context of medical packaging.

[125] In some embodiments, the second recognition technique consists in a vector search match against reference document embeddings of full reference text from packaging label. The process may comprise the following steps: embedding generation, indexing, query embedding, nearest neighbor search and result ranking.

[126] Embedding Generation: each document is converted into an embedding, i.e., a numeric vector representation, by using a machine learning model. An embedding is an array of floating point numbers that represents a document in a numeric form. These embeddings are generated by models such as Sentence-BERT or CLIP.

[127] Indexing: the generated embeddings are stored in an index to efficiently retrieve and compare them during searches.

[128] Query Embedding: a search query for an inference image is converted into an embedding using the same machine learning model used for the embedding generation.

[129] Nearest Neighbor Search: the query embedding is compared to document embeddings using metrics such as cosine similarity or Euclidean distance to find the closest matches.

[130] Result Ranking: the closest matches are ranked and returned as search results.

[131] In some embodiments, the third recognition technique consists in a reverse image search computer vision object recognition against trained models from reference product images. In some embodiments, the training involves a convolutional neural network (CNN) with supervised learning against product packaging images data sets. This may comprise the following steps: dataset preparation, model training and model validation and testing.

[132] Dataset preparation: a large dataset of images is collected and the objects of interest with bounding boxes are annotated. Transformations such as rotation, scaling, and flipping may be applied to increase the variety of training data

[133] Model Training: during this process, the model learns to make accurate predictions by iteratively adjusting its parameters based on the difference between its predictions and the actual target values, using optimization techniques to minimize the error.

[134] Model Validation and Testing: this process comprises the three following steps:

[135] Validation Dataset: the model is evaluated on a separate validation set during training to monitor performance and prevent overfitting.

[136] Model Evaluation Metrics: precision, recall, Fl score, and mAP are calculated on the validation set.

[137] Testing Dataset: after training, the final model is evaluated on a test set to assess its generalization performance.

[138] The results from the three different recognition techniques are combined together with the context of where the images have been captured (e.g., the medical procedure being performed, the room ID, the medical practitioner ID, and/or the like) and a prediction model trained with data associated with the context determines the probability that the captured object be a given medical disposable product. Potential matches are ranked and outputted with an associated confidence score.

[139] Referring to Figure 8, there is shown a computing device 200 suitable for use with some implementations of the present technology, the computing device 200 comprising various hardware components including one or more single or multi-core processors collectively represented by processor 210, a graphics processing unit (GPU) 211, a solid-state drive 220, a random -access memory 230, a display interface 240, and an input/output interface 250. For example, the computing device 200 may be used as a capture device when comprising at least one camera. In another example, the computing device 200 may be used as a server 16.

[140] Communication between the various components of the computing device 200 may be enabled by one or more internal and/or external buses 260 (e.g., a PCI bus, universal serial bus, IEEE 1394 “Firewire” bus, SCSI bus, Serial-ATA bus, etc.), to which the various hardware components are electronically coupled.

[141] The input/output interface 250 may be coupled to a touchscreen 290 and/or to the one or more internal and/or external buses 260. The touchscreen 290 may be part of the display. In one or more implementations, the touchscreen 290 is the display. The touchscreen 290 may equally be referred to as a screen 1290. In the implementations illustrated in Figure 8, the touchscreen 290 comprises touch hardware 294 (e.g., pressure-sensitive cells embedded in a layer of a display allowing detection of a physical interaction between a user and the display) and a touch input/output controller 292 allowing communication with the display interface 240 and/or the one or more internal and/or external buses 260. In one or more implementations, the input/output interface 250 may be connected to a keyboard (not shown), a mouse (not shown) or a trackpad (not shown) allowing the user to interact with the computing device 200 in addition or in replacement of the touchscreen 290.

[142] According to implementations of the present technology, the solid-state drive 120 stores program instructions suitable for being loaded into the random-access memory 130 andexecuted by the processor 110 and/or the GPU 111 for [INSERT], For example, the program instructions may be part of a library or an application.

[143] The computing device 100 may be implemented as a server, a desktop computer, a laptop computer, a tablet, a smartphone, a personal digital assistant or any device that may be configured to implement the present technology, as it may be understood by a person skilled in the art.

[144] Modifications and improvements to the above-described implementations of the present technology may become apparent to those skilled in the art. The foregoing description is intended to be exemplary rather than limiting.

Claims

CLAIMS What is claimed is:

1. A computer-implemented method for identifying a medical disposable product, the method comprising: receiving from a camera a video stream of an area of interest; detecting a presence of an object within the area of interest; determining that an image capture procedure is to be performed; providing for display a user interface for guiding a user to position the object at a desired position relative to the camera; when the object is at the desired position, capturing an image of the object; extracting packaging data from the image of the object; identifying the object as being a given medical disposable product based on the packaging data; and outputting the identification of the given medical disposable product.

2. The computer-implemented method of claim 1, further comprising correcting the video stream.

3. The computer-implemented method of claim 2, wherein said correcting the video stream comprises: measuring a luminance of a space surrounding the camera; comparing the measured luminance to a luminance threshold; and when the measured luminance is below the luminance threshold, performing said correcting the video stream.

4. The computer-implemented method of claim 2 or 3, wherein said correcting the video stream comprises adjusting at least one of a frame rate, a resolution, an aspect ratio and a contrast.

5. The computer-implemented method of claim 2 or 3, wherein said correcting the video stream comprises applying a white balance method to the video stream.

6. The computer-implemented method of any one of claims 1 to 5, wherein said detecting the presence of the object is performed by detecting differences between successive frames of the video frame.

7. The computer-implemented method of any one of claims 1 to 6, wherein said determining that the image capture procedure is to be performed comprises determining that the object is moving closer to the camera and towards a center of the camera.

8. The computer-implemented method of claim 7, wherein said determining that the image capture procedure is to be performed further comprises determining that a speed of motion of the object decreases.

9. The computer-implemented method of any one of claims 1 to 8, further comprising providing the user with a feedback once said capturing the image of the object is performed.

10. The computer-implemented method of any one of claims 1 to 9, wherein said extracting the packaging data is performed using at least one of optical character recognition, a machine learning model configured for extracting text, text position and text size, and a computer vision model.

11. A system for identifying a medical disposable product, the system comprising: a processor; a non-transitory storage medium operatively connected to the processor, the non- transitory storage medium comprising computer-readable instructions; the processor, upon executing the instructions, being configured to: receiving from a camera a video stream of an area of interest; detecting a presence of an object within the area of interest; determining that an image capture procedure is to be performed; providing for display a user interface for guiding a user to position the object at a desired position relative to the camera; when the object is at the desired position, capturing an image of the object; extracting packaging data from the image of the object; identifying the object as being a given medical disposable product based on the packaging data; and outputting the identification of the given medical disposable product.

12. The system of claim 11, further comprising correcting the video stream.

13. The system of claim 12, wherein said correcting the video stream comprises: measuring a luminance of a space surrounding the camera; comparing the measured luminance to a luminance threshold; and when the measured luminance is below the luminance threshold, performing said correcting the video stream.

14. The system of claim 12 or 13, wherein said correcting the video stream comprises adjusting at least one of a frame rate, a resolution, an aspect ratio and a contrast.

15. The system of claim 12 or 13, wherein said correcting the video stream comprises applying a white balance method to the video stream.

16. The system of any one of claims 11 to 15, wherein said detecting the presence of the object is performed by detecting differences between successive frames of the video frame.

17. The system of any one of claims 11 to 16, wherein said determining that the image capture procedure is to be performed comprises determining that the object is moving closer to the camera and towards a center of the camera.

18. The system of claim 17, wherein said determining that the image capture procedure is to be performed further comprises determining that a speed of motion of the object decreases.

19. The system of any one of claims 11 to 18, further comprising providing the user with a feedback once said capturing the image of the object is performed.

20. The system of any one of claims 11 to 19, wherein said extracting the packaging data is performed using at least one of optical character recognition, a machine learning model configured for extracting text, text position and text size, and a computer vision model.