FR3158825A1

FR3158825A1 - Process for classifying objects, particularly shoes, with a view to sorting and then recycling them.

Info

Publication number: FR3158825A1
Application number: FR2400826A
Authority: FR
Inventors: Joel DIEBE
Original assignee: Cetia; Eram Interservices; Decathlon SE
Current assignee: Cetia; Eram Interservices; Decathlon SE
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2025-08-01
Also published as: WO2025163267A1

Abstract

La présente invention concerne un procédé de classification d’un produit dans un système (1), en vue de son tri, le procédé étant caractérisé en ce qu’il comprend la mise en œuvre par des moyens de traitement de données (21a) d’un premier serveur (2a) connecté audit système (1) d’étapes de : Obtention d’au moins une première image du dudit produit et une deuxième image représentant potentiellement ledit produit respectivement selon un premier point de vue et un deuxième point de vue différent du premier point vue, depuis des caméras (11, 12) dudit système (1) ;Détection dudit produit en fonction du résultat de l’application d’un premier modèle de détection d’objet à la première image et d’un deuxième modèle de détection d’objet à la deuxième image ;Lorsque ledit produit est détecté, obtention d’un premier vecteur descriptif dudit produit et d’un deuxième vecteur descriptif dudit produit au moyen d’un premier modèle d’extraction de caractéristiques et d’un deuxième modèle d’extraction de caractéristiques respectivement appliqués sur la première image et la deuxième image ;Classification d’un vecteur long correspondant à une concaténation des premier et deuxième vecteurs descriptifs dudit produit, au moyen d’un modèle de classification. Fig 1. The present invention relates to a method for classifying a product in a system (1), with a view to sorting it, the method being characterized in that it comprises the implementation by data processing means (21a) of a first server (2a) connected to said system (1) of steps of: Obtaining at least a first image of said product and a second image potentially representing said product respectively from a first point of view and a second point of view different from the first point of view, from cameras (11, 12) of said system (1);Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;Classifying of a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, using a classification model. Fig 1.

Description

Process for classifying objects, particularly shoes, with a view to sorting and then recycling them.

DOMAINE TECHNIQUE GÉNÉRALGENERAL TECHNICAL FIELD

La présente invention se rapporte au domaine du recyclage. Plus précisément, elle concerne un procédé de classification d’objets, en particulier des chaussures, en vue de leur tri puis éventuellement leur recyclage.The present invention relates to the field of recycling. More specifically, it relates to a method for classifying objects, in particular shoes, with a view to sorting them and then possibly recycling them.

ETAT DE L’ARTSTATE OF THE ART

Le recyclage des produits nécessite de connaître leur composition et leur assemblage, afin de pouvoir les envoyer vers la bonne ligne de traitement.Recycling products requires knowing their composition and assembly, in order to be able to send them to the correct processing line.

On connait des techniques de classification d’objets circulant sur un convoyeur à bandes, notamment par des réseaux de neurones, voir le brevet US10824936.Techniques for classifying objects moving on a conveyor belt are known, in particular using neural networks, see patent US10824936.

Si de telles techniques marchent très bien pour classifier des déchets (par exemple pour distinguer du papier, du plastique ou de l’aluminium), c’est beaucoup plus difficile pour des produits tels que des chaussures, pour lesquels on a un très grand nombre de modèles et parfois des modèles quasiment identiques mais de compositions totalement différentes.While such techniques work very well for classifying waste (for example, to distinguish between paper, plastic or aluminum), it is much more difficult for products such as shoes, for which there are a very large number of models and sometimes almost identical models but with completely different compositions.

Il a été proposé par conséquent d’utiliser alternativement des étiquettes RFID implantées dans les chaussures, mais on constate que :

Souvent une seule chaussure sur les deux est équipée de l’étiquette RFID notamment pour éviter des interférences
L’étiquette est parfois détruite si la chaussure a été trop usée.

It has therefore been proposed to use RFID tags implanted in shoes as an alternative, but it is noted that:

Often only one shoe out of two is equipped with the RFID tag, in particular to avoid interference.
The label is sometimes destroyed if the shoe has been worn too much.

Ainsi, aujourd’hui il n’y a pas d’autre choix que d’utiliser des opérateurs humains, ce qui n’est pas acceptableSo, today there is no other choice but to use human operators, which is not acceptable.

L’invention vient améliorer la situationThe invention improves the situation

PRESENTATION OF THE INVENTION

La présente invention se rapporte donc selon un premier aspect à un procédé de classification d’un produit dans un système, en vue de son tri, le procédé étant caractérisé en ce qu’il comprend la mise en œuvre par des moyens de traitement de données d’un premier serveur connecté audit système d’étapes de :

Obtention d’au moins une première image du dudit produit et une deuxième image représentant potentiellement ledit produit respectivement selon un premier point de vue et un deuxième point de vue différent du premier point vue, depuis des caméras dudit système ;
Détection dudit produit en fonction du résultat de l’application d’un premier modèle de détection d’objet à la première image et d’un deuxième modèle de détection d’objet à la deuxième image ;
Lorsque ledit produit est détecté, obtention d’un premier vecteur descriptif dudit produit et d’un deuxième vecteur descriptif dudit produit au moyen d’un premier modèle d’extraction de caractéristiques et d’un deuxième modèle d’extraction de caractéristiques respectivement appliqués sur la première image et la deuxième image ;
Classification d’un vecteur long correspondant à une concaténation des premier et deuxième vecteurs descriptifs dudit produit, au moyen d’un modèle de classification.

The present invention therefore relates, according to a first aspect, to a method for classifying a product in a system, with a view to sorting it, the method being characterized in that it comprises the implementation by data processing means of a first server connected to said system of steps of:

Obtaining at least a first image of said product and a second image potentially representing said product respectively from a first point of view and a second point of view different from the first point of view, from cameras of said system;
Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;
When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;
Classification of a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, by means of a classification model.

Selon des caractéristiques avantageuses et non limitatives :According to advantageous and non-limiting characteristics:

L’étape (b) comprend le calcul d’un premier score de détection par application du premier modèle de détection d’objet à la première image, et le calcul d’un deuxième score de détection par application du deuxième modèle de détection d’objet à la deuxième image.Step (b) comprises calculating a first detection score by applying the first object detection model to the first image, and calculating a second detection score by applying the second object detection model to the second image.

Le produit est détecté à l’étape (b) si le produit des premier et deuxième scores de détection est supérieur à un seuil prédéterminé.The product is detected in step (b) if the product of the first and second detection scores is greater than a predetermined threshold.

Les premier et deuxième modèles de détection d’objet sont des modèles de localisation d’objet, l’étape (b) comprenant en outre la localisation dudit produit dans les première et deuxième images, et le recadrage des première et deuxième images sur le produit localisé, l’étape (c) étant mise en œuvre sur les première et deuxième images recadrées.The first and second object detection models are object localization models, step (b) further comprising localizing said product in the first and second images, and cropping the first and second images to the localized product, step (c) being implemented on the first and second cropped images.

L’étape (b) comprend la détermination d’une première boite englobante du produit par application du premier modèle de localisation d’objet à la première image, et la détermination d’une deuxième boite englobante du produit par application du deuxième modèle de localisation d’objet à la deuxième image, le produit étant détecté à l’étape (b) seulement si la première et/ou la deuxième boite englobante du produit vérifie au moins un critère donné.Step (b) comprises determining a first bounding box of the product by applying the first object localization model to the first image, and determining a second bounding box of the product by applying the second object localization model to the second image, the product being detected in step (b) only if the first and/or second bounding box of the product satisfies at least one given criterion.

Les premier et deuxième modèles de localisation d’objet sont des réseaux de neurones à convolution, en particulier de type YOLO.The first and second object localization models are convolutional neural networks, particularly of the YOLO type.

Le procédé comprend une étape (a0) d’entrainement desdits premier et deuxième modèles de détection d’objet, premier et deuxièmes modèles d’extraction de caractéristiques, et/ou modèle de classification, sur des première et deuxième bases d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue.The method comprises a step (a0) of training said first and second object detection models, first and second feature extraction models, and/or classification model, on first and second reference image bases representing a plurality of instances of the product respectively according to said first and second points of view.

L’étape (a0) comprend la génération d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue en insérant lesdites instances du produit dans des images vides selon lesdits premier et deuxième points de vue.Step (a0) comprises generating reference images representing a plurality of instances of the product respectively according to said first and second viewpoints by inserting said instances of the product into empty images according to said first and second viewpoints.

Les premier et deuxième modèles d’extraction de caractéristiques sont des blocs d’extraction de caractéristiques de réseaux de neurones à convolution, en particulier de type Swin Transformer.The first and second feature extraction models are feature extraction blocks of convolutional neural networks, particularly of the Swin Transformer type.

Le modèle de classification est un réseau de neurones, en particulier de type perceptron.The classification model is a neural network, particularly of the perceptron type.

Selon un deuxième aspect, l’invention concerne un procédé de tri d’un produit dans un système, caractérisé en ce qu’il comprend la mise en œuvre du procédé de classification du produit selon le premier aspect, et une étape (e) du tri du produit par le système en fonction du résultat de ladite classification.According to a second aspect, the invention relates to a method for sorting a product in a system, characterized in that it comprises the implementation of the method for classifying the product according to the first aspect, and a step (e) of sorting the product by the system according to the result of said classification.

Selon un troisième aspect, l’invention propose un serveur de classification d’un produit dans un système, en vue de son tri, le serveur étant caractérisé en ce qu’il comprend des moyens de traitement de données configurés pour :

Obtenir au moins une première image du dudit produit et une deuxième image représentant potentiellement ledit produit respectivement selon un premier point de vue et un deuxième point de vue différent du premier point vue, depuis des caméras dudit système ;
Détecter ledit produit en fonction du résultat de l’application d’un premier modèle de détection d’objet à la première image et d’un deuxième modèle de détection d’objet à la deuxième image ;
Lorsque ledit produit est détecté, obtenir un premier vecteur descriptif dudit produit et d’un deuxième vecteur descriptif dudit produit au moyen d’un premier modèle d’extraction de caractéristiques et d’un deuxième modèle d’extraction de caractéristiques respectivement appliqués sur la première image et la deuxième image ;
Classifier un vecteur long correspondant à une concaténation des premier et deuxième vecteurs descriptifs dudit produit, au moyen d’un modèle de classification.

According to a third aspect, the invention proposes a server for classifying a product in a system, with a view to sorting it, the server being characterized in that it comprises data processing means configured to:

Obtaining at least a first image of said product and a second image potentially representing said product respectively from a first point of view and a second point of view different from the first point of view, from cameras of said system;
Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;
When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;
Classify a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, using a classification model.

Selon un quatrième aspect, l’invention concerne un ensemble comprenant un serveur selon le troisième aspect et le système connectés.According to a fourth aspect, the invention relates to an assembly comprising a server according to the third aspect and the connected system.

Selon un cinquième et un sixième aspect, l’invention concerne un produit programme d’ordinateur comprenant des instructions de code pour l’exécution d’un procédé selon le premier aspect de classification d’un produit dans un système, en vue de son tri ; et un moyen de stockage lisible par un équipement informatique sur lequel est enregistré un produit programme d’ordinateur comprenant des instructions de code pour l’exécution d’un procédé selon le premier aspect de classification d’un produit dans un système, en vue de son tri.According to a fifth and a sixth aspect, the invention relates to a computer program product comprising code instructions for executing a method according to the first aspect of classifying a product in a system, with a view to sorting it; and a storage means readable by computer equipment on which is recorded a computer program product comprising code instructions for executing a method according to the first aspect of classifying a product in a system, with a view to sorting it.

PRESENTATION OF FIGURES

D’autres caractéristiques et avantages de la présente invention apparaîtront à la lecture de la description qui va suivre d’un mode de réalisation préférentiel. Cette description sera donnée en référence aux dessins annexés dans lesquels :Other characteristics and advantages of the present invention will appear on reading the following description of a preferred embodiment. This description will be given with reference to the appended drawings in which:

FIG. 1laFIG. 1est un schéma d’un système pour la mise en œuvre du procédé selon l’invention ; FIG. 1 there FIG. 1 is a diagram of a system for implementing the method according to the invention;

FIG. 2laFIG. 2représente un exemple d’image d’un produit selon un premier point de vue ; FIG. 2 there FIG. 2 represents an example image of a product from a first point of view;

FIG. 2laFIG. 2représente un exemple d’image d’un produit selon un deuxième point de vue ; FIG. 2 there FIG. 2 represents an example image of a product from a second point of view;

FIG. 3laFIG. 3est un logigramme illustrant les étapes d’un mode de réalisation du procédé selon l’invention ; FIG. 3 there FIG. 3 is a flowchart illustrating the steps of an embodiment of the method according to the invention;

FIG. 4laFIG. 4illustre schématiquement l’agencement des modèles utilisés ; FIG. 4 there FIG. 4 schematically illustrates the arrangement of the models used;

FIG. 5laFIG. 5représente une exemple d’image de référence synthétique. FIG. 5 there FIG. 5 represents an example of a synthetic reference image.

DETAILED DESCRIPTION

ArchitectureArchitecture

La présente invention concerne un procédé de classification d’un produit dans un système 1 tel que représenté sur laFIG. 1, en particulier pour la mise en œuvre du tri dudit produit, en particulier pour son recyclage.The present invention relates to a method of classifying a product in a system 1 as shown in the FIG. 1 , in particular for the implementation of sorting of said product, in particular for its recycling.

Ledit produit peut être tout produit dont le tri est souhaité, en particulier un vêtement, et on prendra dans la suite de la description une chaussure, mais ce peut être tout type de produit, et notamment une pièce détachée, une marchandise en vrac, un déchet, etc. On comprend que le présent procédé est capable de classifier un grand nombre d’instances (i.e. exemplaires) dudit produit, présentés séquentiellement.Said product can be any product whose sorting is desired, in particular an item of clothing, and in the remainder of the description we will take a shoe, but it can be any type of product, and in particular a spare part, bulk goods, waste, etc. It is understood that the present method is capable of classifying a large number of instances (i.e. copies) of said product, presented sequentially.

A ce titre, ledit système 1 est typiquement de type convoyeur à bande (comme l’on voit sur laFIG. 1), c’est-à-dire qu’il comprend une bande transporteuse sans fin 10, déplaçant en continu des instances dudit produit, mais tout autre technique pourra être utilisée (le système 1 peut être à rouleaux, à godets, avoir des bras, voire même une cavité traversée par le produit en chute libre, etc.).As such, said system 1 is typically of the belt conveyor type (as seen in the FIG. 1 ), that is to say that it comprises an endless conveyor belt 10, continuously moving instances of said product, but any other technique may be used (the system 1 may be roller-based, bucket-based, have arms, or even a cavity crossed by the product in free fall, etc.).

Par classification, on entend l’affectation à chaque produit d’une classe parmi une pluralité de classes prédéterminées, par exemple correspondant à un modèle de chaussures. Le nombre de classes peut être très élevé, on peut avoir notamment des centaines de modèles de chaussures. Alternativement, la classe peut être relative à un matériau (ou combinaison de matériaux) ou directement une technique de recyclage possible du produit. Le présent procédé n’est limité à aucune stratégie de classification particulière.Classification means assigning each product one of a plurality of predetermined classes, for example, corresponding to a shoe model. The number of classes can be very high; there may be hundreds of shoe models. Alternatively, the class may relate to a material (or combination of materials) or directly to a possible recycling technique for the product. This method is not limited to any particular classification strategy.

Par tri on entend la séparation des instances du produit selon le résultat de classification. A ce titre, ledit système 1 comprend préférentiellement des moyens de tri 13 du produit, par exemple des aiguillages, des portes, des actuateurs divers, ou encore des bras de préhension, etc. Les moyens de tri 13 permettent de placer ensemble les instances du produit d’une même classe (par exemples toutes les chaussures d’un même modèle), en vue de les recycler. A noter que le système 1 peut même être un système de recyclage et comprendre des moyens de recyclage des produits triés.Sorting means the separation of product instances according to the classification result. As such, said system 1 preferably comprises product sorting means 13, for example switches, doors, various actuators, or even gripping arms, etc. The sorting means 13 make it possible to place together the product instances of the same class (for example all the shoes of the same model), with a view to recycling them. Note that the system 1 can even be a recycling system and comprise means for recycling the sorted products.

Le système 1 comprend en outre des caméras 11, 12 pour observer ledit produit. De manière préférée, on a au moins une première caméra 11 permettant d’observer ledit produit selon un premier point de vue et une deuxième caméra 12 permettant d’observer ledit produit selon un deuxième point de vue différent du premier point vue. L’idée est d’observer le produit selon plusieurs angles de sorte à augmenter la qualité d’informations et faciliter sa classification. On comprend qu’à ce titre tous les exemplaires du produit dans le système 1 seront dans la mesure du possible placés sensiblement selon la même orientation, par exemple pointe vers l’avant sur la bande 10 comme dans le cas de laFIG. 1, même si le procédé aura une certaine robustesse et sera capable de reconnaitre une chaussure qui aurait par exemple été posée à l’envers, notamment en prenant en compte un nombre de points de vue possibles supérieur à deux (on verra plus loin un exemple à huit points de vues possibles).The system 1 further comprises cameras 11, 12 for observing said product. Preferably, there is at least a first camera 11 for observing said product from a first point of view and a second camera 12 for observing said product from a second point of view different from the first point of view. The idea is to observe the product from several angles so as to increase the quality of information and facilitate its classification. It is understood that in this respect all the copies of the product in the system 1 will be placed as far as possible substantially in the same orientation, for example point forward on the strip 10 as in the case of FIG. 1 , even if the process will have a certain robustness and will be able to recognize a shoe which has for example been placed upside down, in particular by taking into account a number of possible points of view greater than two (we will see later an example with eight possible points of view).

Par exemple, le premier point de vue sera par-dessus le produit, et le deuxième point de vue sera sur un côté, ce qui donne des exemples d’images telles que visibles respectivement sur lesfigures 2a et 2b.For example, the first viewpoint will be from above the product, and the second viewpoint will be from one side, resulting in example images as seen in Figures 2a and 2b respectively.

A noter qu’il peut y avoir plus de deux points de vue et donc plus de deux caméras 11, 12. En particulier, comme expliqué avant jusqu’à huit points de vues pourraient être pris en compte, avec dans le cas d’une chaussure :

Vue du côté gauche ;
Vue de ¾ gauche ;
Vue de face ;
Vue de ¾ droite ;
Vue de droite ;
Vue du dessus ;
Vue du dessus oblique.

Note that there may be more than two viewpoints and therefore more than two cameras 11, 12. In particular, as explained before, up to eight viewpoints could be taken into account, with in the case of a shoe:

Left side view;
¾ left view;
Front view;
Right ¾ view;
Right view;
Top view;
Oblique top view.

Le présent procédé est mis en œuvre par un premier serveur 2a qui peut faire partie du système 1, ou distant et connecté par un réseau 20 tel que le réseau internet. A noter qu’on peut avoir un même premier serveur 2a connecté à plusieurs systèmes 1. De manière avantageuse, on a un deuxième serveur 2b (qui est un équipement d’apprentissage comme on le verra), typiquement distant (i.e. dans le réseau 20), mais qui peut être confondu avec le premier serveur 2a.The present method is implemented by a first server 2a which can be part of the system 1, or remote and connected by a network 20 such as the internet network. Note that we can have the same first server 2a connected to several systems 1. Advantageously, we have a second server 2b (which is a learning device as we will see), typically remote (i.e. in the network 20), but which can be confused with the first server 2a.

Chaque serveur 2a, 2b dispose de moyens de traitement de données 21a, 21b (typiquement un processeur) et de moyens de stockage de données 22a, 22b (une mémoire, par exemple un disque dur). Comme l’on verra, les moyens de traitement de données 22b du deuxième serveur 2b peuvent stocker une base de données d’apprentissage. Dans un souci de simplification, la base de données d’apprentissage est nommée « base d’apprentissage » dans la suite de la présente description.Each server 2a, 2b has data processing means 21a, 21b (typically a processor) and data storage means 22a, 22b (a memory, for example a hard disk). As will be seen, the data processing means 22b of the second server 2b can store a learning database. For the sake of simplification, the learning database is called “learning base” in the remainder of this description.

ProcédéProcess

En référence à laFIG. 3, le présent procédé est mis en œuvre par les moyens de traitement de données 21 du premier serveur 2a, et commence par une étape (a) d’obtention (depuis le système 1) d’au moins une première image dudit produit et une deuxième image représentant potentiellement ledit produit respectivement selon le premier point de vue et le deuxième point de vue différent du premier point vue, depuis les caméras 11, 12 dudit système 1.In reference to the FIG. 3 , the present method is implemented by the data processing means 21 of the first server 2a, and begins with a step (a) of obtaining (from the system 1) at least a first image of said product and a second image potentially representing said product respectively according to the first point of view and the second point of view different from the first point of view, from the cameras 11, 12 of said system 1.

En d’autres termes, l’étape (a) comprend avantageusement l’acquisition de la première image du produit par la première caméra 11 et de la deuxième image du produit par la deuxième caméra 12, et la transmission au serveur 2a de la paire de la première image et de la deuxième image. Ces images sont dites « candidates » par opposition à des images « de référence » qui seront utilisées pour l’apprentissage de modèles.In other words, step (a) advantageously comprises the acquisition of the first image of the product by the first camera 11 and the second image of the product by the second camera 12, and the transmission to the server 2a of the pair of the first image and the second image. These images are called “candidate” as opposed to “reference” images which will be used for learning models.

Le procédé comprend ensuite une étape (b) de détection dudit produit en fonction du résultat de l’application d’un premier modèle de détection d’objet à la première image et d’un deuxième modèle de détection d’objet à la deuxième image. Par détection, on entend identification du produit dans lesdites images, i.e. reconnaissance que les images représentent bien une occurrence du produit (et pas un autre objet).The method then comprises a step (b) of detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image. By detection is meant identification of the product in said images, i.e. recognition that the images indeed represent an occurrence of the product (and not another object).

Selon un premier mode de réalisation, les étapes (a) et (b) sont répétées en boucle, en particulier à une fréquence donnée (notamment correspondant à une fréquence d’acquisition des caméras 11 et 12, par exemple toutes les 100 ms). L’idée est que les caméras 11, 12 acquièrent en permanence des paires d’images (i.e. filment) et les moyens 21a tentent d’y détecter le produit. La suite du procédé est mise en œuvre seulement lorsque le produit est détecté, car en effet il n’y a pas lieu de classifier des images dont on sait qu’elles ne représentent pas le produit. Ce mode de réalisation est particulièrement adapté à un convoyeur à bande sur lequel les instances de produit se déplacent en continu et ne se trouvent donc qu’à un moment précis au bon endroit devant les caméras 11, 12.According to a first embodiment, steps (a) and (b) are repeated in a loop, in particular at a given frequency (in particular corresponding to an acquisition frequency of the cameras 11 and 12, for example every 100 ms). The idea is that the cameras 11, 12 continuously acquire pairs of images (i.e. film) and the means 21a attempt to detect the product there. The rest of the method is implemented only when the product is detected, because in fact there is no need to classify images which are known not to represent the product. This embodiment is particularly suitable for a conveyor belt on which the product instances move continuously and are therefore only at a specific moment in the right place in front of the cameras 11, 12.

Selon un deuxième mode de réalisation, les étapes (a) et (b) sont mises en œuvre après que le produit ait été mis en place à une position donnée, par exemple un fonctionnement discontinu avec des bras attrapant les instances du produit une à une. On sait alors que la paire d’images obtenue à l’étape (a) devrait représenter le produit, l’objectif de l’étape (b) est de le confirmer et de potentiellement localiser le produit.According to a second embodiment, steps (a) and (b) are implemented after the product has been placed in a given position, for example a discontinuous operation with arms grabbing the product instances one by one. It is then known that the pair of images obtained in step (a) should represent the product, the objective of step (b) is to confirm this and potentially locate the product.

Dans tous les cas, les premier et deuxième modèles de détection d’objet peuvent être des modèles de localisation d’objet, en particulier des réseaux de neurones à convolution, par exemple de type YOLO (mais également R-CNN, Mobilenet, etc.), entraînés sur des bases d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue (et avantageusement d’autres points de vue comme évoqué avant, on discutera plus en détails plus loin l’apprentissage). L’homme du métier connait de nombreux réseaux de détection aptes à reconnaître un produit, et les deux modèles peuvent être différents ou identiques.In all cases, the first and second object detection models may be object localization models, in particular convolutional neural networks, for example of the YOLO type (but also R-CNN, Mobilenet, etc.), trained on reference image bases representing a plurality of instances of the product respectively according to said first and second points of view (and advantageously other points of view as mentioned before, learning will be discussed in more detail later). Those skilled in the art know numerous detection networks capable of recognizing a product, and the two models may be different or identical.

Comme l’on voit dans laFIG. 4, l’étape (b) comprend alors avantageusement en outre la localisation dudit produit dans les première et deuxième images, et le recadrage des première et deuxième images sur le produit localisé, la suite du procédé (étape (c)) étant mis en œuvre sur les première et deuxième images recadrées de sorte à les « standardiser » et faciliter leur classification. On peut en outre avoir des prétraitements comme du détourage, etc.As seen in the FIG. 4 , step (b) then advantageously further comprises the location of said product in the first and second images, and the cropping of the first and second images on the located product, the rest of the method (step (c)) being implemented on the first and second cropped images so as to “standardize” them and facilitate their classification. It is also possible to have pre-processing such as clipping, etc.

A noter que ladite détection est en soi déjà une classification, car l’on classe ce que représente les images comme étant une occurrence du produit (i.e. une chaussure) mais d’une finesse inférieure à la classification qu’on attend ici. Par exemple YOLO est capable de reconnaître un objet comme étant une chaussure mais pas de reconnaître des centaines de modèles différents de chaussures, qui peuvent être très similaires. Comme l’on verra le présent procédé y parvient.Note that said detection is in itself already a classification, because we classify what the images represent as being an occurrence of the product (i.e. a shoe) but with a lower precision than the classification expected here. For example, YOLO is capable of recognizing an object as being a shoe but not of recognizing hundreds of different models of shoes, which can be very similar. As we will see, the present method achieves this.

On note que l’étape (b) est spécifique en ce qu’on a deux images et donc deux résultats de détection. Typiquement, l’étape (b) comprend le calcul d’un premier score de détection par application du premier modèle de détection d’objet à la première image, et le calcul d’un deuxième score de détection par application du deuxième modèle de détection d’objet à la deuxième image. Chaque score de détection est généralement une valeur entre 0 et 1 représentative de la probabilité de détection, et on a souvent un seuil de détection (par exemple 95%).Note that step (b) is specific in that we have two images and therefore two detection results. Typically, step (b) includes the calculation of a first detection score by applying the first object detection model to the first image, and the calculation of a second detection score by applying the second object detection model to the second image. Each detection score is generally a value between 0 and 1 representative of the probability of detection, and we often have a detection threshold (for example 95%).

Ainsi, on peut avoir des cas dans lequel un seul modèle de détection sur les deux qui détecterait le produit (le premier score est supérieur au seuil et le second score est inférieur au seuil).Thus, we can have cases in which only one of the two detection models would detect the product (the first score is above the threshold and the second score is below the threshold).

On pourra procéder de nombreuses manières différentes :

Selon un premier mode classique, le produit est considéré détecté à l’étape à l’unanimité, i.e. si le produit est détecté dans chaque image (chaque modèle renvoie un score de détection supérieur au seuil individuel, par exemple 95%).
Selon un deuxième mode, le produit est détecté à l’étape (b) si le produit des premier et deuxième scores de détection est supérieur à un seuil prédéterminé, par exemple 90%. Cela permet de ne pas écarter des cas où juste un modèle sur les deux renverrait un score légèrement inférieur au seuil individuel. A noter qu’on peut mettre en œuvre les deux modes en même temps, i.e. le produit est détecté à l’étape (b) dans l’un ou l’autres des cas (i.e. si chacun des premier et deuxième scores de détection est supérieur à un seuil individuel ou si le produit des premier et deuxième scores de détection est supérieur à un seuil commun).
Selon un troisième mode, le produit est détecté à l’étape (b) si l’un du premier et deuxième scores de détection est supérieur à un seuil bas (par exemple 92%) et l’autre est supérieur à un seuil haut (par exemple 97%). C’est une mode « intermédiaire » entre les deux modes précédents en termes de sévérité. A noter qu’il peut également être mis en œuvre en même temps que l’un ou l’autre de ces modes.
Selon un quatrième mode, potentiellement en combinaison avec l’un ou l’autre des modes précédents, lorsque les modèles de détection sont des modèles de localisation l’étape (b) comprend la détermination d’une première boite englobante du produit (« bounding box ») par application du premier modèle de détection d’objet à la première image, et le calcul d’une deuxième boite englobante du produit par application du deuxième modèle de détection d’objet à la deuxième image, et le produit est détecté à l’étape (b) seulement si la première et/ou la deuxième boite englobante vérifie au moins un critère, comme une taille minimale ou une position suffisamment centrée, pour éviter des détections tronquées qui pénaliseraient la suite de la classification. En d’autres termes, même si le produit est parfaitement détecté, on attend que la bounding box soit de qualité suffisante avant de lancer la suite du procédé.

There are many different ways to do this:

According to a first classical mode, the product is considered to be detected at the unanimous stage, i.e. if the product is detected in each image (each model returns a detection score higher than the individual threshold, for example 95%).
According to a second mode, the product is detected in step (b) if the product of the first and second detection scores is greater than a predetermined threshold, for example 90%. This makes it possible not to rule out cases where just one model out of the two would return a score slightly lower than the individual threshold. Note that both modes can be implemented at the same time, i.e. the product is detected in step (b) in either case (i.e. if each of the first and second detection scores is greater than an individual threshold or if the product of the first and second detection scores is greater than a common threshold).
According to a third mode, the product is detected in step (b) if one of the first and second detection scores is greater than a low threshold (for example 92%) and the other is greater than a high threshold (for example 97%). This is an “intermediate” mode between the two previous modes in terms of severity. Note that it can also be implemented at the same time as one or other of these modes.
According to a fourth mode, potentially in combination with one or other of the preceding modes, when the detection models are localization models, step (b) comprises the determination of a first bounding box of the product by applying the first object detection model to the first image, and the calculation of a second bounding box of the product by applying the second object detection model to the second image, and the product is detected in step (b) only if the first and/or the second bounding box satisfies at least one criterion, such as a minimum size or a sufficiently centered position, to avoid truncated detections which would penalize the rest of the classification. In other words, even if the product is perfectly detected, the bounding box is expected to be of sufficient quality before launching the rest of the method.

Dans le mode de réalisation préféré basé sur YOLO, on note les sorties des deux modèles de détection (localisation) ;In the preferred embodiment based on YOLO, the outputs of the two detection (localization) models are noted;

[x11, y11, x21, y21, p1, c1] (première image)[x11, y11, x21, y21, p1, c1] (first image)

[x12, y12, x22, y22, p2, c2] (deuxième image)[x12, y12, x22, y22, p2, c2] (second image)

Avec :

(x1i, y1i) les coordonnées (en termes de pixels) du coin supérieur-gauche de la bounding box
(x2i, y2i) les coordonnées (en termes de pixels) du coin inférieur droit de la bounding box (on a 0 ≤ x1i ≤ x2i ≤ largeur image i, et 0 ≤ y1i ≤ y2i ≤ hauteur image i).
pi le score de détection.
ci la classe de l’objet détecté (potentiellement toujours 0 dans notre cas car une seule classe « chaussure » attendue).

With :

(x1i, y1i) the coordinates (in pixels) of the upper-left corner of the bounding box
(x2i, y2i) the coordinates (in terms of pixels) of the lower right corner of the bounding box (we have 0 ≤ x1i ≤ x2i ≤ width of image i, and 0 ≤ y1i ≤ y2i ≤ height of image i).
pi the detection score.
here is the class of the detected object (potentially always 0 in our case because only one “shoe” class is expected).

Dans le cas où l’un utiliserait le troisième mode et le quatrième mode en combinaison, la condition de déclenchement du trigger pour la classification (i.e. mise en œuvre de la suite du procédé) se formulerait comme :In the case where one would use the third mode and the fourth mode in combination, the trigger condition for classification (i.e. implementation of the rest of the process) would be formulated as:

Si (p1 > seuil bas et p2 > seuil haut) OU (p1 > seuil haut et p2 > seuil bas) ET (x1i > position min 1 et/ou x2i > position min 2, par exemple milieu de l’image, ou équivalent sur les y1i et/ou y2i).If (p1 > low threshold and p2 > high threshold) OR (p1 > high threshold and p2 > low threshold) AND (x1i > min position 1 and/or x2i > min position 2, for example middle of the image, or equivalent on the y1i and/or y2i).

Ensuite, lorsque ledit produit est détecté, le procédé comprend une étape (c) d’obtention d’un premier vecteur descriptif dudit produit et d’un deuxième vecteur descriptif dudit produit au moyen d’un premier modèle d’extraction de caractéristiques et d’un deuxième modèle d’extraction de caractéristiques respectivement appliqués sur la première image et la deuxième image (le cas échéant recadrées).Then, when said product is detected, the method comprises a step (c) of obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image (if applicable cropped).

De manière préférée, lesdits premier et deuxièmes modèle d’extraction de caractéristiques sont des blocs d’extraction de caractéristiques de réseaux neurones réalisant des tâches de vision (notamment de classification, de localisation, de segmentation, etc.), en particulier de type Swin Transformer (mais également VGG, encore YOLO, etc.)Preferably, said first and second feature extraction models are feature extraction blocks of neural networks performing vision tasks (in particular classification, localization, segmentation, etc.), in particular of the Swin Transformer type (but also VGG, YOLO, etc.)

En effet tous ces réseaux ont d’abord un bloc d’extraction de caractéristiques, puis un bloc final d’encodage réalisant la tâche attendue à partir du vecteur de caractéristiques extrait. Par exemple, le bloc d’encodage de YOLO génère le vecteur [x1, y1, x2, y2, p, c] à partir du vecteur de caractéristiques.Indeed, all these networks first have a feature extraction block, then a final encoding block performing the expected task from the extracted feature vector. For example, the encoding block of YOLO generates the vector [x1, y1, x2, y2, p, c] from the feature vector.

Ce vecteur de caractéristiques, ou « feature map » est une représentation haut niveau de l’image d’entrée. A noter que le bloc d’extraction de caractéristiques peut en pratique générer une matrice de caractéristiques, qu’on remet alors sous forme d’un vecteur en disposant à la suite les lignes.This feature vector, or "feature map," is a high-level representation of the input image. Note that the feature extraction block can in practice generate a feature matrix, which is then converted into a vector by arranging the lines in sequence.

On verra plus loin comme on peut obtenir premier et deuxièmes modèle d’extraction de caractéristiques, mais à nouveau ils peuvent être identiques ou différents, et potentiellement pris directement sur étagère.We will see later how we can obtain the first and second feature extraction models, but again they can be identical or different, and potentially taken directly off the shelf.

L’intérêt de ne déclencher l’étape (c) qu’en cas de détection permet d’économiser des ressources, car les modèles d’extraction de caractéristiques sont bien plus lourds et longs à mettre en œuvre si l’on veut une fiabilité suffisante.The advantage of only triggering step (c) in the event of detection saves resources, because feature extraction models are much heavier and take longer to implement if sufficient reliability is desired.

Enfin, dans une étape (d) particulièrement originale, comme l’on voit dans laFIG. 4le procédé comprend la classification d’un vecteur long correspondant à une concaténation des premier et deuxième vecteurs descriptifs dudit produit, au moyen d’un modèle de classification.Finally, in a particularly original step (d), as seen in the FIG. 4 the method comprises classifying a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, by means of a classification model.

En effet, on pourrait classifier directement chaque image (et combiner d’une façon ou d’une autre les résultats des deux classifications), mais comme expliqué la fiabilité est insuffisante pour une classification fine, par exemple au niveau de modèles de chaussures. En concaténant les deux vecteurs, on combine l’information des deux images d’une manière astucieuse, et alors on devient capable d’atteindre le niveau de fiabilité suffisant.Indeed, we could classify each image directly (and combine the results of both classifications in one way or another), but as explained, the reliability is insufficient for fine classification, for example at the level of shoe models. By concatenating the two vectors, we combine the information from the two images in a clever way, and then we become able to achieve the sufficient level of reliability.

Ledit modèle de classification utilisé peut être tout modèle apte à classifier un vecteur (puisqu’on a déjà réalisé l’extraction de caractéristiques), en particulier tout réseau de neurones à propagation avant, voire un simple perceptron, et pas nécessairement un réseau de neurones à convolution pour la vision.The classification model used can be any model capable of classifying a vector (since the feature extraction has already been carried out), in particular any forward propagation neural network, or even a simple perceptron, and not necessarily a convolutional neural network for vision.

EntraînementTraining

Le procédé comprend avantageusement une étape préalable (a0) d’apprentissage (ou entraînement) desdits premier et deuxième modèles de détection d’objet, premier et deuxièmes modèles d’extraction de caractéristiques, et/ou modèle de classification étant entraînés sur des première et deuxième bases d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue, en particulier mise en œuvre par les moyens de traitement 21b du deuxième serveur 2b, le modèle étant ensuite chargé sur le premier serveur 2a. On rappelle que le deuxième serveur 2b peut être confondu avec le premier serveur 2a.The method advantageously comprises a prior step (a0) of learning (or training) said first and second object detection models, first and second feature extraction models, and/or classification model being trained on first and second reference image bases representing a plurality of instances of the product respectively according to said first and second points of view, in particular implemented by the processing means 21b of the second server 2b, the model then being loaded onto the first server 2a. It is recalled that the second server 2b can be confused with the first server 2a.

Ces images de références sont associées en ensemble d’images représentant la même instance du produit selon divers points de vue (au moins une paire selon le premier et le deuxième point de vue) et sont déjà classifiées, c’est-à-dire qu’on sait qu’elles représentent le produit et elles sont associés à un résultat de classification attendu (vérité-terrain), qui est naturellement le même pour toutes les images d’un même ensemble.These reference images are associated in a set of images representing the same instance of the product according to various points of view (at least one pair according to the first and second points of view) and are already classified, that is to say that we know that they represent the product and they are associated with an expected classification result (ground truth), which is naturally the same for all the images of the same set.

En particulier :

les premier et deuxième modèles de détection d’objet peuvent être pré entrainés (ou entièrement entraînés) sur des bases publiques. Alternativement ou en complément, on les entraîne (c’est du fine-tuning si les modèles sont déjà pré-entraînés) sur les images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue. Les premier et deuxième modèles peuvent être identiques et entrainés sur toutes les images de références (y compris selon des points de vue supplémentaires – jusqu’à huit points de vue) pour améliorer leur robustesse, ou entraînés plus spécifiquement sur les images de référence correspondant au point de vue associé (i.e. premier point de vue pour le premier modèle de détection et deuxième point de vue pour le deuxième modèle de détection) pour améliorer leur précision.
En ce qui concerne les premier et deuxièmes modèles d’extraction de caractéristiques, on rappelle qu’il s’agit typiquement du bloc d’extraction de caractéristiques de modèles complets (comprenant en outre un bloc final d’encodage). Ce sont ces modèles plus complets qui sont en pratique entraînés, et ce de manière similaire aux modèles de détection : ils peuvent être également pré entrainés (ou entièrement entraînés) sur des bases publiques (pour une tache quelconque telle que classification, détection, segmentation, etc.). Alternativement ou en complément, on les entraîne (c’est du fine-tuning si les modèles sont déjà pré-entraînés) sur les images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue, cette fois spécifiquement dans la tâche de classification. A nouveau les deux modèles peuvent être identiques et entrainés sur toutes les images de références (y compris selon des points de vue supplémentaires – jusqu’à huit points de vue) pour améliorer leur robustesse, ou entraînés plus spécifiquement sur les images de référence correspondant au point de vue associé (i.e. premier point de vue pour le premier modèle d’extraction de caractéristiques et deuxième point de vue pour le deuxième modèle d’extraction de caractéristiques) pour améliorer leur précision. On récupère au final les modèles d’extraction de caractéristiques en supprimant le bloc d’encodage du modèle complet entraîné.
le modèle de classification requiert la base d’images de référence : préférentiellement on fixe les poids des modèles d’extractions de caractéristiques, pour chaque paire/ensemble d’images de référence (représentant la même instance du produit selon les premier et deuxième point de vue – mais à nouveau potentiellement d’autres points de vue) en construisant le vecteur long correspondant, et on entraîne le modèle de classification à prédire le résultat de classification attendu pour ces images de référence.

Especially :

the first and second object detection models may be pre-trained (or fully trained) on public databases. Alternatively or in addition, they are trained (this is fine-tuning if the models are already pre-trained) on the reference images representing a plurality of instances of the product respectively according to said first and second viewpoints. The first and second models may be identical and trained on all the reference images (including according to additional viewpoints – up to eight viewpoints) to improve their robustness, or trained more specifically on the reference images corresponding to the associated viewpoint (i.e. first viewpoint for the first detection model and second viewpoint for the second detection model) to improve their accuracy.
Regarding the first and second feature extraction models, it is recalled that this is typically the feature extraction block of complete models (further comprising a final encoding block). It is these more complete models that are in practice trained, and in a similar manner to the detection models: they can also be pre-trained (or fully trained) on public databases (for any task such as classification, detection, segmentation, etc.). Alternatively or in addition, they are trained (this is fine-tuning if the models are already pre-trained) on the reference images representing a plurality of instances of the product respectively according to said first and second points of view, this time specifically in the classification task. Again, the two models can be identical and trained on all reference images (including additional viewpoints – up to eight viewpoints) to improve their robustness, or trained more specifically on the reference images corresponding to the associated viewpoint (i.e., first viewpoint for the first feature extraction model and second viewpoint for the second feature extraction model) to improve their accuracy. The feature extraction models are finally recovered by removing the encoding block from the fully trained model.
the classification model requires the reference image base: preferably we fix the weights of the feature extraction models, for each pair/set of reference images (representing the same instance of the product according to the first and second point of view – but again potentially other points of view) by constructing the corresponding long vector, and we train the classification model to predict the expected classification result for these reference images.

De manière particulièrement préférée, l’étape (a0) comprend la génération d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue en insérant lesdites instances du produit dans des images vides selon lesdits premier et deuxième points de vue, i.e. la création d’images de référence synthétique.In a particularly preferred manner, step (a0) comprises the generation of reference images representing a plurality of instances of the product respectively according to said first and second points of view by inserting said instances of the product into empty images according to said first and second points of view, i.e. the creation of synthetic reference images.

On peut naturellement alternativement ou en complément générer des images de référence « authentiques » en disposant de nombreuses instances du produit dans le système 1 et en utilisant les caméras 11, 12, mais cela est long en particulier si l’on veut une classification fine notamment au niveau du modèle de chaussure, car il faut beaucoup d’images de référence.Naturally, alternatively or additionally, we can generate “authentic” reference images by having numerous instances of the product in system 1 and using cameras 11, 12, but this is long, particularly if we want a fine classification, especially at the shoe model level, because many reference images are required.

Ainsi, en référence à laFIG. 5, on peut simplement prendre en photo les instances du produit sur un fond quelconque (en particulier un fond neutre par exemple blanc ou vert) selon les divers points de vue (notamment les huit points de vue évoqués), préférentiellement de manière standardisée en termes de conditions de prise de vue (éclairage, etc.), ce qui peut être fait simplement et partout, sans disposer du système.So, with reference to the FIG. 5 , we can simply take a photo of the product instances on any background (in particular a neutral background for example white or green) according to the various points of view (in particular the eight points of view mentioned), preferably in a standardized manner in terms of shooting conditions (lighting, etc.), which can be done simply and everywhere, without having the system.

Ensuite, connaissant réellement les premier et deuxième points de vue du système 1, on sélectionne parmi les images selon les huit points de vue ceux qui correspondent aux premier et deuxième points de vue, on détoure le produit (i.e. on supprime le fond neutre et on extrait la partie de l’image qui représente le produit), et on l’incruste sur l’image vide (voir le résultat en partie de droite de laFIG. 5).Then, actually knowing the first and second points of view of system 1, we select from the images according to the eight points of view those which correspond to the first and second points of view, we cut out the product (i.e. we remove the neutral background and we extract the part of the image which represents the product), and we insert it on the empty image (see the result in the right part of the FIG. 5 ).

A noter que « l’image vide » peut être un fond uni, ou bien une image d’une caméra 11, 12 en l’absence du produit, représentant alors le système 1 « vide ».Note that the “empty image” can be a plain background, or an image from a camera 11, 12 in the absence of the product, then representing the “empty” system 1.

Dans tous les cas on peut utiliser des techniques d’augmentation pour multiplier le nombre d’images de références synthétiques et/ou authentiques.In any case, augmentation techniques can be used to multiply the number of synthetic and/or authentic reference images.

ServeurServer

Selon un deuxième aspect, l’invention concerne le procédé de tri d’un produit dans le système 1 (connecté au premier serveur 2a).According to a second aspect, the invention relates to the method of sorting a product in the system 1 (connected to the first server 2a).

Comme expliqué il comprend la mise en œuvre du procédé de classification du produit selon le premier aspect (étapes (a)-(d)), puis une étape (e) du tri du produit par le système 1 en fonction du résultat de ladite classification, notamment en utilisant des moyens de tri 13 du système 1, par exemple des aiguillages, des portes, des actuateurs divers, ou encore des bras de préhension, etc.As explained, it comprises the implementation of the product classification method according to the first aspect (steps (a)-(d)), then a step (e) of sorting the product by the system 1 according to the result of said classification, in particular using sorting means 13 of the system 1, for example switches, doors, various actuators, or even gripping arms, etc.

Selon un aspect supplémentaire, l’invention peut même concerner le procédé de recyclage d’un produit dans le système 1, si le système 1 est adapté pour.According to a further aspect, the invention may even relate to the method of recycling a product in the system 1, if the system 1 is suitable for it.

A ce titre, le procédé comprend en outre une étape (f) de recyclage du produit trié, chaque classe de produit étant associé à une technique de recyclage adaptée (et donc les produits triés pouvant tous ensemble subir la même technique adaptée).In this respect, the method further comprises a step (f) of recycling the sorted product, each class of product being associated with a suitable recycling technique (and therefore the sorted products can all undergo the same suitable technique together).

ServeurServer

Selon un troisième aspect, l’invention concerne le premier serveur 2a pour la mise en œuvre du procédé selon le premier aspect.According to a third aspect, the invention relates to the first server 2a for implementing the method according to the first aspect.

Ainsi, ce premier serveur 2a comprend comme expliqué au moins des moyens de traitement de données 21a et une mémoire 22a. Il s’agit typiquement d’un serveur de classification d’un produit dans un système 1, potentiellement intégré audit système.Thus, this first server 2a comprises, as explained, at least data processing means 21a and a memory 22a. It is typically a server for classifying a product in a system 1, potentially integrated into said system.

Les moyens de traitement de données 21a sont configurés pour mettre en œuvre des étapes consistant à :

Obtenir au moins une première image du dudit produit et une deuxième image représentant potentiellement ledit produit respectivement selon un premier point de vue et un deuxième point de vue différent du premier point vue, depuis des caméras 11, 12 dudit système 1 ;
Détecter ledit produit en fonction du résultat de l’application d’un premier modèle de détection d’objet à la première image et d’un deuxième modèle de détection d’objet à la deuxième image ;
Lorsque ledit produit est détecté, obtenir un premier vecteur descriptif dudit produit et d’un deuxième vecteur descriptif dudit produit au moyen d’un premier modèle d’extraction de caractéristiques et d’un deuxième modèle d’extraction de caractéristiques respectivement appliqués sur la première image et la deuxième image ;
Classifier un vecteur long correspondant à une concaténation des premier et deuxième vecteurs descriptifs dudit produit, au moyen d’un modèle de classification

The data processing means 21a are configured to implement steps consisting of:

Obtaining at least a first image of said product and a second image potentially representing said product respectively from a first point of view and a second point of view different from the first point of view, from cameras 11, 12 of said system 1;
Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;
When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;
Classify a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, using a classification model

Selon un quatrième aspect, l’invention propose un ensemble comprenant ledit premier serveur 2a, ainsi qu’au moins un système 1 connecté (via le réseau 20). Avantageusement ledit système comprend également le deuxième serveur 2b, connecté au premier serveur 2a toujours via le réseau 20.According to a fourth aspect, the invention proposes an assembly comprising said first server 2a, as well as at least one system 1 connected (via the network 20). Advantageously, said system also comprises the second server 2b, connected to the first server 2a still via the network 20.

Le deuxième serveur 2b comprend des moyens de traitement de données 21b configurés pour mettre en œuvre l’apprentissage desdits premier et deuxième modèles de détection d’objet, premier et deuxièmes modèles d’extraction de caractéristiques, et/ou modèle de classification, sur des première et deuxième bases d’images de référence représentant une pluralité d’instances du produit respectivement selon lesdits premier et deuxième points de vue.The second server 2b comprises data processing means 21b configured to implement the training of said first and second object detection models, first and second feature extraction models, and/or classification model, on first and second reference image bases representing a plurality of instances of the product respectively according to said first and second points of view.

A nouveau, le système 1 et/ou le premier serveur 2a et/ou le deuxième serveur 2b peuvent être confondus.Again, system 1 and/or first server 2a and/or second server 2b may be confused.

Produit programme d’ordinateurComputer program product

Selon un cinquième et un sixième aspects, l’invention concerne un produit programme d’ordinateur comprenant des instructions de code pour l’exécution (sur les moyens de traitement de donnés 21a du premier serveur 2a) d’un procédé selon le premier aspect de classification d’un produit dans un système 1, en vue de son tri, ainsi que des moyens de stockage lisibles par un équipement informatique (par exemple les moyens de stockage de données 22a du premier serveur 2a) sur lequel on trouve ce produit programme d’ordinateur.According to a fifth and a sixth aspect, the invention relates to a computer program product comprising code instructions for the execution (on the data processing means 21a of the first server 2a) of a method according to the first aspect of classifying a product in a system 1, with a view to sorting it, as well as storage means readable by computer equipment (for example the data storage means 22a of the first server 2a) on which this computer program product is found.

Claims

Method for classifying a product in a system (1), with a view to sorting it, the method being characterized in that it comprises the implementation by data processing means (21a) of a first server (2a) connected to said system (1) of steps of:

Obtaining at least a first image of said product and a second image potentially representing said product respectively according to a first point of view and a second point of view different from the first point of view, from cameras (11, 12) of said system (1);
Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;
When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;
Classification of a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, by means of a classification model.

The method of claim 1, wherein step (b) comprises calculating a first detection score by applying the first object detection model to the first image, and calculating a second detection score by applying the second object detection model to the second image.

The method of claim 2, wherein the product is detected in step (b) if the product of the first and second detection scores is greater than a predetermined threshold.

The method of one of claims 1 to 3, wherein the first and second object detection models are object localization models, step (b) further comprising localizing said product in the first and second images, and cropping the first and second images onto the localized product, step (c) being performed on the first and second cropped images.

Method according to one of claims 2 and 3 in combination with claim 4, wherein step (b) comprises determining a first bounding box of the product by applying the first object location model to the first image, and determining a second bounding box of the product by applying the second object location model to the second image, the product being detected in step (b) only if the first and/or second bounding box of the product meets at least one given criterion.

Method according to one of claims 4 and 5, in which the first and second object localization models are convolutional neural networks, in particular of the YOLO type.

Method according to one of claims 1 to 6, comprising a step (a0) of training said first and second object detection models, first and second feature extraction models, and/or classification model, on first and second reference image bases representing a plurality of instances of the product respectively according to said first and second points of view.

The method of claim 7, wherein step (a0) comprises generating reference images representing a plurality of instances of the product respectively according to said first and second viewpoints by inserting said instances of the product into empty images according to said first and second viewpoints.

Method according to one of claims 1 to 8, in which the first and second feature extraction models are feature extraction blocks of convolutional neural networks, in particular of the Swin Transformer type.

Method according to one of claims 1 to 9, in which the classification model is a neural network, in particular of the perceptron type.

Method for sorting a product in a system (1), characterized in that it comprises the implementation of the method for classifying the product according to one of claims 1 to 10, and a step (e) of sorting the product by the system (1) according to the result of said classification.

Server (2a) for classifying a product in a system (1), with a view to sorting it, the server (2a) being characterized in that it comprises data processing means (21a) configured to:

Obtaining at least a first image of said product and a second image potentially representing said product respectively from a first point of view and a second point of view different from the first point of view, from cameras (11, 12) of said system (1);
Detecting said product based on the result of applying a first object detection model to the first image and a second object detection model to the second image;
When said product is detected, obtaining a first descriptive vector of said product and a second descriptive vector of said product by means of a first feature extraction model and a second feature extraction model respectively applied to the first image and the second image;
Classify a long vector corresponding to a concatenation of the first and second descriptive vectors of said product, using a classification model.

Assembly comprising a server (2a) according to claim 12 and the system (1) connected.

Computer program product comprising code instructions for executing a method according to one of claims 1 to 10 for classifying a product in a system (1), with a view to sorting it, when said program is executed on a computer.

Storage means readable by computer equipment on which is recorded a computer program product comprising code instructions for the execution of a method according to one of claims 1 to 10 for classifying a product in a system (1), with a view to sorting it.