CN110647888B

CN110647888B - Three-dimensional information extraction method based on monocular image and electronic device

Info

Publication number: CN110647888B
Application number: CN201810674456.4A
Authority: CN
Inventors: 毛文涛
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2023-07-25
Anticipated expiration: 2038-06-26
Also published as: CN110647888A

Abstract

The invention discloses a method for extracting three-dimensional information based on monocular images, which comprises the following steps: acquiring a second 2D monocular image, the second 2D monocular image comprising a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. The embodiment of the invention has the advantages of small calculation complexity and low cost, and has no blind area and no influence of a use scene.

Description

Three-dimensional information extraction method based on monocular image and electronic device

Technical Field

The invention relates to the field of image vision, in particular to a method for extracting three-dimensional information based on monocular images and an electronic device.

Background

In the existing image vision method, in order to obtain 3D information, such as 3D position information, of an object in a shooting scene, there are two methods, namely a two-camera stereoscopic imaging principle, two images of different angles at the same moment are obtained by using two image sensors, corresponding feature points are found in the two images, and then the spatial 3D position information of the point is calculated according to a geometric relationship. The disadvantage of this method is that: 1. because the common part of the two images is needed to be used, the area of the acquired 3D position information is smaller than that of the single image, and blind areas can occur in some areas; 2. two cameras to be used need to be synchronized, so that the calculation complexity is high and the cost is high; 3. finding the corresponding feature points in the image is very demanding for the algorithm, and it is basically very difficult to find the corresponding feature points in some scenes with few features.

And secondly, emitting infrared light to the object, receiving the infrared light reflected by the object by a sensor, and calculating 3D position information of the object according to the propagation time of the infrared light. The disadvantages of this method are: 1. the active light source is used, so that the cost is high; 2. because the reflectivity of different materials to infrared light is different, the distance calculation deviation can be caused; 3. being an active light source, remote objects require a very powerful active light source, which is very limited in use.

In a word, the scheme in the prior art has the defects of high calculation complexity, high cost, dead zone and very high influence by the use scene.

Disclosure of Invention

The embodiment of the invention provides a three-dimensional information extraction method based on a monocular image and an electronic device.

In a first aspect, an embodiment of the present invention provides a method for extracting three-dimensional information based on a monocular image, including:

acquiring a second 2D monocular image, the second 2D monocular image comprising a target object;

and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object.

In a possible embodiment, before the acquiring the second 2D monocular image, the method further includes:

acquiring relation data of a first 2D monocular image and 3D information;

training an algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model;

the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm.

In a possible embodiment, the acquiring the relationship data of the first 2D monocular image and the 3D information includes:

acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;

acquiring 3D information of each object in at least one object in the first 2D monocular image;

and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.

In a possible embodiment, the training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model includes:

performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;

obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the critical parameter is used for representing the parameters of the object features;

calculating the critical parameters according to a depth network algorithm to obtain first 3D information;

acquiring a difference value of the first 3D information and the 3D information in the relation data;

and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.

In a second aspect, an embodiment of the present invention provides an electronic device, including:

an acquisition unit configured to acquire a second 2D monocular image, the second 2D monocular image including a target object;

and the calculating unit is used for inputting the second 2D monocular image into the trained algorithm model, and calculating to obtain the 3D information of the target object.

In a possible embodiment, the electronic device further comprises:

the acquisition unit is further used for acquiring the relation data of the first 2D monocular image and the 3D information before acquiring the second 2D monocular image;

the training unit is used for training the algorithm model according to the relation data of the first 2D monocular image and the 3D information so as to obtain the trained algorithm model;

In a possible embodiment, the acquisition unit is specifically configured to:

In a possible embodiment, the training unit is specifically configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform all or part of the method of the first aspect.

In a fourth aspect, embodiments of the present invention provide a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform all or part of the method as described in the first aspect.

In a fifth aspect, embodiments of the present invention provide a program product comprising instructions which, when run on a computer, cause the computer to perform all or part of the method as described in the first aspect.

It can be seen that in the solution according to the embodiment of the present invention, a second 2D monocular image is acquired, where the second 2D monocular image includes a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. Because a 2D monocular image is adopted, the calculation complexity is low, the cost is low, and meanwhile, no blind area exists relative to a binocular image because no overlapping part of the images is required to be found; and the method for establishing the relation data between the first 2D monocular image and the 3D information can be fused by adopting a plurality of methods, and the method is not influenced by the use scene in the final use.

These and other aspects of the invention will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention;

fig. 2 is a flow chart of another method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.

Detailed Description

The following will describe in detail.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

"plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Embodiments of the present application are described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flow chart of a method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention. As shown in fig. 1, the method includes:

s101, the electronic device acquires a second 2D monocular image, wherein the second 2D monocular image comprises a target object.

The electronic device comprises at least one camera, and the electronic device acquires the second 2D monocular image through the at least one camera.

Alternatively, all of the at least one cameras may be Infrared (IR) cameras, RGB cameras, or other cameras. The at least one camera is partly an IR camera and partly an RGB camera.

acquiring relation data of a first 2D monocular image and 3D information;

Further, the acquiring the relationship data between the first 2D monocular image and the 3D information includes:

The 3D information of the object includes three-dimensional coordinate information of a feature point of the object in three-dimensional coordinate axes or a distance between the feature point of the object and the electronic device.

Specifically, the electronic device acquires the first 2D monocular image through at least one camera thereof, wherein the first 2D monocular image comprises at least one object; the electronic device extracts each object of at least one object from the first 2D monocular image, and then obtains 3D information of each object by a method of locating according to a mark point, a focusing method, computer vision, an information optical method or other methods, or obtains 3D information of each object by a three-dimensional mark measuring instrument or a follow-up mechanical arm. And the electronic device establishes a mapping relation between each object in at least one object in the first 2D monocular image and 3D information thereof according to a deep learning algorithm so as to obtain relation data of the first 2D monocular image and the 3D information.

For example, the electronic device may acquire 3D information of the object a through two first 2D monocular images, establish a mapping relationship between the object a and the 3D information, and then use only any one of the two first 2D monocular images to correspond to the 3D information of the object a, so as to obtain the relationship data between the first 2D monocular image and the 3D information.

For another example, the electronic device includes a monocular 2D camera and a depth camera, the monocular 2D camera is configured to obtain the first 2D monocular image including the object a, and the depth camera is configured to obtain 3D information of the object a. The electronic device binds the monocular 2D camera and the depth camera, then the monocular 2D camera acquires a first 2D monocular image and the depth camera acquires an image, the image is calibrated and aligned, and a mapping relation between an object A in the first 2D monocular image and 3D information of the object A is established. According to the method, a mapping relation between each object in the first 2D monocular image and 3D information of the object is established, so that relation data of the first 2D monocular image and the 3D information is obtained.

It should be noted that the image acquired by the depth camera includes 3D information of the first 2D monocular image.

For another example, the electronic device may further obtain 3D information of the object a through an optical-based motion capture device, and then one-to-one correspond the 3D information of the object a to the object a, so as to establish a mapping relationship between the object a and the 3D information thereof, so as to obtain relationship data between the first 2D monocular image and the 3D information.

In a possible embodiment, since there is a mapping relationship between the object a and the plurality of 3D information, that is, there is a mapping relationship between the plurality of first 2D monocular images and the plurality of 3D information, or there is a mapping relationship between the plurality of objects a and one 3D information, the electronic device needs to perform ambiguity processing on the object a and the 3D information to obtain a unique mapping relationship between the object a and the 3D information.

Specifically, the electronic device performs data normalization processing on the 3D information, and performs ambiguity processing on the 3D information according to additional conditions and critical parameters, such as the height of a human body, the size of a palm, and the interrelationship between an object and a scene, so as to obtain a unique mapping relationship between the object a and the 3D information, and further obtain relationship data of the object a and the 3D information thereof. Wherein the above-mentioned critical parameters are characteristic parameters for characterizing the object a as distinguished from similar objects.

According to the method, the electronic device obtains the relation data of the first 2D monocular image and the 3D information.

For example, two results were obtained by the above method with the palm of a person in the two first 2D monocular images: the palm of an adult and the palm of an infant. This is ambiguous for the above-mentioned electronic devices. In contrast, the electronic device determines whether the palm of the person in the first 2D monocular image is the palm of an adult or the palm of an minor based on information such as the length of the reference object or the palm of the person in the 2D monocular image. The reference object in the first 2D monocular image is an additional condition, and the length of the palm of the human hand is a critical parameter.

Wherein the object a is any one of at least one object in the first 2D monocular image.

Further, the training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model includes:

Specifically, the electronic device performs forward computation on the first 2D monocular image according to a convolutional neural network (convolutional neural network, CNN) to obtain a feature map of the first 2D monocular image, and then performs computation on the feature map and parameters of the first 2D monocular image according to a machine learning algorithm or a deep learning algorithm to obtain critical parameters; or estimating parameters and feature images of the first 2D monocular image to obtain the above critical parameters, where the critical parameters include height of a human body, size of a palm, interrelationship between an object and a scene, and the like. And then the electronic device calculates the critical parameters according to a depth network algorithm or other algorithms to obtain 3D information of the first 2D monocular image.

Other algorithms, such as deep neural network (deep neuralnetwork, DNN) algorithm, random forest algorithm, etc., may be used as well.

The electronic device obtains an error value of the 3D information according to the input 3D information of the first 2D monocular image and the calculated 3D information of the first 2D monocular image, and reversely propagates the error value of the 3D information to the algorithm model, and adjusts relevant parameters of the algorithm model to obtain the trained algorithm model. This process is referred to as optimization of the algorithm model.

Optionally, the electronic device optimizes the algorithm model according to any other modeled optimization algorithm.

S102, the electronic device inputs the second 2D monocular image into the trained algorithm model, and 3D information of the target object is obtained through calculation.

Specifically, the electronic device inputs the second 2D monocular image into a trained algorithm model, and calculates according to relevant parameters of the trained algorithm model, so as to obtain 3D information of the target object in the second 2D monocular image.

The 3D information of the object refers to 3D information of the object feature points, that is, three-dimensional coordinate information of the object feature points in three-dimensional coordinate axes.

It should be noted that the objects described in the above steps S101-S102 may be replaced by human bodies, that is, 3D information of the human bodies may be obtained by the above method.

In a specific application scenario, the electronic device acquires a whole body image and a hand image of a user in an application scenario through a monocular IR camera, and the image is a 2D monocular image. And marking important joint points of the human body and positions of joints of the hand by means of marking points, and obtaining 3D information of the marking points. And establishing a mapping relation between the marked points of the whole body and the hand of the 2D monocular image and the corresponding 3D information through a deep learning method to obtain relation data, wherein the relation data comprises the marked points and the corresponding 3D information, and then performing deep learning training on the algorithm model according to the relation data. Taking the full convolutional network in deep learning as an example, the final training results in a trained algorithm model having an input and an output. In actual use, a monocular IR image of an arbitrary user is input into the trained algorithmic model, which computes the 3D information of the joints of the user's body and hand.

It can be seen that in the solution according to the embodiment of the present invention, a second 2D monocular image is acquired, where the second 2D monocular image includes a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. Because a 2D monocular image is adopted, the calculation complexity is low, the cost is low, and meanwhile, no blind area exists relative to a binocular image because no overlapping part of the images is required to be found; and the method for establishing the relation data between the first 2D monocular image and the 3D information can be fused by adopting a plurality of methods, so that the method is not influenced by the use scene in the final use.

Referring to fig. 2, fig. 2 is a flow chart of another method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention. As shown in fig. 2, the method includes:

s201, the electronic device acquires a first 2D monocular image, wherein the first 2D monocular image comprises at least one object.

The electronic device comprises at least one camera, and the electronic device acquires a first 2D monocular image through the at least one camera.

Further, the electronic device acquires the first 2D monocular image, and then extracts the object a from the first 2D monocular image. The object a is any one of the at least one object.

S202, the electronic device acquires 3D information of each object in at least one object in the first 2D monocular image.

Specifically, the electronic device obtains the 3D information of the object according to a mark point positioning mode, a focusing method, a computer vision, an information optical method or other modes, or obtains the 3D information of the object through a three-dimensional mark measuring instrument or a follow-up mechanical arm.

S203, the electronic device establishes a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.

Specifically, the electronic device establishes a mapping relationship between each object in at least one object in the first 2D monocular image and 3D information thereof according to a deep learning algorithm, so as to obtain relationship data of the first 2D monocular image and the 3D information.

Further, after the mapping relationship is established, the electronic device uses the first 2D monocular image and the corresponding 3D information thereof as relationship data for subsequent use.

S204, the electronic device trains the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model.

The depth network algorithm may also use other algorithms, such as a depth neural network (deep neural network, DNN) algorithm, a random forest algorithm, and the like, where the electronic device obtains an error value of the 3D information according to the input 3D information of the first 2D monocular image and the calculated 3D information of the first 2D monocular image, and reversely propagates the error value of the 3D information to the algorithm model, and adjusts relevant parameters of the algorithm model to obtain a trained algorithm model. This process is referred to as optimization of the algorithm model.

S205, the electronic device acquires a second 2D monocular image, and the second 2D monocular image comprises a target object.

S206, the electronic device inputs the second 2D monocular image into the trained algorithm model to obtain 3D information of the second 2D monocular image.

Specifically, the electronic device inputs the second 2D monocular image into a trained 3D algorithm model, and calculates according to relevant parameters of the trained algorithm model, so as to obtain 3D information of the target object in the second 2D monocular image.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention. As shown in fig. 3, the electronic device 300 includes:

an acquisition unit 301 configured to acquire a second 2D monocular image, where the second 2D monocular image includes a target object;

the computing unit 302 is configured to input the second 2D monocular image into the trained algorithm model, and compute to obtain 3D information of the target object.

In one possible embodiment, the electronic device 300 further comprises:

the acquiring unit 301 is further configured to acquire relationship data between the first 2D monocular image and the 3D information before acquiring the second 2D monocular image;

and a training unit 303, configured to train an algorithm model according to the relationship data of the first 2D monocular image and the 3D information, so as to obtain the trained algorithm model.

In a possible embodiment, the obtaining unit 301 is specifically configured to:

In a possible embodiment, the training unit 303 is specifically configured to:

It should be noted that the above units (the acquisition unit 301, the calculation unit 302, and the training unit 303) are configured to perform the relevant steps of the above interaction method.

In the present embodiment, the electronic device 300 is presented in the form of a unit. "unit" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above described functionality. Further, the above acquisition unit 301, calculation unit 302, and training unit 303 may be implemented by the processor 401 of the electronic device shown in fig. 4.

The electronic device 400 as shown in fig. 4 may be implemented in the structure of fig. 4, the electronic device 400 comprising at least one processor 401, at least one memory 402 and at least one communication interface 403. The processor 401, the memory 402, and the communication interface 403 are connected via the communication bus and perform communication with each other.

The processor 401 may be a general purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program.

A communication interface 403 for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), etc.

The Memory 402 may be, but is not limited to, read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, random access Memory (random access Memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be electrically erasable programmable read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), compact disc read-Only Memory (Compact Disc Read-Only Memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.

Wherein the memory 402 is used for storing application program codes for executing the above schemes, and the execution is controlled by the processor 501. The processor 401 is arranged to execute application code stored in the memory 402.

The code stored by the memory 402 may perform the interaction methods provided above, such as: acquiring a second 2D monocular image, the second 2D monocular image comprising a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object.

The embodiment of the invention also provides a computer storage medium, wherein the storage medium can store a program, and the program can be executed to include part or all of the steps of any one of the extraction methods based on three-dimensional information of monocular images described in the embodiment of the method.

The embodiment of the present invention also provides a program product, including instructions, which when executed on a computer, cause the computer to perform part or all of the steps of any one of the three-dimensional information extraction methods based on monocular images described in the above method embodiments.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of embodiments of the invention, wherein the principles and embodiments of the invention are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the invention; meanwhile, as those skilled in the art will appreciate, modifications will be made in the specific embodiments and application scope in accordance with the idea of the present invention, and the present disclosure should not be construed as limiting the present invention. .

Claims

1. The method for extracting the three-dimensional information based on the monocular image is characterized by comprising the following steps of:

inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object; wherein, before the acquiring the second 2D monocular image, the method further comprises:

acquiring relation data of a first 2D monocular image and 3D information;

the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm;

training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model, including:

obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the parameter is a parameter of the target object, and the critical parameter is used for representing the feature parameter of the target object which is different from a similar object;

2. The method of claim 1, wherein the acquiring the relationship data of the first 2D monocular image and the 3D information comprises:

3. An electronic device, comprising:

the computing unit is used for inputting the second 2D monocular image into the trained algorithm model, and computing to obtain 3D information of the target object;

the electronic device further includes:

the training unit is specifically used for:

4. The electronic device according to claim 3, wherein the obtaining unit is specifically configured to:

5. An electronic device, comprising

A memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the method of any of claims 1-2.

6. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-2.