[go: up one dir, main page]

CN110647888B - Three-dimensional information extraction method based on monocular image and electronic device - Google Patents

Three-dimensional information extraction method based on monocular image and electronic device Download PDF

Info

Publication number
CN110647888B
CN110647888B CN201810674456.4A CN201810674456A CN110647888B CN 110647888 B CN110647888 B CN 110647888B CN 201810674456 A CN201810674456 A CN 201810674456A CN 110647888 B CN110647888 B CN 110647888B
Authority
CN
China
Prior art keywords
monocular image
information
algorithm model
monocular
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810674456.4A
Other languages
Chinese (zh)
Other versions
CN110647888A (en
Inventor
毛文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810674456.4A priority Critical patent/CN110647888B/en
Publication of CN110647888A publication Critical patent/CN110647888A/en
Application granted granted Critical
Publication of CN110647888B publication Critical patent/CN110647888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for extracting three-dimensional information based on monocular images, which comprises the following steps: acquiring a second 2D monocular image, the second 2D monocular image comprising a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. The embodiment of the invention has the advantages of small calculation complexity and low cost, and has no blind area and no influence of a use scene.

Description

Three-dimensional information extraction method based on monocular image and electronic device
Technical Field
The invention relates to the field of image vision, in particular to a method for extracting three-dimensional information based on monocular images and an electronic device.
Background
In the existing image vision method, in order to obtain 3D information, such as 3D position information, of an object in a shooting scene, there are two methods, namely a two-camera stereoscopic imaging principle, two images of different angles at the same moment are obtained by using two image sensors, corresponding feature points are found in the two images, and then the spatial 3D position information of the point is calculated according to a geometric relationship. The disadvantage of this method is that: 1. because the common part of the two images is needed to be used, the area of the acquired 3D position information is smaller than that of the single image, and blind areas can occur in some areas; 2. two cameras to be used need to be synchronized, so that the calculation complexity is high and the cost is high; 3. finding the corresponding feature points in the image is very demanding for the algorithm, and it is basically very difficult to find the corresponding feature points in some scenes with few features.
And secondly, emitting infrared light to the object, receiving the infrared light reflected by the object by a sensor, and calculating 3D position information of the object according to the propagation time of the infrared light. The disadvantages of this method are: 1. the active light source is used, so that the cost is high; 2. because the reflectivity of different materials to infrared light is different, the distance calculation deviation can be caused; 3. being an active light source, remote objects require a very powerful active light source, which is very limited in use.
In a word, the scheme in the prior art has the defects of high calculation complexity, high cost, dead zone and very high influence by the use scene.
Disclosure of Invention
The embodiment of the invention provides a three-dimensional information extraction method based on a monocular image and an electronic device.
In a first aspect, an embodiment of the present invention provides a method for extracting three-dimensional information based on a monocular image, including:
acquiring a second 2D monocular image, the second 2D monocular image comprising a target object;
and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object.
In a possible embodiment, before the acquiring the second 2D monocular image, the method further includes:
acquiring relation data of a first 2D monocular image and 3D information;
training an algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model;
the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm.
In a possible embodiment, the acquiring the relationship data of the first 2D monocular image and the 3D information includes:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
In a possible embodiment, the training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model includes:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the critical parameter is used for representing the parameters of the object features;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
In a second aspect, an embodiment of the present invention provides an electronic device, including:
an acquisition unit configured to acquire a second 2D monocular image, the second 2D monocular image including a target object;
and the calculating unit is used for inputting the second 2D monocular image into the trained algorithm model, and calculating to obtain the 3D information of the target object.
In a possible embodiment, the electronic device further comprises:
the acquisition unit is further used for acquiring the relation data of the first 2D monocular image and the 3D information before acquiring the second 2D monocular image;
the training unit is used for training the algorithm model according to the relation data of the first 2D monocular image and the 3D information so as to obtain the trained algorithm model;
the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm.
In a possible embodiment, the acquisition unit is specifically configured to:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
In a possible embodiment, the training unit is specifically configured to:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the critical parameter is used for representing the parameters of the object features;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
a memory storing executable program code;
a processor coupled to the memory;
the processor invokes the executable program code stored in the memory to perform all or part of the method of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform all or part of the method as described in the first aspect.
In a fifth aspect, embodiments of the present invention provide a program product comprising instructions which, when run on a computer, cause the computer to perform all or part of the method as described in the first aspect.
It can be seen that in the solution according to the embodiment of the present invention, a second 2D monocular image is acquired, where the second 2D monocular image includes a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. Because a 2D monocular image is adopted, the calculation complexity is low, the cost is low, and meanwhile, no blind area exists relative to a binocular image because no overlapping part of the images is required to be found; and the method for establishing the relation data between the first 2D monocular image and the 3D information can be fused by adopting a plurality of methods, and the method is not influenced by the use scene in the final use.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention;
fig. 2 is a flow chart of another method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
The following will describe in detail.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
"plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Embodiments of the present application are described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flow chart of a method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention. As shown in fig. 1, the method includes:
s101, the electronic device acquires a second 2D monocular image, wherein the second 2D monocular image comprises a target object.
The electronic device comprises at least one camera, and the electronic device acquires the second 2D monocular image through the at least one camera.
Alternatively, all of the at least one cameras may be Infrared (IR) cameras, RGB cameras, or other cameras. The at least one camera is partly an IR camera and partly an RGB camera.
In a possible embodiment, before the acquiring the second 2D monocular image, the method further includes:
acquiring relation data of a first 2D monocular image and 3D information;
training an algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model;
the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm.
Further, the acquiring the relationship data between the first 2D monocular image and the 3D information includes:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
The 3D information of the object includes three-dimensional coordinate information of a feature point of the object in three-dimensional coordinate axes or a distance between the feature point of the object and the electronic device.
Specifically, the electronic device acquires the first 2D monocular image through at least one camera thereof, wherein the first 2D monocular image comprises at least one object; the electronic device extracts each object of at least one object from the first 2D monocular image, and then obtains 3D information of each object by a method of locating according to a mark point, a focusing method, computer vision, an information optical method or other methods, or obtains 3D information of each object by a three-dimensional mark measuring instrument or a follow-up mechanical arm. And the electronic device establishes a mapping relation between each object in at least one object in the first 2D monocular image and 3D information thereof according to a deep learning algorithm so as to obtain relation data of the first 2D monocular image and the 3D information.
For example, the electronic device may acquire 3D information of the object a through two first 2D monocular images, establish a mapping relationship between the object a and the 3D information, and then use only any one of the two first 2D monocular images to correspond to the 3D information of the object a, so as to obtain the relationship data between the first 2D monocular image and the 3D information.
For another example, the electronic device includes a monocular 2D camera and a depth camera, the monocular 2D camera is configured to obtain the first 2D monocular image including the object a, and the depth camera is configured to obtain 3D information of the object a. The electronic device binds the monocular 2D camera and the depth camera, then the monocular 2D camera acquires a first 2D monocular image and the depth camera acquires an image, the image is calibrated and aligned, and a mapping relation between an object A in the first 2D monocular image and 3D information of the object A is established. According to the method, a mapping relation between each object in the first 2D monocular image and 3D information of the object is established, so that relation data of the first 2D monocular image and the 3D information is obtained.
It should be noted that the image acquired by the depth camera includes 3D information of the first 2D monocular image.
For another example, the electronic device may further obtain 3D information of the object a through an optical-based motion capture device, and then one-to-one correspond the 3D information of the object a to the object a, so as to establish a mapping relationship between the object a and the 3D information thereof, so as to obtain relationship data between the first 2D monocular image and the 3D information.
In a possible embodiment, since there is a mapping relationship between the object a and the plurality of 3D information, that is, there is a mapping relationship between the plurality of first 2D monocular images and the plurality of 3D information, or there is a mapping relationship between the plurality of objects a and one 3D information, the electronic device needs to perform ambiguity processing on the object a and the 3D information to obtain a unique mapping relationship between the object a and the 3D information.
Specifically, the electronic device performs data normalization processing on the 3D information, and performs ambiguity processing on the 3D information according to additional conditions and critical parameters, such as the height of a human body, the size of a palm, and the interrelationship between an object and a scene, so as to obtain a unique mapping relationship between the object a and the 3D information, and further obtain relationship data of the object a and the 3D information thereof. Wherein the above-mentioned critical parameters are characteristic parameters for characterizing the object a as distinguished from similar objects.
According to the method, the electronic device obtains the relation data of the first 2D monocular image and the 3D information.
For example, two results were obtained by the above method with the palm of a person in the two first 2D monocular images: the palm of an adult and the palm of an infant. This is ambiguous for the above-mentioned electronic devices. In contrast, the electronic device determines whether the palm of the person in the first 2D monocular image is the palm of an adult or the palm of an minor based on information such as the length of the reference object or the palm of the person in the 2D monocular image. The reference object in the first 2D monocular image is an additional condition, and the length of the palm of the human hand is a critical parameter.
Wherein the object a is any one of at least one object in the first 2D monocular image.
Further, the training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model includes:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the critical parameter is used for representing the parameters of the object features;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
Specifically, the electronic device performs forward computation on the first 2D monocular image according to a convolutional neural network (convolutional neural network, CNN) to obtain a feature map of the first 2D monocular image, and then performs computation on the feature map and parameters of the first 2D monocular image according to a machine learning algorithm or a deep learning algorithm to obtain critical parameters; or estimating parameters and feature images of the first 2D monocular image to obtain the above critical parameters, where the critical parameters include height of a human body, size of a palm, interrelationship between an object and a scene, and the like. And then the electronic device calculates the critical parameters according to a depth network algorithm or other algorithms to obtain 3D information of the first 2D monocular image.
Other algorithms, such as deep neural network (deep neuralnetwork, DNN) algorithm, random forest algorithm, etc., may be used as well.
The electronic device obtains an error value of the 3D information according to the input 3D information of the first 2D monocular image and the calculated 3D information of the first 2D monocular image, and reversely propagates the error value of the 3D information to the algorithm model, and adjusts relevant parameters of the algorithm model to obtain the trained algorithm model. This process is referred to as optimization of the algorithm model.
Optionally, the electronic device optimizes the algorithm model according to any other modeled optimization algorithm.
S102, the electronic device inputs the second 2D monocular image into the trained algorithm model, and 3D information of the target object is obtained through calculation.
Specifically, the electronic device inputs the second 2D monocular image into a trained algorithm model, and calculates according to relevant parameters of the trained algorithm model, so as to obtain 3D information of the target object in the second 2D monocular image.
The 3D information of the object refers to 3D information of the object feature points, that is, three-dimensional coordinate information of the object feature points in three-dimensional coordinate axes.
It should be noted that the objects described in the above steps S101-S102 may be replaced by human bodies, that is, 3D information of the human bodies may be obtained by the above method.
In a specific application scenario, the electronic device acquires a whole body image and a hand image of a user in an application scenario through a monocular IR camera, and the image is a 2D monocular image. And marking important joint points of the human body and positions of joints of the hand by means of marking points, and obtaining 3D information of the marking points. And establishing a mapping relation between the marked points of the whole body and the hand of the 2D monocular image and the corresponding 3D information through a deep learning method to obtain relation data, wherein the relation data comprises the marked points and the corresponding 3D information, and then performing deep learning training on the algorithm model according to the relation data. Taking the full convolutional network in deep learning as an example, the final training results in a trained algorithm model having an input and an output. In actual use, a monocular IR image of an arbitrary user is input into the trained algorithmic model, which computes the 3D information of the joints of the user's body and hand.
It can be seen that in the solution according to the embodiment of the present invention, a second 2D monocular image is acquired, where the second 2D monocular image includes a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. Because a 2D monocular image is adopted, the calculation complexity is low, the cost is low, and meanwhile, no blind area exists relative to a binocular image because no overlapping part of the images is required to be found; and the method for establishing the relation data between the first 2D monocular image and the 3D information can be fused by adopting a plurality of methods, so that the method is not influenced by the use scene in the final use.
Referring to fig. 2, fig. 2 is a flow chart of another method for extracting three-dimensional information based on monocular images according to an embodiment of the present invention. As shown in fig. 2, the method includes:
s201, the electronic device acquires a first 2D monocular image, wherein the first 2D monocular image comprises at least one object.
The electronic device comprises at least one camera, and the electronic device acquires a first 2D monocular image through the at least one camera.
Alternatively, all of the at least one cameras may be Infrared (IR) cameras, RGB cameras, or other cameras. The at least one camera is partly an IR camera and partly an RGB camera.
Further, the electronic device acquires the first 2D monocular image, and then extracts the object a from the first 2D monocular image. The object a is any one of the at least one object.
S202, the electronic device acquires 3D information of each object in at least one object in the first 2D monocular image.
The 3D information of the object includes three-dimensional coordinate information of a feature point of the object in three-dimensional coordinate axes or a distance between the feature point of the object and the electronic device.
Specifically, the electronic device obtains the 3D information of the object according to a mark point positioning mode, a focusing method, a computer vision, an information optical method or other modes, or obtains the 3D information of the object through a three-dimensional mark measuring instrument or a follow-up mechanical arm.
S203, the electronic device establishes a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
Specifically, the electronic device establishes a mapping relationship between each object in at least one object in the first 2D monocular image and 3D information thereof according to a deep learning algorithm, so as to obtain relationship data of the first 2D monocular image and the 3D information.
Further, after the mapping relationship is established, the electronic device uses the first 2D monocular image and the corresponding 3D information thereof as relationship data for subsequent use.
For example, the electronic device may acquire 3D information of the object a through two first 2D monocular images, establish a mapping relationship between the object a and the 3D information, and then use only any one of the two first 2D monocular images to correspond to the 3D information of the object a, so as to obtain the relationship data between the first 2D monocular image and the 3D information.
For another example, the electronic device includes a monocular 2D camera and a depth camera, the monocular 2D camera is configured to obtain the first 2D monocular image including the object a, and the depth camera is configured to obtain 3D information of the object a. The electronic device binds the monocular 2D camera and the depth camera, then the monocular 2D camera acquires a first 2D monocular image and the depth camera acquires an image, the image is calibrated and aligned, and a mapping relation between an object A in the first 2D monocular image and 3D information of the object A is established. According to the method, a mapping relation between each object in the first 2D monocular image and 3D information of the object is established, so that relation data of the first 2D monocular image and the 3D information is obtained.
It should be noted that the image acquired by the depth camera includes 3D information of the first 2D monocular image.
For another example, the electronic device may further obtain 3D information of the object a through an optical-based motion capture device, and then one-to-one correspond the 3D information of the object a to the object a, so as to establish a mapping relationship between the object a and the 3D information thereof, so as to obtain relationship data between the first 2D monocular image and the 3D information.
In a possible embodiment, since there is a mapping relationship between the object a and the plurality of 3D information, that is, there is a mapping relationship between the plurality of first 2D monocular images and the plurality of 3D information, or there is a mapping relationship between the plurality of objects a and one 3D information, the electronic device needs to perform ambiguity processing on the object a and the 3D information to obtain a unique mapping relationship between the object a and the 3D information.
Specifically, the electronic device performs data normalization processing on the 3D information, and performs ambiguity processing on the 3D information according to additional conditions and critical parameters, such as the height of a human body, the size of a palm, and the interrelationship between an object and a scene, so as to obtain a unique mapping relationship between the object a and the 3D information, and further obtain relationship data of the object a and the 3D information thereof. Wherein the above-mentioned critical parameters are characteristic parameters for characterizing the object a as distinguished from similar objects.
According to the method, the electronic device obtains the relation data of the first 2D monocular image and the 3D information.
For example, two results were obtained by the above method with the palm of a person in the two first 2D monocular images: the palm of an adult and the palm of an infant. This is ambiguous for the above-mentioned electronic devices. In contrast, the electronic device determines whether the palm of the person in the first 2D monocular image is the palm of an adult or the palm of an minor based on information such as the length of the reference object or the palm of the person in the 2D monocular image. The reference object in the first 2D monocular image is an additional condition, and the length of the palm of the human hand is a critical parameter.
S204, the electronic device trains the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model.
Specifically, the electronic device performs forward computation on the first 2D monocular image according to a convolutional neural network (convolutional neural network, CNN) to obtain a feature map of the first 2D monocular image, and then performs computation on the feature map and parameters of the first 2D monocular image according to a machine learning algorithm or a deep learning algorithm to obtain critical parameters; or estimating parameters and feature images of the first 2D monocular image to obtain the above critical parameters, where the critical parameters include height of a human body, size of a palm, interrelationship between an object and a scene, and the like. And then the electronic device calculates the critical parameters according to a depth network algorithm or other algorithms to obtain 3D information of the first 2D monocular image.
The depth network algorithm may also use other algorithms, such as a depth neural network (deep neural network, DNN) algorithm, a random forest algorithm, and the like, where the electronic device obtains an error value of the 3D information according to the input 3D information of the first 2D monocular image and the calculated 3D information of the first 2D monocular image, and reversely propagates the error value of the 3D information to the algorithm model, and adjusts relevant parameters of the algorithm model to obtain a trained algorithm model. This process is referred to as optimization of the algorithm model.
Optionally, the electronic device optimizes the algorithm model according to any other modeled optimization algorithm.
S205, the electronic device acquires a second 2D monocular image, and the second 2D monocular image comprises a target object.
S206, the electronic device inputs the second 2D monocular image into the trained algorithm model to obtain 3D information of the second 2D monocular image.
Specifically, the electronic device inputs the second 2D monocular image into a trained 3D algorithm model, and calculates according to relevant parameters of the trained algorithm model, so as to obtain 3D information of the target object in the second 2D monocular image.
The 3D information of the object refers to 3D information of the object feature points, that is, three-dimensional coordinate information of the object feature points in three-dimensional coordinate axes.
It should be noted that the objects described in the above steps S101-S102 may be replaced by human bodies, that is, 3D information of the human bodies may be obtained by the above method.
In a specific application scenario, the electronic device acquires a whole body image and a hand image of a user in an application scenario through a monocular IR camera, and the image is a 2D monocular image. And marking important joint points of the human body and positions of joints of the hand by means of marking points, and obtaining 3D information of the marking points. And establishing a mapping relation between the marked points of the whole body and the hand of the 2D monocular image and the corresponding 3D information through a deep learning method to obtain relation data, wherein the relation data comprises the marked points and the corresponding 3D information, and then performing deep learning training on the algorithm model according to the relation data. Taking the full convolutional network in deep learning as an example, the final training results in a trained algorithm model having an input and an output. In actual use, a monocular IR image of an arbitrary user is input into the trained algorithmic model, which computes the 3D information of the joints of the user's body and hand.
It can be seen that in the solution according to the embodiment of the present invention, a second 2D monocular image is acquired, where the second 2D monocular image includes a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object. Because a 2D monocular image is adopted, the calculation complexity is low, the cost is low, and meanwhile, no blind area exists relative to a binocular image because no overlapping part of the images is required to be found; and the method for establishing the relation data between the first 2D monocular image and the 3D information can be fused by adopting a plurality of methods, so that the method is not influenced by the use scene in the final use.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention. As shown in fig. 3, the electronic device 300 includes:
an acquisition unit 301 configured to acquire a second 2D monocular image, where the second 2D monocular image includes a target object;
the computing unit 302 is configured to input the second 2D monocular image into the trained algorithm model, and compute to obtain 3D information of the target object.
In one possible embodiment, the electronic device 300 further comprises:
the acquiring unit 301 is further configured to acquire relationship data between the first 2D monocular image and the 3D information before acquiring the second 2D monocular image;
and a training unit 303, configured to train an algorithm model according to the relationship data of the first 2D monocular image and the 3D information, so as to obtain the trained algorithm model.
In a possible embodiment, the obtaining unit 301 is specifically configured to:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
In a possible embodiment, the training unit 303 is specifically configured to:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the critical parameter is used for representing the parameters of the object features;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
It should be noted that the above units (the acquisition unit 301, the calculation unit 302, and the training unit 303) are configured to perform the relevant steps of the above interaction method.
In the present embodiment, the electronic device 300 is presented in the form of a unit. "unit" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above described functionality. Further, the above acquisition unit 301, calculation unit 302, and training unit 303 may be implemented by the processor 401 of the electronic device shown in fig. 4.
The electronic device 400 as shown in fig. 4 may be implemented in the structure of fig. 4, the electronic device 400 comprising at least one processor 401, at least one memory 402 and at least one communication interface 403. The processor 401, the memory 402, and the communication interface 403 are connected via the communication bus and perform communication with each other.
The processor 401 may be a general purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program.
A communication interface 403 for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), etc.
The Memory 402 may be, but is not limited to, read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, random access Memory (random access Memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be electrically erasable programmable read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), compact disc read-Only Memory (Compact Disc Read-Only Memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.
Wherein the memory 402 is used for storing application program codes for executing the above schemes, and the execution is controlled by the processor 501. The processor 401 is arranged to execute application code stored in the memory 402.
The code stored by the memory 402 may perform the interaction methods provided above, such as: acquiring a second 2D monocular image, the second 2D monocular image comprising a target object; and inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object.
The embodiment of the invention also provides a computer storage medium, wherein the storage medium can store a program, and the program can be executed to include part or all of the steps of any one of the extraction methods based on three-dimensional information of monocular images described in the embodiment of the method.
The embodiment of the present invention also provides a program product, including instructions, which when executed on a computer, cause the computer to perform part or all of the steps of any one of the three-dimensional information extraction methods based on monocular images described in the above method embodiments.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing has outlined rather broadly the more detailed description of embodiments of the invention, wherein the principles and embodiments of the invention are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the invention; meanwhile, as those skilled in the art will appreciate, modifications will be made in the specific embodiments and application scope in accordance with the idea of the present invention, and the present disclosure should not be construed as limiting the present invention. .

Claims (6)

1. The method for extracting the three-dimensional information based on the monocular image is characterized by comprising the following steps of:
acquiring a second 2D monocular image, the second 2D monocular image comprising a target object;
inputting the second 2D monocular image into a trained algorithm model, and calculating to obtain 3D information of the target object; wherein, before the acquiring the second 2D monocular image, the method further comprises:
acquiring relation data of a first 2D monocular image and 3D information;
training an algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model;
the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm;
training the algorithm model according to the relation data of the first 2D monocular image and the 3D information to obtain the trained algorithm model, including:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the parameter is a parameter of the target object, and the critical parameter is used for representing the feature parameter of the target object which is different from a similar object;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
2. The method of claim 1, wherein the acquiring the relationship data of the first 2D monocular image and the 3D information comprises:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
3. An electronic device, comprising:
an acquisition unit configured to acquire a second 2D monocular image, the second 2D monocular image including a target object;
the computing unit is used for inputting the second 2D monocular image into the trained algorithm model, and computing to obtain 3D information of the target object;
the acquisition unit is further used for acquiring the relation data of the first 2D monocular image and the 3D information before acquiring the second 2D monocular image;
the electronic device further includes:
the training unit is used for training the algorithm model according to the relation data of the first 2D monocular image and the 3D information so as to obtain the trained algorithm model;
the algorithm model is a deep learning model or a model obtained through training of a machine learning algorithm;
the training unit is specifically used for:
performing forward calculation on the first 2D monocular image according to a convolutional neural network to obtain a feature map of the first 2D monocular image;
obtaining a critical parameter according to the feature map and the parameters of the first 2D monocular image, wherein the parameter is a parameter of the target object, and the critical parameter is used for representing the feature parameter of the target object which is different from a similar object;
calculating the critical parameters according to a depth network algorithm to obtain first 3D information;
acquiring a difference value of the first 3D information and the 3D information in the relation data;
and back-propagating the difference value to the algorithm model, and training the algorithm model to obtain the trained algorithm model.
4. The electronic device according to claim 3, wherein the obtaining unit is specifically configured to:
acquiring the first 2D monocular image, the first 2D monocular image comprising at least one object;
acquiring 3D information of each object in at least one object in the first 2D monocular image;
and establishing a mapping relation between each object in the first 2D monocular image and 3D information thereof to obtain relation data of the first 2D monocular image and the 3D information.
5. An electronic device, comprising
A memory storing executable program code;
a processor coupled to the memory;
the processor invokes the executable program code stored in the memory to perform the method of any of claims 1-2.
6. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-2.
CN201810674456.4A 2018-06-26 2018-06-26 Three-dimensional information extraction method based on monocular image and electronic device Active CN110647888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810674456.4A CN110647888B (en) 2018-06-26 2018-06-26 Three-dimensional information extraction method based on monocular image and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810674456.4A CN110647888B (en) 2018-06-26 2018-06-26 Three-dimensional information extraction method based on monocular image and electronic device

Publications (2)

Publication Number Publication Date
CN110647888A CN110647888A (en) 2020-01-03
CN110647888B true CN110647888B (en) 2023-07-25

Family

ID=69008880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810674456.4A Active CN110647888B (en) 2018-06-26 2018-06-26 Three-dimensional information extraction method based on monocular image and electronic device

Country Status (1)

Country Link
CN (1) CN110647888B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A Depth Estimation Method for Monocular Image Based on Fully Convolutional Neural Network FCN
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A Depth Estimation Method for Monocular Image Based on Fully Convolutional Neural Network FCN
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
depth map prediction from a single image using a multi-scale deep network;David Eigen 等;《NeurIPS Proceedings》;20141231;全文 *
基于深层卷积神经网络的单目红外图像深度估计;许路等;《光学学报》;20160710(第07期);全文 *

Also Published As

Publication number Publication date
CN110647888A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
US11727593B1 (en) Automated data capture
US11640694B2 (en) 3D model reconstruction and scale estimation
CN111060101B (en) Vision-assisted distance SLAM method and device and robot
US11847796B2 (en) Calibrating cameras using human skeleton
US12260575B2 (en) Scale-aware monocular localization and mapping
Staranowicz et al. Practical and accurate calibration of RGB-D cameras using spheres
CN110702111A (en) Simultaneous localization and map creation (SLAM) using dual event cameras
US11788845B2 (en) Systems and methods for robust self-relocalization in a visual map
WO2019102442A1 (en) Systems and methods for 3d facial modeling
US20140225988A1 (en) System and method for three-dimensional surface imaging
US10750157B1 (en) Methods and systems for creating real-time three-dimensional (3D) objects from two-dimensional (2D) images
JP2022542858A (en) Deep network training methods
CN107016348B (en) Face detection method and device combined with depth information and electronic device
JP7103357B2 (en) Information processing equipment, information processing methods, and programs
KR20150082379A (en) Fast initialization for monocular visual slam
US10679376B2 (en) Determining a pose of a handheld object
CN106991378B (en) Depth-based face orientation detection method and device and electronic device
JP2020524355A5 (en)
CN112164099A (en) Self-checking and self-calibration method and device based on monocular structured light
JP7103354B2 (en) Information processing equipment, information processing methods, and programs
CN115088244B (en) Depth sensor activation for localization based on data from a monocular camera
US20210004978A1 (en) Method for acquiring depth information of target object and movable platform
CN114494582B (en) Three-dimensional model dynamic updating method based on visual perception
CN110647888B (en) Three-dimensional information extraction method based on monocular image and electronic device
CN113030960A (en) Monocular vision SLAM-based vehicle positioning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant