CN111880657B

CN111880657B - Control method and device of virtual object, electronic equipment and storage medium

Info

Publication number: CN111880657B
Application number: CN202010753268.8A
Authority: CN
Inventors: 李国雄
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2023-04-11
Anticipated expiration: 2040-07-30
Also published as: TW202205059A; WO2022021980A1; CN111880657A

Abstract

The disclosure provides a control method, a control device, an electronic device and a storage medium of a virtual object, wherein the control method comprises the following steps: acquiring a real scene image, and displaying an augmented reality image formed by overlapping the real scene image and a virtual object on terminal equipment; identifying first display position information of a target key point of a hand in the real scene image under an equipment coordinate system, and identifying length information of an arm in the real scene image; converting the first display position information into second display position information under a world coordinate system based on the first display position information and the length information of the arm; and controlling the display position of the virtual object in the augmented reality image based on the second display position information.

Description

Control method and device of virtual object, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a method and an apparatus for controlling a virtual object, an electronic device, and a storage medium.

Background

With the development of artificial intelligence, the application target scene of the Augmented Reality (AR) technology is gradually wide, in the AR scene, the interactive scene of the user and the virtual object can be increased, and the movement of the virtual object can be controlled through hand movement, so that the hand movement can be used as an important action for human-computer interaction in the AR scene, and the tracking accuracy and efficiency of the hand movement directly influence the control effect on the virtual object.

Disclosure of Invention

The embodiment of the disclosure at least provides a control scheme of a virtual object.

In a first aspect, the present disclosure implements a method for controlling a virtual object, including:

acquiring a real scene image, and displaying an augmented reality image overlapped by the real scene image and a virtual object on terminal equipment;

identifying first display position information of a target key point of a hand in the real scene image under an equipment coordinate system, and identifying length information of an arm in the real scene image;

converting the first display position information into second display position information under a world coordinate system based on the first display position information and the length information of the arm;

and controlling the display position of the virtual object in the augmented reality image based on the second display position information.

In the embodiment of the disclosure, second display position information of the target key point of the hand under the world coordinate system can be determined based on the first display position information of the target key point of the hand under the equipment coordinate system and the length information of the arm, the second display position information of the target key point of the hand under the world coordinate system can be quickly determined by means of the length information of the arm, and a large number of hand skeleton points do not need to be identified, so that the identification efficiency of the target key point of the hand can be improved, further, when the display position of the virtual object is controlled based on the target key point of the hand, the processing efficiency in the process of controlling the virtual object can be improved, and the control effect is optimized.

In one possible implementation, the identifying first display position information of a target key point of a hand in the real scene image in a device coordinate system includes:

performing hand detection on the real scene image, and determining a detection area containing a hand in the real scene image;

and acquiring the position coordinates of the target key points of the detection area in the real scene image, and taking the position coordinates as the first display position information.

In a possible embodiment, the identifying length information of the arm in the image of the real scene includes:

detecting the real scene image, and determining the arm gesture type in the real scene image;

and determining the length information of the arm in the real scene image based on the arm gesture type and a mapping relation between the pre-established arm gesture type and the length information of the arm.

In the embodiment of the disclosure, the arm posture type contained in the real scene image is determined through image detection, and then the length information of the arm in the real scene image can be rapidly determined according to the mapping relationship established in advance.

In one possible embodiment, the converting the first display position information into second display position information in a world coordinate system based on the first display position information and the length information of the arm includes:

normalizing the first display position information based on the screen size information of the terminal equipment to obtain third display position information of the target key point of the hand under a standardized equipment coordinate system;

and determining the second display position information based on the third display position information, a camera projection matrix of an image acquisition unit of the terminal equipment and the length information of the arm.

In the embodiment of the disclosure, normalization processing can be performed on the first display position information of the target key of the hand under the device coordinate systems corresponding to different types of terminal devices, so that the second display position information of the target key of the hand of the augmented reality image under the world coordinate system can be quickly determined according to the augmented reality image displayed in different types of terminal devices in a unified manner.

In a possible implementation manner, the determining the second display position information based on the third display position information, a camera projection matrix of an image acquisition unit of the terminal device, and the length information of the arm includes:

determining fourth display position information of the target key point of the hand under a camera coordinate system based on the third display position information, the camera projection matrix and the length information of the arm;

determining the second display position information based on the fourth display position information and a camera external parameter matrix used when the image acquisition unit shoots the real scene image;

the camera projection matrix is a conversion matrix of a standardized equipment coordinate system and a camera coordinate system, and the camera external reference matrix is a conversion matrix of a world coordinate system and the camera coordinate system.

In the embodiment of the disclosure, the depth information of the target key point of the hand under the camera coordinate system is approximately represented by introducing the length information of the arm, so that the fourth display position information of the target key point of the hand under the camera coordinate system can be quickly determined, and the second display position information of the target key point of the hand under the world coordinate system can be quickly determined.

In a possible embodiment, the controlling the display position of the virtual object in the augmented reality image based on the second display position information includes:

determining position change data of the target key points of the hand in a target scene based on second display position information corresponding to the target key points of the hand in different frames of real scene images;

and controlling the display position of the virtual object in the augmented reality image to move based on the position change data.

In the embodiment of the disclosure, the display position of the virtual object can be adjusted based on the position change data of the target key point of the hand in the real scene, so that the purpose of controlling the display position of the virtual object by the hand is achieved.

determining a target display position of the virtual object based on second display position information corresponding to the target key point of the hand and a preset relative position relationship between the target key point of the hand and the virtual object;

controlling the virtual object in the augmented reality image to move to the target display position based on the determined target display position of the virtual object.

In the embodiment of the disclosure, the display position of the virtual object can be adjusted based on the second display position information of the target key point of the hand in the world coordinate system and the preset relative position relationship, so that the purpose of controlling the display position of the virtual object through the hand is achieved.

In a second aspect, an embodiment of the present disclosure provides an apparatus for controlling a virtual object, including:

the display module is used for acquiring a real scene image and displaying an augmented reality image formed by overlapping the real scene image and a virtual object on the terminal equipment;

the identification module is used for identifying first display position information of a target key point of a hand in the real scene image under an equipment coordinate system and identifying length information of the arm in the real scene image;

the adjusting module is used for converting the first display position information into second display position information under a world coordinate system based on the first display position information and the length information of the arm;

and the control module is used for controlling the display position of the virtual object in the augmented reality image based on the second display position information.

In one possible embodiment, the identification module, when configured to identify first display position information of a target key point of a hand in the real scene image in a device coordinate system, includes:

and acquiring the position coordinates of the target position point of the detection area in the real scene image, and taking the position coordinates as the first display position information.

In a possible implementation, the identification module, when identifying the length information of the arm in the real scene image, includes:

detecting the real scene image, and determining the arm posture type in the real scene image;

In one possible embodiment, the adjusting module, when configured to convert the first display position information into second display position information in a world coordinate system based on the first display position information and the length information of the arm, includes:

In a possible implementation, the adjusting module, when configured to determine the second display position information based on the third display position information, a camera projection matrix of an image capturing unit of the terminal device, and the length information of the arm, includes:

determining fourth display position information of a target key point of the hand under a camera coordinate system based on the third display position information, the camera projection matrix and the length information of the arm;

In a possible embodiment, the control module, when being configured to control the display position of the virtual object in the augmented reality image based on the second display position information, includes:

determining position change data of the target key point of the hand in the real scene based on second display position information corresponding to the target key point of the hand in different frames of real scene images;

In a possible embodiment, the control module, when configured to control the display position of the virtual object in the augmented reality image based on the second display position information, includes:

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the control method according to the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the control method according to the first aspect.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flowchart of a control method for a virtual object according to an embodiment of the present disclosure;

FIG. 2 illustrates a flowchart of a method for determining first display location information of a target keypoint of a hand provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating an image of a real scene including a hand provided by an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for determining length information of an arm according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for determining second display location information of a target keypoint of a hand provided by an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a specific method for determining second display location information of a target key point of a hand according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a transformation of a camera coordinate system to a standardized device coordinate system provided by an embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating a method for controlling movement of a virtual object according to an embodiment of the present disclosure;

FIG. 9 illustrates a flow chart of another method for controlling movement of a virtual object provided by an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram illustrating a control apparatus for a virtual object according to an embodiment of the present disclosure;

fig. 11 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

Augmented Reality (AR) technology may be applied to an AR device, which may be any electronic device capable of supporting AR functions, including but not limited to AR glasses, a tablet computer, a smart phone, and the like. When the AR device is operated in a real scene, the AR scene superimposed by the virtual object and the real scene can be viewed through the AR device, in the AR scene, the application of controlling the virtual object based on the hand is very wide, for example, the display position of the virtual object in the AR scene can be changed through the movement of the hand.

When the position to the hand is discerned, can adopt and ask the barycenter position of hand based on the coordinate of detecting each skeleton point of hand, the show position of virtual object is controlled to the removal through the barycenter position, and this mode need detect out the coordinate of each skeleton point of hand, asks the barycenter position according to a plurality of coordinates again, and the process is comparatively loaded down with trivial details, when the barycenter position based on definite hand is controlled virtual object, has the problem that efficiency is lower.

Based on the research, the present disclosure provides a method for controlling a virtual object, which may determine second display position information of a target key point of a hand in a world coordinate system based on first display position information of the target key point of the hand in an equipment coordinate system and length information of the arm, and quickly determine the second display position information of the target key point of the hand in the world coordinate system by using the length information of the arm, without identifying a large number of hand skeleton points, so as to improve identification efficiency of the target key point of the hand, and further improve processing efficiency in a process of controlling the virtual object and optimize a control result when the target key point of the hand controls a display position of the virtual object.

For convenience of understanding of the present embodiment, first, a detailed description is given to a control method for a virtual object disclosed in the embodiments of the present disclosure, an execution subject of the control method for a virtual object provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and specifically may be a terminal device, a server, or other processing devices, and exemplarily, the terminal device may include an AR device such as a mobile phone, a tablet, and AR glasses, which is not limited herein. The AR device may connect to the server through the application. In some possible implementations, the control method of the virtual object may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a control method for a virtual object according to an embodiment of the present disclosure is shown, where the control method includes the following steps S101 to S104:

s101, acquiring a real scene image, and displaying an augmented reality image formed by overlapping the real scene image and a virtual object on a terminal device.

Exemplarily, the real scene image may be acquired through an image acquisition unit of the terminal device, and the image acquisition unit may include a camera arranged on the terminal device, such as a smartphone and a camera on a tablet, and may also include an external camera, such as a camera externally connected to the AR glasses.

After acquiring the real scene image, the image acquisition unit of the terminal device may superimpose the real scene image with a virtual object generated in advance by means of a computer graphics technology and a visualization technology to obtain an augmented reality image displayed on a screen of the terminal device, where the real scene image is, for example, a park, and the virtual object is a virtual panda superimposed on the park, or the real scene image is a playground runway and the virtual object is a virtual racing car superimposed on the playground runway.

S102, identifying first display position information of a target key point of a hand in the real scene image under the equipment coordinate system, and identifying length information of an arm in the real scene image.

Illustratively, key points of a hand region in a real scene image shot by the image acquisition unit can be identified, at least one key point of the hand region is available, and a target key point can be selected from the at least one key point. For example, for convenience of recognition, a central point of the hand may be selected as a target key point, a key point on a finger in the hand may be selected as a target key point, and the like.

After a real scene image including a hand is obtained, the hand included in the real scene image may be detected based on a pre-trained neural network for target detection, a detection area including the hand in the real scene image, such as a detection frame including the hand, may be determined, and first display position information of a target key point of the hand in a device coordinate system may be determined based on a position of the detection frame.

For example, taking a corner of the display screen as the origin of the device coordinate system, taking two perpendicular and intersecting sides of the display screen as coordinate axes of the device coordinate system, for example, taking a mobile phone as an example, taking a corner of the mobile phone screen as the origin of the device coordinate system, taking a long side passing through the corner as a horizontal axis (x axis) of the device coordinate system, and taking a short side passing through the corner as a vertical axis (y axis) of the device coordinate system.

If the terminal device is an AR glasses, and the image of the real scene is displayed on the lens of the AR glasses, the device coordinate system may be a coordinate system established with an angle point of the lens screen projected thereon as an origin and two mutually perpendicular and intersecting lines passing through the origin in the lens screen as coordinate axes.

In addition, when the user holds the terminal device in hand, the real scene image captured by the image capturing unit further includes an arm, and the length information of the arm in the real scene image is related to the posture of the arm, for example, the length of the arm in the extended state is longer than the length of the arm in the retracted state, so that the length information of the arm included in the real scene image can be determined by recognizing the posture of the arm in the real scene image.

And S103, converting the first display position information into second display position information under a world coordinate system based on the first display position information and the length information of the arm.

The first display position information corresponding to the target key point of the hand comprises coordinate values of the target key point of the hand along an x axis and a y axis respectively under an equipment coordinate system, and because the real scene image is obtained by shooting through the image acquisition unit, the second display position information of the target key point of the hand under a world coordinate system can be determined according to a camera projection matrix of the image acquisition unit.

For example, a world coordinate system may be pre-constructed for a real scene, for example, an exhibition hall with the real scene being an exhibition hall, a set position point of the exhibition hall may be used as an origin of the world coordinate system, three mutually perpendicular straight lines are used as three coordinate axes of the world coordinate system, and the world coordinate system corresponding to the exhibition hall is obtained after a positive direction of each coordinate axis is determined.

Illustratively, when determining world position coordinates of the target key point of the hand in the world coordinate system, considering that the arm of the user is aligned with the hand and is approximately parallel to the optical axis of the image acquisition unit of the terminal device, depth information of the target key point of the hand in the camera coordinate system can be represented by length information of the arm, and then second display position information of the target key point of the hand in the world coordinate system is determined by combining first display position information corresponding to the target key point of the hand.

And S104, controlling the display position of the virtual object in the augmented reality image based on the second display position information.

For example, the initial display position of the virtual object in the augmented reality scene may be determined in a pre-constructed three-dimensional scene model representing the real scene, the three-dimensional scene model and the real scene are presented according to a ratio of 1.

Illustratively, the movement of the virtual object in the augmented reality image may be controlled by the movement of the target key point of the hand, or the virtual object may be triggered by the target key point of the hand, so as to control the change of the special effect presented by the virtual object, for example, the virtual object is in a static state in the augmented reality image, when it is detected that the second display position information of the target key point of the hand and the position information of the virtual object in the world coordinate system coincide, the special effect of the hand-triggered virtual object may be achieved, and at this time, the virtual object may start to move along with the movement of the target key point of the hand, so as to achieve the purpose of controlling the virtual object by the hand.

In the embodiment of the disclosure, second display position information of the target key point of the hand in the world coordinate system can be determined based on the first display position information of the target key point of the hand in the equipment coordinate system and the length information of the arm, the second display position information of the target key point of the hand in the world coordinate system can be quickly determined by means of the length information of the arm, and a large number of hand skeleton points do not need to be identified, so that the identification efficiency of the target key point of the hand can be improved, further, when the display position of the virtual object is controlled based on the target key point of the hand, the processing efficiency in the process of controlling the virtual object can be improved, and the control effect is optimized.

The above-mentioned S101 to S104 will be described in detail with reference to specific embodiments.

In relation to S102 above, when recognizing the first display position information of the target key point of the hand in the real scene image in the device coordinate system, as shown in fig. 2, the method may include:

s1021, performing hand detection on the real scene image, and determining a detection area containing the hand in the real scene image;

s1022, a position coordinate of the target position point in the detection region in the real scene image is acquired, and the position coordinate is used as the first display position information.

For example, the hand detection may be performed on the real scene image based on a pre-trained neural network for performing target detection, so as to obtain a detection area including the hand in the real scene image, that is, obtain a detection frame labeled for the hand position in the real scene image, as shown in fig. 3, which is a schematic diagram of the detected detection frame including the hand.

Further, when the neural network outputs the detection frame including the hand, the neural network may simultaneously output position coordinates corresponding to four corner points of the detection frame, for example, as shown in fig. 3, an apparatus coordinate system is established with the real scene image, the four corner points of the detection frame include an upper left corner point k1, an upper right corner point k2, a lower left corner point k3, and a lower right corner point k4, and the target position point of the detection frame, for example, the position coordinates of the central point in the detection frame in the real scene image, may be determined based on the corresponding position coordinates of the four corner points in the real scene image.

For example, a straight line equation corresponding to the first diagonal line may be determined based on the upper left corner point k1 and the lower right corner point k4, then a straight line equation corresponding to the second diagonal line may be determined based on the upper right corner point k2 and the lower left corner point k3, and then the position coordinate of the central point of the detection frame in the real scene image may be determined based on the straight line equation corresponding to the first diagonal line and the straight line equation corresponding to the second diagonal line, for example, an intersection of the straight line equation corresponding to the first diagonal line and the straight line equation corresponding to the second diagonal line may be used as the position coordinate of the central point of the detection frame in the real scene image.

In the case of recognizing the length information of the arm in the real scene image in S102, as shown in fig. 4, the method may include the following steps S1023 to S1024:

s1023, detecting the real scene image and determining the arm gesture type in the real scene image;

and S1024, determining the length information of the arm in the real scene image based on the arm posture type and the mapping relation between the pre-established arm posture type and the length information of the arm.

Similarly, the real scene image is detected through a pre-trained neural network for gesture detection, and an arm gesture category corresponding to an arm included in the real scene image can be determined, for example, the arm gesture type may include three types, such as an arm stretching state, an arm semi-stretching state, and an arm retracting state, and further based on a mapping relationship between the pre-established arm gesture type and the length information of the arm, the length information of the arm in the real scene image can be determined.

For example, the mapping relationship between the pre-established arm posture type and the length information of the arm may include: when the arm is in the extended state, the length information of the corresponding arm is 0.65m, when the arm is in the semi-extended state, the length information of the corresponding arm is 0.45m, when the arm is in the retracted state, the length information of the corresponding arm is 0.2m, and if the real scene image is detected and the arm posture type in the real scene image is determined to be in the extended state, the length information of the arm can be rapidly determined to be 0.65m.

In the case of converting the first display position information into the second display position information in the world coordinate system based on the first display position information and the arm length information in S103, as shown in fig. 5, the following steps S1031 to S1032 may be included:

s1031, normalization processing is carried out on the first display position information based on the screen size information of the terminal equipment, and third display position information of the target key point of the hand under a standard equipment coordinate system is obtained;

and S1032, determining second display position information based on the third display position information, the camera projection matrix of the image acquisition unit of the terminal device and the length information of the arm.

Considering that device coordinate systems corresponding to different terminal devices are not uniform, a standardized device coordinate system may be introduced, where a value range on an x-axis and a y-axis of the standardized device coordinate system is 0 to 1, so that when mapping coordinate values of the first display position information on the device coordinate system along the x-axis and the y-axis to the standardized device coordinate system, normalization processing on the first display position information is required, specifically, the coordinate value of the first display position information on the device coordinate system along the x-axis may be normalized to a value of 0 to 1, and meanwhile, the coordinate value of the first display position information on the device coordinate system along the y-axis may be normalized to a value of 0 to 1.

Considering that the device coordinate system is a coordinate system constructed by a screen of the terminal device, when the first display position information is normalized, the normalization processing can be performed through the screen size information of the terminal device, and after the normalization processing is performed, the third display position information of the target key point of the hand under the standardized device coordinate system can be obtained, so that the third display position information of the target key point of the hand under the standardized device coordinate systems with different screen sizes can be determined in a unified manner.

Specifically, note that the first display position information of the target key point of the hand in the device coordinate system is p _screen ＝(x _screen ,y _screen ) The third display position information P of the target key point of the hand in the standardized device coordinate system may be determined by the following formula (1) and formula (2) _ndc ＝(x _ndc ,y _ndc )：

x _ndc ＝x _screen /W (1)；

y _ndc ＝y _screen /L (2)；

Wherein x is _ndc Coordinate values of the target key points of the hand along the x-axis direction under the standardized equipment coordinate system; y is _ndc Coordinate values of the target key points of the hands along the y-axis direction under the standardized equipment coordinate system; x is the number of _screen Coordinate values of target key points of the hands along the x-axis direction under the equipment coordinate system; y is _screen Coordinate values of the target key points of the hands along the y-axis direction under the equipment coordinate system; w represents the length of the screen of the terminal device along the x-axis direction under the device coordinate system; l denotes the length of the screen of the terminal device in the y-axis direction under the device coordinate system.

After third display position information corresponding to the target key point of the hand is obtained, fourth display position information of the target key point of the hand in a camera coordinate system can be determined based on the third display position information and a camera projection matrix of an image acquisition unit of the terminal device, wherein a coordinate of the target key point of the hand in the camera coordinate system along a z axis can be determined through length information of the arm, and further second display position information of the target key point of the hand in a world coordinate system can be determined based on the fourth display position information of the target key point of the hand in the camera coordinate system.

Specifically, when determining the second display position information based on the third display position information, the camera projection matrix of the image capturing unit of the terminal device, and the length information of the arm, as shown in fig. 6, the following S10321 to S10322 may be included:

s10321, determining fourth display position information of the target key point of the hand under a camera coordinate system based on the third display position information, the camera projection matrix and the length information of the arm;

and S10322, determining second display position information based on the fourth display position information and a camera external reference matrix used when the image acquisition unit captures the image of the real scene.

The camera projection matrix is a conversion matrix of a standardized equipment coordinate system and a camera coordinate system, and the camera external reference matrix is a conversion matrix of a world coordinate system and a camera coordinate system.

Wherein, the camera projects a matrix M _proj Can be expressed by the following formula (3):

to explain the parameters n, r, l, t, and b included in the camera projection matrix, fig. 7 is introduced below, which is a schematic diagram of converting the camera coordinate system corresponding to the image acquisition unit into the standardized device coordinate system, where (a) in fig. 7 is a view frustum under the camera coordinate system, and (b) in fig. 7 is a unit cube under the standardized device coordinate system after the view frustum under the camera coordinate system is processed by the camera projection matrix, that is, the camera projection matrix can normalize the view frustum. The method comprises the following steps that a scene in a view frustum is visible, a scene outside the view frustum is invisible, the view frustum comprises a far clipping plane ABCD and a near clipping plane EFGH, wherein (r, t, n) represents a coordinate value of a point F in the near clipping plane, r represents a coordinate value of the point F in the x-axis direction under a camera coordinate system, t represents a coordinate value of the point F in the y-axis direction under the camera coordinate system, and n represents a coordinate value of the point F in the z-axis direction under the camera coordinate system; (l, b, n) represents coordinates of a point H in the near clipping plane, wherein l represents a coordinate value of the point H in the x-axis direction in the camera coordinate system, b represents a coordinate value of the point H in the y-axis direction in the camera coordinate system, and n represents a coordinate value of the point H in the z-axis direction in the camera coordinate system; f represents the focal length of the camera corresponding to the image acquisition unit.

The common parameters of the camera, namely, the camera field angle fov and the camera aspect ratio aspect, may be determined according to the following formulas (4) and (5):

the camera projection matrix can also be determined by some intrinsic parameters of the camera field angle, the camera aspect ratio, the near clipping plane, the far clipping plane and the camera focal length.

Next, a projection formula (6) is introduced, and fourth display position information of the target key point of the hand in the camera coordinate system is determined through the projection formula (6).

P _ndc ＝M _proj *P _cam (6)；

Wherein, P _cam ＝(x _cam ,y _cam ,z _cam ) Fourth display position information in the camera coordinate system representing the target key point of the hand, wherein x _cam Coordinate values of the target key points of the hand along the x-axis direction under the camera coordinate system, y _cam Coordinate value of target key point of hand along y-axis direction under camera coordinate system, z _cam Coordinate values of the target key points of the hand along the z-axis direction under the camera coordinate system, z _cam Can be determined by the arm length information of the arm in the camera coordinate system, x _cam And y _cam Can be determined by equation (6).

After fourth display position information of the target key point of the hand in the camera coordinate system is obtained, second display position information of the target key point of the hand may be determined by the following formula (7):

P _world ＝P _cam *M _cam (7)；

wherein, P _world Target key points representing hands are worldwideSecond display position information under the coordinate system; m _cam The image acquisition unit is used for acquiring a real scene image, and the image acquisition unit is used for acquiring an inverse matrix of a camera external reference matrix, specifically a conversion matrix for converting a camera coordinate system into a world coordinate system.

Specifically, the camera external reference matrix used when the image acquisition unit of the terminal device captures the real scene image may include a translation vector and a rotation matrix when the world coordinate system is converted into the camera coordinate system, and may be represented by corresponding position information of a world coordinate system origin in the camera coordinate system when the image acquisition unit captures the real scene image, and a rotation angle of a coordinate axis of the world coordinate system in the camera coordinate system when the image acquisition unit captures the real scene image, or may be represented by pose data in the world coordinate system when the image acquisition unit captures the real scene image, and specifically, when determining the pose data when the image acquisition unit captures the real scene image, the pose data may be determined by an immediate positioning and mapping (SLAM) algorithm, which is not described in detail in this disclosure, and after obtaining the camera external reference matrix, the second display position information of the target key point of the hand may be further determined according to the above formula (7).

As to S104, in an embodiment, as shown in fig. 8, the method may include the following steps S1041 to S1042:

s1041, determining position change data of the target key points of the hand in the target scene based on second display position information corresponding to the target key points of the hand in different frames of real scene images;

and S1042, controlling the display position of the virtual object in the augmented reality image to move based on the position change data.

For example, the image acquisition unit of the terminal device may acquire real scene images at set time intervals, and determine second display position information corresponding to the target key point of the hand in each frame of the real scene images in the manner described above, so as to determine position change data of the target key point of the hand in the real scene within a set time, and then control the display position of the virtual object based on the position change data.

Exemplarily, taking a real scene as an indoor exhibition hall as an example, the exhibition hall includes three entity tables, which are respectively denoted as table a, table B, and table C, the virtual object is a virtual vase, and an initial position of the virtual vase in a world coordinate system corresponding to the exhibition hall is located on the table a, so that the virtual vase located on the table a can be seen in the augmented reality image, the virtual vase located on the table a can be triggered by a target key point of a hand, and then the display position of the virtual vase starts to be moved, for example, when second display position information corresponding to the target key point of the hand in two consecutive frames of real scene images is detected to be moved from the table a to the table C, the virtual vase can be controlled to be moved from the table a to the table C, so as to present an augmented reality image of the virtual vase located on the table C.

For example, when it is detected that the target key point of the hand and the virtual object are overlapped and set in the world coordinate system, an adjustment process for the display position of the virtual object may be triggered, for example, when it is detected that the second display position information corresponding to the target key point of the hand and the position information of the virtual object in the world coordinate system are overlapped and reach the set length, the adjustment for the display position of the virtual object in the augmented reality image may be started.

In another application scenario, for example, in an AR game scenario, the control of the dynamic virtual object may also be completed through the second display position information corresponding to the target key point of the hand in the different frames of real scene images, for example, the virtual object is a virtual racing car, and the driving trajectory of the virtual racing car in the augmented reality image is continuously adjusted through the position change data of the target key point of the hand in the different frames of real scene images in the real scene.

In another embodiment, as shown in fig. 9, the method for S104 may include the following steps S1043 to S1044:

s1043, determining a target display position of the virtual object based on second display position information corresponding to the target key point of the hand and a preset relative position relationship between the target key point of the hand and the virtual object;

and S1044, controlling the virtual object in the augmented reality image to move to the target display position based on the determined target display position of the virtual object.

Illustratively, the preset relative position relationship between the target key point of the hand and the virtual object may be preset, or may be an initial relative position relationship between the target key point of the hand and the virtual object when the target key point of the hand and the virtual object are first acquired in the augmented reality image.

Based on the preset relative position relationship between the target key point of the hand and the virtual object, the target display position of the virtual object can be determined based on the second display position information of the target key point of the hand, which is acquired in real time, in the world coordinate system at the current moment, and then the virtual object in the augmented reality image is controlled to move to the target display position.

For example, the virtual object is a virtual color brush, when the virtual color brush moves in the augmented reality image, a color line corresponding to a movement track may be presented in the augmented reality image, for example, the virtual color brush may draw a pink line according to the movement track, when a multi-frame real scene image is collected, a target display position at which the virtual color brush continuously moves may be determined based on the second display position information and a preset relative position relationship along with continuous movement of a target key point of a hand, so that a movement track may be formed in the augmented reality image according to the movement of the target display position, a color line corresponding to the movement track may be correspondingly presented in the augmented reality image, for example, the movement track of the target key point of the hand of the user is circular, and then a special effect of drawing the circular color line by the virtual color brush may be presented in the augmented reality image.

It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.

Based on the same technical concept, a control device of a virtual object corresponding to a control method of the virtual object is also provided in the embodiments of the present disclosure, and because the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the control method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not described again.

Referring to fig. 10, a schematic structural diagram of a control apparatus 1000 for a virtual object according to an embodiment of the present disclosure is shown, where the control apparatus 1000 for a virtual object includes:

a display module 1001 configured to acquire a real scene image and display an augmented reality image in which the real scene image and a virtual object are superimposed on a terminal device;

the identification module 1002 is configured to identify first display position information of a target key point of a hand in a real scene image in an equipment coordinate system, and identify length information of the arm in the real scene image;

an adjusting module 1003, configured to convert the first display position information into second display position information in a world coordinate system based on the first display position information and the length information of the arm;

and a control module 1004 configured to control a display position of the virtual object in the augmented reality image based on the second display position information.

In one possible implementation, the identifying module 1002, when configured to identify first display position information of a target key point of a hand in a real scene image in a device coordinate system, includes:

performing hand detection on the real scene image, and determining a detection area containing the hand in the real scene image;

and acquiring the position coordinates of the target position point of the detection area in the real scene image, and taking the position coordinates as first display position information.

In one possible implementation, the recognition module 1002, when recognizing the length information of the arm in the real scene image, includes:

and determining the length information of the arm in the real scene image based on the arm gesture type and the mapping relation between the pre-established arm gesture type and the length information of the arm.

In one possible embodiment, the adjusting module 1003, when being configured to convert the first display position information into the second display position information in the world coordinate system based on the first display position information and the length information of the arm, includes:

and determining second display position information based on the third display position information, the camera projection matrix of the image acquisition unit of the terminal equipment and the length information of the arm.

In a possible implementation, the adjusting module 1003, when configured to determine the second display position information based on the third display position information, the camera projection matrix of the image capturing unit of the terminal device, and the length information of the arm, includes:

determining second display position information based on the fourth display position information and a camera external parameter matrix used when the image acquisition unit shoots the real scene image;

In one possible implementation, the control module 1004, when configured to control the display position of the virtual object in the augmented reality image based on the second display position information, includes:

determining position change data of the target key points of the hand in the real scene based on second display position information corresponding to the target key points of the hand in different frames of real scene images;

and controlling the virtual object in the augmented reality image to move to the target display position based on the determined target display position of the virtual object.

The description of the processing flow of each module in the apparatus and the interaction flow between the modules may refer to the relevant description in the above method embodiments, and will not be described in detail here.

Corresponding to the control method of the virtual object in fig. 1, an embodiment of the present disclosure further provides an electronic device 1100, and as shown in fig. 11, a schematic structural diagram of the electronic device 1100 provided in the embodiment of the present disclosure includes:

a processor 111, a memory 112, and a bus 113; the storage 112 is used for storing execution instructions and includes a memory 1121 and an external storage 1122; the memory 1121 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 111 and data exchanged with the external memory 1122 such as a hard disk, the processor 111 exchanges data with the external memory 1122 through the memory 1121, and when the electronic device 1100 operates, the processor 111 communicates with the memory 112 through the bus 113, so that the processor 111 executes the following instructions: acquiring a real scene image, and displaying an augmented reality image overlapped by the real scene image and a virtual object on terminal equipment; identifying first display position information of a target key point of a hand in a real scene image under an equipment coordinate system, and identifying length information of an arm in the real scene image; converting the first display position information into second display position information under a world coordinate system based on the first display position information and the length information of the arm; and controlling the display position of the virtual object in the augmented reality image based on the second display position information.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the control method for a virtual object described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the method for controlling a virtual object provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the method for controlling a virtual object described in the above method embodiments, which may be referred to in the above method embodiments specifically, and are not described herein again.

The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for controlling a virtual object, comprising:

acquiring a real scene image, and displaying an augmented reality image formed by overlapping the real scene image and a virtual object on terminal equipment;

2. The control method according to claim 1, wherein the identifying first display position information of a target key point of a hand in the real scene image in a device coordinate system comprises:

3. The control method according to claim 1, wherein the identifying length information of the arm in the image of the real scene includes:

and determining the length information of the arm in the real scene image based on the arm posture type and a mapping relation between the pre-established arm posture type and the length information of the arm.

4. The control method according to any one of claims 1 to 3, wherein the converting the first display position information into second display position information in a world coordinate system based on the first display position information and the length information of the arm includes:

5. The control method according to claim 4, wherein the determining the second display position information based on the third display position information, a camera projection matrix of an image acquisition unit of the terminal device, and the length information of the arm includes:

determining the second display position information based on the fourth display position information and a camera external reference matrix used when the image acquisition unit shoots the real scene image;

6. The control method according to any one of claims 1 to 5, wherein the controlling the display position of the virtual object in the augmented reality image based on the second display position information includes:

7. The control method according to any one of claims 1 to 5, wherein the controlling the display position of the virtual object in the augmented reality image based on the second display position information includes:

8. An apparatus for controlling a virtual object, comprising:

the identification module is used for identifying first display position information of a target key point of a hand in the real scene image under an equipment coordinate system and identifying length information of an arm in the real scene image;

9. The control apparatus according to claim 8, wherein the identifying module, when configured to identify the first display position information of the target key point of the hand in the real scene image in the device coordinate system, comprises:

10. The control device according to claim 8, wherein the recognition module, when recognizing the length information of the arm in the real scene image, includes:

11. The control device according to any one of claims 8 to 10, wherein the adjustment module, when being configured to convert the first display position information into second display position information in a world coordinate system based on the first display position information and the length information of the arm, includes:

12. The control apparatus according to claim 11, wherein the adjusting module, when determining the second display position information based on the third display position information, a camera projection matrix of an image capturing unit of the terminal device, and the length information of the arm, comprises:

13. The control device according to any one of claims 8 to 12, wherein the control module, when configured to control the display position of the virtual object in the augmented reality image based on the second display position information, includes:

14. The control device according to any one of claims 8 to 12, wherein the control module, when configured to control the display position of the virtual object in the augmented reality image based on the second display position information, includes:

15. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the control method of any of claims 1 to 7.

16. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the control method according to one of claims 1 to 7.