CN113315914B

CN113315914B - Panoramic video data processing method and device

Info

Publication number: CN113315914B
Application number: CN202110571013.4A
Authority: CN
Inventors: 潘一汉
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2022-05-17
Anticipated expiration: 2041-05-25
Also published as: CN113315914A

Abstract

The application provides a panoramic video data processing method and a panoramic video data processing device, wherein the panoramic video data processing method comprises the following steps: determining a reference polar coordinate of a target frame of a framing target object in a current video frame under the condition that the center of the target object is detected to enter a fixed visual angle area; generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in a spherical polar coordinate system; and returning to execute the operation step of generating a tracking plane image of the next video frame by taking the reference polar coordinate as the center until the center of the target object leaves the fixed visual angle release area according to the panoramic image of the next video frame in the spherical polar coordinate system. Therefore, as long as the target object does not leave the fixed visual angle release area, the corresponding plane image is generated by always taking the same polar coordinate as the center, so that the inaccurate tracking effect caused by the movement of the target object at a large angle is effectively eliminated.

Description

Panoramic video data processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a panoramic video data processing method. The application also relates to a panoramic video data processing device, a computing device and a computer readable storage medium.

Background

With the rapid development of computer technology and image processing technology, panoramic videos come into play, the shooting and production of panoramic videos are more and more pursued by people, and a plurality of video websites take the panoramic videos as a special category for users to select and watch. The panoramic video is a dynamic video which is shot by a panoramic camera and contains 360-degree omnibearing picture contents, a static panoramic picture is converted into a dynamic video image, and a user can watch the dynamic video within the shooting angle range of the panoramic camera at will.

In the prior art, when tracking a target object in a panoramic video, the panoramic video is often directly decoded into a frame sequence image, and then the target object is directly identified and tracked on the original panoramic image obtained by decoding frame by frame. However, when the target object passes through or passes through the position right below or above the spherical polar coordinate system at a large angle, that is, when the target object moves at a large angle, the currently played picture will have an effect of instantaneous near-vertical turning or large-angle deflection, and therefore the target object cannot be tracked, and the tracking effect is poor.

Disclosure of Invention

In view of this, an embodiment of the present application provides a method for processing panoramic video data. The application also relates to a panoramic video data processing device, a computing device and a computer readable storage medium, which are used for solving the problem of poor tracking effect when a target object moves at a large angle in a panoramic video in the prior art.

According to a first aspect of embodiments of the present application, there is provided a panoramic video data processing method, including:

determining a reference polar coordinate of a target frame of a framing target object in a current video frame under the condition that the center of the target object is detected to enter a fixed visual angle area;

generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

and taking the next video frame as the current video frame, returning to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system until the center of the target object leaves a fixed visual angle release area.

According to a second aspect of embodiments of the present application, there is provided a panoramic video data processing apparatus including:

the first determination module is configured to determine a reference polar coordinate of a target frame for framing the target object in the current video frame under the condition that the center of the target object is detected to enter the fixed view angle area;

a generation module configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

and the execution module is configured to take the next video frame as a current video frame, return to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center until the center of the target object leaves a fixed visual angle release area according to the panoramic image of the next video frame in the spherical polar coordinate system.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the method of:

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any of the panoramic video data processing methods.

According to the panoramic video data processing method, under the condition that the center of a target object is detected to enter a fixed visual angle area, the reference polar coordinates of a target frame of the framing target object in a current video frame are determined; generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system; and taking the next video frame as the current video frame, returning to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system until the center of the target object leaves a fixed visual angle release area. In this case, a fixed view angle area and a fixed view angle release area may be preset as buffer areas, and after the center of the target object enters the fixed view angle area, a fixed reference polar coordinate is used as the center to generate a tracking plane image corresponding to a subsequent video frame, that is, as long as the target object does not leave the fixed view angle release area, a corresponding tracking plane image is always generated with the same polar coordinate as the center, thereby effectively eliminating large-amplitude deflection of the tracking plane image when the target object moves at a large angle, and ensuring the tracking effect of the target object.

Drawings

Fig. 1 is a flowchart of a panoramic video data processing method according to an embodiment of the present application;

fig. 2 is a top view of a sphere corresponding to a polar coordinate system of the sphere provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a spherical polar coordinate system according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a panoramic video data processing method according to an embodiment of the present application

Fig. 5 is a schematic structural diagram of a panoramic video data processing apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the present application, a panoramic video data processing method is provided, and the present application relates to a panoramic video data processing apparatus, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows a flowchart of a panoramic video data processing method according to an embodiment of the present application, which specifically includes the following steps:

step 102: and under the condition that the center of the target object is detected to enter a fixed visual angle area, determining the reference polar coordinates of a target frame for framing the target object in the current video frame.

In practical application, when a target object in a panoramic video is tracked at present, the panoramic video is usually directly decoded into a frame sequence image, and then the target object is directly identified and tracked on the original panoramic image obtained by decoding frame by frame. However, when the target object passes through or passes under or over the spherical polar coordinate system at a large angle, that is, the target object moves at a large angle (for example, from an azimuth angle of 90, an elevation angle of-89.9 to an azimuth angle of-90, and an elevation angle of-89.9), the current plane image may have an effect of instantaneous near-vertical flipping or large-angle deflection, which may result in that the target object cannot be tracked, and the tracking effect is poor.

That is, when the target object is located near the boundary position of a certain angle in the spherical polar coordinate system, if the target object moves a small distance to the left or right, the angle changes relatively greatly, but the angle at which the target object actually moves is very small. In a possible implementation manner, when the target object is near the bottom (near the south pole) in the spherical polar coordinate system, if the target object moves a little left and right, the corresponding azimuth angle of the target object changes greatly, for example, if the target object is at a position with an azimuth angle of-85 degrees, and if the target object moves 10 degrees right, the corresponding azimuth angle of the target object changes to 85 degrees, that is, the target object actually moves by an angle of only 10 degrees, but the corresponding azimuth angle changes from-85 degrees to 85 degrees.

For example, fig. 2 is a top view of a sphere corresponding to a spherical polar coordinate system provided in an embodiment of the present application, and as shown in fig. 2, a target object passes through a large angle from the bottom of the spherical polar coordinate system corresponding to a panoramic video. Wherein zone 1 is an azimuth-85 degree zone and zone 2 is an azimuth-80 degree zone. When the target object has a large angle across the bottom, the center of the target object moves from point a to point b.

Therefore, in order to improve the tracking effect of the target object, the application provides a panoramic video data processing method, which determines the reference polar coordinates of the target frame of the framed target object in the current video frame under the condition that the center of the target object is detected to enter a fixed view angle area; generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system; and taking the next video frame as the current video frame, returning to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system until the center of the target object leaves a fixed visual angle release area. In this case, after the center of the target object enters the fixed view angle region, the fixed reference polar coordinate is used as the center to generate the planar image corresponding to the subsequent video frame, that is, as long as the target object does not leave the fixed view angle release region, the corresponding planar image is always generated by using the same polar coordinate as the center, so that the large-amplitude deflection of the planar image when the target object moves at a large angle is effectively eliminated, and the tracking effect of the target object is ensured.

Specifically, the target object is an object that the user wants to subsequently display in the center of the plane image, and the content framed by the framing operation is the target object; the target frame is a frame for selecting a target object; the current video frame is a panoramic video frame when the center of the target object is detected to enter a fixed view angle area, namely the current video frame is a video frame of the fixed view angle area. In addition, the fixed view angle region is a region range which is preset, and if the target object enters the fixed view angle region, the reference polar coordinate of the target frame of the framing target object in the current video frame can be fixed as the view angle center of the planar image of the subsequent video frame, namely, the reference polar coordinate is taken as the center to generate the corresponding planar image subsequently.

In addition, the spherical polar coordinate system is also called a spatial polar coordinate system, and is a kind of three-dimensional coordinate system, which is extended from a two-dimensional polar coordinate system to determine the positions of the midpoint, line, plane and body in the three-dimensional space, and it uses the origin of coordinates as a reference point, and is composed of an azimuth angle, an elevation angle and a radial distance, in this application, the radial distance in the spherical polar coordinate system is set as a default value in advance, and is usually set to be between 100 and 300, such as 128. That is, the polar spherical coordinate system in the present application is a polar spherical coordinate system with a fixed spherical radius, and thus the polar reference coordinate system in the present application includes an azimuth angle and an elevation angle, by which a point on the spherical surface (i.e., a point on the spherical surface corresponding to the center position of the target object) can be uniquely determined.

In practical application, for a planar image of each video frame, the coordinate position of the center of a target object in the video frame can be determined through image recognition, then whether the coordinate position is located in the fixed view angle area or not is judged, if yes, the center of the target object is determined to enter the fixed view angle area, and the video frame is determined to be the current video frame.

For example, fig. 3 is a schematic diagram of a spherical polar coordinate system according to an embodiment of the present application, and as shown in fig. 3, lat (elevation angle) and lon (azimuth angle) are polar coordinate representations of an elevation angle and an azimuth angle of a point a in a sphere, respectively.

In an optional implementation manner of this embodiment, when it is detected that the center of the target object enters the fixed view angle region, before determining the reference polar coordinates of the target frame that frames the target object in the current video frame, the method further includes:

and presetting the fixed visual angle area and the fixed visual angle release area, wherein the fixed visual angle area is included in the fixed visual angle release area.

In a specific implementation, a region between a positive first threshold and a negative first threshold in the spherical polar coordinate system may be determined as a fixed viewing angle region, and a region between a positive second threshold and a negative second threshold in the spherical polar coordinate system may be determined as a fixed viewing angle release region, where the first threshold is greater than the second threshold, that is, the fixed viewing angle region is included in the fixed viewing angle release region, that is, the fixed viewing angle release region is greater than the fixed viewing angle region.

It should be noted that when the target object passes through the vertical upper or lower part of the spherical polar coordinate system, the azimuth angle of the center of the viewing angle of the planar image may change greatly instantly, and due to the deviation of the algorithm, when the center of the target object changes back and forth around the elevation angle of 90 degrees, the azimuth angle may also change greatly continuously, thereby resulting in poor tracking effect of the target object. Therefore, the fixed view angle area is set, the change condition of the reference polar coordinate corresponding to the large angle is avoided being frequently calculated, and the reference polar coordinate of the target frame is calculated once only when the target object enters the fixed view angle area. In addition, the fixed view angle release area is set to prevent the target object from moving without stopping near the boundary of the fixed view angle area, and the reference polar coordinates corresponding to a large angle may be calculated multiple times, so that the fixed view angle area and the fixed view angle release area are not the same in the present application.

For example, the fixed viewing angle region may be a region greater than 85 degrees and less than-85 degrees, and the fixed viewing angle release region may be a region greater than 80 degrees and less than-80 degrees.

Step 104: and generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to the panoramic image of the next video frame in the spherical polar coordinate system.

Specifically, on the basis of determining a reference polar coordinate of a target frame of a frame selection target object in a current video frame when it is detected that the center of the target object enters a fixed view angle area, a planar image of a next video frame is generated according to a panoramic image of the next video frame in the spherical polar coordinate system by taking the reference polar coordinate as the center. The tracking plane image refers to a plane image used for target tracking of a target object.

It should be noted that the reference polar coordinates of the target frame of the frame selection target object in the current video frame may be used as the polar coordinates of the search target object in the next video frame. That is, the tracking plane image of the next video frame is generated according to the reference polar coordinates of the target frame of the frame-selected target object in the current video frame, and after the tracking plane image of the next video frame is generated, the target object may be searched in the tracking plane image, so as to determine whether the center of the target object is away from the fixed view release area.

In an optional implementation manner of this embodiment, the generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system may specifically be implemented as follows:

mapping the next video frame to the spherical polar coordinate system to obtain a panoramic image of the next video frame in the spherical polar coordinate system;

taking the reference polar coordinate as a center and a preset angle as a range, and intercepting the panoramic image;

and converting the intercepted panoramic image into a tracking plane image of the next video frame.

Specifically, the predetermined angle range refers to a predetermined range of elevation and azimuth angles, such as an elevation angle of 0-30 degrees and an azimuth angle of 10-45 degrees. It should be noted that, when generating the tracking plane image corresponding to each video frame of the panoramic video, the preset angle ranges are the same, that is, a range of an elevation angle and an azimuth angle is preset, the tracking plane image of the first frame of the panoramic video frame is generated according to the range of the elevation angle and the azimuth angle, and the tracking plane image of each subsequent frame of the panoramic video frame is generated according to the range of the elevation angle and the azimuth angle.

In the application, the panoramic image of the next video frame can be projected into a spherical polar coordinate system, then a certain panoramic image is intercepted by taking the determined reference polar coordinate as the center, and the panoramic image is mapped into a two-dimensional plane image, so that the tracking plane image of the next video frame can be obtained.

In an optional implementation manner of this embodiment, after generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system, the method further includes:

according to the object characteristics in the target frame, determining the updated polar coordinates of the central position of the target frame in the tracking plane image of the next video frame in the spherical polar coordinate system;

and playing the plane image of the next video frame by using the updated polar coordinates.

It should be noted that, image recognition is performed on the playing plane image of the current video frame, so as to obtain an object feature (i.e., an object feature in a target frame) of a target object located in the center of the current playing plane image, based on the object feature, target tracking can be performed in the generated tracking plane image of the next video frame to find a corresponding target object, then based on the newly determined target object, the reference polar coordinates are updated, and subsequently, the plane image of the next video frame is played with the updated polar coordinates. The playing plane image refers to a plane image actually played by the client, that is, a plane image that can be seen by the user.

That is, the fixed view angle region is only the polar coordinates of the fixed search target object, i.e. the polar coordinates (reference polar coordinates) of the search target object only need to ensure that the target object is within the fixed view angle region, and not necessarily in the center of the playing plane image; when the panoramic video is actually played, the target object still needs to be fixed in the center of the playing plane image for playing, that is, the updated polar coordinates of the center position of the target frame in the tracking plane image of the next video frame need to be determined according to the object characteristics in the target frame, and the panoramic video is played according to the updated polar coordinates.

In an optional implementation manner of this embodiment, according to the object feature of the target object, an updated polar coordinate of the center position of the target object in the tracking plane image of the next video frame in the spherical polar coordinate system is determined, and a specific implementation process may be as follows:

carrying out image recognition on a recognition area in a tracking plane image of the next video frame, and determining the central position of a target object in the next video frame;

and determining the updated polar coordinates of the central position of the target object in the next video frame in the spherical polar coordinate system.

It should be noted that, image recognition is performed on the tracking plane image of the next video frame to obtain corresponding image features, then the image features having the object features of the target object are determined as the target object, and then the updated polar coordinates of the center position of the updated target object can be determined.

In addition, when the tracking plane image of the next video frame is subjected to image recognition and the corresponding image feature is recognized, the whole tracking plane image can be subjected to image recognition to recognize the corresponding image feature. In addition, only the area near the target frame of the frame selection target object can be subjected to image recognition, namely, the recognition area in the tracking plane image of the next video frame can be determined firstly, and then the image recognition is carried out only in the recognition area.

For example, the area framed by the target frame corresponding to the framing operation may be determined first, then the area of the preset multiple is determined as the recognition area, and image recognition is performed only in the recognition area.

In an optional implementation manner of this embodiment, the image recognition is performed on the recognition area in the tracking plane image of the next video frame, and the center position of the target object in the next video frame is determined, which may be specifically implemented as follows:

carrying out image recognition on the recognition area in the tracking plane image of the next video frame to obtain image characteristics;

analyzing and processing the image features and the object features to obtain a confidence coefficient that the target object exists in the identification region and a position offset of the image features relative to the center position of the identification region; (ii) a

And under the condition that the confidence degree is greater than a confidence degree threshold value, determining the central position of the target object in the next video frame according to the central position of the target object in the playing plane image and the position offset.

In particular, the confidence level is also referred to as reliability, or confidence level, confidence coefficient, and the confidence level that the target object exists in the recognition area may indicate whether the target object exists in the recognition area. It should be noted that after image recognition is performed on the recognition area in the tracking plane image of the next video frame to obtain the image features, it is necessary to determine whether the image features obtained by the recognition are the target object initially framed, so that the image features and the object features can be analyzed to determine the confidence level of the target object existing in the recognition area, that is, the reliability of the target object existing in the recognition area, and in specific implementation, the image features and the object features can be analyzed by different algorithms to obtain the confidence level of the target object existing in the recognition area.

In one possible implementation, the similarity between the image feature and the object feature may be determined through feature comparison, so as to obtain a confidence that the target object exists in the recognition region. In a specific implementation, the image features and the object features may be compared, a similarity between the image features and the object features may be determined, and the similarity may be determined as a confidence level that a target object exists in the recognition region.

In addition, the confidence that the target object exists in the recognition area can also be obtained by performing convolution on the image features and the object features, and certainly, in practical applications, other tracking algorithms can also be adopted to input the image features and the object features into the tracking algorithm to obtain the confidence that the target object exists in the recognition area, which is not limited in the present application.

In addition to the confidence that the target object exists in the recognition area, the analysis processing of the image feature and the object feature may obtain a position offset amount of the image feature with respect to the center position of the recognition area. Since the identification area is determined according to the target object in the playing plane image of the current video frame, the central position of the identification area may actually represent the central position of the target object in the current video frame. In addition, the image feature is a feature obtained by identifying the identification region in the next video frame, and the object feature is an object feature of the target object in the current video frame (i.e. a feature when the target object is located at the image center position), and by analyzing and comparing the image feature and the object feature, a change of the image feature in the next video frame relative to the feature when the target object is located at the image center position can be obtained, and the change can represent a position offset of the image feature relative to the center position of the identification region.

In addition, since the image feature is a feature obtained by performing image recognition on the recognition area in the tracking plane image of the next video frame, the image feature is a feature corresponding to the candidate target object, a position offset of the image feature with respect to a center position of the recognition area is a position offset of the candidate target object with respect to the center position of the recognition area, and when it is determined that the candidate target object is the target object in the current video frame subsequently, the position offset may indicate how much the target object moves with respect to the current video frame in the next video frame.

When the confidence is greater than the confidence threshold, it is indicated that the identified image feature has a high probability of being the target object initially framed, so that the updated center position of the target object (i.e., the center position of the target object in the next video frame) can be obtained according to the initial center position (the center position of the target object in the playing plane image) and the position offset (i.e., the moving distance of the target object).

It should be noted that, target tracking is performed in a new video frame (i.e., a next video frame), a position of the target object in the next video frame is determined, and then updated polar coordinates of a center position of the target object are further determined, so that it is convenient to continue to generate a tracking plane image of a subsequent video frame and continue to track the target object.

Step 106: and taking the next video frame as the current video frame, and returning to the operation step of the step 104 until the center of the target object leaves the fixed view release area.

Specifically, on the basis of generating a tracking plane image of a next video frame according to a panoramic image of the next video frame in a spherical polar coordinate system with the reference polar coordinate as a center, the next video frame is further taken as a current video frame, and the operation step of generating the tracking plane image of the next video frame with the reference polar coordinate as a center is returned to be executed until the center of the target frame leaves a fixed view angle release area.

In practical applications, if the target object leaves the fixed view angle release area, it indicates that the target may have left the center position of the tracking plane image obtained by the fixed polar coordinates, and at this time, the fixed reference polar coordinates cannot be adopted as the center position of the tracking plane image, and other tracking methods need to be adopted. In specific implementation, image recognition can be performed on a tracking plane image of a next video frame, object features of a target object are obtained, and whether the target object leaves a fixed view release area or not is determined through the object features.

It should be noted that, after the target object enters the fixed view angle region, the view angle may be fixed by using the reference polar coordinate of the target frame of the framed target object in the current video, and then the corresponding tracking plane image is generated with the reference polar coordinate as the center until the target object leaves the fixed view angle release region.

For example, as shown in fig. 2, the center of the target object moves from point a to point b, the viewing angle is fixed at point a, and the fixation of the viewing angle is not released until point b is moved, and a viewing angle range corresponding to the polar coordinate of point a as the center and a viewing angle range corresponding to the polar coordinate of point b as the center are shown in fig. 2.

In addition, after the target object leaves the fixed view release area, target tracking can be performed in the subsequent video frame through the vertex position of the target frame, and target tracking can also be performed in the subsequent video frame by utilizing an interpolation mode. By means of the method of tracking the target in the subsequent video frame by the vertex position of the target frame, since the rectangle selected in one plane image is not necessarily the same rectangle in the other plane image, no matter how coordinate mapping is performed, the content in the new rectangular frame and the content in the frame of the previous plane image are in and out, so that by means of the method, the rectangular frame during feature extraction is not necessarily just attached to the target object, and the tracking effect of the target object may be reduced; and the target tracking is carried out in the subsequent video frames by utilizing an interpolation mode, frame processing is not carried out according to corresponding coordinate conversion or mapping, but a tracking algorithm is utilized to track a frame which is suitable for the target in a new plane image while the visual angle is changed according to the semantic information of the image, so that the position of a rectangular frame can be more fit with the target finally, but the calculation amount consumed by the interpolation is relatively large. Therefore, when the reduction of the calculation amount is mainly considered, the target can be tracked in the subsequent video frame through the vertex position of the target frame, and when the tracking effect of the target object is mainly considered, the target can be tracked in the subsequent video frame by utilizing an interpolation mode.

In an optional implementation manner of this embodiment, the method further includes:

under the condition that the center of the target object is detected to leave the fixed visual angle release area, determining an updated target frame in the planar image corresponding to the fixed visual angle release area, from which the target object in the current video frame leaves, according to the vertex position of the target frame in the planar image corresponding to the fixed visual angle of the current video frame;

performing image recognition on a planar image corresponding to the target object in the current video frame leaving the fixed view relief area, and determining the object characteristics in the updated target frame;

and determining the updated central polar coordinates of the target frame in the tracking plane image of the next video frame according to the object characteristics in the updated target frame.

It should be noted that, when entering the fixed view angle region, the vertex position of the target frame is fixed, and when it is detected that the center of the target object leaves the fixed view angle release region, the current video frame may first generate a planar image corresponding to the fixed view angle, where a target frame in the planar image is the same as the target frame in the fixed view angle region, and may also generate a planar image corresponding to the fixed view angle release region, and then determine an updated target frame based on the vertex position of the target frame in the planar image corresponding to the fixed view angle, where the vertex position includes the polar coordinates of 4 vertices. The polar coordinates of the 4 vertices may then be mapped to the planar image corresponding to the dismissed area away from the fixed perspective, thereby continuing subsequent target tracking.

In an optional implementation manner of this embodiment, an updated target frame in the planar image corresponding to the fixed view release area from which the target object in the current video frame leaves is determined according to a vertex position of the target frame in the planar image corresponding to the fixed view of the current video frame, and a specific implementation process may be as follows:

determining the vertex coordinates of the target frame in the plane image corresponding to the fixed visual angle of the current video frame;

according to the vertex coordinates of the target frame, determining updated vertex coordinates in a plane image corresponding to the fixed visual angle release area away from the target object in the current video frame;

determining the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate in the updated vertex coordinates;

and determining an updated target frame in the planar image corresponding to the target object in the current video frame leaving the fixed visual angle release area according to the maximum value and the small value of the abscissa and the maximum value and the minimum value of the ordinate.

In practical application, after determining the vertex coordinates of the target frame in the planar image corresponding to the fixed view angle of the current video frame, the vertex coordinates of the target frame may be mapped into a spherical polar coordinate system, and the updated vertex coordinates in the planar image corresponding to the fixed view angle release region from which the target object in the current video frame leaves are determined according to the vertex mapped into the spherical polar coordinate system and the picture when the target object in the current video frame leaves the fixed view angle release region in the spherical polar coordinate system.

It should be noted that, after mapping the vertex coordinates of the target frame in the planar image corresponding to the fixed view angle of the current video frame into the planar image corresponding to the fixed view angle release area from which the target object in the current video frame leaves, updated vertex coordinates may be obtained, and at this time, vertices corresponding to the four vertex coordinates do not keep a rectangular shape any longer, so that the maximum and minimum values of horizontal and vertical coordinates in the four updated vertex coordinates may be calculated, the minimum value and the maximum value are connected by a straight line, a new target frame, that is, an updated target frame may be formed, and subsequently, the object features in the updated target frame may be re-extracted, and the subsequent tracking calculation may be continued.

under the condition that the center of the target object is detected to leave the fixed view angle releasing area, determining the initial polar coordinate of the center of the target frame when the target object enters the fixed view angle area, and determining the central polar coordinate of the central position of the playing plane image of the current video frame;

interpolating according to the initial polar coordinates and the central polar coordinates to obtain intermediate polar coordinates with preset values;

determining the object characteristics corresponding to the central position of the target frame according to the intermediate polar coordinates and the central polar coordinates;

and determining the updated central polar coordinates of the target frame in the tracking plane image of the next video frame according to the object characteristics.

When the target object leaves the fixed viewing angle release area, the target object is far from the center position of the playing plane image, and the target object is moved to a position corresponding to the position leaving the fixed viewing angle release area directly, which may cause distortion of the target object.

In an optional implementation manner of this embodiment, the object feature corresponding to the target object located at the central position is determined according to the intermediate polar coordinate and the central polar coordinate, and a specific implementation process may be as follows:

sequentially arranging the intermediate polar coordinates and the central polar coordinates to obtain a polar coordinate set;

carrying out image recognition on the playing plane image of the current video frame, and determining the object characteristics of the target object;

generating a tracking plane image of the current video frame by taking the ith polar coordinate in the polar coordinate set as a center according to the panoramic image of the current video frame in the spherical polar coordinate system, wherein i is equal to 1;

and enabling the i to be increased by 1, taking the tracking plane image as the playing plane image, and returning to the operation step of executing the image recognition on the playing plane image of the current video frame to obtain the object characteristics of the target object until the i is equal to the number of polar coordinates in the polar coordinate set, so as to obtain the object characteristics corresponding to the central position of the target object.

The intermediate polar coordinates and the central polar coordinates are sequentially arranged to obtain a polar coordinate set, that is, the intermediate polar coordinates obtained by interpolation are arranged from small to large according to the distance from the initial polar coordinates, and after the intermediate polar coordinates are arranged, the central polar coordinates are arranged to obtain the polar coordinate set; that is, the first polar coordinate in the polar coordinate set is the polar coordinate having the smallest distance from the initial polar coordinate, and the last polar coordinate in the polar coordinate set is the polar coordinate having the largest distance from the initial polar coordinate (i.e., the center polar coordinate).

In practical implementation, the center of the target frame needs to be moved from the current initial polar coordinate to the central polar coordinate of the central position of the playing plane image, that is, the initial polar coordinate is the starting point of interpolation, and the central polar coordinate is the ending point of interpolation. When interpolation is carried out, interpolation is carried out according to the initial polar coordinate and the central polar coordinate, namely, the numerical value between the initial polar coordinate and the central polar coordinate is averagely divided into preset numerical values, and a plurality of interpolated intermediate polar coordinates are obtained. And then sequentially taking each intermediate polar coordinate obtained by interpolation as a center to generate corresponding plane images until the last intermediate polar coordinate is obtained, generating corresponding tracking plane images, namely the plane images of the target object positioned in the center of the playing plane image of the current video frame, and carrying out image recognition on the tracking plane images taking the central polar coordinate as the center to obtain the object characteristics of the target object positioned in the center of the playing plane image of the current video frame. In addition, in determining the parameters of interpolation, that is, in determining how many pieces the numerical value between the initial polar coordinates and the center polar coordinates is equally divided, the determination may be made based on the larger coordinate of the initial polar coordinates.

For example, assuming that the central polar coordinates are (0, 0), the initial polar coordinates are (50, 80), that is, the initial value of the interpolation is (50, 80), the end value is (0, 0), and the preset value is 20, it is necessary to divide 80-0 into 20 on average and 50-0 into 20 on average, thereby obtaining the intermediate polar coordinates ((47.5, 76), (45, 72), … …, (5, 8), (2.5, 4)). Thus, the polar coordinate set obtained at this time is ((47.5, 76), (45, 72), … …, (5, 8), (2.5, 4), (0, 0)).

Performing image recognition on a playing plane image of the current video frame (namely a plane image with the polar coordinates of (50, 80)) to obtain the object characteristics of the target object; then, generating a tracking plane image of the current video frame by taking (47.5, 76) as a center according to a panoramic image of the current video frame in the spherical polar coordinate system, and performing image recognition by taking the tracking plane image as a playing plane image to obtain object characteristics of a target object; and then continuously taking (45, 72) as a center, generating a tracking plane image of the current video frame, and performing image recognition by taking the tracking plane image as a playing plane image to obtain the object characteristics of the target object. And repeating the steps until the tracking plane image of the current video frame is generated by taking (0, 0) as the center, and carrying out image recognition on the tracking plane image to obtain the object characteristics of the target object positioned at the center position of the playing plane image of the current video frame.

In an optional implementation manner of this embodiment, the image recognition is performed on the planar image corresponding to the fixed view dismissal area of the current video frame, and the object feature in the updated target frame is determined, which may be implemented as follows:

determining a target frame corresponding to the frame selection operation;

determining a corresponding recognition area according to the target frame;

and performing image recognition in the recognition area in the plane image corresponding to the fixed visual angle release area where the current video frame leaves, and determining the object characteristics of the target object.

When the target object is selected, a target frame may be used for frame selection, and then a partial image content greater than or equal to the target frame may be selected as an identification area according to the target frame, and then image identification may be performed only in the identification area.

In actual application, the area framed by the target frame may be determined, and a preset multiple of the area may be determined as the identification area. Of course, the length and width of the target frame may also be determined, and a region formed by preset multiples of the length and width may be determined as the recognition region. Specifically, the preset multiple may be preset, and the preset multiple is used to determine the area where the image recognition is finally performed, for example, the preset multiple may be 1.5 times, 2 times, and the like.

It should be noted that, in order to track a target object in a subsequent video frame, image recognition needs to be performed on a target object framed and selected in a planar image corresponding to a position where a current video frame leaves the fixed view angle release area, so as to obtain an object feature of the target object located in the center of the planar image (i.e., an object feature in an updated target frame). In specific implementation, the tracking algorithm may be a tracking algorithm based on correlation filtering, such as KCF (Kernel correlation filter), DSST (discrete scalable cephacercracker, a filtering algorithm combining position and scale), and may also be a tracking algorithm based on deep learning, such as SiamRPN, SiamFC, and the like, and the specific tracking algorithm is not limited in this application.

In addition, when the planar image corresponding to the fixed view angle release area of the current video frame is subjected to image recognition to extract the object features of the target object in the updated target frame, the image recognition can be performed on the whole planar image to extract the features. In addition, only the object characteristics of the target object need to be acquired finally, so that only the area near the target frame can be subjected to image recognition, namely, the area framed by the framing operation can be determined first, then the area of the preset multiple is determined as the recognition area, and the image recognition is only carried out in the recognition area, so that the image recognition of the whole plane image is not needed, the image recognition speed is increased, and the processing efficiency of the whole panoramic video is improved.

According to the panoramic video data processing method, under the condition that the center of a target object is detected to enter a fixed visual angle area, the reference polar coordinates of a target frame of the framing target object in a current video frame are determined; generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system; and taking the next video frame as the current video frame, returning to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system until the center of the target object leaves a fixed visual angle release area. When the center of the target object leaves the fixed view angle release area, the target can be tracked in the subsequent video frame through the vertex position of the target frame, and the target can also be tracked in the subsequent video frame by utilizing an interpolation mode.

In this case, a fixed view angle area and a fixed view angle release area may be preset as buffer areas, and after the center of the target object enters the fixed view angle area, a fixed reference polar coordinate is used as the center to generate a tracking plane image corresponding to a subsequent video frame, that is, as long as the target object does not leave the fixed view angle release area, a corresponding tracking plane image is always generated with the same polar coordinate as the center, thereby effectively eliminating large-amplitude deflection of the tracking plane image when the target object moves at a large angle, and ensuring the tracking effect of the target object. In addition, when the center of the target object leaves the fixed view angle release area, the target can be tracked in the subsequent video frame through the vertex position of the target frame, and the target can also be tracked in the subsequent video frame by utilizing an interpolation mode, so that the tracking effect of the target object in the subsequent panoramic video is ensured.

Fig. 4 is a flowchart illustrating another panoramic video data processing method according to an embodiment of the present application, which specifically includes the following steps:

step 402: and under the condition that the center of the target object is detected to enter a fixed visual angle area, determining the reference polar coordinates of a target frame for framing the target object in the current video frame.

Step 404: and generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to the panoramic image of the next video frame in the spherical polar coordinate system.

Step 406: and carrying out image recognition on the tracking plane image of the next video frame, and determining the object characteristics of the target object.

Step 408: and determining whether the center of the target object is away from the fixed view angle release area according to the object characteristics of the target object, if so, executing

steps

410 and 414 or 416 and 422, and if not, returning to execute the step 404.

Step 410: and determining an updated target frame in the planar image corresponding to the area where the current video frame leaves the fixed visual angle release area according to the vertex position of the target frame in the planar image corresponding to the fixed visual angle of the current video frame.

Step 412: and performing image recognition on the plane image corresponding to the fixed visual angle release area of the current video frame, and determining the object characteristics in the updated target frame.

Step 414: and determining the updated central polar coordinates of the target frame in the tracking plane image of the next video frame according to the object characteristics in the updated target frame.

Step 416: and determining the initial polar coordinates of the center of the target frame when the target frame enters the fixed visual angle area, and determining the central polar coordinates of the central position of the playing plane image of the current video frame.

Step 418: and carrying out interpolation according to the initial polar coordinates and the central polar coordinates to obtain intermediate polar coordinates with preset values.

Step 420: and determining the object characteristics corresponding to the central position of the target frame according to the intermediate polar coordinates and the central polar coordinates.

Step 422: and determining the updated central polar coordinates of the target frame in the tracking plane image of the next video frame according to the object characteristics.

The panoramic video data processing method can preset a fixed visual angle area and a fixed visual angle relief area as buffer areas, and after the center of a target object enters the fixed visual angle area, a fixed reference polar coordinate is adopted as the center to generate a planar image corresponding to a subsequent video frame. In addition, when the center of the target object leaves the fixed view angle release area, the target can be tracked in the subsequent video frame through the vertex position of the target frame, and the target can also be tracked in the subsequent video frame by utilizing an interpolation mode, so that the tracking effect of the target object in the subsequent panoramic video is ensured.

The above is a schematic scheme of a panoramic video data processing method according to this embodiment. It should be noted that the technical solution of the panoramic video data processing method shown in fig. 4 is the same as the technical solution of the panoramic video data processing method shown in fig. 1, and details of the technical solution of the panoramic video data processing method shown in fig. 4, which are not described in detail, can be referred to the description of the technical solution of the panoramic video data processing method shown in fig. 1.

Corresponding to the above method embodiment, the present application further provides an embodiment of a panoramic video data processing apparatus, and fig. 5 shows a schematic structural diagram of a panoramic video data processing apparatus provided in an embodiment of the present application.

As shown in fig. 5, the apparatus includes:

a first determining module 502 configured to determine a reference polar coordinate of a target frame for framing a target object in a current video frame in case that it is detected that a center of the target object enters a fixed viewing angle region;

a generating module 504 configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

an executing module 506, configured to take the next video frame as a current video frame, return to the operation step of executing the panoramic image in the spherical polar coordinate system according to the next video frame, and generate a tracking plane image of the next video frame with the reference polar coordinate as a center until the center of the target object leaves a fixed view release area.

Optionally, the apparatus further comprises a setting module configured to:

Optionally, the apparatus further comprises a second determining module configured to:

Optionally, the second determination module is further configured to:

determining the vertex coordinates of the target frame in a plane image corresponding to the fixed visual angle of the current video frame;

Optionally, the apparatus further comprises a third determining module configured to:

Optionally, the third determining module is further configured to:

Optionally, the apparatus further comprises a playing module configured to:

determining an updated polar coordinate of the central position of the target frame in the tracking plane image of the next video frame in the spherical polar coordinate system according to the object characteristics in the target frame;

Optionally, the second determination module is further configured to:

determining a target frame corresponding to the frame selection operation;

determining a corresponding recognition area according to the target frame;

Optionally, the generating module 504 is further configured to:

Optionally, the playing module is further configured to:

Optionally, the playing module is further configured to

Carrying out image recognition on a recognition area in a tracking plane image of the next video frame to obtain image characteristics;

analyzing and processing the image features and the object features to obtain a confidence coefficient that the target object exists in the identification region and a position offset of the image features relative to the center position of the identification region;

The application provides a panorama video data processing apparatus, can set up fixed visual angle region and fixed visual angle removing area in advance and remove the district as the buffer, target object's center gets into behind the fixed visual angle region, what adopt is fixed reference polar coordinate is the center, generate the tracking plane image that follow-up video frame corresponds, that is to say, as long as target object does not leave fixed visual angle removing area, just all the time use same polar coordinate to generate corresponding tracking plane image as the center, thereby when effectively eliminating target object and using the wide-angle to remove, the deflection by a wide margin of tracking plane image, thereby guarantee target object's tracking effect. In addition, when the center of the target object leaves the fixed view angle release area, the target can be tracked in the subsequent video frame through the vertex position of the target frame, and the target can also be tracked in the subsequent video frame by utilizing an interpolation mode, so that the tracking effect of the target object in the subsequent panoramic video is ensured.

The above is a schematic scheme of a panoramic video data processing apparatus of the present embodiment. It should be noted that the technical solution of the panoramic video data processing apparatus and the technical solution of the panoramic video data processing method belong to the same concept, and details that are not described in detail in the technical solution of the panoramic video data processing apparatus can be referred to the description of the technical solution of the panoramic video data processing method.

Fig. 6 illustrates a block diagram of a computing device 600 provided according to an embodiment of the present application. The components of the computing device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 6 is for purposes of example only and is not limiting as to the scope of the present application. Other components may be added or replaced as desired by those skilled in the art.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

Wherein the processor 620 is configured to execute the following computer-executable instructions to implement the following method:

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the panoramic video data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the panoramic video data processing method.

An embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions, which are executed by a processor to implement the operational steps of the panoramic video data processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned panoramic video data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned panoramic video data processing method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A panoramic video data processing method, comprising:

under the condition that the center of a target object is detected to enter a fixed visual angle area, determining a reference polar coordinate of a target frame of a frame selection target object in a current video frame, wherein the fixed visual angle area is a preset area range;

generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in a spherical polar coordinate system;

taking the next video frame as a current video frame, returning to execute the operation step of generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to the panoramic image of the next video frame in the spherical polar coordinate system until the center of the target object leaves a fixed visual angle release area, wherein the fixed visual angle area is included in the fixed visual angle release area;

wherein, the generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to the panoramic image of the next video frame in the spherical polar coordinate system comprises:

2. The method of claim 1, wherein before determining the reference polar coordinates of the target bounding box for bounding the target object in the current video frame in case that the center of the target object is detected to enter the fixed view region, the method further comprises:

the fixed view angle region and the fixed view angle release region are preset.

3. The panoramic video data processing method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, wherein the determining, according to a vertex position of the target frame in the planar image corresponding to the fixed view angle of the current video frame, an updated target frame in the planar image corresponding to the fixed view angle release area from which the target object in the current video frame leaves the fixed view angle release area comprises:

5. The panoramic video data processing method according to claim 1 or 2, characterized in that the method further comprises:

6. The method of claim 5, wherein the determining the object feature corresponding to the target object being located at the central position according to the intermediate polar coordinates and the central polar coordinates comprises:

and increasing the i by 1, taking the tracking plane image as the playing plane image, and returning to execute the operation step of performing image recognition on the playing plane image of the current video frame to obtain the object characteristics of the target object until the i is equal to the number of polar coordinates in the polar coordinate set, so as to obtain the object characteristics corresponding to the central position of the target object.

7. The method of processing panoramic video data according to claim 1, wherein after generating a tracking plane image of the next video frame from the panoramic image of the next video frame in the spherical polar coordinate system centering on the reference polar coordinate, the method further comprises:

8. The method of claim 3, wherein the performing image recognition on the planar image corresponding to the fixed view relief area away from the current video frame and determining the object feature in the updated target frame comprises:

determining a corresponding recognition area according to the target frame;

9. The method according to claim 7, wherein said determining updated polar coordinates of the center position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object comprises:

10. The method of claim 9, wherein the image recognition of the identified region in the tracking plane image of the next video frame and the determination of the center position of the target object in the next video frame comprises:

11. A panoramic video data processing apparatus, comprising:

the first determination module is configured to determine a reference polar coordinate of a target frame of a framing target object in a current video frame under the condition that the center of the target object is detected to enter a fixed view angle area, wherein the fixed view angle area is a preset area range;

a generating module configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in a spherical polar coordinate system;

an execution module configured to return to execute the operation step of generating the tracking plane image of the next video frame by taking the reference polar coordinate as a center until the center of the target object leaves a fixed view angle release area, wherein the fixed view angle release area is included in the fixed view angle release area, and the panoramic image in the spherical polar coordinate system according to the next video frame is taken as the current video frame;

wherein the generation module is further configured to:

12. A computing device, comprising:

a memory and a processor;

taking the next video frame as a current video frame, returning to execute the operation step of generating a tracking plane image of the next video frame according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the reference polar coordinate as a center until the center of the target object leaves a fixed visual angle release area, wherein the fixed visual angle area is included in the fixed visual angle release area;

13. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the panoramic video data processing method of any one of claims 1 to 10.