CN113822126B

CN113822126B - Icon recognition method, device, and computer-readable storage medium

Info

Publication number: CN113822126B
Application number: CN202110711205.0A
Authority: CN
Inventors: 郑斌; 殷泽龙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2025-08-05
Anticipated expiration: 2041-06-25
Also published as: CN113822126A

Abstract

The embodiments of the present application relate to the field of artificial intelligence technology and disclose an icon recognition method, device, and computer-readable storage medium. The embodiments of the present application obtain an image to be recognized from a video to be detected; a preset area of the image to be recognized is cropped to obtain an image sub-block corresponding to the preset area; the image sub-block is input into a trained detection model to obtain a confidence level that the image sub-block contains a target icon; when it is detected that the confidence level is within a preset confidence range, the image sub-block within the preset confidence range is subjected to optical character recognition processing to obtain an optical character recognition result; and the target recognition result of the video to be detected is determined based on the result of the optical character recognition processing. In this way, the image to be recognized is cropped to obtain an image sub-block, the confidence level of the image sub-block is determined by the detection model, and the image sub-block with a confidence level within the preset confidence range is further subjected to optical character recognition processing, thereby improving the accuracy and efficiency of icon recognition.

Description

Icon recognition method, icon recognition device and computer readable storage medium

Technical Field

The application relates to the field of artificial intelligence, in particular to an icon identification method, an icon identification device and a computer readable storage medium.

Background

With the rise of short video creation and sharing platforms, the development of short videos brings rich experience to people. However, some users plagiarize the work of originators for the hotness by carrying a large amount of popular videos, and infringe the rights and interests of originators. In order to ensure that each short video released to the platform is not original, each short video to be released needs to be checked, and it is required to be noted that each short video after release carries a watermark mark, such as an icon, of the corresponding platform, and whether each short video to be released is original is determined by checking whether each short video carries the icon or not. When the short video is checked, the related technology scans each video frame corresponding to the short video to determine whether the short video carries an icon according to a scanning result.

In the research and practice process of the prior art, the inventor of the application finds that when the existing icon in the short video is identified, each video frame needs to be scanned, and then whether the icon is carried or not is determined according to the scanning result.

Disclosure of Invention

The embodiment of the application provides an icon identification method, an icon identification device and a computer readable storage medium. The efficiency and accuracy of icon identification can be improved.

The embodiment of the application provides an icon identification method, which comprises the following steps:

Acquiring an image to be identified from a video to be detected;

cutting a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area;

inputting the image sub-blocks into a trained detection model to obtain the confidence that the image sub-blocks contain target icons;

When the confidence coefficient is detected to be in a preset confidence coefficient range, performing optical character recognition processing on the image sub-blocks in the preset confidence coefficient range to obtain an optical character recognition result;

and determining a target recognition result of the video to be detected according to the result of the optical character recognition processing.

Correspondingly, an embodiment of the present application provides an icon identifying apparatus, including:

The acquisition unit is used for acquiring an image to be identified from the video to be detected;

The preprocessing unit is used for shearing a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area;

The input unit is used for inputting the image sub-blocks into the trained detection model to obtain the confidence that the image sub-blocks contain target icons;

The recognition unit is used for carrying out optical character recognition processing on the image sub-blocks in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range, so as to obtain an optical character recognition result;

and the determining unit is used for determining a target recognition result of the video to be detected according to the result of the optical character recognition processing.

In some embodiments, the determining unit is further configured to:

identifying the optical character identification result according to the part-of-speech information to obtain a part-of-speech result;

when the part-of-speech result is detected to contain target word information in a preset word information list, determining the target word information as a target icon, and acquiring the identification of the target icon;

Acquiring mark position information corresponding to the target icon in the image sub-block;

And determining the identification of the target icon and the mark position information as a target identification result.

In some embodiments, the determining unit is further configured to:

When the part-of-speech result is detected not to contain target word information in a preset word information list, performing optical character recognition processing on the image to be recognized to obtain an optical character recognition result corresponding to the image to be recognized;

Identifying an optical character identification result corresponding to the image to be identified according to the part-of-speech information to obtain a word information result corresponding to the image to be identified;

Matching the word information result with target word information in the preset word information list;

When the number of the matched target word information is greater than or equal to a preset word information number threshold, determining the matched target word information as a target icon in the image to be identified;

Acquiring the identification of the target icon and the mark position information of the target icon in the image to be identified;

In some embodiments, the input unit is further configured to:

inputting the image sub-blocks into a trained detection model;

Extracting features of the image sub-blocks through the detection model to obtain image sub-features;

carrying out convolution processing on the image sub-features through the detection model to obtain a prediction feature map;

and decoding the dimension of the prediction feature map through a classification layer in the detection model to obtain the confidence that the image sub-block contains the target icon.

In some embodiments, the determining unit is further configured to:

When the confidence coefficient is detected to be larger than the preset confidence coefficient range, determining that the image sub-block corresponding to the confidence coefficient contains a target icon;

Acquiring the mark position information of the image sub-block corresponding to the confidence coefficient, and acquiring the mark of the target icon;

And determining the identification of the target icon and the mark position information as a target identification result of the video to be detected.

In some embodiments, the determining unit is further configured to:

when the confidence coefficient is detected to be smaller than the preset confidence range, determining that the image to be identified does not carry the target icon;

and determining the icon which does not carry the target as a target recognition result of the video to be detected.

In some embodiments, the icon recognition device further includes a training unit for:

acquiring a sample image, wherein a preset area in the sample image carries a sample icon;

Cutting the sample image to obtain a sample image sub-block carrying the sample icon, and obtaining a sample confidence coefficient of the sample icon contained in the sample image sub-block;

Inputting the sample image sub-block into a preset model to obtain a prediction confidence that the sample image sub-block contains the sample icon;

Acquiring a confidence coefficient difference between the sample confidence coefficient and the prediction confidence coefficient;

And carrying out iterative training on the network parameters of the preset model according to the confidence coefficient difference until the confidence coefficient difference converges, so as to obtain a trained detection model.

In some embodiments, the acquisition unit is further configured to:

Receiving a video to be detected;

Extracting target image frames in the video to be detected;

and determining the target image frame image as an image to be identified.

In addition, the embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein the memory stores an application program, and the processor is used for running the application program in the memory to realize the icon identification method provided by the embodiment of the application.

In addition, the embodiment of the application also provides a computer readable storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any icon identification method provided by the embodiment of the application.

Furthermore, embodiments of the present application provide a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in any of the icon recognition methods provided by the embodiments of the present application.

The method and the device for detecting the target identification of the video can acquire the image to be identified from the video to be detected, cut the preset area of the image to be identified to obtain the image sub-block corresponding to the preset area, input the image sub-block into the trained detection model to obtain the confidence coefficient of the target icon contained in the image sub-block, perform optical character identification processing on the image sub-block in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range to obtain an optical character identification result, and determine the target identification result of the video to be detected according to the optical character identification processing result. When the video to be detected is detected, the image to be identified corresponding to the video to be detected is obtained, the image sub-blocks are obtained through cutting, the target icon identification is carried out on the image sub-blocks through the trained detection model, the whole image identification is avoided on the image to be identified, the identification amount of the image area to be identified is reduced, the icon identification efficiency is improved, after the confidence coefficient of the target icon is contained in the image sub-blocks, whether the optical character identification processing is carried out on the image to be identified is determined according to the confidence coefficient, and the accuracy in the identification of the icons in the video is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scenario of an icon recognition system according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of an icon recognition method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another step of the icon recognition method according to the embodiment of the present application;

FIG. 4 is a flowchart of an icon recognition method according to an embodiment of the present application;

FIG. 5 is a schematic view of a sample image with sample icons according to an embodiment of the present application;

Fig. 6 is a schematic view of a scenario of an icon recognition method according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an icon recognition device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides an icon identification method, an icon identification device and a computer readable storage medium. Specifically, the embodiment of the present application will be described from the perspective of an icon recognition apparatus, which may be specifically integrated in a computer device, and the computer device may be a server, a terminal, or other devices. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as icon recognition, and the like, and is specifically described by the following embodiments:

for example, referring to fig. 1, a schematic view of a scenario of an icon recognition system according to an embodiment of the present application is provided. The scene comprises a terminal 10 and a server 20, wherein the terminal 10 and the server 20 are connected through wireless communication to realize data interaction.

The user selects the video to be uploaded through the terminal 10, and uploads the video (video to be detected) to the server 20 corresponding to the corresponding platform through the terminal 10, so that the server 20 performs auditing on the video to be detected, such as target icon identification in the video.

The server 20 may obtain an image to be identified from a video to be detected, cut a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area, input the image sub-block to a trained detection model to obtain a confidence coefficient of the image sub-block containing a target icon, perform optical character recognition processing on the image sub-block in the preset confidence coefficient range to obtain an optical character recognition result when the confidence coefficient is detected to be in the preset confidence coefficient range, and determine a target recognition result of the video to be detected according to the result of the optical character recognition processing.

The icon identification may include the processing modes of obtaining an image to be identified corresponding to the video to be detected, determining confidence, performing optical character identification processing, and the like.

The following will describe in detail. The order of the following examples is not limited to the preferred order of the examples.

Referring to fig. 2, fig. 2 is a schematic flow chart of steps of an icon identifying method according to an embodiment of the present application, and the specific flow is as follows:

101. and acquiring an image to be identified from the video to be detected.

The video to be detected can be a video to be detected and checked, which is requested by a user to be issued to the target sharing platform, and the target distribution platform is responsible for checking and detecting the video to be detected. If the video to be detected is downloaded by the requesting user through other platforms, that is, the original creation of the requesting user is not included, the video to be detected may carry watermarks, identifications, tags or marks of the corresponding platforms, so that the video to be distributed needs to be checked.

The image to be identified may be an image corresponding to a video frame of a certain frame in the video to be identified. For example, when a user requests to issue a short video to a target sharing platform, such as a video platform, and the target sharing platform detects the short video to be detected, the target video frame is determined to be an image to be identified by acquiring the target video frame in the video, so that the icon of the image to be identified can be identified later, for example, the icon can be a watermark, a mark, a tag, and the like.

In some embodiments, taking a video to be distributed as an example, an image corresponding to a video frame in the video needs to be checked or identified, and the step of acquiring the image to be identified from the video to be detected may include:

(1) Receiving a video to be detected;

(2) Extracting target image frames in a video to be detected;

(3) The target image frame image is determined as the image to be identified.

The video to be detected may be a video requested to be released by a user on a target platform, for example, the video platform a needs to audit the video to be detected.

In order to audit the video to be detected, the embodiment of the application identifies the image corresponding to the video frame in the video to be detected. Since the video to be detected includes a plurality of frames, in order to reduce the workload in detecting the video, the present embodiment selects a small number of frames from the plurality of frames as the image to be detected. Specifically, when a video to be detected is received, the method for extracting the target image frames in the video to be detected can be that all video frames in the video to be detected are extracted to obtain a video frame set corresponding to the video to be detected, the video frames in the video frame set are ordered according to the time sequence relation among the video frames in the video to be detected to obtain a video frame sequence, and the target image frames corresponding to the target time sequence in the video frame sequence are obtained. The target image frame corresponding to the target time sequence may be a target image frame corresponding to a first time sequence, a target image frame corresponding to a middle time sequence or a target image frame corresponding to a tail end time sequence in the video frame sequence, or may be target image frames corresponding to first, middle and tail three video frames respectively, which is not limited herein. Therefore, the efficiency of the subsequent icon recognition is improved.

Further, after extracting the target image frame of the video to be detected, determining the target image frame as an image to be identified, which is to be audited by the platform.

102. Cutting a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area.

The preset area may be a certain area or position in the image to be identified, such as an image area of upper left, lower left, upper right, lower right, middle, etc. in the image to be identified.

Since video is released by a platform, the watermark (icon) carried by the video is usually located in a corner region of the video picture, such as the upper left or upper right corner region of the video picture. In order to improve the detection efficiency of the watermark (icon), the watermark (icon) detection may be performed on a specific area of the image to be identified. Specifically, after the corresponding image to be identified is obtained from the video to be detected, the embodiment of the application carries out pretreatment on the preset area of the image to be identified so as to obtain the image sub-block corresponding to the preset area.

It should be noted that the preprocessing method of the image to be identified may be that a preset area of the image to be identified is cut, captured or intercepted to obtain a corresponding image sub-block. The preprocessing process comprises the steps of obtaining the image size of an image to be recognized, determining the preset area size of the image to be recognized according to a preprocessing rule based on the size of the image to be recognized, and intercepting the preset area in the image to be recognized according to the preset area size to obtain an image sub-block. The preprocessing rule may include a preset area position of the image to be processed, a size proportion of the preset area in the image to be identified, and the like. For example, in the preprocessing rule, the preset area includes an area of upper left, lower left, upper right, lower right, middle, etc. of the image to be recognized, and the size ratio of the area in the image to be recognized is 1/5, and then the upper left, lower left, upper right, lower right, middle area in the image to be recognized is intercepted according to the size ratio, so as to obtain image sub-blocks corresponding to the areas. The above is merely an example and is not intended to be limiting herein.

When the preset region of the image to be recognized is preprocessed, only one region of the image to be recognized may be preprocessed. In addition, a plurality of preset areas in the image to be identified can be preprocessed simultaneously, such as the upper left, lower left, upper right, lower right and middle areas in the image to be identified are preprocessed simultaneously, so that image sub-blocks corresponding to the areas, namely a plurality of image sub-blocks, are obtained, the subsequent icon identification is conveniently carried out on the plurality of image sub-blocks, the phenomenon of missed detection of icons (watermark marks) is avoided, and the accuracy of detecting the icons in the video is improved.

By the method, the image to be identified is cut through the preprocessing rule, so that the accuracy of the image sub-block corresponding to the preset area is improved, the target icon (watermark mark) to be detected is ensured to be located in the image sub-block as much as possible, meanwhile, the size of the image sub-block is reduced as much as possible, the image size during the subsequent icon identification of the image sub-block is reduced, and the efficiency during the subsequent icon identification of the image sub-block is improved.

By the method, when the subsequent detection model carries out icon recognition, only the image sub-blocks are recognized, so that full-image recognition of the image to be recognized is avoided, the recognition quantity of the image area to be recognized is reduced, and the icon recognition efficiency is improved.

103. And inputting the image sub-blocks into the trained detection model to obtain the confidence that the image sub-blocks contain the target icons.

The detection model may be a model including a feature extraction layer (Darknet-53), a convolutional network layer (Convolutional Set), a classification layer (sigmoid), and the like. The feature extraction layer is used for extracting basic features of the image, the convolution network layer is used for carrying out convolution processing on the extracted features to obtain a predicted feature map, and the classification layer is used for converting the predicted feature map into a predicted result.

The detection model may be an icon recognition model which is iteratively trained until convergence, and is used for recognizing or detecting a target icon at a specific position in an image to be recognized, so as to determine the confidence of the target icon at the specific position. Specifically, the trained detection model is obtained by sample image sub-blocks of a preset area in a sample image and sample confidence coefficient combined training of the sample image sub-blocks containing sample icons.

The confidence level may be a confidence value corresponding to an icon event included in a certain region of the image to be identified, and is used for reflecting the confidence level or probability that the target icon is included in the certain region or position of the image to be identified.

In order to perform icon detection on an image to be identified, the embodiment of the application inputs an image sub-block into a trained detection model to obtain a confidence value that the image sub-block contains a target icon.

In some embodiments, the step of inputting the image sub-block into the trained detection model to obtain the confidence that the image sub-block contains the target icon includes:

(1) Inputting the image sub-blocks into a trained detection model;

(2) Extracting features of the image sub-blocks through a feature extraction layer in the detection model to obtain image sub-features;

(3) Carrying out convolution processing on the image sub-features through a convolution layer in the detection model to obtain a prediction feature map;

(4) And decoding the dimension of the prediction feature map through a classification layer in the detection model to obtain the confidence that the image sub-block contains the target icon.

When the detection model receives an image sub-block, feature extraction is carried out on the image sub-block through a feature extraction layer (Darknet-53) in the detection model to obtain an image sub-feature corresponding to the image sub-block, convolution processing is carried out on the image sub-feature through a convolution network layer (Convolutional Set) in the detection model to obtain a prediction feature map, the dimension of the prediction feature map is obtained through a classification layer (Sigmoid), and the dimension of the prediction feature map is decoded to obtain the confidence that the image sub-block contains a target icon.

In some embodiments, the convolutional network layers include a first convolutional network layer, a second convolutional network layer, and a third convolutional network layer, and the step of convolving the image sub-features by detecting the convolutional layers in the model to obtain a predicted feature map includes:

And (3.1) carrying out convolution processing on the image sub-features through a first convolution network layer in the convolution network layers to obtain a first prediction feature map.

Specifically, the first convolutional network layer comprises 3*3 convolutional layers and 1*1 convolutional layers. And performing convolution processing on the image sub-features through 3*3 convolution layers and 1*1 convolution layers in the first convolution network layer respectively to obtain a first prediction feature map.

And (3.2) carrying out convolution processing on the image sub-features through a second convolution network layer in the convolution network layers to obtain a second prediction feature layer.

Specifically, the second convolutional network layer comprises 1*1 convolutional layers, an upsampling layer, a fusion layer, 3*3 convolutional layers, and 1*1 convolutional layers. The image sub-feature processing method comprises the steps of carrying out convolution processing on an image sub-feature through a 1*1 convolution layer in a second convolution network layer to obtain a first convolution feature result, carrying out up-sampling on the first convolution feature result through an up-sampling layer to obtain a sampled first feature result, carrying out fusion on the sampled first feature result and a 26 x 26 feature result output by a feature extraction layer (Darknet-53) through a fusion layer to obtain a first fusion feature, and carrying out convolution processing on the image sub-feature through a 3*3 convolution layer and a 1*1 convolution layer respectively to obtain a second prediction feature layer.

And (3.3) carrying out convolution processing on the image sub-features through a third convolution network layer in the convolution network layers to obtain a third prediction feature layer.

Specifically, the third convolutional network layer comprises 1*1 convolutional layers, an upsampling layer, a fusion layer, 3*3 convolutional layers, and 1*1 convolutional layers. The method comprises the steps of carrying out convolution processing on image sub-features through 1*1 convolution layers in a third convolution network layer to obtain a first convolution feature result, carrying out up-sampling on the convolution feature result through an up-sampling layer to obtain a sampled first feature result, and carrying out fusion on the sampled first feature result and 26 x 26 feature results output by a feature extraction layer (Darknet-53) through a fusion layer to obtain a first fusion feature.

Further, the first fusion feature is subjected to convolution processing through a 1*1 convolution layer to obtain a second convolution feature result, the second convolution feature result is subjected to up-sampling through an up-sampling layer to obtain a sampled second feature result, and the sampled second feature result is fused with the 52 x 52 feature result output by the feature extraction layer (Darknet-53) through the fusion layer to obtain a second fusion feature.

And carrying out convolution processing on the image sub-features by sequentially passing through 3*3 convolution layers and 1*1 convolution layers on the second fusion features to obtain a third prediction feature layer.

(3.4) Determining the first prediction feature map, the second prediction feature map, and the third prediction feature map as prediction feature maps.

In some embodiments, before the step of inputting the image sub-block into the trained detection model, further comprises:

A. And acquiring a sample image, wherein a preset area in the sample image carries a sample icon.

The sample image may be an image with a sample icon, and the sample image may be an image obtained directly, or may be an image corresponding to a video frame in a sample video, which is not limited herein. It should be noted that, the preset area in the sample image carries a sample icon.

The preset area may be a certain area or position in the sample image, such as an image area of upper left, lower left, upper right, lower right, middle, etc. in the sample image.

The sample image may be a synthetic image. Because the sample data of the watermark video is complex to obtain, the embodiment of the application adopts a data synthesis mode to fit the real data, namely the sample image containing the sample icon. In the data synthesis, the size and the position randomness of the sample icon (watermark mark) can be ensured, for example, icon setting is performed in any preset area in the sample image, for example, the sample icon is set in any image area or a plurality of image areas in the upper left, lower left, upper right, lower right and middle of the sample image, and the size of the icon is random, but the size of the icon is smaller than the preset area, and the size of the icon is not limited herein. For example, when the sample image is synthesized by adopting the data synthesis mode, the corresponding sample icon can be synthesized with the preselected image according to the obtained sample icon or watermark by obtaining the sample icon or watermark of the existing sharing platform or enterprise, so as to ensure the randomness of the size and the position of the sample icon (watermark), and perform random clipping or shielding on the sample icon (watermark), and the like, so as to obtain the sample image. In addition, data enhancement processing such as blurring, shielding, brightness, color change, noise adding and the like can be performed on a pre-selected image before the sample image is synthesized or the sample image obtained by synthesis, so that corresponding sample images can be further acquired for training of a model.

B. and cutting the sample image to obtain a sample image sub-block carrying the sample icon, and acquiring the sample confidence coefficient of the sample icon contained in the sample image sub-block.

In order to improve the efficiency of the subsequent training of the model, the embodiment of the application cuts the preset area in the sample image to obtain the sample image sub-block with the corresponding size. Therefore, when the preset model is trained subsequently, the speed of identifying the icon (watermark mark) in the image by the preset model is reduced, so that the speed of subsequent iterative training is increased, and the model training efficiency is improved.

In order to train the preset model, a sample confidence level that a preset area in the sample image contains the sample icon needs to be obtained. The sample confidence level can be set in a humane perception mode, and the confidence level indicates the confidence level or probability value that a preset area in the sample image contains the sample icon.

C. And inputting the sample image sub-block into a preset model to obtain the prediction confidence that the sample image sub-block contains the sample icon.

D. and obtaining a confidence difference between the sample confidence and the prediction confidence.

E. and carrying out iterative training on the network parameters of the preset model according to the confidence coefficient difference until the confidence coefficient difference converges, and obtaining a trained detection model.

Specifically, the preset model may be a model including a feature extraction layer (Darknet-53), a convolutional network layer (Convolutional Set), and a classification layer (Sigmoid). The method comprises the steps of carrying out convolution processing on a preset area of a sample image through a convolution network layer in a preset model to obtain sample image sub-features, carrying out convolution processing on the sample image sub-features through a convolution network layer in a detection model to obtain a prediction feature map sample, obtaining dimensions of the prediction feature map through a classification layer, and decoding the dimensions of the prediction feature map sample to obtain confidence that a sample image sub-block contains a target icon.

In the model training process, the prediction confidence is the confidence or probability obtained by the preset model based on the input sample image, the prediction confidence reflects the confidence or probability of the sample icon contained in the preset area of the sample image identified by the preset model, and certain difference exists between the prediction confidence and the actually set sample confidence.

In order to enable the confidence coefficient output by the preset model when the video is identified to be more accurate or to be closer to detection or identification of human perception, the embodiment of the application calculates the confidence coefficient difference between the sample confidence coefficient and the prediction confidence coefficient, and judges the difference between the current preset model and the ideal target detection model according to the confidence coefficient difference, so that the trained detection model is obtained by default by adjusting network parameters of the preset model and performing iterative training after adjustment until the difference between the prediction confidence coefficient and the sample confidence coefficient does not exist or is very close to the difference. Therefore, the joint training of the preset model through the sample image sub-blocks and the sample confidence coefficient is realized, and the trained detection model is obtained.

The model is trained in the mode, a trained target model, namely a detection model is obtained, and image sub-blocks of the image to be identified are identified based on the detection model, so that the identification efficiency and accuracy of the image identification in the video are improved.

104. When the confidence coefficient is detected to be in the preset confidence coefficient range, performing optical character recognition processing on the image sub-blocks in the preset confidence coefficient range to obtain an optical character recognition result.

The preset confidence range may be a preset value range corresponding to the confidence, and is used for distinguishing the situation that the icon (watermark) is to be identified, and limiting the image sub-block corresponding to the confidence in the value range to perform optical character identification processing.

Wherein optical character recognition (Optical Character Recognition, OCR) may be a process of determining the shape of a corresponding character by detecting dark and light patterns and then translating the shape into computer text by character recognition, i.e., optically converting image sub-blocks or icons in an image into corresponding image files and converting text in an image into text format by recognition software for further editing processing or other processing by word processing software.

In order to improve the accuracy of icon (watermark mark) detection, after obtaining the confidence that the image sub-block contains the target icon, the embodiment of the application can determine whether to perform optical character recognition on the corresponding image sub-block according to the confidence. Specifically, in the embodiment of the application, a confidence value range is set, namely a preset confidence range, and the confidence is higher when the image sub-block corresponding to the confidence in the range contains the target icon, but the icon or the watermark can not be directly determined, so that the accuracy of detecting the icon (watermark) is improved, the image sub-block corresponding to the confidence in the preset confidence range can be subjected to optical character recognition, and the character information contained in the icon in the image sub-block is converted into text information, so that an optical character recognition result is obtained.

For example, assuming that the confidence level has a value range of [0,1], the embodiment of the present application sets the preset confidence level range to [0.5,0.9], and the image sub-block input to the detection model includes image sub-blocks corresponding to the upper left, lower left, upper right, lower right and middle image areas of the image to be recognized, where the confidence level of the image sub-block in the upper left image area of the image to be recognized includes the target icon is the largest, and if the confidence level of the upper left image area is 0.8 (the confidence level of other image sub-blocks may be 0 or less than 0.5), because the confidence level of the image sub-block corresponding to the upper left image area is in the preset confidence range [0.5,0.9], the image sub-block corresponding to the upper left area is subjected to optical character recognition to convert the character information included in the icon in the image sub-block into text information, so as to obtain the optical character recognition result.

It should be noted that, the optical character recognition result includes word information or text, and then it is determined whether the image sub-block includes the target icon or not through the optical character recognition result including the word information or text, so as to determine whether the video to be detected includes the target icon (watermark mark).

105. And determining a target recognition result of the video to be detected according to the result of the optical character recognition processing.

The target identification result is a result generated by carrying out icon identification on the video to be detected. Specifically, when the image sub-block is detected to contain the target icon, the target recognition result may include mark position information, a mark, etc. of the target icon in the image to be recognized or the video frame, and when the image sub-block is detected to not contain the target icon, the target recognition result may be that the video to be detected does not carry the target icon, or other expression modes, which are not limited herein.

In some embodiments, the step of determining the target recognition result of the video to be detected according to the result of the optical character recognition process includes:

(1) And identifying the optical character identification result according to the part-of-speech information to obtain the part-of-speech result.

The part-of-speech information may be information according to which part-of-speech is divided, and is used for dividing the related text into a plurality of word features or word information. For example, after the optical character recognition result is obtained, the characters or the characters in the optical character recognition result are divided according to the part-of-speech information, so as to obtain a part-of-speech result, wherein the part-of-speech result can comprise word information.

(2) When the part-of-speech result is detected to contain target word information in the preset word information list, determining the target word information as a target icon, and acquiring the identification of the target icon.

The preset word information list may be a list containing text information or word information corresponding to an existing icon (watermark), for example, the preset word information list includes "a", "a icon", "a game", "a video", "B icon", "B video", "B headline", and the like.

In order to identify whether an image sub-block with confidence in a preset confidence range contains a target icon or not, after a part-of-speech result is obtained, the embodiment of the application determines that the target word information is the target icon and obtains the identification of the target icon by detecting whether the part-of-speech result contains word information matched with the target word information in a preset word information list when the part-of-speech result contains one or more target word information in the preset word information list. For example, if the part-of-speech result includes word information of "a video", the part-of-speech result is matched with word features in a preset word information list, and if target word information in the preset word information list is matched with word information in the part-of-speech result, if "a video" in the preset word information list is matched with "a video" in the part-of-speech result, it is determined that "a video" is a target icon, and the identifier corresponding to the obtained target icon may be "a video", "a", or the like.

(3) And obtaining the mark position information corresponding to the target icon in the image sub-block.

The marking position information may be position information of an image sub-block carrying the target icon in the image to be identified (video frame picture of the video to be detected), and the position information of the image sub-block in the image to be identified may reflect that the target icon (watermark) is located in the position of the video to be identified.

In order to learn the position information of the target icon in the image to be identified, the embodiment of the application determines the position information of the corresponding image sub-block in the image to be identified. Specifically, the position information of the image sub-block corresponding to the confidence coefficient is identified in the image to be identified, and the position information is determined as the mark position information of the target icon.

(4) And determining the identification of the target icon and the mark position information as target identification results.

In the embodiment of the application, after the mark position information of the image sub-block corresponding to the confidence coefficient, namely the mark position information of the target icon and the mark of the target icon are obtained, a corresponding target recognition result is generated according to the mark position information and the mark of the target icon.

In the embodiment of the application, the confidence coefficient of the corresponding image sub-block containing the target icon is in the preset confidence range, which may be that the model is not completely matched with the actual icon when identifying the icon characteristics similar to the icon, or the confidence coefficient output by the detection model when detecting the target icon is in the preset confidence range because the definition of the target icon is lower.

Because the size of the target icon is larger, the complete icon is not covered in the image sub-block, so that the confidence that the corresponding image sub-block contains the target icon is in a preset confidence range, but the part-of-speech result corresponding to the optical character recognition result does not contain the target word information in the preset word information list, therefore, the optical character recognition, namely the full-image recognition, needs to be carried out on the whole area of the image to be recognized, and whether the image to be recognized contains the target icon is determined.

Thus, in some implementations, embodiments of the application further include:

A. When the part-of-speech result is detected not to contain target word information in the preset word information list, performing optical character recognition processing on the image to be recognized to obtain an optical character recognition result corresponding to the image to be recognized, and recognizing the optical character recognition result corresponding to the image to be recognized according to the part-of-speech information to obtain a word information result corresponding to the image to be recognized.

The word information result may be a plurality of word information corresponding to all texts in the image to be identified.

Because the image sub-blocks with the confidence in the preset confidence range belong to specific areas (preset areas such as upper left, lower left, upper right, lower right, middle and other areas) in the image to be identified, the image sub-blocks reflected by the image sub-blocks contain target icon information on one surface. In order to improve the accuracy of icon identification, when the image sub-block with the confidence in the preset confidence range does not contain the target word information in the preset word information list, the embodiment of the application can perform optical character identification on the image to be identified corresponding to one or more frames of video frames in the video to be detected, namely perform full-area optical character identification on the image to be identified to obtain an optical character identification result corresponding to the image to be identified, and perform word information conversion on the optical character identification result corresponding to the image to be identified to obtain a word information result of the full-area of the image to be identified, so as to facilitate the subsequent determination of whether the target icon (watermark) is contained according to the word information result.

It should be noted that, the optical character recognition processing is performed on the whole area of the image to be recognized, where the whole area refers to all areas in the image to be recognized, that is, any area in the whole image. For example, a whole image to be recognized is recognized to obtain all optical character recognition results corresponding to the image to be recognized.

B. And matching the word information result with target word information in a preset word information list.

C. And when the number of the matched target word information is greater than or equal to a preset word information number threshold, determining the matched target word information as a target icon in the image to be identified.

The preset word information quantity threshold may be a numerical value for determining that the image to be identified contains the target icon. For example, when the word information contained in the image to be recognized is matched with the preset word information list, the number of matched target word information is obtained, and when the number of the target word information is greater than or equal to the threshold value, the image to be recognized is determined to contain the target icon.

D. and acquiring the identification of the target icon and the mark position information of the target icon in the image to be identified.

E. And determining the identification of the target icon and the mark position information as target identification results.

In order to improve accuracy in recognizing icons (watermark marks), the method and the device can perform full-image or full-area optical character recognition processing on the images to be recognized corresponding to video frames in the videos to be detected to obtain full-area optical character recognition results, perform part-of-speech information recognition on the optical character recognition results to obtain word information results of the full areas in the images to be recognized, match the word information results with a preset word information list to determine the number of matched target word information, and determine the matched target word information as the target icons in the images to be recognized or determine that the images to be recognized contain the target icons (watermark marks) when the number of the target word information contained in the word information results is larger than or equal to a preset word information number threshold.

For example, the preset word information list may further include "come a (application), focus on hot spots, come a (application), focus on event development, sweep code using the latest version a (application), add me friends", and the like. Because the tail frame video frame in the video to be detected possibly contains more related information, such as word information related to the target icon, when the part-of-speech result of the image sub-block with the confidence level within the preset confidence range does not contain the target word information in the preset word information list, carrying out full-area optical character recognition on the image corresponding to the tail frame video frame in the video to be detected to obtain an image corresponding optical character recognition result corresponding to the tail frame video frame, obtaining the optical character recognition result, converting the optical character recognition result into a word information result containing one or more word information, and when the word information result is matched with a plurality of target word information in the preset word information list, such as that the matched target word information is ' A (application) ', ' Lai A (application) ', focusing on hot spot time ' and ' using the latest version A (application) scanning code, adding friends ', namely, wherein the number of the matched target word information is 3, and the preset word information threshold is 2, and because the number of the matched target word information is larger than the threshold, determining that the image to be recognized contains the target icon, and particularly can be all the matched targets are the target words or the target words which are determined to be more frequently matched to be the target words.

Further, in order to obtain a target recognition result including the mark of the target icon and the mark position information, after determining that the image sub-block corresponding to the confidence coefficient includes the target icon, the embodiment of the application obtains the mark position information of the image sub-block corresponding to the confidence coefficient, that is, the mark position information of the target icon, and obtains the mark of the target icon, thereby generating a corresponding target recognition result according to the mark position information and the mark of the target icon.

In addition, in the embodiment of the application, when the confidence coefficient of the detection model output image sub-block containing the target icon is larger than the preset confidence coefficient range, the image sub-block corresponding to the confidence coefficient can be directly determined to contain the target icon because the confidence coefficient is higher, namely, the video to be detected carries the target image (watermark mark).

Wherein, in some implementations, the embodiment of the application further includes:

(1) When the confidence coefficient is detected to be larger than a preset confidence coefficient range, determining that the image sub-block corresponding to the confidence coefficient contains a target icon;

(2) Acquiring mark position information of an image sub-block corresponding to the confidence coefficient and acquiring an identifier of a target icon;

(3) And determining the identification of the target icon and the mark position information as a target identification result of the video to be detected.

The identifier of the target icon may be an identifier or a mark of a certain platform, which represents a mark given by the video to be detected after the video to be detected is successfully issued by the corresponding platform. For example, when the video 1 is previously released on the a sharing platform, the a sharing platform attaches an a icon (watermark mark) to a certain area in the picture of the video 1, and the icon (watermark mark) carried by the video 1 is the a icon.

When the confidence level of the output of the detection model is greater than the preset confidence range, the corresponding image sub-block is indicated to contain the target icon (watermark mark) because the confidence level is higher, so that the image sub-block corresponding to the confidence level can be directly determined to contain the target icon. For example, after a video 1 is previously published on an a-sharing platform, the a-sharing platform attaches an a-icon (watermark mark) to a certain area in a picture of the video 1, when a user downloads the video 1 from the a-sharing platform and publishes the video 1 to a B-sharing platform to carry the video 1 published on the a-sharing platform to the B-sharing platform, because the video 1 includes the a-icon, the B-sharing platform detects whether the video 1 includes a target icon (watermark mark), specifically, by detecting through a trained model, and outputting that a confidence level of an image sub-block corresponding to the video 1 includes the a-icon is 0.98, and a preset confidence range is [0.5,0.9], the embodiment of the application can determine that the corresponding image sub-block includes the target icon.

In addition, in the embodiment of the application, when the confidence coefficient of the detection model output image sub-block containing the target icon is smaller than the preset confidence coefficient range, the image sub-block corresponding to the confidence coefficient can be directly determined to not contain the target icon because the confidence coefficient is smaller, namely the video to be detected does not carry the target image (watermark mark).

In some implementations, the present embodiments further include:

A. When the confidence coefficient is detected to be smaller than a preset confidence range, determining that the image to be identified does not carry the target icon;

B. and determining the target icon which is not carried with the target icon as a target recognition result of the video to be detected.

It should be noted that, when the confidence coefficient of the output of the detection model is smaller than the preset confidence range, the confidence coefficient is lower, which indicates that the corresponding image sub-block does not include the target icon (watermark mark), so that it can be directly determined that the image sub-block corresponding to the confidence coefficient does not include the target icon, that is, the video to be detected does not carry the target icon.

For example, when video 2 is not previously distributed on any other sharing platform, any region in the picture of the video 2 does not contain a target icon (watermark mark), when a user distributes the video 2 to the B sharing platform, because the video 2 does not contain any target icon, the B sharing platform detects the image sub-block corresponding to the video 2, specifically, through a trained model, and detects that the image sub-block has the target icon (watermark mark), a lower confidence level is output, for example, 0.2, and because the confidence level is smaller than a preset confidence range [0.5,0.9], it is indicated that the confidence level corresponding to the image sub-block contains the target icon with lower confidence level, so that the embodiment of the application can determine that the corresponding image sub-block does not contain the target icon, that is, it is determined that the video 1 does not carry the target icon.

Further, when the image to be identified is determined not to carry the target icon, the target identification result of the video to be detected is that the video to be detected does not carry the target icon.

From the above, it can be seen that the embodiment of the application can acquire an image to be identified from a video to be detected, cut a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area, input the image sub-block into a trained detection model to obtain a confidence coefficient of an image sub-block containing a target icon, perform optical character recognition processing on the image sub-block in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range to obtain an optical character recognition result, and determine the target recognition result of the video to be detected according to the result of the optical character recognition processing. When the video to be detected is detected, the image to be identified corresponding to the video to be detected is obtained, the image sub-block is preprocessed to obtain the image sub-block, the target icon identification is carried out on the image sub-block through the trained detection model, the whole image identification on the image to be identified is avoided, the identification amount of the image area to be identified is reduced, the icon identification efficiency is improved, after the confidence coefficient of the target icon is contained in the image sub-block, whether the optical character identification processing is carried out on the image to be identified is determined according to the confidence coefficient, and the accuracy in the identification of the icon in the video is improved.

According to the method described in the above embodiments, examples are described in further detail below.

The icon identification method provided by the embodiment of the application is further described by taking icon identification as an example.

Referring to fig. 3, fig. 3 is a flowchart illustrating another step of the icon recognition method according to the embodiment of the present application, fig. 4 is a flowchart illustrating a flowchart of the icon recognition method according to the embodiment of the present application, fig. 5 is a schematic view of a sample image with a sample icon according to the embodiment of the present application, fig. 6 is a schematic view of a scene of the icon recognition method according to the embodiment of the present application, and for convenience of understanding, the embodiment of the present application will be described with reference to fig. 3, fig. 4, fig. 5 and fig. 6.

In the embodiment of the present application, description will be made from the viewpoint of an icon recognition apparatus, which may be integrated in a computer device such as a terminal or a server. When a processor on a terminal or a server executes a program corresponding to the icon identification method, the specific flow of the icon identification method is as follows:

201. and acquiring an image to be identified from the video to be detected.

The video to be detected can be a video to be detected and checked, which is requested by a user to be released to the target sharing platform, and the target distribution platform is responsible for checking and detecting the video to be detected. If the video to be detected is downloaded by the requesting user through other platforms, that is, the original creation of the requesting user is not included, the video to be detected may carry watermarks, identifications, tags or marks of the corresponding platforms, so that the video to be distributed needs to be checked.

In order to audit the video to be detected, the embodiment of the application identifies the image corresponding to the video frame in the video to be detected. Since the video to be detected includes a plurality of frames, in order to reduce the workload in detecting the video, the present embodiment selects a small number of frames from the plurality of frames as the image to be detected. Specifically, when a video to be detected is received, the method for extracting the target image frames in the video to be detected can be that all video frames in the video to be detected are extracted to obtain a video frame set corresponding to the video to be detected, the video frames in the video frame set are ordered according to the time sequence relation among the video frames in the video to be detected to obtain a video frame sequence, and the target image frames corresponding to the target time sequence in the video frame sequence are obtained. The target image frame corresponding to the target time sequence may be a target image frame corresponding to a first time sequence, a target image frame corresponding to a middle time sequence or a target image frame corresponding to a tail end time sequence in the video frame sequence, or may be target image frames corresponding to first, middle and tail three video frames respectively, which is not limited herein.

202. Cutting a preset area of the image to be identified to obtain an image sub-block corresponding to the preset area.

Through the mode, the image to be identified is preprocessed through the preprocessing rule, so that the accuracy of the image sub-block corresponding to the preset area is improved, the target icon (watermark mark) to be detected is ensured to be located in the image sub-block as much as possible, meanwhile, the size of the image sub-block is reduced as much as possible, the image size during the subsequent icon identification of the image sub-block is reduced, and the efficiency during the subsequent icon identification of the image sub-block is improved.

203. And inputting the image sub-blocks into the trained detection model to obtain the confidence that the image sub-blocks contain the target icons.

The trained detection model is obtained by sample image sub-blocks of a preset area in a sample image and sample confidence coefficient combined training of sample icons contained in the sample image sub-blocks. The training process of the detection model after training by the preset model comprises the steps of obtaining a sample image, cutting the sample image to obtain a sample image sub-block carrying the sample icon, obtaining sample confidence coefficient of the sample icon in the sample image sub-block, inputting the sample image sub-block into the preset model to obtain a prediction confidence coefficient of the sample icon in the sample image sub-block, obtaining a confidence coefficient difference between the sample confidence coefficient and the prediction confidence coefficient, and carrying out iterative training on network parameters of the preset model according to the confidence coefficient difference until the confidence coefficient difference converges to obtain the detection model after training.

The sample image may be a synthetic image. Because the sample data of the watermark video is complex to obtain, the embodiment of the application adopts a data synthesis mode to fit the real data, namely the sample image containing the sample icon. In the data synthesis, the size and the position randomness of the sample icon (watermark mark) can be ensured, for example, icon setting is performed in any preset area in the sample image, for example, the sample icon is set in any image area or a plurality of image areas in the upper left, lower left, upper right, lower right and middle of the sample image, and the size of the icon is random, but the size of the icon is smaller than the preset area, and the size of the icon is not limited herein.

For example, referring to fig. 5, when a sample image is synthesized by adopting a data synthesis manner, a corresponding sample icon can be obtained by obtaining a sample icon or watermark of an existing sharing platform or enterprise, so as to synthesize the sample icon or watermark with a preselected image according to the obtained sample icon or watermark, ensure the randomness of the size and position of the sample icon (watermark), and perform random clipping or shielding on the sample icon (watermark), so as to obtain the sample image. As shown in FIG. 5, each sample image comprises one sample icon, the positions of the sample icons in each sample image are different, and the sample icons are different, for example, the positions of the sample icons 'A' in the top two 'scenery images' are respectively positioned in the upper left area and the upper right area of the sample image, the sample icon 'B' in the video frame images corresponding to the two middle game images is positioned in the middle area of the sample image, and the icons 'C' in the sample images corresponding to the two 'prize awarding ceremony' video frames are respectively positioned in the lower left area and the lower right area of the sample image. It should be noted that, the positions of the icons in the sample images shown in fig. 5 are only examples, and in the actual training process, the positions of the sample images in the synthesized sample images may be determined according to the actual situation.

In addition, data enhancement processing such as blurring, shielding, brightness, color change, noise adding and the like can be performed on a pre-selected image before the sample image is synthesized or the sample image obtained by synthesis, so that corresponding sample images can be further acquired for training of a model.

The detection model may be a model including a feature extraction layer (Darknet-53), a convolutional network layer (Convolutional Set), a classification layer (sigmoid), and the like. When the detection model receives an image sub-block, feature extraction is carried out on the image sub-block through a feature extraction layer (Darknet-53) in the detection model to obtain an image sub-feature corresponding to the image sub-block, convolution processing is carried out on the image sub-feature through a convolution network layer (Convolutional Set) in the detection model to obtain a prediction feature map, the dimension of the prediction feature map is obtained through a classification layer (Sigmoid), and the dimension of the prediction feature map is decoded to obtain the confidence that the image sub-block contains a target icon.

Specifically, convolutional network layers (Darknet-53) include a first convolutional network layer, a second convolutional network layer, and a third convolutional network layer.

Wherein the first convolutional network layer comprises 3*3 convolutional layers and 1*1 convolutional layers. And carrying out convolution processing on the image sub-features through 3*3 convolution layers and 1*1 convolution layers in the first convolution network layer respectively to obtain a first prediction feature map.

Wherein the second convolutional network layer comprises 1*1 convolutional layers, an upsampling layer, a fusion layer, 3*3 convolutional layers, and 1*1 convolutional layers. The image sub-features are subjected to convolution processing through 1*1 convolution layers in the second convolution network layer to obtain a first convolution feature result, the first convolution feature result is subjected to up-sampling through an up-sampling layer to obtain a sampled first feature result, the sampled first feature result is fused with 26 x 26 feature results output by a feature extraction layer (Darknet-53) through a fusion layer to obtain a first fusion feature, and the image sub-features are subjected to convolution processing through 3*3 convolution layers and 1*1 convolution layers respectively to obtain a second prediction feature layer.

Wherein the third convolutional network layer comprises 1*1 convolutional layers, an upsampling layer, a fusion layer, 3*3 convolutional layers, and 1*1 convolutional layers. The method comprises the steps of carrying out convolution processing on a first fusion feature through a 1*1 convolution layer to obtain a second convolution feature result, carrying out up-sampling on the second convolution feature result through an up-sampling layer to obtain a sampled second feature result, carrying out fusion on the sampled second feature result and a 52 x 52 feature result output by a feature extraction layer (Darknet-53) through the fusion layer to obtain a second fusion feature, and carrying out convolution processing on image sub-features through the second fusion feature sequentially through a 3*3 convolution layer and a 1*1 convolution layer to obtain a third prediction feature layer.

Further, the first prediction feature map, the second prediction feature map, and the third prediction feature map are determined as prediction feature maps.

Specifically, the process of generating the confidence value by the classification layer (Sigmoid) is as follows:

The method comprises the steps of obtaining dimensions of a prediction feature map, obtaining dimensions of the prediction feature map as N [3 (4+1+C) ], wherein N is the size of the output feature map, 3 Anchor frames in total, each frame has 4-dimensional prediction frame values which are Tx, ty, tw, th respectively, and the method comprises 1-dimensional prediction frame confidence and C-dimensional object category numbers. The small-size feature map is used for detecting the large-size object, and the large-size feature map is used for detecting the small-size object, so that the detection performance of the model on targets with different sizes is improved.

The process of decoding the dimension of the prediction feature map to obtain the confidence that the image sub-block contains the target icon may be as follows:

the prediction feature map is decoded through the Sigmoid activation function to obtain the confidence coefficient, and the value of the confidence coefficient after decoding is located between (0 and 1), such as 0.1, 0.3, 0.5, 0.8, 0.95 and the like.

In the embodiment of the application, the confidence level can be a confidence value corresponding to an icon event contained in a certain area in the image to be identified, and the confidence value is used for reflecting the confidence level or probability that the target icon is contained in the certain area or position in the image to be identified.

Referring to fig. 6, the processing strategy for the target icon (watermark) when the confidence is in the scene of different confidence ranges is reflected. In the embodiment of the application, for a preset confidence range, two thresholds are set as thresh1 and thresh2 respectively. The method comprises the steps of carrying out Optical Character Recognition (OCR) on an image sub-block or an image to be recognized when the confidence level of the output of a detection model is [ thresh1, thresh2], determining that a video to be detected contains a target icon (watermark) when the confidence level of the output of the detection model is [ thresh2,1.0], and determining that the video to be detected does not contain the target icon (watermark) when the confidence level of the output of the detection model is [0, thresh1 ]. Specifically, steps 204-205 are performed when the confidence level of the detection model output is within a preset confidence level range, steps 206-208 are performed when the confidence level of the detection model output is greater than the preset confidence level range, and steps 209-210 are performed when the confidence level of the detection model output is less than the preset confidence level range.

204. When the confidence coefficient is detected to be in the preset confidence coefficient range, performing optical character recognition processing on the image sub-blocks in the preset confidence coefficient range to obtain an optical character recognition result.

In order to improve the accuracy in the detection of icons (watermarking). For example, assuming that the confidence level has a value range of [0,1], the embodiment of the present application sets the preset confidence level range to [0.5,0.9], and the image sub-block input to the detection model includes image sub-blocks corresponding to the upper left, lower left, upper right, lower right and middle image areas of the image to be recognized, where the confidence level of the image sub-block in the upper left image area of the image to be recognized includes the target icon is the largest, and if the confidence level of the upper left image area is 0.8 (the confidence level of other image sub-blocks may be 0 or less than 0.5), because the confidence level of the image sub-block corresponding to the upper left image area is in the preset confidence range [0.5,0.9], the image sub-block corresponding to the upper left area is subjected to optical character recognition to convert the character information included in the icon in the image sub-block into text information, so as to obtain the optical character recognition result.

205. And determining a target recognition result of the video to be detected according to the result of the optical character recognition processing.

The target identification result is a result generated by carrying out icon identification on the video to be detected. When the image sub-block is detected to contain the target icon, the target identification result can comprise mark position information, identification and the like of the target icon in the image to be identified or the video frame picture.

In some embodiments, a target recognition result of a video to be detected is determined according to a result of optical character recognition processing, and the method includes the steps of recognizing the optical character recognition result according to part-of-speech information to obtain the part-of-speech result, determining the target word information as a target icon when detecting that the part-of-speech result contains target word information in a preset word information list, acquiring a mark of the target icon, acquiring mark position information corresponding to the target icon in an image sub-block, and determining the mark of the target icon and the mark position information as target recognition results.

The word part result is matched with word characteristics in the preset word information list when word information in the preset word information list is matched with word information in the word part result, if the "A video" in the preset word information list is matched with the "A video" in the word part result, the "A video" is determined to be the target icon, and the identification corresponding to the obtained target icon can be the "A video", "A", and the like. Further, the position information of the image sub-block corresponding to the confidence coefficient is identified in the image to be identified, and the position information is determined to be the mark position information of the target icon.

In addition, since the end frame video frame in the video to be detected may contain relatively more word information associated with the target icon, for example, icons (watermarks) showing other platforms, user IDs, publicity languages, and the like. In the embodiment of the application, when the part-of-speech result of the image sub-block with the confidence level within the preset confidence range does not contain target word information in the preset word information list, carrying out full-area optical character recognition on the image corresponding to the tail frame video frame in the video to be detected to obtain an image corresponding optical character recognition result corresponding to the tail frame video frame, obtaining the optical character recognition result and converting the optical character recognition result into word information results containing one or more word information, and when the word information results are matched with a plurality of target word information in the preset word information list, for example, the word information results are "A (application)", "Lai A (application)", the word information is the word information of the target word, and the word information is the word information of the target word, namely, the number of the target word information is 3, and the threshold value is 2. Further, the mark position information of the image sub-block corresponding to the confidence coefficient, namely the mark position information of the target icon, is obtained, and the mark of the target icon, such as 'A (application)', is obtained, so that a corresponding target recognition result is generated according to the mark position information and the mark of the target icon.

When the part-of-speech result is detected to not contain target word information in the preset word information list, carrying out full-area optical character recognition on an image to be recognized corresponding to a tail frame video frame in the video to be detected to obtain an optical character recognition result corresponding to the full area, recognizing word information in the optical character recognition result corresponding to the full area to determine the number of corresponding word information, and determining that the image to be recognized contains a target icon when the number of the word information is greater than or equal to a preset word information number threshold value, and determining that the image to be recognized is the target recognition result according to the mark position information and the mark of the target icon. By the method, a video sharer can be prevented from cutting video frames except for the video tail frames to remove the target icon (watermark mark), and the accuracy of identifying the icon (watermark mark) in the video can be improved by identifying the image corresponding to the tail frame.

206. When the confidence coefficient is detected to be larger than the preset confidence coefficient range, determining that the image sub-block corresponding to the confidence coefficient contains the target icon.

207. And acquiring the mark position information of the image sub-block corresponding to the confidence coefficient, and acquiring the identification of the target icon.

208. And determining the identification of the target icon and the mark position information as a target identification result of the video to be detected.

Steps 206-208 are performed to generate a target recognition result of the video to be detected when the confidence level is detected to be greater than the preset confidence level range.

For example, when the confidence level of the output of the detection model is greater than the preset confidence range, the confidence level is higher, which indicates that the corresponding image sub-block contains the target icon (watermark mark), so that the image sub-block corresponding to the confidence level can be directly determined to contain the target icon. For example, after a video 1 is previously published on an a-sharing platform, the a-sharing platform attaches an a-icon (watermark mark) to a certain area in a picture of the video 1, when a user downloads the video 1 from the a-sharing platform and publishes the video 1 to a B-sharing platform to carry the video 1 published on the a-sharing platform to the B-sharing platform, because the video 1 includes the a-icon, the B-sharing platform detects whether the video 1 includes a target icon (watermark mark), specifically, by detecting through a trained model, and outputting that a confidence level of an image sub-block corresponding to the video 1 includes the a-icon is 0.98, and a preset confidence range is [0.5,0.9], the embodiment of the application can determine that the corresponding image sub-block includes the target icon.

Further, the target identification result of the video to be detected is generated by acquiring the mark position information and the mark of the target icon.

209. And when the confidence coefficient is detected to be smaller than the preset confidence range, determining that the image to be identified does not carry the target icon.

210. And determining the target icon which is not carried with the target icon as a target recognition result of the video to be detected.

The steps 201 to 210 are executed to implement the flow shown in fig. 4, and specifically, the flow of the icon identifying method shown in fig. 4 is as follows:

301. And extracting the first, middle and last three frames of video frames from the video to be detected as images to be identified.

302. Cutting a preset area of the image to be identified to obtain a corresponding image sub-block.

303. Inputting the image sub-blocks into a detection model, and acquiring the confidence that the image sub-blocks contain the target icons through the detection model.

304. Whether the confidence coefficient is within a preset confidence coefficient range is determined, if the confidence coefficient is within the preset confidence coefficient range, the process 305 is executed, and if the confidence coefficient is not within the preset confidence coefficient range, the process 310 is executed.

305. And performing optical character recognition on the image sub-blocks in the preset confidence coefficient range to obtain an optical character recognition result corresponding to the image sub-blocks.

306. And if the corresponding word information is detected, determining that the video contains the target icon (watermark mark), and if the corresponding word information is not detected, executing a flow 307.

307. And carrying out full-image optical character recognition on the image to be recognized corresponding to the tail frame to obtain an optical character recognition result corresponding to the full image of the image to be recognized.

308. And detecting whether the optical character recognition result corresponding to the full graph of the image to be recognized contains word information corresponding to the target icon, wherein if the word information corresponding to the target icon is detected to be contained, executing a flow 309, and if the word information corresponding to the target icon is not detected to be contained, determining that the video does not contain the target icon (watermark mark).

309. And judging whether the video contains the target icon (watermarking) according to the detected number of the word information, wherein if the detected number of the corresponding word information is larger than or equal to a preset word information number threshold value, determining that the video contains the target icon (watermarking), and if the detected number of the corresponding word information is smaller than the preset word information number threshold value, determining that the video does not contain the target icon (watermarking).

It should be noted that, for the specific implementation of the processes 301-309, reference is made to the previous embodiments, and details are not repeated here.

In order to better implement the above method, the embodiment of the present application further provides an icon identifying apparatus, where the icon identifying apparatus may be integrated into a computer device, such as a server or a terminal, and the terminal may include a tablet computer, a notebook computer, and/or a personal computer.

For example, as shown in fig. 7, the icon recognition apparatus may include an acquisition unit 401, a preprocessing unit 402, an input unit 403, a recognition unit 404, and a determination unit 405.

An acquiring unit 401, configured to acquire an image to be identified from a video to be detected;

A preprocessing unit 402, configured to cut a preset area of an image to be identified, to obtain an image sub-block corresponding to the preset area;

An input unit 403, configured to input the image sub-block to the trained detection model, to obtain a confidence that the image sub-block includes the target icon;

The recognition unit 404 is configured to perform optical character recognition processing on the image sub-block in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range, so as to obtain an optical character recognition result;

a determining unit 405, configured to determine a target recognition result of the video to be detected according to a result of the optical character recognition processing.

In some embodiments, the determining unit 405 is further specifically configured to identify an optical character recognition result according to the part-of-speech information, obtain the part-of-speech result, determine that the target word information is a target icon and obtain the identifier of the target icon when the part-of-speech result is detected to include the target word information in the preset word information list, obtain the mark position information corresponding to the target icon in the image sub-block, and determine the identifier of the target icon and the mark position information as the target recognition result.

In some embodiments, the determining unit 405 is further configured to:

when the part-of-speech result is detected not to contain target word information in the preset word information list, performing optical character recognition processing on the whole region of the image to be recognized to obtain an optical character recognition result corresponding to the whole region, and recognizing the optical character recognition result corresponding to the whole region according to the part-of-speech information to obtain a word information result;

Matching the word information result with target word information in a preset word information list;

acquiring the identification of a target icon and acquiring the mark position information of the target icon in an image to be identified;

And determining the identification of the target icon and the mark position information as target identification results.

In some embodiments, the input unit 403 is further specifically configured to input the image sub-block into the trained detection model, perform feature extraction on the image sub-block through a feature extraction layer in the detection model to obtain an image sub-feature, perform convolution processing on the image sub-feature through a convolution layer in the detection model to obtain a prediction feature map, and decode the dimension of the prediction feature map through a classification layer in the detection model to obtain the confidence that the image sub-block contains the target icon.

In some embodiments, the determining unit 405 is further specifically configured to determine that the image sub-block corresponding to the confidence coefficient includes the target icon when the confidence coefficient is detected to be greater than the preset confidence coefficient range, obtain the mark position information of the image sub-block corresponding to the confidence coefficient, obtain the identifier of the target icon, and determine the identifier of the target icon and the mark position information as the target recognition result of the video to be detected.

In some embodiments, the determining unit 405 is further configured to determine that the image to be identified does not carry the target icon when the confidence level is detected to be less than the preset confidence range, and determine that the image to be identified does not carry the target icon as a target identification result of the video to be detected.

In some embodiments, the icon recognition device further comprises a training unit, specifically configured to:

inputting the sample image sub-block into a preset model to obtain a prediction confidence that the sample image sub-block contains a sample icon;

Obtaining a confidence coefficient difference between the sample confidence coefficient and the prediction confidence coefficient;

and carrying out iterative training on the network parameters of the preset model according to the confidence coefficient difference until the confidence coefficient difference converges, and obtaining a trained detection model.

In some embodiments, the obtaining unit 401 is further configured to receive the video to be detected, extract a target image frame in the video to be detected, and determine the target image frame image as the image to be identified.

From the above, it can be seen that in the embodiment of the present application, an image to be identified may be obtained from a video to be detected through the obtaining unit 401, a preset area of the image to be identified is cut through the preprocessing unit 402 to obtain an image sub-block corresponding to the preset area, the image sub-block is input to a trained detection model through the input unit 403 to obtain a confidence coefficient of the image sub-block containing the target icon, the identifying unit 404 is configured to perform optical character identification processing on the image sub-block in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range, to obtain an optical character identification result, and the determining unit 405 determines the target identification result of the video to be detected according to the optical character identification processing result. When the video to be detected is detected, the image to be identified corresponding to the video to be detected is obtained, the image sub-block is preprocessed to obtain the image sub-block, the target icon identification is carried out on the image sub-block through the trained detection model, the whole image identification on the image to be identified is avoided, the identification amount of the image area to be identified is reduced, the icon identification efficiency is improved, after the confidence coefficient of the target icon is contained in the image sub-block, whether the optical character identification processing is carried out on the image to be identified is determined according to the confidence coefficient, and the accuracy in the identification of the icon in the video is improved.

The embodiment of the application also provides a computer device, as shown in fig. 8, which shows a schematic structural diagram of the computer device according to the embodiment of the application, specifically:

the computer device may include one or more processing cores 'processors 501, one or more computer-readable storage media's memory 502, a power supply 503, and an input unit 504, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 8 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components. Wherein:

The processor 501 is the control center of the computer device and uses various interfaces and lines to connect the various parts of the entire computer device, and by running or executing software programs and/or modules stored in the memory 502, and invoking data stored in the memory 502, performs various functions of the computer device and processes the data, thereby performing overall detection of the computer device. Optionally, the processor 501 may include one or more processing cores, and preferably the processor 501 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.

The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by executing the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the computer device, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 502 may also include a memory controller to provide access to the memory 502 by the processor 501.

The computer device further includes a power supply 503 for powering the various components, and preferably the power supply 503 may be logically coupled to the processor 501 via a power management system such that functions such as charge, discharge, and power consumption management are performed by the power management system. The power supply 503 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The computer device may also include an input unit 504, which input unit 504 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in the embodiment of the present application, the processor 501 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 502 according to the following instructions, and the processor 501 executes the application programs stored in the memory 502, so as to implement various functions, as follows:

The method comprises the steps of obtaining an image to be recognized from a video to be detected, cutting a preset area of the image to be recognized to obtain an image sub-block corresponding to the preset area, inputting the image sub-block into a trained detection model to obtain the confidence coefficient of an image sub-block containing a target icon, carrying out optical character recognition processing on the image sub-block in the preset confidence coefficient range to obtain an optical character recognition result when the confidence coefficient is detected to be in the preset confidence coefficient range, and determining the target recognition result of the video to be detected according to the optical character recognition processing result.

The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.

From the above, it can be seen that the embodiment of the present application can obtain an image to be identified from a video to be detected, cut a preset region of the image to be identified to obtain an image sub-block corresponding to the preset region, input the image sub-block into a trained detection model to obtain a confidence coefficient of an image sub-block containing a target icon, perform optical character recognition processing on the image sub-block in the preset confidence coefficient range when the confidence coefficient is detected to be in the preset confidence coefficient range to obtain an optical character recognition result, and determine the target recognition result of the video to be detected according to the result of the optical character recognition processing. When the video to be detected is detected, the image to be identified corresponding to the video to be detected is obtained, the image sub-blocks are obtained through cutting, the target icon identification is carried out on the image sub-blocks through the trained detection model, the whole image identification is avoided on the image to be identified, the identification amount of the image area to be identified is reduced, the icon identification efficiency is improved, after the confidence coefficient of the target icon is contained in the image sub-blocks, whether the optical character identification processing is carried out on the image to be identified is determined according to the confidence coefficient, and the accuracy in the identification of the icon in the video is improved.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the icon recognition methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

The computer readable storage medium may include, among others, read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disks, and the like.

Because the instructions stored in the computer readable storage medium can execute the steps in any icon identification method provided by the embodiments of the present application, the beneficial effects that any icon identification method provided by the embodiments of the present application can achieve can be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing describes in detail a method, apparatus and computer readable storage medium for icon recognition according to embodiments of the present application, and specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the above examples are only for aiding in understanding the method and core concept of the present application, and meanwhile, for those skilled in the art, according to the concept of the present application, there are variations in the specific embodiments and application ranges, and the disclosure should not be construed as limiting the present application.

Claims

1. An icon recognition method, comprising:

Acquiring an image to be identified from a video to be detected;

The method comprises the steps of determining a target recognition result of a video to be detected according to a result of optical character recognition processing, wherein the method comprises the steps of recognizing the optical character recognition result according to part-of-speech information to obtain the part-of-speech result, determining the target word information as a target icon when the part-of-speech result is detected to contain target word information in a preset word information list, obtaining the identification of the target icon, obtaining mark position information corresponding to the target icon in an image sub-block, and determining the identification of the target icon and the mark position information as target recognition results.

2. The method as recited in claim 1, further comprising:

3. The method of claim 1, wherein inputting the image sub-block into the trained detection model yields a confidence that the image sub-block contains a target icon, comprising:

inputting the image sub-blocks into a trained detection model;

4. The method of claim 1, wherein before inputting the image sub-block into the trained detection model, further comprising:

5. The method as recited in claim 1, further comprising:

6. The method as recited in claim 1, further comprising:

When the confidence coefficient is detected to be smaller than the preset confidence coefficient range, determining that the image to be identified does not carry a target icon;

7. The method of claim 1, wherein the acquiring the image to be identified from the video to be detected comprises:

Receiving a video to be detected;

Extracting target image frames in the video to be detected;

and determining the target image frame image as an image to be identified.

8. An icon recognition apparatus, comprising:

The determining unit is used for determining a target recognition result of the video to be detected according to the result of the optical character recognition processing, and comprises the steps of recognizing the optical character recognition result according to part-of-speech information to obtain a part-of-speech result, determining the target word information as a target icon when detecting that the part-of-speech result contains target word information in a preset word information list, acquiring the identification of the target icon, acquiring mark position information corresponding to the target icon in the image sub-block, and determining the identification and the mark position information of the target icon as a target recognition result.

9. A computer readable storage medium, characterized in that it is computer readable and stores a plurality of instructions adapted to be loaded by a processor for performing the steps in the icon recognition method according to any one of claims 1 to 7.

10. A computer device comprising a processor and a memory, the memory storing an application program, the processor being configured to run the application program in the memory to perform the steps in the icon recognition method of any one of claims 1 to 7.

11. A computer program product, characterized in that it comprises computer instructions stored in a computer-readable storage medium, from which computer instructions are read by a processor of a computer device, the processor executing the computer instructions, causing the computer device to perform the steps of the icon recognition method according to any one of claims 1 to 7.