CN113486693A

CN113486693A - Video processing method and device

Info

Publication number: CN113486693A
Application number: CN202010942908.XA
Authority: CN
Inventors: 孟祥奇; 冯谨强; 高雪松; 陈维强
Original assignee: Qingdao Hisense Electronic Industry Holdings Co Ltd
Current assignee: Qingdao Hisense Electronic Industry Holdings Co Ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2021-10-08

Abstract

The present application provides a video processing method and device to protect user privacy. In this application, for any frame of image in the collected video stream, the trained human body region segmentation model is used to segment the human body region in the image, so as to obtain a human body mask image that is displayed in different sub-regions of the human body, in which the trained human body region is obtained. The human body region segmentation model is trained based on images with human body sub-region annotations; based on the human body mask image, the skin color ratio of the target human sub-region is determined; when the skin color ratio of the target human sub-region is greater than the preset value, the target The sub-region of the human body is filled. Cut the human body region in the image into multiple human body sub-regions, and determine whether the human body sub-region is an exposed region according to the skin color ratio of the human sub-region. This area is filled to prevent leakage of user privacy.

Description

Video processing method and device

Technical Field

The application relates to the technical field of computers, and provides a video processing method and video processing equipment.

Background

Along with the popularization of smart homes, more and more smart devices are connected into the camera. The intelligent device shoots pictures or videos through the camera, the shot pictures or videos can be published to the network, and other users can watch the uploaded pictures or videos in the network.

When the camera captures a person image, if the user is exposed by wearing intentionally or unintentionally, the captured image also shows that the user is exposed by wearing. When the exposed image or video is distributed to a network, the appearance is not only influenced, but also the privacy of the user is leaked.

Disclosure of Invention

The embodiment of the application provides a video processing method and video processing equipment, which are used for protecting user privacy.

In a first aspect, an embodiment of the present application provides a method for video processing, where the method includes:

aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by human body subregion difference, wherein the trained human body region segmentation model is obtained by training according to the image with human body subregion marks;

determining the skin color proportion of a target human body subregion based on the human body mask image;

and when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion.

In the application, after a video stream is acquired, whether the image contains human body features or not is determined for any frame of image in the video stream, after the image contains the human body features, the image is input into a trained human body segmentation model, a human body region corresponding to the human body features in the image is segmented through the trained human body region segmentation model, and a human body mask image for displaying each human body subregion in a distinguishing mode is output. And determining the skin color proportion of the target human body subregion based on the human body mask image, determining the target human body subregion as an exposed region worn by the user when the skin color proportion of the target human body subregion is greater than a preset value, and filling the target human body subregion to shield the exposed region so as to protect the privacy of the user.

In one possible implementation manner, for any frame of image in a collected video stream, detecting a privacy area in a human body area through a trained privacy recognition model, and outputting position information of the privacy area, wherein the trained privacy recognition model is generated according to image training with a privacy area label;

and filling the privacy area according to the position information of the privacy area.

When filling the target human body subregion according to the skin color proportion, there can be pixel points that do not have the discernment simultaneously at the juncture of two adjacent human body subregions, if this pixel point is the pixel point in the privacy zone that exposes, then there is the condition that the privacy zone is unrecognized, finally lead to not filling the privacy zone and handle, or when the skin color proportion of target subregion is lower, but this target subregion is the privacy zone, still can not fill the processing to this target subregion, these two kinds of circumstances will lead to the privacy zone to expose, influence user privacy. Therefore, the method and the device further provide the steps that the image is input into the trained privacy recognition model to detect the privacy area, the privacy area is further filled, and the privacy of the user is guaranteed.

In one possible implementation, determining the skin color proportion of the target human body subregion based on the human body mask image includes:

determining a target human body subregion based on the human body mask image;

determining the number of skin color pixel points in a target human body subregion;

and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.

In the application, whether a target human body subregion is exposed or not is mainly determined according to the human body complexion, and the image exists in the form of pixel points, so that whether skin color pixel points or non-skin color pixel points are determined according to all the pixel points in the target human body subregion, the number of the skin color pixel points and the number of the non-skin color pixel points are determined, and the ratio of the skin color pixel points to the total number of the pixel points in the target human body subregion is determined and is the complexion proportion of the target human body subregion. A technical scheme for determining the skin color proportion in the target sub-region is provided, so that the skin color proportion of the target sub-region is accurately determined, and whether filling processing is needed or not is further determined according to the skin color proportion, so that the privacy of a user is protected.

In one possible implementation, the pixel point is determined to be a skin color pixel point by the following method:

converting the RGB data into YUV data;

aiming at any pixel point in the target human body subregion, determining the tone of the pixel point according to the chrominance information in the YUV data;

and when the tone is within the preset range, determining the pixel points as skin color pixel points.

In the application, a technical scheme for determining whether the pixel point is a skin color pixel point is provided, and because skin color is mainly influenced by chromaticity, RGB is converted into a YUV color space with separated brightness and chromaticity, whether the pixel point is the skin color pixel point is determined according to the converted chromaticity information, and the skin color pixel point in the target human body subregion is accurately identified so as to determine the skin color proportion of the target human body subregion.

In one possible implementation, a filling process is performed, including:

the filling process is performed in such a manner that the pixel values of R, G, B three channels take the same value.

In the application, in order to make the filled image irreversible, a mode that R, G, B three-channel pixel values take the same value is adopted for filling, so that the problem that user privacy is leaked due to malicious restoration of the processed image is further prevented.

In a second aspect, an embodiment of the present application provides a video processing device configured to execute a method of video processing provided by an embodiment of the present application.

In a third aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: camera and treater, wherein:

the camera is used for collecting video streams;

the processor is used for segmenting a human body region in the image through a trained human body region segmentation model aiming at any frame of image in the collected video stream so as to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion by adopting a mode that pixel values of three channels of red R, green G and blue B take the same value; the trained human body region segmentation model is obtained by training according to the image with the human body subregion labels.

In a fourth aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: camera, treater, display, wherein:

the camera is used for collecting video streams;

the processor is used for segmenting a human body region in the image through a trained human body region segmentation model aiming at any frame of image in the collected video stream so as to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion by adopting a mode that the pixel values of three channels of red R, green G and blue B take the same value; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;

and the display is used for displaying the image subjected to the filling processing.

In a fifth aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: the device comprises a segmentation module, a determination module and a processing module, wherein:

the segmentation module is used for segmenting a human body region in an image through a trained human body region segmentation model aiming at any frame of image in the acquired video stream so as to obtain a human body mask image which is displayed by human body subregion difference, wherein the trained human body region segmentation model is obtained by training according to the image with human body subregion marks;

the determining module is used for determining the skin color proportion of the target human body subregion based on the human body mask image;

and the processing module is used for filling the target human body subregion when the skin color proportion of the target human body subregion is larger than a preset value.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored, and when executed by a processor, the computer instructions implement a method for video processing provided by the embodiment of the present application.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of a first application scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of an intelligent device for video processing according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a second application scenario provided in the embodiment of the present application;

fig. 4 is a schematic diagram of a central control device for video processing according to an embodiment of the present application;

fig. 5 is a schematic diagram of a third application scenario provided in the embodiment of the present application;

fig. 6 is a schematic diagram of another intelligent device for video processing according to an embodiment of the present application;

fig. 7 is a schematic diagram of a fourth application scenario provided in the embodiment of the present application;

fig. 8 is a flowchart of a method for video processing according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a human body region segmentation model according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a human body mask image according to an embodiment of the present application;

fig. 11 is a flowchart of an overall method of video processing according to an embodiment of the present application;

fig. 12 is a block diagram of a video processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solution and advantages of the present application more clearly and clearly understood, the technical solution in the embodiments of the present application will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

1. Artificial intelligence:

artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, collect knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

2. Edge end:

the edge terminal is an open platform integrating network, computing, storage and application core capabilities at one side close to an object or a data source, and provides nearest-end service nearby. The application program is initiated at the edge end to generate faster network service response, and the basic requirements of the industry in the aspects of real-time business, application intelligence, safety, privacy protection and the like are met.

3. Masking the image:

the mask is the outside of the selection frame (the inside of the selection frame is the selection area), and the mask image is a human body mask image in the application, and only the outline of a person is contained in the image at the moment.

The following briefly introduces the design concept of the embodiments of the present application.

With the development of internet technology, more and more intelligent devices begin to access cameras. The camera is used for gathering video, image etc. and smart machine publishes video, image etc. that the camera gathered.

However, in the process of capturing videos and images by the camera, the user wears the exposure intentionally or unintentionally, which has adverse effects on the user and causes privacy leakage of the user.

Therefore, the embodiment of the application provides a method and equipment for video processing, which are used for protecting the privacy of users. In the method, the user privacy exposure area in the video stream is identified based on the video stream collected by the camera, and the user privacy exposure area is filled.

Specifically, in the application, for any frame of image in a video stream acquired by a camera, a human body region in the image is segmented through a human body region segmentation model to obtain a human body mask image which is displayed by a human body subregion in a distinguishing manner, the skin color ratio of the target human body subregion is determined, and when the skin color ratio of the target human body subregion is greater than a preset value, the target human body subregion is an exposed user privacy region, and at the moment, the target human body subregion is filled, so that the user privacy is protected.

After introducing the design idea of the embodiment of the present application, an application scenario of the present application is briefly described.

Fig. 1 is a schematic diagram illustrating an application scenario of a first video processing. As shown in fig. 1, the application scenario includes a smart device 100 and a server 101.

The intelligent device 100 is a device including a camera in a security scene, and may also be an independent camera for acquiring various video images in the security scene; meanwhile, in order to prevent the privacy of the user from being revealed during the video transmission process, a processor for performing video processing is further provided in the smart device 100.

Fig. 2 illustrates a first intelligent device for video processing in the present application, where the intelligent device 100 includes a camera 1001, a processor 1002, and a data transmission module 1003. The camera 1001 is configured to capture a video image, the processor 1002 is configured to perform video processing on the video image captured by the camera 1001, and the data transmission module 1003 is configured to transmit the video image processed by the processor 1002 to the server 101.

In this application, when the processor 1002 performs video processing on a video image acquired by the camera 1001, the processor 1002 is specifically configured to:

aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;

in one possible implementation, the processor 1002 is further configured to:

detecting a privacy area of a human body area in an image through a trained privacy recognition model aiming at any frame of image in an acquired video stream, and outputting position information of the privacy area; filling the privacy area according to the position information of the privacy area; wherein the trained privacy recognition model is generated by image training with privacy zone labels.

In one possible implementation, the processor 1002 is specifically configured to:

determining the target human body subregion based on the human body mask image; determining the number of skin color pixel points in the target human body subregion; and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.

In one possible implementation, the processor 1002 determines that a pixel point is a skin color pixel point by:

converting the RGB data into YUV data; aiming at any pixel point in the target human body subregion, determining the color tone of the pixel point according to the chrominance information in the YUV data; and when the color tone is within a preset range, determining the pixel points as skin color pixel points.

In one possible implementation, the processor 1002 is specifically configured to perform the filling process in a manner that the pixel values of the R, G, B three channels take the same value.

The server 101 is configured to receive a video sent by the smart device 100, and distribute the video after receiving a video viewing demand. The servers 101 may be a group, multiple groups, or one or more types of servers.

The smart device 100 and the server 101 may be communicatively coupled via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.

Fig. 3 is a schematic diagram illustrating an application scenario of the second video processing, and as shown in fig. 3, the application scenario includes a smart device 300, a central control device 301, and a server 302.

Wherein, this smart machine 300 is various equipment in the intelligence house, for example install the robot of sweeping the floor of camera, install the intelligent audio amplifier of camera, install the intelligent lock etc. of camera. The intelligent device 300 collects video images through a camera, transmits the collected video images to the central control device 301, and sends the video images to the server 302 through the central control device 301;

the central control device 301 is a device for controlling the intelligent device 300, and fig. 4 exemplarily shows a first central control device for video processing in this application, where the central control device 301 includes a detector 3010, a controller 3011, a communicator 3012, a processor 3013, and the like; wherein:

the detector 3010 is used to detect various instructions triggered by the user;

the controller 3011 is configured to control the corresponding smart device 300 according to various instructions triggered by a user;

the communicator 3012 is configured to connect to the smart device 300, receive the video image sent by the smart device 300, and connect to the server 302, and send the video image processed by the processor 3013 to the server 302;

the processor 3013 is configured to process the received video image transmitted by the smart device 300, so as to prevent privacy of the user from being revealed during transmission of the video image to the server 302.

In this application, when the processor 3013 performs video processing on a video image transmitted by the smart device 300, the processor 3013 is specifically configured to:

in one possible implementation, the processor 3013 is further configured to:

In one possible implementation, the processor 3013 is specifically configured to:

In one possible implementation, the processor 3013 determines that the pixel point is a skin color pixel point by:

In one possible implementation, the processor 3013 is specifically configured to perform the padding processing in a manner that the pixel values of the R, G, B three channels take the same value.

The server 302 is configured to receive a video sent by the central control device 301, and distribute the video after receiving a video viewing demand. The servers 302 may be a group or groups of servers, and may be one or more types of servers.

Fig. 5 is a schematic diagram illustrating an application scenario of the third video processing, as shown in fig. 5, the application scenario includes: the smart device 500 and the server 501, and the smart device 500 and the server 501 perform data communication by a plurality of communication methods. This may allow the smart device 500 to be communicatively coupled via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.

The smart device 500 may be a mobile device with a function of capturing video images, such as a mobile phone, a computer, or a tablet.

Because the intelligent device becomes an important platform for collecting and releasing information by a user, the intelligent device comprises various application programs capable of collecting and/or releasing information, and the user can send one or a combination of characters, images and videos when releasing the information; and the published video information can be acquired in real time, such as scenes of goods selling live broadcast, game live broadcast and the like. When the intelligent device collects the video images, the video images exposed by the user wearing the intelligent device can be collected, at the moment, if the video images exposed by the user wearing the intelligent device are published, the privacy of the user is leaked, and at the moment, the video images are processed through a processor in the intelligent device.

A hardware configuration block diagram of the smart device 500 is exemplarily shown in fig. 6. As shown in fig. 5, the smart device 500 may include therein a tuning demodulator 510, a communicator 520, a detector 530, an external device interface 540, a controller 550, a memory 560, a user interface 565, a video processor 570, a display 575, an audio processor 580, an audio output interface 585, and a power supply 590.

The tuning demodulator 510 receives signals in a wired or wireless manner, may perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and is configured to demodulate audio and video signals carried in videos watched by a user from a plurality of wireless or wired signals.

The tuner demodulator 510 may receive signals in various ways, depending on the broadcasting system of the signals, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and the analog signal and the digital signal can be demodulated according to the different types of the received signals.

The communicator 520 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the smart device 500 transmits video data to an external device connected via the communicator 520, or browses and downloads video data from an external device connected via the communicator 520. The communicator 520 may include a network communication protocol module or a near field communication protocol module such as a WIFI module 521, a bluetooth communication protocol module 522, a wired ethernet communication protocol module 523, and the like, so that the communicator 520 may receive a control signal according to the control of the controller 550 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.

The detector 530 is a component of the smart device 500 for collecting signals of an external environment or interaction with the outside. The detector 530 may include a sound collector 531, such as a microphone, which may be used to receive the sound of the user, such as a voice signal of a control instruction of the user controlling the smart device 500; alternatively, ambient sounds may be collected for identifying the type of ambient scene, enabling the smart device 500 to adapt to ambient noise.

In some other exemplary embodiments, the detector 530 may further include an image collector 532, such as a camera, a video camera, etc., which may be used to collect the external environment scene; and for capturing video taken by the user.

In some other exemplary embodiments, the detector 530 may further include a light receiver for collecting the ambient light intensity to adapt to the display parameter variation of the smart device 500.

In some other exemplary embodiments, the detector 530 may further include a temperature sensor, such as by sensing an ambient temperature, and the smart device 500 may adaptively adjust a display color temperature of the image. For example, when the temperature is higher, the smart device 500 may be adjusted to display a cool color tone; when the temperature is in a low environment, the smart device 500 may be adjusted to display a warm color temperature tone of the image.

The external device interface 540 is a component for providing the controller 550 to control data transmission between the smart device 500 and an external device. The external device interface 540 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), etc. of the external apparatus.

The external device interface 540 may include: a High Definition Multimedia Interface (HDMI) terminal 541, a Composite Video Blanking Sync (CVBS) terminal 542, an analog or digital Component terminal 543, a Universal Serial Bus (USB) terminal 544, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like.

The controller 550 controls the operation of the smart device 500 and responds to the user's operations by running various software control programs (e.g., an operating system and various application programs) stored on the memory 560.

As shown in fig. 6, the controller 550 includes a Random Access Memory (RAM)551, a Read Only Memory (ROM)552, a graphics processor 553, a processor 554, a communication interface 555, and a communication bus 556. The RAM551, the ROM552, the graphics processor 553, and the processor 554 communication interface 555 are connected by a communication bus 556.

The ROM552 is used to store various system boot instructions. When the power-on signal is received, the smart device 500 starts to boot, and the processor 554 executes the system boot instruction in the ROM552 and copies the operating system stored in the memory 560 to the RAM551 to start running the boot operating system. After the boot of the operating system is completed, the processor 554 further copies various applications stored in the memory 560 to the RAM551, and then starts running and booting the various applications.

A graphics processor 553 for generating various graphic objects such as icons, operation menus, and user input instruction display graphics, etc. The graphic processor 553 may include an operator for performing an operation by receiving various interactive instructions input by a user, and then displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 575.

A processor 554 for executing operating system and application program instructions stored in memory 560. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.

In some demonstrative embodiments, processor 554 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. Processor 554 is configured to process video images captured by image capture device 532. Processor 554 is specifically configured to: aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;

in one possible implementation, the processor 554 is further configured to:

In one possible implementation, the processor 554 is specifically configured to:

In one possible implementation, processor 554 determines that a pixel is a skin tone pixel by:

In one possible implementation, the processor 554 is specifically configured to perform the padding process in a manner that R, G, B pixel values of three channels take the same value.

Communication interface 555 may include a first interface through an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.

The controller 550 may control the overall operation of the smart device 500. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 575, the controller 550 may perform an operation related to the object selected by the user input command.

Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to the object. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the smart device 500 or a voice command corresponding to a voice spoken by the user.

The memory 560 is used to store various types of data, software programs, or applications that drive and control the operation of the smart device 500. The memory 560 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 560, the RAM551 and the ROM555 of the controller 550, or a memory card in the smart device 500.

In some embodiments, the memory 560 is specifically configured to store an operating program that drives the controller 550 in the smart device 500; storing various applications built into the smart device 500 and downloaded by the user from an external device; data such as visual effect images for configuring various GUIs provided by the display 575, various objects related to the GUIs, and selectors for selecting GUI objects are stored.

In some embodiments, memory 560 is specifically configured to store drivers for tuner demodulator 510, communicator 550, detector 530, external device interface 540, video processor 570, display 575, audio processor 580, etc., and related data, such as external data (e.g., audio-visual data) received from the external device interface or user data (e.g., key information, voice information, touch information, etc.) received by the user interface. In some embodiments, memory 560 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.

The server 501 is configured to obtain a processed video issued by the smart device 500, and send the received video to other terminals after receiving a viewing demand. The servers 501 may be a group, a plurality of groups, or one or more types of servers. Other service data such as text data is provided through the server 501. It should be noted that, the three application scenes may further include an edge device, which is exemplified by the first scene, as shown in fig. 7, which is a schematic view of a fourth application scene provided in the embodiment of the present application, where the scene includes the smart device 100, the edge device 700, and the server 101, where the smart device 100 acquires a video image and transmits the acquired video image to the edge device 700, and the acquired video image is transmitted to the server 101 after being video-processed by the edge device 700, and a specific processing manner may refer to an implementation manner of processors in the three scenes, and is not described again in detail.

Similarly, for the second scenario, the central control device 201 may forward the video acquired from the intelligent device 200 to the edge device, and the edge device performs video processing on the video; for the third scenario, the smart device 500 may send the captured video to the edge device, and the edge device performs video processing. I.e. to ensure that the video containing the exposed area is not uploaded to the server.

In a possible application scenario, the technical solution provided in the embodiment of the present application may implement video processing by means of a deep learning technique in the field of AI (Artificial Intelligence).

The method of video processing provided by the exemplary embodiments of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, it should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect.

As shown in fig. 8, a flowchart of a method for video processing according to an embodiment of the present application includes the following steps:

step 800, aiming at any frame of image in the collected video stream, segmenting the human body region in the image through the trained human body region segmentation model to obtain a human body mask image which is displayed by the human body subregion in a distinguishing way.

In the application, the trained human body region segmentation model is obtained by training according to the image with the human body sub-region label.

In one possible implementation, the human segmentation model is trained by:

firstly, establishing training samples, wherein each training sample is an image containing the characteristics of the whole human body, namely the image contains the whole human body; dividing a human body region corresponding to human body characteristics into a plurality of human body subregions, and labeling each human body subregion to form a training sample for training a human body segmentation model;

for example, the body area is divided into 4 body sub-areas, which are a head, a hand, a trunk, and a leg; or divided into 5 human body sub-regions, head, hand, torso, thigh and shank.

And then, inputting the training sample into the human body segmentation model, training the human body segmentation model to obtain a trained human body segmentation model, and finally applying the trained human body segmentation model to the implementation of the video processing.

In the present application, the human body is divided into pixel-level divisions.

In the application, a deep learning algorithm is adopted to train a human body segmentation model, wherein a Mask-RCNN (Region-based Convolutional Neural Network) is mainly adopted to segment human body regions, and the segmentation accuracy of the method is higher than that of the traditional machine learning method.

Fig. 9 is a schematic diagram of a human body segmentation model provided in an embodiment of the present application, where the model is Mask-RCNN.

In the application, the Mask-RCNN has a branch for predicting each pixel type, a Network structure of FCN (full Convolutional neural Network) is adopted, an end-to-end Network is constructed by utilizing convolution and deconvolution, and finally each pixel is classified, so that a good segmentation effect is realized. The FCN can receive an input image with any size, the deconvolution layer is adopted to carry out upsampling on the feature map of the last volume base layer to restore the feature map to the same size of the input image, so that a prediction can be generated for each pixel, the spatial information in the original input image is kept, and finally the upsampled feature map is subjected to pixel classification.

After a human body region segmentation model is obtained through training, the human body region segmentation model obtained through training is applied to video processing, each frame of image in a video stream is input into the human body segmentation model, the model outputs a classified prediction label for each pixel point, namely the probability (between 0 and 1) that each pixel point is part of a human body, the probability sum of each pixel point being each class is 1, and finally the label with the maximum probability is selected as the final class of the pixel. Different types of pixels are set to different pixel values, and a segmentation mask of a human body region is obtained.

For example, the probability that one pixel point in the image is the head is 0.1, the probability of the hand is 0.8, the probability of the trunk is 0.1, and the probability of the leg is 0, at this time, the pixel point is determined to be the hand, at this time, the pixel point is set to be a pixel value corresponding to the hand, the same method is adopted to determine the category of the pixel point in the whole body area, the pixel value is set for the pixel point according to the category, and finally, the human body mask image which is displayed in a human body subregion in a distinguishing manner is obtained.

As shown in fig. 10, in the schematic diagram of a human body mask image provided in the embodiment of the present application, different regions corresponding to different letters in the diagram correspond to different categories, and represent different regions of a human body, that is, each region of the human body can be determined by the human body mask image.

For example, the region corresponding to the letter a in the human body mask image shown in fig. 10 represents the head, the region corresponding to the letter D represents the hand, the region corresponding to the letter B represents the torso, and the region corresponding to the letter C represents the leg, while the pixel values of the head, the hand, the torso, and the leg in the actual human body mask image are different, that is, different types are set to different pixel values.

Step 801, determining the skin color proportion of the target human body subregion based on the human body mask image.

Taking fig. 10 as an example for explanation, the human body sub-regions include four of a head, a hand, a trunk, and a leg, and if the skin color ratio is calculated for all of the four human body sub-regions, the calculation amount is relatively large. In addition, the region cannot be determined as an exposed region even if the head and hands are exposed too much in actual conditions, and the region can be determined as an exposed region if the leg and the trunk are exposed to a certain ratio.

In the application, the skin color ratio is the ratio between the number of skin color pixel points and the total pixel points in the target human body sub-region, and therefore whether the pixel points are skin color pixel points or not needs to be determined for each pixel point in the target human body sub-region. Namely, the number of skin color pixel points and the number of non-skin color pixel points in the target human body subregion are counted.

The distribution of skin tones in the color space is rather concentrated but affected by lighting and ethnic factors. In the present application, to reduce the influence of the illumination intensity on the skin color, the color space is converted from RGB to a color space with separated luminance and chrominance, i.e., YUV. The luminance component is discarded and the skin tones of different races do not vary much on the bicolor difference plane because the difference in skin tones is more manifested in luminance than chrominance. Firstly, converting image RGB data into YUV data, wherein the conversion formula is as follows:

Y＝0.299R+0.587G+0.114B

U＝-0.1687R-0.3313G+0.5B+128

V＝0.5R-0.4187G-0.0813B+128

in YUV space, U and V are two mutually orthogonal vectors on a plane. The chrominance information, i.e., the sum of U and V, is a two-dimensional vector. Each color corresponds to a chrominance signal vector, the saturation is Ch, the hue is represented by a phase angle theta, and at the moment, the hue theta of a pixel point is determined according to chrominance information U and V in YUV data through the following formula:

θ＝tan^-1(V/U)

in this application, whether adopting the scope of predetermineeing to detect the pixel and be the complexion pixel, when the tone of pixel satisfies following condition promptly, can regard this pixel as the complexion pixel:

105≤θ≤150

sequentially traversing each pixel point in the target human body subregion, and determining whether the tone of each pixel point accords with the formula range; if yes, determining the pixel point as a skin color pixel point, and adding 1 to a corresponding skin color pixel point number SkinNum counter; if not, determining the pixel point as a non-skin color pixel point, and adding 1 to a counter of the number unSkinNum corresponding to the corresponding non-skin color pixel point. The method is adopted to respectively complete the statistics of the pixel points of the trunk and the leg of the human body. And calculating the skin color proportion of the trunk and the skin color proportion of the legs by the following formulas:

ratio＝SkinNum/(SkinNum+UnSkinNum)

wherein, ratio is the skin color proportion of the target human body subregion, skinnnum is the sum of skin color pixel points in the target human body subregion, and Unskinnnum is the sum of non-skin color pixel points in the target human body subregion.

When determining whether the target human body subregion needs to be filled, determining whether the skin color proportion of the target human body subregion meets a preset condition, so that two numerical values, namely ratio, are preset in the application₁、ratio₂The range of values is 0 to 1. It is to be noted that the numerical value ratio₁、ratio₂Can be considered to adjust the setting, and ratio₁Ratio of less than or equal to₂。

The preset conditions are as follows:

when ratio is<ratio₁Then, determining the target human body subregion as a normal region;

when ratio is₁≤ratio≤ratio₂Then, the target is determinedThe human body subregion is a sexual feeling region; also belongs to the normal area;

when ratio is>ratio₂And then determining the target human body subregion as an exposure region.

And step 802, when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion.

In the application, when the skin color proportion of the target human body subregion is determined to be larger than the preset value, the target human body subregion is determined to be an exposed region, and in order to ensure the privacy of a user, filling processing is carried out on the target human body subregion.

In the present application, the filling process is performed using the same color in order to make the filled image irreversible. Firstly, determining pixel points in an exposure area based on a human body area segmentation mask, then taking a uniform fixed value for the pixel values of R, G, B three channels, filling the area into a mosaic, and blocking the exposure area.

In one possible implementation, the segmentation algorithm has a recognition inaccuracy problem at the edges of the detection region. If the pixel point is an exposed human privacy area, the problem that code printing of the exposed privacy area fails exists; or when the proportion of human skin color is low, only the privacy area may be exposed, and the algorithm may be determined to be normal in this case. Therefore, the privacy of the user is leaked, and for the problems of the two situations, a human body privacy area recognition algorithm, namely a privacy recognition model, is introduced into the application.

In the method, a privacy area of a human body area in an image is detected through a trained privacy recognition model aiming at any frame of image in an acquired video stream, wherein the trained privacy recognition model is generated according to image training with a privacy area label;

after a privacy area in the human body area is detected, determining and outputting position information of the privacy area;

It should be noted that the privacy recognition model is trained based on the convolutional neural network, and gives the corresponding confidence r and the position information (x) in real time_left，y_up) And (x)_right，y_down). After the privacy area is detected, the position information is recorded, and the filling processing is performed in real time in the same way as the filling processing of the exposed area, which is not described herein again.

In the method and the device, real-time coding filling processing is carried out on the privacy areas detected and identified and the marked exposure areas, live video exposure is prevented, attack leakage in the video stream transmission or storage process is prevented, and user privacy and platform safety are protected.

In a possible implementation manner, since the video stream includes a plurality of images, including images that do not include a portrait, at this time, if all the images in the video stream are input into the human body segmentation model and/or the privacy recognition model, the amount of calculation of the human body segmentation model and/or the privacy recognition model is increased, and therefore, before the images are input into the human body segmentation model and the privacy recognition model, whether the images include a portrait is detected, if the images include a portrait, the images are input into the human body segmentation model and/or the privacy recognition model, and otherwise, the images are directly saved and/or transmitted. As shown in fig. 11, an overall method flowchart of video processing provided in the embodiment of the present application includes the following steps:

step 1100, collecting video stream by a camera;

step 1101, inputting an image into a trained human body segmentation model aiming at any frame image in the acquired video stream, and acquiring a human body mask image which is displayed by distinguishing in a human body subregion;

step 1102, determining a skin color proportion of a target human body subregion based on the human body mask image;

1103, judging whether the skin color proportion is larger than a preset threshold value, if so, executing 1104, otherwise, executing 1107;

1104, filling the target human body subarea;

step 1105, detecting whether the human body region includes a privacy region through the trained privacy recognition model, if yes, executing step 1106, otherwise executing step 1107;

step 1106, performing filling processing on the privacy area;

step 1107, video transmission or save.

It should be noted that step 1101 and step 1105 may be executed simultaneously, or step 1101 may be executed first and then step 1105 may be executed, or step 1105 may be executed first and then step 1101 is executed, that is, it is ensured that both the privacy area and the exposure area in the finally saved or transmitted image are filled, so as to achieve the effect of protecting the privacy of the user.

Based on the same inventive concept, the embodiment of the present invention further provides a video processing apparatus, and as the apparatus corresponds to the video processing method of the embodiment of the present invention, and the principle of the apparatus for solving the problem is similar to the principle of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated. As shown in fig. 12, a block diagram of a video processing apparatus according to an embodiment of the present application is provided, where the apparatus includes: a segmentation module 1200, a determination module 1201, and a processing module 1202, wherein:

a segmentation module 1200, configured to segment a human body region in an image through a trained human body region segmentation model to obtain a human body mask image displayed differently in a human body subregion, where the trained human body region segmentation model is obtained by training an image with a human body subregion label;

a determining module 1201, configured to determine a skin color ratio of a target human body subregion based on the human body mask image;

and the processing module 1202 is configured to perform filling processing on the target human body sub-region when the skin color ratio of the target human body sub-region is greater than a preset value.

In a possible implementation manner, the apparatus further includes a detection module 1203, where:

a detection module 1203, configured to detect a privacy area of a human body area in an image through a trained privacy recognition model for any frame of image in an acquired video stream, and output location information of the privacy area, where the trained privacy recognition model is generated by image training with a privacy area label;

the processing module 1202 is configured to perform a filling process on the privacy area according to the location information of the privacy area.

In a possible implementation manner, the determining module 1201 is specifically configured to:

determining a target human body subregion based on the human body mask image;

In a possible implementation manner, the determining module 1201 determines that the pixel point is a skin color pixel point by:

converting the RGB data into YUV data;

In one possible implementation, the processing module 1202 is specifically configured to:

In one possible implementation, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method steps of video processing according to the present application.

In one possible implementation, the various aspects of the method for video processing provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps of the method for video processing according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. a method for video processing, characterized in that the method comprises:

For any frame of image in the collected video stream, the trained human body region segmentation model is used to segment the human body region in the image, so as to obtain a human body mask image that is displayed in different sub-regions of the human body, wherein the trained human body region is obtained. The human body region segmentation model is trained from images with human body sub-region annotations;

Based on the human body mask image, determine the skin color ratio of the target human body sub-region;

When the skin color ratio of the target human body sub-region is greater than a preset value, filling processing is performed on the target human body sub-region.

2. The method of claim 1, wherein the method further comprises:

For any frame of image in the collected video stream, use the trained privacy recognition model to detect the privacy area of the human body area in the image, and output the location information of the privacy area, wherein the trained privacy recognition model The model is generated by training images with private region annotations;

Filling the privacy area according to the location information of the privacy area.

3. The method according to claim 1, wherein, determining the skin color ratio of the target human body sub-region based on the human body mask image, comprising:

determining the target human sub-region based on the human body mask image;

Determine the number of skin color pixels in the sub-region of the target human body;

The skin color ratio of the target human sub-region is determined according to the number of the skin color pixels and the total number of pixels in the target human sub-region.

4. method as claimed in claim 3, is characterized in that, determining that pixel point is skin color pixel point by the following way:

Convert RGB data to YUV data;

For any pixel in the target body sub-region, determine the hue of the pixel according to the chromaticity information in the YUV data;

When the hue is within a preset range, the pixel is determined to be a skin color pixel.

5. The method according to claim 1 or 2, wherein the filling process comprises:

Filling is performed by adopting the way that the pixel values of the three channels R, G, and B take the same value.

6. A device for video processing, characterized in that the device is configured to execute the video processing method according to any one of claims 1-5.

7. A device for video processing, characterized in that the device comprises: a camera and a processor, wherein:

The camera is used to collect video streams;

The processor is configured to, for any frame of image in the collected video stream, segment the human body region in the image through the trained human body region segmentation model, so as to obtain a human body mask that is displayed differently by the human body sub-regions image; based on the human body mask image, determine the skin color ratio of the target human sub-region; when the skin color ratio of the target human sub-region is greater than the preset value, use red R, green G, blue for the target human sub-region Filling processing is performed in a manner that the pixel values of the three channels of B take the same value; wherein the trained human body region segmentation model is obtained by training based on images marked with human body sub-regions.

8. The apparatus of claim 7, wherein the processor is further configured to:

According to the location information of the privacy area, the privacy area is filled with the pixel values of the three channels of red R, green G, and blue B taking the same value.

9. A device for video processing, characterized in that the device comprises: a camera, a processor and a display, wherein:

The camera is used to collect video streams;

The processor is configured to, for any frame of image in the collected video stream, segment the human body region in the image through the trained human body region segmentation model, so as to obtain a human body mask that is displayed differently by the human body sub-regions image; based on the human body mask image, determine the skin color ratio of the target human sub-region; when the skin color ratio of the target human sub-region is greater than the preset value, use red R, green G, blue for the target human sub-region The pixel values of the three channels of B take the same value, and perform filling processing; wherein the trained human body region segmentation model is obtained by training according to the images marked with human body sub-regions;

The display is used for displaying the filled image.

10. The apparatus of claim 9, wherein the processor is further configured to: