[go: up one dir, main page]

CN113486693A - Video processing method and device - Google Patents

Video processing method and device Download PDF

Info

Publication number
CN113486693A
CN113486693A CN202010942908.XA CN202010942908A CN113486693A CN 113486693 A CN113486693 A CN 113486693A CN 202010942908 A CN202010942908 A CN 202010942908A CN 113486693 A CN113486693 A CN 113486693A
Authority
CN
China
Prior art keywords
human body
region
image
sub
skin color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010942908.XA
Other languages
Chinese (zh)
Inventor
孟祥奇
冯谨强
高雪松
陈维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Hisense Electronic Industry Holdings Co Ltd
Original Assignee
Qingdao Hisense Electronic Industry Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Hisense Electronic Industry Holdings Co Ltd filed Critical Qingdao Hisense Electronic Industry Holdings Co Ltd
Priority to CN202010942908.XA priority Critical patent/CN113486693A/en
Publication of CN113486693A publication Critical patent/CN113486693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种视频处理的方法及设备,用以保护用户隐私。本申请中,针对采集的视频流中的任一帧图像,通过已训练的人体区域分割模型,对图像中的人体区域进行分割,以得到人体子区域区别显示的人体蒙版图像,其中已训练的人体区域分割模型是根据带有人体子区域标注的图像训练得到的;基于人体蒙版图像,确定目标人体子区域的肤色比例;当目标人体子区域的肤色比例大于预设值时,对目标人体子区域进行填充处理。将图像中的人体区域切割成多个人体子区域,并根据人体子区域的肤色比例确定人体子区域是否为暴露区域,当人体子区域肤色比例大于预设值时,确定为暴露区域,并对该区域进行填充处理,防止用户隐私泄露。

Figure 202010942908

The present application provides a video processing method and device to protect user privacy. In this application, for any frame of image in the collected video stream, the trained human body region segmentation model is used to segment the human body region in the image, so as to obtain a human body mask image that is displayed in different sub-regions of the human body, in which the trained human body region is obtained. The human body region segmentation model is trained based on images with human body sub-region annotations; based on the human body mask image, the skin color ratio of the target human sub-region is determined; when the skin color ratio of the target human sub-region is greater than the preset value, the target The sub-region of the human body is filled. Cut the human body region in the image into multiple human body sub-regions, and determine whether the human body sub-region is an exposed region according to the skin color ratio of the human sub-region. This area is filled to prevent leakage of user privacy.

Figure 202010942908

Description

Video processing method and device
Technical Field
The application relates to the technical field of computers, and provides a video processing method and video processing equipment.
Background
Along with the popularization of smart homes, more and more smart devices are connected into the camera. The intelligent device shoots pictures or videos through the camera, the shot pictures or videos can be published to the network, and other users can watch the uploaded pictures or videos in the network.
When the camera captures a person image, if the user is exposed by wearing intentionally or unintentionally, the captured image also shows that the user is exposed by wearing. When the exposed image or video is distributed to a network, the appearance is not only influenced, but also the privacy of the user is leaked.
Disclosure of Invention
The embodiment of the application provides a video processing method and video processing equipment, which are used for protecting user privacy.
In a first aspect, an embodiment of the present application provides a method for video processing, where the method includes:
aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by human body subregion difference, wherein the trained human body region segmentation model is obtained by training according to the image with human body subregion marks;
determining the skin color proportion of a target human body subregion based on the human body mask image;
and when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion.
In the application, after a video stream is acquired, whether the image contains human body features or not is determined for any frame of image in the video stream, after the image contains the human body features, the image is input into a trained human body segmentation model, a human body region corresponding to the human body features in the image is segmented through the trained human body region segmentation model, and a human body mask image for displaying each human body subregion in a distinguishing mode is output. And determining the skin color proportion of the target human body subregion based on the human body mask image, determining the target human body subregion as an exposed region worn by the user when the skin color proportion of the target human body subregion is greater than a preset value, and filling the target human body subregion to shield the exposed region so as to protect the privacy of the user.
In one possible implementation manner, for any frame of image in a collected video stream, detecting a privacy area in a human body area through a trained privacy recognition model, and outputting position information of the privacy area, wherein the trained privacy recognition model is generated according to image training with a privacy area label;
and filling the privacy area according to the position information of the privacy area.
When filling the target human body subregion according to the skin color proportion, there can be pixel points that do not have the discernment simultaneously at the juncture of two adjacent human body subregions, if this pixel point is the pixel point in the privacy zone that exposes, then there is the condition that the privacy zone is unrecognized, finally lead to not filling the privacy zone and handle, or when the skin color proportion of target subregion is lower, but this target subregion is the privacy zone, still can not fill the processing to this target subregion, these two kinds of circumstances will lead to the privacy zone to expose, influence user privacy. Therefore, the method and the device further provide the steps that the image is input into the trained privacy recognition model to detect the privacy area, the privacy area is further filled, and the privacy of the user is guaranteed.
In one possible implementation, determining the skin color proportion of the target human body subregion based on the human body mask image includes:
determining a target human body subregion based on the human body mask image;
determining the number of skin color pixel points in a target human body subregion;
and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.
In the application, whether a target human body subregion is exposed or not is mainly determined according to the human body complexion, and the image exists in the form of pixel points, so that whether skin color pixel points or non-skin color pixel points are determined according to all the pixel points in the target human body subregion, the number of the skin color pixel points and the number of the non-skin color pixel points are determined, and the ratio of the skin color pixel points to the total number of the pixel points in the target human body subregion is determined and is the complexion proportion of the target human body subregion. A technical scheme for determining the skin color proportion in the target sub-region is provided, so that the skin color proportion of the target sub-region is accurately determined, and whether filling processing is needed or not is further determined according to the skin color proportion, so that the privacy of a user is protected.
In one possible implementation, the pixel point is determined to be a skin color pixel point by the following method:
converting the RGB data into YUV data;
aiming at any pixel point in the target human body subregion, determining the tone of the pixel point according to the chrominance information in the YUV data;
and when the tone is within the preset range, determining the pixel points as skin color pixel points.
In the application, a technical scheme for determining whether the pixel point is a skin color pixel point is provided, and because skin color is mainly influenced by chromaticity, RGB is converted into a YUV color space with separated brightness and chromaticity, whether the pixel point is the skin color pixel point is determined according to the converted chromaticity information, and the skin color pixel point in the target human body subregion is accurately identified so as to determine the skin color proportion of the target human body subregion.
In one possible implementation, a filling process is performed, including:
the filling process is performed in such a manner that the pixel values of R, G, B three channels take the same value.
In the application, in order to make the filled image irreversible, a mode that R, G, B three-channel pixel values take the same value is adopted for filling, so that the problem that user privacy is leaked due to malicious restoration of the processed image is further prevented.
In a second aspect, an embodiment of the present application provides a video processing device configured to execute a method of video processing provided by an embodiment of the present application.
In a third aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: camera and treater, wherein:
the camera is used for collecting video streams;
the processor is used for segmenting a human body region in the image through a trained human body region segmentation model aiming at any frame of image in the collected video stream so as to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion by adopting a mode that pixel values of three channels of red R, green G and blue B take the same value; the trained human body region segmentation model is obtained by training according to the image with the human body subregion labels.
In a fourth aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: camera, treater, display, wherein:
the camera is used for collecting video streams;
the processor is used for segmenting a human body region in the image through a trained human body region segmentation model aiming at any frame of image in the collected video stream so as to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion by adopting a mode that the pixel values of three channels of red R, green G and blue B take the same value; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;
and the display is used for displaying the image subjected to the filling processing.
In a fifth aspect, an embodiment of the present application provides an apparatus for video processing, where the apparatus includes: the device comprises a segmentation module, a determination module and a processing module, wherein:
the segmentation module is used for segmenting a human body region in an image through a trained human body region segmentation model aiming at any frame of image in the acquired video stream so as to obtain a human body mask image which is displayed by human body subregion difference, wherein the trained human body region segmentation model is obtained by training according to the image with human body subregion marks;
the determining module is used for determining the skin color proportion of the target human body subregion based on the human body mask image;
and the processing module is used for filling the target human body subregion when the skin color proportion of the target human body subregion is larger than a preset value.
In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored, and when executed by a processor, the computer instructions implement a method for video processing provided by the embodiment of the present application.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic view of a first application scenario provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an intelligent device for video processing according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a second application scenario provided in the embodiment of the present application;
fig. 4 is a schematic diagram of a central control device for video processing according to an embodiment of the present application;
fig. 5 is a schematic diagram of a third application scenario provided in the embodiment of the present application;
fig. 6 is a schematic diagram of another intelligent device for video processing according to an embodiment of the present application;
fig. 7 is a schematic diagram of a fourth application scenario provided in the embodiment of the present application;
fig. 8 is a flowchart of a method for video processing according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a human body region segmentation model according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a human body mask image according to an embodiment of the present application;
fig. 11 is a flowchart of an overall method of video processing according to an embodiment of the present application;
fig. 12 is a block diagram of a video processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solution and advantages of the present application more clearly and clearly understood, the technical solution in the embodiments of the present application will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
1. Artificial intelligence:
artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, collect knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
2. Edge end:
the edge terminal is an open platform integrating network, computing, storage and application core capabilities at one side close to an object or a data source, and provides nearest-end service nearby. The application program is initiated at the edge end to generate faster network service response, and the basic requirements of the industry in the aspects of real-time business, application intelligence, safety, privacy protection and the like are met.
3. Masking the image:
the mask is the outside of the selection frame (the inside of the selection frame is the selection area), and the mask image is a human body mask image in the application, and only the outline of a person is contained in the image at the moment.
The following briefly introduces the design concept of the embodiments of the present application.
With the development of internet technology, more and more intelligent devices begin to access cameras. The camera is used for gathering video, image etc. and smart machine publishes video, image etc. that the camera gathered.
However, in the process of capturing videos and images by the camera, the user wears the exposure intentionally or unintentionally, which has adverse effects on the user and causes privacy leakage of the user.
Therefore, the embodiment of the application provides a method and equipment for video processing, which are used for protecting the privacy of users. In the method, the user privacy exposure area in the video stream is identified based on the video stream collected by the camera, and the user privacy exposure area is filled.
Specifically, in the application, for any frame of image in a video stream acquired by a camera, a human body region in the image is segmented through a human body region segmentation model to obtain a human body mask image which is displayed by a human body subregion in a distinguishing manner, the skin color ratio of the target human body subregion is determined, and when the skin color ratio of the target human body subregion is greater than a preset value, the target human body subregion is an exposed user privacy region, and at the moment, the target human body subregion is filled, so that the user privacy is protected.
After introducing the design idea of the embodiment of the present application, an application scenario of the present application is briefly described.
Fig. 1 is a schematic diagram illustrating an application scenario of a first video processing. As shown in fig. 1, the application scenario includes a smart device 100 and a server 101.
The intelligent device 100 is a device including a camera in a security scene, and may also be an independent camera for acquiring various video images in the security scene; meanwhile, in order to prevent the privacy of the user from being revealed during the video transmission process, a processor for performing video processing is further provided in the smart device 100.
Fig. 2 illustrates a first intelligent device for video processing in the present application, where the intelligent device 100 includes a camera 1001, a processor 1002, and a data transmission module 1003. The camera 1001 is configured to capture a video image, the processor 1002 is configured to perform video processing on the video image captured by the camera 1001, and the data transmission module 1003 is configured to transmit the video image processed by the processor 1002 to the server 101.
In this application, when the processor 1002 performs video processing on a video image acquired by the camera 1001, the processor 1002 is specifically configured to:
aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;
in one possible implementation, the processor 1002 is further configured to:
detecting a privacy area of a human body area in an image through a trained privacy recognition model aiming at any frame of image in an acquired video stream, and outputting position information of the privacy area; filling the privacy area according to the position information of the privacy area; wherein the trained privacy recognition model is generated by image training with privacy zone labels.
In one possible implementation, the processor 1002 is specifically configured to:
determining the target human body subregion based on the human body mask image; determining the number of skin color pixel points in the target human body subregion; and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.
In one possible implementation, the processor 1002 determines that a pixel point is a skin color pixel point by:
converting the RGB data into YUV data; aiming at any pixel point in the target human body subregion, determining the color tone of the pixel point according to the chrominance information in the YUV data; and when the color tone is within a preset range, determining the pixel points as skin color pixel points.
In one possible implementation, the processor 1002 is specifically configured to perform the filling process in a manner that the pixel values of the R, G, B three channels take the same value.
The server 101 is configured to receive a video sent by the smart device 100, and distribute the video after receiving a video viewing demand. The servers 101 may be a group, multiple groups, or one or more types of servers.
The smart device 100 and the server 101 may be communicatively coupled via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.
Fig. 3 is a schematic diagram illustrating an application scenario of the second video processing, and as shown in fig. 3, the application scenario includes a smart device 300, a central control device 301, and a server 302.
Wherein, this smart machine 300 is various equipment in the intelligence house, for example install the robot of sweeping the floor of camera, install the intelligent audio amplifier of camera, install the intelligent lock etc. of camera. The intelligent device 300 collects video images through a camera, transmits the collected video images to the central control device 301, and sends the video images to the server 302 through the central control device 301;
the central control device 301 is a device for controlling the intelligent device 300, and fig. 4 exemplarily shows a first central control device for video processing in this application, where the central control device 301 includes a detector 3010, a controller 3011, a communicator 3012, a processor 3013, and the like; wherein:
the detector 3010 is used to detect various instructions triggered by the user;
the controller 3011 is configured to control the corresponding smart device 300 according to various instructions triggered by a user;
the communicator 3012 is configured to connect to the smart device 300, receive the video image sent by the smart device 300, and connect to the server 302, and send the video image processed by the processor 3013 to the server 302;
the processor 3013 is configured to process the received video image transmitted by the smart device 300, so as to prevent privacy of the user from being revealed during transmission of the video image to the server 302.
In this application, when the processor 3013 performs video processing on a video image transmitted by the smart device 300, the processor 3013 is specifically configured to:
aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;
in one possible implementation, the processor 3013 is further configured to:
detecting a privacy area of a human body area in an image through a trained privacy recognition model aiming at any frame of image in an acquired video stream, and outputting position information of the privacy area; filling the privacy area according to the position information of the privacy area; wherein the trained privacy recognition model is generated by image training with privacy zone labels.
In one possible implementation, the processor 3013 is specifically configured to:
determining the target human body subregion based on the human body mask image; determining the number of skin color pixel points in the target human body subregion; and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.
In one possible implementation, the processor 3013 determines that the pixel point is a skin color pixel point by:
converting the RGB data into YUV data; aiming at any pixel point in the target human body subregion, determining the color tone of the pixel point according to the chrominance information in the YUV data; and when the color tone is within a preset range, determining the pixel points as skin color pixel points.
In one possible implementation, the processor 3013 is specifically configured to perform the padding processing in a manner that the pixel values of the R, G, B three channels take the same value.
The server 302 is configured to receive a video sent by the central control device 301, and distribute the video after receiving a video viewing demand. The servers 302 may be a group or groups of servers, and may be one or more types of servers.
Fig. 5 is a schematic diagram illustrating an application scenario of the third video processing, as shown in fig. 5, the application scenario includes: the smart device 500 and the server 501, and the smart device 500 and the server 501 perform data communication by a plurality of communication methods. This may allow the smart device 500 to be communicatively coupled via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.
The smart device 500 may be a mobile device with a function of capturing video images, such as a mobile phone, a computer, or a tablet.
Because the intelligent device becomes an important platform for collecting and releasing information by a user, the intelligent device comprises various application programs capable of collecting and/or releasing information, and the user can send one or a combination of characters, images and videos when releasing the information; and the published video information can be acquired in real time, such as scenes of goods selling live broadcast, game live broadcast and the like. When the intelligent device collects the video images, the video images exposed by the user wearing the intelligent device can be collected, at the moment, if the video images exposed by the user wearing the intelligent device are published, the privacy of the user is leaked, and at the moment, the video images are processed through a processor in the intelligent device.
A hardware configuration block diagram of the smart device 500 is exemplarily shown in fig. 6. As shown in fig. 5, the smart device 500 may include therein a tuning demodulator 510, a communicator 520, a detector 530, an external device interface 540, a controller 550, a memory 560, a user interface 565, a video processor 570, a display 575, an audio processor 580, an audio output interface 585, and a power supply 590.
The tuning demodulator 510 receives signals in a wired or wireless manner, may perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and is configured to demodulate audio and video signals carried in videos watched by a user from a plurality of wireless or wired signals.
The tuner demodulator 510 may receive signals in various ways, depending on the broadcasting system of the signals, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and the analog signal and the digital signal can be demodulated according to the different types of the received signals.
The communicator 520 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the smart device 500 transmits video data to an external device connected via the communicator 520, or browses and downloads video data from an external device connected via the communicator 520. The communicator 520 may include a network communication protocol module or a near field communication protocol module such as a WIFI module 521, a bluetooth communication protocol module 522, a wired ethernet communication protocol module 523, and the like, so that the communicator 520 may receive a control signal according to the control of the controller 550 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.
The detector 530 is a component of the smart device 500 for collecting signals of an external environment or interaction with the outside. The detector 530 may include a sound collector 531, such as a microphone, which may be used to receive the sound of the user, such as a voice signal of a control instruction of the user controlling the smart device 500; alternatively, ambient sounds may be collected for identifying the type of ambient scene, enabling the smart device 500 to adapt to ambient noise.
In some other exemplary embodiments, the detector 530 may further include an image collector 532, such as a camera, a video camera, etc., which may be used to collect the external environment scene; and for capturing video taken by the user.
In some other exemplary embodiments, the detector 530 may further include a light receiver for collecting the ambient light intensity to adapt to the display parameter variation of the smart device 500.
In some other exemplary embodiments, the detector 530 may further include a temperature sensor, such as by sensing an ambient temperature, and the smart device 500 may adaptively adjust a display color temperature of the image. For example, when the temperature is higher, the smart device 500 may be adjusted to display a cool color tone; when the temperature is in a low environment, the smart device 500 may be adjusted to display a warm color temperature tone of the image.
The external device interface 540 is a component for providing the controller 550 to control data transmission between the smart device 500 and an external device. The external device interface 540 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), etc. of the external apparatus.
The external device interface 540 may include: a High Definition Multimedia Interface (HDMI) terminal 541, a Composite Video Blanking Sync (CVBS) terminal 542, an analog or digital Component terminal 543, a Universal Serial Bus (USB) terminal 544, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like.
The controller 550 controls the operation of the smart device 500 and responds to the user's operations by running various software control programs (e.g., an operating system and various application programs) stored on the memory 560.
As shown in fig. 6, the controller 550 includes a Random Access Memory (RAM)551, a Read Only Memory (ROM)552, a graphics processor 553, a processor 554, a communication interface 555, and a communication bus 556. The RAM551, the ROM552, the graphics processor 553, and the processor 554 communication interface 555 are connected by a communication bus 556.
The ROM552 is used to store various system boot instructions. When the power-on signal is received, the smart device 500 starts to boot, and the processor 554 executes the system boot instruction in the ROM552 and copies the operating system stored in the memory 560 to the RAM551 to start running the boot operating system. After the boot of the operating system is completed, the processor 554 further copies various applications stored in the memory 560 to the RAM551, and then starts running and booting the various applications.
A graphics processor 553 for generating various graphic objects such as icons, operation menus, and user input instruction display graphics, etc. The graphic processor 553 may include an operator for performing an operation by receiving various interactive instructions input by a user, and then displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 575.
A processor 554 for executing operating system and application program instructions stored in memory 560. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.
In some demonstrative embodiments, processor 554 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. Processor 554 is configured to process video images captured by image capture device 532. Processor 554 is specifically configured to: aiming at any frame of image in the collected video stream, segmenting a human body region in the image through a trained human body region segmentation model to obtain a human body mask image which is displayed by distinguishing in a human body subregion; determining the skin color proportion of a target human body subregion based on the human body mask image; when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion; the trained human body region segmentation model is obtained by training according to an image with human body subregion labels;
in one possible implementation, the processor 554 is further configured to:
detecting a privacy area of a human body area in an image through a trained privacy recognition model aiming at any frame of image in an acquired video stream, and outputting position information of the privacy area; filling the privacy area according to the position information of the privacy area; wherein the trained privacy recognition model is generated by image training with privacy zone labels.
In one possible implementation, the processor 554 is specifically configured to:
determining the target human body subregion based on the human body mask image; determining the number of skin color pixel points in the target human body subregion; and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.
In one possible implementation, processor 554 determines that a pixel is a skin tone pixel by:
converting the RGB data into YUV data; aiming at any pixel point in the target human body subregion, determining the color tone of the pixel point according to the chrominance information in the YUV data; and when the color tone is within a preset range, determining the pixel points as skin color pixel points.
In one possible implementation, the processor 554 is specifically configured to perform the padding process in a manner that R, G, B pixel values of three channels take the same value.
Communication interface 555 may include a first interface through an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.
The controller 550 may control the overall operation of the smart device 500. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 575, the controller 550 may perform an operation related to the object selected by the user input command.
Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to the object. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the smart device 500 or a voice command corresponding to a voice spoken by the user.
The memory 560 is used to store various types of data, software programs, or applications that drive and control the operation of the smart device 500. The memory 560 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 560, the RAM551 and the ROM555 of the controller 550, or a memory card in the smart device 500.
In some embodiments, the memory 560 is specifically configured to store an operating program that drives the controller 550 in the smart device 500; storing various applications built into the smart device 500 and downloaded by the user from an external device; data such as visual effect images for configuring various GUIs provided by the display 575, various objects related to the GUIs, and selectors for selecting GUI objects are stored.
In some embodiments, memory 560 is specifically configured to store drivers for tuner demodulator 510, communicator 550, detector 530, external device interface 540, video processor 570, display 575, audio processor 580, etc., and related data, such as external data (e.g., audio-visual data) received from the external device interface or user data (e.g., key information, voice information, touch information, etc.) received by the user interface. In some embodiments, memory 560 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.
The server 501 is configured to obtain a processed video issued by the smart device 500, and send the received video to other terminals after receiving a viewing demand. The servers 501 may be a group, a plurality of groups, or one or more types of servers. Other service data such as text data is provided through the server 501. It should be noted that, the three application scenes may further include an edge device, which is exemplified by the first scene, as shown in fig. 7, which is a schematic view of a fourth application scene provided in the embodiment of the present application, where the scene includes the smart device 100, the edge device 700, and the server 101, where the smart device 100 acquires a video image and transmits the acquired video image to the edge device 700, and the acquired video image is transmitted to the server 101 after being video-processed by the edge device 700, and a specific processing manner may refer to an implementation manner of processors in the three scenes, and is not described again in detail.
Similarly, for the second scenario, the central control device 201 may forward the video acquired from the intelligent device 200 to the edge device, and the edge device performs video processing on the video; for the third scenario, the smart device 500 may send the captured video to the edge device, and the edge device performs video processing. I.e. to ensure that the video containing the exposed area is not uploaded to the server.
In a possible application scenario, the technical solution provided in the embodiment of the present application may implement video processing by means of a deep learning technique in the field of AI (Artificial Intelligence).
The method of video processing provided by the exemplary embodiments of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, it should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect.
As shown in fig. 8, a flowchart of a method for video processing according to an embodiment of the present application includes the following steps:
step 800, aiming at any frame of image in the collected video stream, segmenting the human body region in the image through the trained human body region segmentation model to obtain a human body mask image which is displayed by the human body subregion in a distinguishing way.
In the application, the trained human body region segmentation model is obtained by training according to the image with the human body sub-region label.
In one possible implementation, the human segmentation model is trained by:
firstly, establishing training samples, wherein each training sample is an image containing the characteristics of the whole human body, namely the image contains the whole human body; dividing a human body region corresponding to human body characteristics into a plurality of human body subregions, and labeling each human body subregion to form a training sample for training a human body segmentation model;
for example, the body area is divided into 4 body sub-areas, which are a head, a hand, a trunk, and a leg; or divided into 5 human body sub-regions, head, hand, torso, thigh and shank.
And then, inputting the training sample into the human body segmentation model, training the human body segmentation model to obtain a trained human body segmentation model, and finally applying the trained human body segmentation model to the implementation of the video processing.
In the present application, the human body is divided into pixel-level divisions.
In the application, a deep learning algorithm is adopted to train a human body segmentation model, wherein a Mask-RCNN (Region-based Convolutional Neural Network) is mainly adopted to segment human body regions, and the segmentation accuracy of the method is higher than that of the traditional machine learning method.
Fig. 9 is a schematic diagram of a human body segmentation model provided in an embodiment of the present application, where the model is Mask-RCNN.
In the application, the Mask-RCNN has a branch for predicting each pixel type, a Network structure of FCN (full Convolutional neural Network) is adopted, an end-to-end Network is constructed by utilizing convolution and deconvolution, and finally each pixel is classified, so that a good segmentation effect is realized. The FCN can receive an input image with any size, the deconvolution layer is adopted to carry out upsampling on the feature map of the last volume base layer to restore the feature map to the same size of the input image, so that a prediction can be generated for each pixel, the spatial information in the original input image is kept, and finally the upsampled feature map is subjected to pixel classification.
After a human body region segmentation model is obtained through training, the human body region segmentation model obtained through training is applied to video processing, each frame of image in a video stream is input into the human body segmentation model, the model outputs a classified prediction label for each pixel point, namely the probability (between 0 and 1) that each pixel point is part of a human body, the probability sum of each pixel point being each class is 1, and finally the label with the maximum probability is selected as the final class of the pixel. Different types of pixels are set to different pixel values, and a segmentation mask of a human body region is obtained.
For example, the probability that one pixel point in the image is the head is 0.1, the probability of the hand is 0.8, the probability of the trunk is 0.1, and the probability of the leg is 0, at this time, the pixel point is determined to be the hand, at this time, the pixel point is set to be a pixel value corresponding to the hand, the same method is adopted to determine the category of the pixel point in the whole body area, the pixel value is set for the pixel point according to the category, and finally, the human body mask image which is displayed in a human body subregion in a distinguishing manner is obtained.
As shown in fig. 10, in the schematic diagram of a human body mask image provided in the embodiment of the present application, different regions corresponding to different letters in the diagram correspond to different categories, and represent different regions of a human body, that is, each region of the human body can be determined by the human body mask image.
For example, the region corresponding to the letter a in the human body mask image shown in fig. 10 represents the head, the region corresponding to the letter D represents the hand, the region corresponding to the letter B represents the torso, and the region corresponding to the letter C represents the leg, while the pixel values of the head, the hand, the torso, and the leg in the actual human body mask image are different, that is, different types are set to different pixel values.
Step 801, determining the skin color proportion of the target human body subregion based on the human body mask image.
Taking fig. 10 as an example for explanation, the human body sub-regions include four of a head, a hand, a trunk, and a leg, and if the skin color ratio is calculated for all of the four human body sub-regions, the calculation amount is relatively large. In addition, the region cannot be determined as an exposed region even if the head and hands are exposed too much in actual conditions, and the region can be determined as an exposed region if the leg and the trunk are exposed to a certain ratio.
In the application, the skin color ratio is the ratio between the number of skin color pixel points and the total pixel points in the target human body sub-region, and therefore whether the pixel points are skin color pixel points or not needs to be determined for each pixel point in the target human body sub-region. Namely, the number of skin color pixel points and the number of non-skin color pixel points in the target human body subregion are counted.
The distribution of skin tones in the color space is rather concentrated but affected by lighting and ethnic factors. In the present application, to reduce the influence of the illumination intensity on the skin color, the color space is converted from RGB to a color space with separated luminance and chrominance, i.e., YUV. The luminance component is discarded and the skin tones of different races do not vary much on the bicolor difference plane because the difference in skin tones is more manifested in luminance than chrominance. Firstly, converting image RGB data into YUV data, wherein the conversion formula is as follows:
Y=0.299R+0.587G+0.114B
U=-0.1687R-0.3313G+0.5B+128
V=0.5R-0.4187G-0.0813B+128
in YUV space, U and V are two mutually orthogonal vectors on a plane. The chrominance information, i.e., the sum of U and V, is a two-dimensional vector. Each color corresponds to a chrominance signal vector, the saturation is Ch, the hue is represented by a phase angle theta, and at the moment, the hue theta of a pixel point is determined according to chrominance information U and V in YUV data through the following formula:
θ=tan-1(V/U)
in this application, whether adopting the scope of predetermineeing to detect the pixel and be the complexion pixel, when the tone of pixel satisfies following condition promptly, can regard this pixel as the complexion pixel:
105≤θ≤150
sequentially traversing each pixel point in the target human body subregion, and determining whether the tone of each pixel point accords with the formula range; if yes, determining the pixel point as a skin color pixel point, and adding 1 to a corresponding skin color pixel point number SkinNum counter; if not, determining the pixel point as a non-skin color pixel point, and adding 1 to a counter of the number unSkinNum corresponding to the corresponding non-skin color pixel point. The method is adopted to respectively complete the statistics of the pixel points of the trunk and the leg of the human body. And calculating the skin color proportion of the trunk and the skin color proportion of the legs by the following formulas:
ratio=SkinNum/(SkinNum+UnSkinNum)
wherein, ratio is the skin color proportion of the target human body subregion, skinnnum is the sum of skin color pixel points in the target human body subregion, and Unskinnnum is the sum of non-skin color pixel points in the target human body subregion.
When determining whether the target human body subregion needs to be filled, determining whether the skin color proportion of the target human body subregion meets a preset condition, so that two numerical values, namely ratio, are preset in the application1、ratio2The range of values is 0 to 1. It is to be noted that the numerical value ratio1、ratio2Can be considered to adjust the setting, and ratio1Ratio of less than or equal to2
The preset conditions are as follows:
when ratio is<ratio1Then, determining the target human body subregion as a normal region;
when ratio is1≤ratio≤ratio2Then, the target is determinedThe human body subregion is a sexual feeling region; also belongs to the normal area;
when ratio is>ratio2And then determining the target human body subregion as an exposure region.
And step 802, when the skin color proportion of the target human body subregion is larger than a preset value, filling the target human body subregion.
In the application, when the skin color proportion of the target human body subregion is determined to be larger than the preset value, the target human body subregion is determined to be an exposed region, and in order to ensure the privacy of a user, filling processing is carried out on the target human body subregion.
In the present application, the filling process is performed using the same color in order to make the filled image irreversible. Firstly, determining pixel points in an exposure area based on a human body area segmentation mask, then taking a uniform fixed value for the pixel values of R, G, B three channels, filling the area into a mosaic, and blocking the exposure area.
In one possible implementation, the segmentation algorithm has a recognition inaccuracy problem at the edges of the detection region. If the pixel point is an exposed human privacy area, the problem that code printing of the exposed privacy area fails exists; or when the proportion of human skin color is low, only the privacy area may be exposed, and the algorithm may be determined to be normal in this case. Therefore, the privacy of the user is leaked, and for the problems of the two situations, a human body privacy area recognition algorithm, namely a privacy recognition model, is introduced into the application.
In the method, a privacy area of a human body area in an image is detected through a trained privacy recognition model aiming at any frame of image in an acquired video stream, wherein the trained privacy recognition model is generated according to image training with a privacy area label;
after a privacy area in the human body area is detected, determining and outputting position information of the privacy area;
and filling the privacy area according to the position information of the privacy area.
It should be noted that the privacy recognition model is trained based on the convolutional neural network, and gives the corresponding confidence r and the position information (x) in real timeleft,yup) And (x)right,ydown). After the privacy area is detected, the position information is recorded, and the filling processing is performed in real time in the same way as the filling processing of the exposed area, which is not described herein again.
In the method and the device, real-time coding filling processing is carried out on the privacy areas detected and identified and the marked exposure areas, live video exposure is prevented, attack leakage in the video stream transmission or storage process is prevented, and user privacy and platform safety are protected.
In a possible implementation manner, since the video stream includes a plurality of images, including images that do not include a portrait, at this time, if all the images in the video stream are input into the human body segmentation model and/or the privacy recognition model, the amount of calculation of the human body segmentation model and/or the privacy recognition model is increased, and therefore, before the images are input into the human body segmentation model and the privacy recognition model, whether the images include a portrait is detected, if the images include a portrait, the images are input into the human body segmentation model and/or the privacy recognition model, and otherwise, the images are directly saved and/or transmitted. As shown in fig. 11, an overall method flowchart of video processing provided in the embodiment of the present application includes the following steps:
step 1100, collecting video stream by a camera;
step 1101, inputting an image into a trained human body segmentation model aiming at any frame image in the acquired video stream, and acquiring a human body mask image which is displayed by distinguishing in a human body subregion;
step 1102, determining a skin color proportion of a target human body subregion based on the human body mask image;
1103, judging whether the skin color proportion is larger than a preset threshold value, if so, executing 1104, otherwise, executing 1107;
1104, filling the target human body subarea;
step 1105, detecting whether the human body region includes a privacy region through the trained privacy recognition model, if yes, executing step 1106, otherwise executing step 1107;
step 1106, performing filling processing on the privacy area;
step 1107, video transmission or save.
It should be noted that step 1101 and step 1105 may be executed simultaneously, or step 1101 may be executed first and then step 1105 may be executed, or step 1105 may be executed first and then step 1101 is executed, that is, it is ensured that both the privacy area and the exposure area in the finally saved or transmitted image are filled, so as to achieve the effect of protecting the privacy of the user.
Based on the same inventive concept, the embodiment of the present invention further provides a video processing apparatus, and as the apparatus corresponds to the video processing method of the embodiment of the present invention, and the principle of the apparatus for solving the problem is similar to the principle of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated. As shown in fig. 12, a block diagram of a video processing apparatus according to an embodiment of the present application is provided, where the apparatus includes: a segmentation module 1200, a determination module 1201, and a processing module 1202, wherein:
a segmentation module 1200, configured to segment a human body region in an image through a trained human body region segmentation model to obtain a human body mask image displayed differently in a human body subregion, where the trained human body region segmentation model is obtained by training an image with a human body subregion label;
a determining module 1201, configured to determine a skin color ratio of a target human body subregion based on the human body mask image;
and the processing module 1202 is configured to perform filling processing on the target human body sub-region when the skin color ratio of the target human body sub-region is greater than a preset value.
In a possible implementation manner, the apparatus further includes a detection module 1203, where:
a detection module 1203, configured to detect a privacy area of a human body area in an image through a trained privacy recognition model for any frame of image in an acquired video stream, and output location information of the privacy area, where the trained privacy recognition model is generated by image training with a privacy area label;
the processing module 1202 is configured to perform a filling process on the privacy area according to the location information of the privacy area.
In a possible implementation manner, the determining module 1201 is specifically configured to:
determining a target human body subregion based on the human body mask image;
determining the number of skin color pixel points in a target human body subregion;
and determining the skin color proportion of the target human body subregion according to the number of the skin color pixel points and the total number of the pixel points in the target human body subregion.
In a possible implementation manner, the determining module 1201 determines that the pixel point is a skin color pixel point by:
converting the RGB data into YUV data;
aiming at any pixel point in the target human body subregion, determining the tone of the pixel point according to the chrominance information in the YUV data;
and when the tone is within the preset range, determining the pixel points as skin color pixel points.
In one possible implementation, the processing module 1202 is specifically configured to:
the filling process is performed in such a manner that the pixel values of R, G, B three channels take the same value.
In one possible implementation, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method steps of video processing according to the present application.
In one possible implementation, the various aspects of the method for video processing provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps of the method for video processing according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1.一种视频处理的方法,其特征在于,该方法包括:1. a method for video processing, characterized in that the method comprises: 针对采集的视频流中的任一帧图像,通过已训练的人体区域分割模型,对所述图像中的人体区域进行分割,以得到人体子区域区别显示的人体蒙版图像,其中所述已训练的人体区域分割模型是根据带有人体子区域标注的图像训练得到的;For any frame of image in the collected video stream, the trained human body region segmentation model is used to segment the human body region in the image, so as to obtain a human body mask image that is displayed in different sub-regions of the human body, wherein the trained human body region is obtained. The human body region segmentation model is trained from images with human body sub-region annotations; 基于所述人体蒙版图像,确定目标人体子区域的肤色比例;Based on the human body mask image, determine the skin color ratio of the target human body sub-region; 当所述目标人体子区域的肤色比例大于预设值时,对所述目标人体子区域进行填充处理。When the skin color ratio of the target human body sub-region is greater than a preset value, filling processing is performed on the target human body sub-region. 2.如权利要求1所述的方法,其特征在于,该方法还包括:2. The method of claim 1, wherein the method further comprises: 针对采集的视频流中的任一帧图像,通过已训练的隐私识别模型,检测所述图像中人体区域的隐私区域,并输出所述隐私区域的位置信息,其中,所述已训练的隐私识别模型是根据带有隐私区域标注的图像训练生成的;For any frame of image in the collected video stream, use the trained privacy recognition model to detect the privacy area of the human body area in the image, and output the location information of the privacy area, wherein the trained privacy recognition model The model is generated by training images with private region annotations; 根据所述隐私区域的位置信息,对所述隐私区域进行填充处理。Filling the privacy area according to the location information of the privacy area. 3.如权利要求1所述的方法,其特征在于,所述基于所述人体蒙版图像,确定目标人体子区域的肤色比例,包括:3. The method according to claim 1, wherein, determining the skin color ratio of the target human body sub-region based on the human body mask image, comprising: 基于所述人体蒙版图像,确定所述目标人体子区域;determining the target human sub-region based on the human body mask image; 确定所述目标人体子区域中肤色像素点的数量;Determine the number of skin color pixels in the sub-region of the target human body; 根据所述肤色像素点的数量与所述目标人体子区域中像素点的总数量,确定所述目标人体子区域的肤色比例。The skin color ratio of the target human sub-region is determined according to the number of the skin color pixels and the total number of pixels in the target human sub-region. 4.如权利要求3所述的方法,其特征在于,通过如下方式确定像素点为肤色像素点:4. method as claimed in claim 3, is characterized in that, determining that pixel point is skin color pixel point by the following way: 将RGB数据转换为YUV数据;Convert RGB data to YUV data; 针对所述目标人体子区域中的任一像素点,根据所述YUV数据中的色度信息,确定所述像素点的色调;For any pixel in the target body sub-region, determine the hue of the pixel according to the chromaticity information in the YUV data; 当所述色调在预设范围内时,确定所述像素点为肤色像素点。When the hue is within a preset range, the pixel is determined to be a skin color pixel. 5.如权利要求1或2所述的方法,其特征在于,所述进行填充处理,包括:5. The method according to claim 1 or 2, wherein the filling process comprises: 采用R、G、B三个通道的像素值取相同值的方式,进行填充处理。Filling is performed by adopting the way that the pixel values of the three channels R, G, and B take the same value. 6.一种视频处理的设备,其特征在于,该设备配置为执行如权利要求1~5任一所述视频处理的方法。6. A device for video processing, characterized in that the device is configured to execute the video processing method according to any one of claims 1-5. 7.一种视频处理的设备,其特征在于,该设备包括:摄像头和处理器,其中:7. A device for video processing, characterized in that the device comprises: a camera and a processor, wherein: 所述摄像头,用于采集视频流;The camera is used to collect video streams; 所述处理器,用于针对采集的视频流中的任一帧图像,通过已训练的人体区域分割模型,对所述图像中的人体区域进行分割,以得到人体子区域区别显示的人体蒙版图像;基于所述人体蒙版图像,确定目标人体子区域的肤色比例;当所述目标人体子区域的肤色比例大于预设值时,对所述目标人体子区域采用红R、绿G、蓝B三个通道的像素值取相同值的方式,进行填充处理;其中所述已训练的人体区域分割模型是根据带有人体子区域标注的图像训练得到的。The processor is configured to, for any frame of image in the collected video stream, segment the human body region in the image through the trained human body region segmentation model, so as to obtain a human body mask that is displayed differently by the human body sub-regions image; based on the human body mask image, determine the skin color ratio of the target human sub-region; when the skin color ratio of the target human sub-region is greater than the preset value, use red R, green G, blue for the target human sub-region Filling processing is performed in a manner that the pixel values of the three channels of B take the same value; wherein the trained human body region segmentation model is obtained by training based on images marked with human body sub-regions. 8.如权利要求7所述的设备,其特征在于,所述处理器还用于:8. The apparatus of claim 7, wherein the processor is further configured to: 针对采集的视频流中的任一帧图像,通过已训练的隐私识别模型,检测所述图像中人体区域的隐私区域,并输出所述隐私区域的位置信息,其中,所述已训练的隐私识别模型是根据带有隐私区域标注的图像训练生成的;For any frame of image in the collected video stream, use the trained privacy recognition model to detect the privacy area of the human body area in the image, and output the location information of the privacy area, wherein the trained privacy recognition model The model is generated by training images with private region annotations; 根据所述隐私区域的位置信息,对所述隐私区域采用红R、绿G、蓝B三个通道的像素值取相同值的方式,进行填充处理。According to the location information of the privacy area, the privacy area is filled with the pixel values of the three channels of red R, green G, and blue B taking the same value. 9.一种视频处理的设备,其特征在于,该设备包括:摄像头、处理器和显示器,其中:9. A device for video processing, characterized in that the device comprises: a camera, a processor and a display, wherein: 所述摄像头,用于采集视频流;The camera is used to collect video streams; 所述处理器,用于针对采集的视频流中的任一帧图像,通过已训练的人体区域分割模型,对所述图像中的人体区域进行分割,以得到人体子区域区别显示的人体蒙版图像;基于所述人体蒙版图像,确定目标人体子区域的肤色比例;当所述目标人体子区域的肤色比例大于预设值时,对所述目标人体子区域采用红R、绿G、蓝B三个通道的像素值取相同值的方式,进行填充处理;其中所述已训练的人体区域分割模型是根据带有人体子区域标注的图像训练得到的;The processor is configured to, for any frame of image in the collected video stream, segment the human body region in the image through the trained human body region segmentation model, so as to obtain a human body mask that is displayed differently by the human body sub-regions image; based on the human body mask image, determine the skin color ratio of the target human sub-region; when the skin color ratio of the target human sub-region is greater than the preset value, use red R, green G, blue for the target human sub-region The pixel values of the three channels of B take the same value, and perform filling processing; wherein the trained human body region segmentation model is obtained by training according to the images marked with human body sub-regions; 所述显示器,用于显示进行填充处理后的图像。The display is used for displaying the filled image. 10.如权利要求9所述的设备,其特征在于,所述处理器还用于:10. The apparatus of claim 9, wherein the processor is further configured to: 针对采集的视频流中的任一帧图像,通过已训练的隐私识别模型,检测所述图像中人体区域的隐私区域,并输出所述隐私区域的位置信息,其中,所述已训练的隐私识别模型是根据带有隐私区域标注的图像训练生成的;For any frame of image in the collected video stream, use the trained privacy recognition model to detect the privacy area of the human body area in the image, and output the location information of the privacy area, wherein the trained privacy recognition model The model is generated by training images with private region annotations; 根据所述隐私区域的位置信息,对所述隐私区域采用红R、绿G、蓝B三个通道的像素值取相同值的方式,进行填充处理。According to the location information of the privacy area, the privacy area is filled with the pixel values of the three channels of red R, green G, and blue B taking the same value.
CN202010942908.XA 2020-09-09 2020-09-09 Video processing method and device Pending CN113486693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010942908.XA CN113486693A (en) 2020-09-09 2020-09-09 Video processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010942908.XA CN113486693A (en) 2020-09-09 2020-09-09 Video processing method and device

Publications (1)

Publication Number Publication Date
CN113486693A true CN113486693A (en) 2021-10-08

Family

ID=77932619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010942908.XA Pending CN113486693A (en) 2020-09-09 2020-09-09 Video processing method and device

Country Status (1)

Country Link
CN (1) CN113486693A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114268813A (en) * 2021-12-31 2022-04-01 广州方硅信息技术有限公司 Live broadcast picture adjusting method and device and computer equipment
CN115830690A (en) * 2022-12-22 2023-03-21 云控智行(上海)汽车科技有限公司 A traffic image desensitization method and device
CN116049865A (en) * 2021-10-27 2023-05-02 海信集团控股股份有限公司 A privacy protection method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1215618A2 (en) * 2000-12-14 2002-06-19 Eastman Kodak Company Image processing method for detecting human figures in a digital image
JP2017188771A (en) * 2016-04-05 2017-10-12 株式会社東芝 Image capturing system and image and video display method
CN107333055A (en) * 2017-06-12 2017-11-07 美的集团股份有限公司 Control method, control device, Intelligent mirror and computer-readable recording medium
CN109993212A (en) * 2019-03-06 2019-07-09 西安电子科技大学 Location privacy protection method in social network image sharing, social network platform
CN110334571A (en) * 2019-04-03 2019-10-15 复旦大学 A privacy protection method of millimeter wave image human body based on convolutional neural network
CN111640119A (en) * 2020-04-09 2020-09-08 北京邮电大学 Image processing method, processing device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1215618A2 (en) * 2000-12-14 2002-06-19 Eastman Kodak Company Image processing method for detecting human figures in a digital image
JP2017188771A (en) * 2016-04-05 2017-10-12 株式会社東芝 Image capturing system and image and video display method
CN107333055A (en) * 2017-06-12 2017-11-07 美的集团股份有限公司 Control method, control device, Intelligent mirror and computer-readable recording medium
CN109993212A (en) * 2019-03-06 2019-07-09 西安电子科技大学 Location privacy protection method in social network image sharing, social network platform
CN110334571A (en) * 2019-04-03 2019-10-15 复旦大学 A privacy protection method of millimeter wave image human body based on convolutional neural network
CN111640119A (en) * 2020-04-09 2020-09-08 北京邮电大学 Image processing method, processing device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王景中 等: ""基于比例特征的网络不良图像过滤算法研究"", 《计算机工程与科学》, vol. 38, no. 3, 31 March 2016 (2016-03-31), pages 515 *
董洪义 编著: "《深度学习之PyTorch物体检测实战》", 31 March 2020, 北京:机械工业出版社, pages: 135 - 138 *
陆玲 等著: "《图像目标分割方法》", 30 November 2016, 哈尔滨工程大学出版社, pages: 131 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049865A (en) * 2021-10-27 2023-05-02 海信集团控股股份有限公司 A privacy protection method, device, equipment and medium
CN114268813A (en) * 2021-12-31 2022-04-01 广州方硅信息技术有限公司 Live broadcast picture adjusting method and device and computer equipment
CN115830690A (en) * 2022-12-22 2023-03-21 云控智行(上海)汽车科技有限公司 A traffic image desensitization method and device

Similar Documents

Publication Publication Date Title
CN111739027B (en) Image processing method, device, equipment and readable storage medium
US20200344411A1 (en) Context-aware image filtering
TWI556639B (en) Techniques for adding interactive features to videos
CN104866323B (en) Unlocking interface generation method and device and electronic equipment
CN113610720B (en) Video denoising method and device, computer readable medium and electronic device
CN113709519B (en) A method and device for determining the blocked area of live broadcast
CN113645476B (en) Picture processing method and device, electronic equipment and storage medium
CN113486693A (en) Video processing method and device
US20230351604A1 (en) Image cutting method and apparatus, computer device, and storage medium
US20250245879A1 (en) Method of image processing, electronic device and storage medium
CN107665482A (en) Realize the video data real-time processing method and device, computing device of double exposure
CN111107264A (en) Image processing method, image processing device, storage medium and terminal
WO2022088834A1 (en) Dynamic photograph album generation method, server, display terminal and readable storage medium
US20250349044A1 (en) Image processing method, electronic device and storage medium
CN113794831B (en) Video shooting method, device, electronic equipment and medium
CN115348469A (en) Picture display method and device, video processing equipment and storage medium
CN113487497B (en) Image processing method, device and electronic device
CN112165631B (en) Media resource processing method and device, storage medium and electronic equipment
CN110378973B (en) Image information processing method, device, and electronic device
CN112995539B (en) Mobile terminal and image processing method
CN113706371A (en) Special effect checking method and device and electronic equipment
CN117221742B (en) Video processing method, device, equipment and storage medium
CN113450367A (en) Image processing method and device
CN117119316B (en) Image processing method, electronic device and readable storage medium
CN113704548B (en) Application program detection method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 266555, No. 218, Bay Road, Qingdao economic and Technological Development Zone, Shandong

Applicant after: Hisense Group Holding Co.,Ltd.

Address before: 218 Qianwangang Road, Qingdao Economic and Technological Development Zone, Shandong Province

Applicant before: QINGDAO HISENSE ELECTRONIC INDUSTRY HOLDING Co.,Ltd.

Country or region before: China

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211008