[go: up one dir, main page]

CN119296558A - Audio and video quality enhancement method, device, equipment and medium based on machine learning - Google Patents

Audio and video quality enhancement method, device, equipment and medium based on machine learning Download PDF

Info

Publication number
CN119296558A
CN119296558A CN202411815572.5A CN202411815572A CN119296558A CN 119296558 A CN119296558 A CN 119296558A CN 202411815572 A CN202411815572 A CN 202411815572A CN 119296558 A CN119296558 A CN 119296558A
Authority
CN
China
Prior art keywords
audio
video
data
preset
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411815572.5A
Other languages
Chinese (zh)
Inventor
张海焦
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haiwei Hengtai Intelligent Technology Co ltd
Original Assignee
Shenzhen Haiwei Hengtai Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Haiwei Hengtai Intelligent Technology Co ltd filed Critical Shenzhen Haiwei Hengtai Intelligent Technology Co ltd
Priority to CN202411815572.5A priority Critical patent/CN119296558A/en
Publication of CN119296558A publication Critical patent/CN119296558A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses an audio and video quality enhancement method, device, equipment and medium based on machine learning, relating to the technical field of audio and video quality enhancement, comprising the steps of determining audio and video data to be processed according to initial audio and video information to be processed and preset machine learning parameters if the initial audio and video information to be processed is received; determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprises audio processing data and video processing data, determining audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data is the audio processing data, and determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data is the video processing data. The playing effect of the audio and video is improved.

Description

Audio and video quality enhancement method, device, equipment and medium based on machine learning
Technical Field
The application relates to the technical field of audio and video quality enhancement, in particular to an audio and video quality enhancement method, device, equipment and medium based on machine learning.
Background
At present, along with the development of related technologies of audio and video, higher requirements are also put forward on an audio and video processing mode.
The traditional audio and video processing mode directly converts initial audio and video into an electric signal capable of being transmitted, and converts the electric signal into audio and video capable of being played when the audio and video are required to be played, so that the audio and video processing mode has great defects, and the phenomenon that the final playing effect is affected due to poor playing effect of the initial audio and video is caused by the fact that the initial audio and video is directly converted into the electric signal capable of being transmitted, namely the audio and video processing mode can affect the final playing effect due to poor playing effect of the initial audio and video, and further the audio and video playing effect is poor.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The application mainly aims to provide an audio and video quality enhancement method, device, equipment and medium based on machine learning, aiming at solving the technical problem of poor audio and video playing effect.
In order to achieve the above object, the present application provides a machine learning-based audio/video quality enhancement method, which includes:
If the initial audio and video information to be processed is received, determining the audio and video data to be processed according to the initial audio and video information and preset machine learning parameters;
Determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprise audio processing data and video processing data;
Determining audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data are the audio processing data;
and determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data are the video processing data.
In an embodiment, the preset machine learning parameters include a standard video parameter threshold and a standard audio parameter threshold, and the step of determining the audio-video data to be processed according to the initial audio-video information and the preset machine learning parameters includes:
When the initial audio-video information is audio information, detecting whether an audio parameter value in the audio information meets the standard audio parameter threshold value, and when the audio parameter value in the audio information does not meet the standard audio parameter threshold value, taking the audio data in the audio information as audio-video data to be processed;
When the initial audio and video information is video information, detecting whether a video parameter value in the video information meets the standard video parameter threshold, and when the video parameter value in the video information does not meet the standard video parameter threshold, taking video data in the video information as audio and video data to be processed.
In an embodiment, the preset machine learning database includes an audio coding transmission library and a video coding transmission library, and the step of determining audio/video processing data according to the audio/video data to be processed and the preset machine learning database includes:
Determining audio data in the audio-video data to be processed, determining a first coding transmission mode corresponding to the data characteristics of the audio data in the audio coding transmission library, and binding the first coding transmission mode and the audio data as audio processing data;
Determining video data in the audio and video data to be processed, determining a second coding transmission mode corresponding to the data characteristics of the video data in the video coding transmission library, and binding the second coding transmission mode and the video data as video processing data.
In an embodiment, the preset first optimization rule includes an audio noise reduction rule, an audio equalization rule, and an echo cancellation rule, and the step of determining audio quality enhancement data according to the audio processing data and the preset first optimization rule includes:
When the noise value of the audio processing data is larger than a preset audio noise reduction threshold value, noise reduction is carried out on the audio processing data based on the audio noise reduction rule to obtain noise reduction audio data, and when the balance value of the audio processing data is larger than a preset audio balance threshold value, audio balance is carried out on the noise reduction audio data based on the audio balance rule to obtain balance audio data;
when the echo value of the audio processing data is larger than a preset echo cancellation threshold value, echo cancellation is carried out on the balanced audio data based on the echo cancellation rule to obtain audio quality enhancement data;
And when the echo value of the audio processing data is smaller than or equal to a preset echo cancellation threshold value, taking the balanced audio data as audio quality enhancement data.
In an embodiment, the preset second optimization rule includes a video denoising rule, a contrast enhancement rule, a sharpening rule, and a resolution processing rule, and the step of determining video quality enhancement data according to the video processing data and the preset second optimization rule includes:
When the noise value of the video processing data is larger than a preset video denoising threshold value, denoising the video processing data based on the video denoising rule to obtain denoising video data;
When the contrast of the video processing data is smaller than or equal to a preset contrast enhancement threshold value, the noise reduction video data is used as video quality enhancement data;
and when the contrast of the video processing data is larger than a preset contrast enhancement threshold value, carrying out contrast adjustment on the noise reduction video data based on the contrast enhancement rule to obtain contrast adjustment data, and when the sharpening value of the contrast adjustment data is larger than a preset sharpening processing threshold value and the resolution of the contrast adjustment data is larger than a preset resolution processing threshold value, carrying out processing on the contrast adjustment data based on the sharpening processing rule and the resolution processing rule to obtain video quality enhancement data.
In an embodiment, the machine learning-based audio/video quality enhancement method further includes:
Acquiring an initial training audio and video, and determining a playing score corresponding to the initial training audio and video, wherein the playing score comprises a first score value which is directly played;
determining an audio noise reduction threshold value, an audio equalization threshold value and an echo cancellation threshold value corresponding to the audio data under the first grading value, and taking the audio noise reduction threshold value, the audio equalization threshold value and the echo cancellation threshold value as the basis for starting a preset first optimization rule;
Determining a video denoising threshold value, a contrast enhancement threshold value, a sharpening threshold value and a resolution processing threshold value corresponding to the video data under the first grading value, and taking the video denoising threshold value, the contrast enhancement threshold value, the sharpening threshold value and the resolution processing threshold value as the basis for starting a preset second optimization rule.
In an embodiment, the playing score further includes a second score value played after the transmission, and the step of determining the playing score corresponding to the initial training audio and video includes:
determining a first corresponding relation between the audio coding mode and the data characteristics of the audio data under the second scoring value, and taking the first corresponding relation as an audio coding transmission library in a preset machine learning database;
and determining a second corresponding relation between the video coding mode and the data characteristics of the video data under the second grading value, and taking the second corresponding relation as a video coding transmission library in a preset machine learning database.
In addition, in order to achieve the above object, the present application also provides an audio and video quality enhancement device based on machine learning, where the audio and video quality enhancement device based on machine learning includes:
The information acquisition module is used for determining the audio and video data to be processed according to the initial audio and video information and preset machine learning parameters if the initial audio and video information to be processed is received;
the type judging module is used for determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprise audio processing data and video processing data;
The first enhancement module is used for determining audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data are the audio processing data;
and the second enhancement module is used for determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data are the video processing data.
In addition, in order to achieve the aim, the application also provides an audio and video quality enhancement device based on machine learning, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is configured to realize the steps of the audio and video quality enhancement method based on machine learning.
In addition, in order to achieve the above object, the present application also proposes a medium, which is a computer-readable storage medium, on which a computer program is stored, the computer program implementing the steps of the machine learning-based audio/video quality enhancement method as described above when being executed by a processor.
The embodiment of the application provides an audio and video quality enhancement method based on machine learning, which comprises the steps of determining audio and video data to be processed according to initial audio and video information to be processed and preset machine learning parameters if the initial audio and video information to be processed is received, determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, determining audio and video processing data according to the audio processing data and a preset first optimization rule when the audio and video processing data is the audio processing data, determining audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data is the video processing data, determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data is the video processing data, determining audio and video data to be processed according to the initial audio and video information and the preset machine learning parameters, further processing the audio and video processing data and the preset machine learning database, finally determining audio and video processing data under different processing data types, determining audio quality enhancement data and a first optimization rule according to the audio and video processing data and a preset first optimization rule or a second optimization rule when the audio and video processing data and the preset audio and the video quality enhancement data are the initial audio and video data are not equal to each other, and the audio quality enhancement data can be directly transmitted according to the initial enhancement rule or the audio and video enhancement rule is not determined when the audio quality enhancement data and the first optimization rule is different from the initial enhancement rule, thereby improving the playing effect of the audio and video.
Drawings
Fig. 1 is a flowchart of a first embodiment of a machine learning-based audio/video quality enhancement method according to the present application;
FIG. 2 is a flow chart of an implementation of the machine learning-based audio/video quality enhancement method of the present application;
FIG. 3 is a flowchart of a second embodiment of the machine learning-based audio/video quality enhancement method of the present application;
FIG. 4 is a schematic block diagram of the machine learning-based audio/video quality enhancement device of the present application;
FIG. 5 is a schematic diagram of a hardware operating environment related to a device in the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
For a better understanding of the technical solution of the present application, the following detailed description will be given with reference to the drawings and the specific embodiments.
The existing audio and video processing mode is to directly convert the initial audio and video into an electrical signal capable of being transmitted, and convert the electrical signal into the audio and video capable of being played when the audio and video are required to be played, namely, directly process the audio and video in different environments or different time points into the electrical signal, so as to transmit the electrical signal to an area required to be played, further process the electrical signal into the audio and video signal capable of being played, the whole process does not consider the influence on the audio and video when the actual different environments or different time points and the internal signal are acquired, and the problem that the final playing effect is affected due to poor playing effect of the initial audio and video is low in contrast of the environment, namely, unclear information is shot under strong light, and noise of the sound signal is larger because of noisy environment sound.
Therefore, based on the defects of the above audio and video quality processing scheme, the audio and video quality enhancement method based on machine learning is provided. The method comprises the steps of determining audio and video data to be processed through initial audio and video information and preset machine learning parameters, further processing the audio and video data and determining audio and video processing data through a preset machine learning database, finally determining audio quality enhancement data through the audio processing data and a preset first optimization rule and/or determining video quality enhancement data according to the video processing data and a preset second optimization rule under different processing data types, avoiding the phenomenon that the initial audio and video is directly converted into an electric signal capable of being transmitted and the final playing effect is affected due to poor playing effect of the initial audio and video, and determining audio quality enhancement data through the audio processing data and the preset first optimization rule and/or determining video quality enhancement data according to the video processing data and the preset second optimization rule under different processing data types, so that the playing effect of the audio and video is improved.
It should be noted that, the execution body of the present embodiment may be a computing service device having functions of data processing, network communication, and program running, such as a tablet computer, a personal computer, a mobile phone, or a device, a controller, or the like capable of implementing the above functions. The present embodiment and the following embodiments will be described below with reference to a controller as an example.
Based on this, an embodiment of the present application provides an audio and video quality enhancement method based on machine learning, and referring to fig. 1, fig. 1 is a flowchart of a first embodiment of the audio and video quality enhancement method based on machine learning according to the present application.
Referring to fig. 1, the present application provides a machine learning-based audio/video quality enhancement method, and in a first embodiment of the machine learning-based audio/video quality enhancement method, the machine learning-based audio/video quality enhancement method includes:
step S10, if initial audio and video information to be processed is received, determining audio and video data to be processed according to the initial audio and video information and preset machine learning parameters;
Step S20, determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprise audio processing data and video processing data;
In this embodiment, when the initial audio/video information to be processed is received, the initial audio/video information is determined to determine whether the initial audio/video information needs to be processed, and the main determination mode is to determine whether the initial audio/video information needs to be quality-enhanced based on a preset machine learning parameter, so as to determine that the initial audio/video information is used as the audio/video data to be processed when the quality enhancement is required. The audio/video data to be processed refers to audio/video data to be subjected to quality enhancement, the initial audio/video information refers to audio/video data which is initially acquired or acquired, the preset machine learning parameters refer to parameters which are determined based on machine learning and need to be subjected to quality enhancement, if contrast is smaller than a certain value for the video data, the video data is determined to be subjected to quality enhancement, if noise is larger than a certain value for the audio data, the audio data is determined to be subjected to quality enhancement. After the audio and video data to be processed requiring quality enhancement is determined, audio and video processing data are determined according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprise audio processing data and video processing data, the preset machine learning database is a database determined based on machine learning, the database at least comprises an audio coding transmission database and a video coding transmission database, the audio coding transmission database is a database for defining audio coding and transmission, the video coding transmission database is a database for defining video coding and transmission, the audio processing data are audio data after coding transmission processing, the video processing data are video data after coding transmission processing, and further influence of the audio and video data on coding and transmission can be avoided so as to ensure playing effects of the audio and video.
Step S30, when the audio and video processing data are the audio processing data, determining audio quality enhancement data according to the audio processing data and a preset first optimization rule;
step S40, determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio/video processing data is the video processing data.
In this embodiment, after the audio and video processing data is determined, the audio processing data and the video processing data in the audio and video processing data are respectively processed, that is, the audio processing data and the video processing data are respectively processed according to a certain processing rule, so as to obtain respective corresponding quality enhancement data, for example, the audio quality enhancement data is determined based on the audio processing data and a preset first optimization rule, and/or the video quality enhancement data is determined based on the video processing data and a preset second optimization rule. The audio quality enhancement data refers to audio data after processing the data, the video quality enhancement data refers to video data after processing the data, the preset first optimization rule refers to rules for processing the audio processing data, such as an audio noise reduction rule, an audio equalization rule and an echo cancellation rule, the preset second optimization rule refers to rules for processing the video processing data, such as a video denoising rule, a contrast enhancement rule, a sharpening processing rule and a resolution processing rule, and finally the audio and video data after quality enhancement can be obtained so as to play based on the audio and video data, thereby ensuring the playing effect of the audio and video data. It should be noted that, the processes of processing based on the respective rules and processing based on the machine learning database may be changed in order, that is, the quality enhancement data of the audio and video may be obtained directly based on the audio and video data to be processed, and the quality enhancement data of the audio and video and the machine learning database may be encoded and transmitted to be used as the final quality enhancement data of the audio and video, so as to perform the subsequent playing process of the audio and video data, and further, from the aspects of data itself and transmission programming, the playing effect of the audio and video data is improved.
In an embodiment, referring to fig. 2, fig. 2 is a schematic flow chart of an implementation of the machine learning-based audio/video quality enhancement method of the present application, after an audio/video to be processed is obtained, it is determined whether the audio/video is required to be processed (quality enhancement processing), that is, a step of determining audio/video data to be processed according to the initial audio/video information and a preset machine learning parameter is performed, so as to preferentially determine improvement and optimization of transmission when it is determined that the processing is required, that is, determine audio/video processing data based on a preset machine learning database, so as to achieve improvement and optimization of transmission through improvement of coding transmission, and further optimize the data itself after the improvement and optimization of transmission are determined, that is, at this time, the audio/video data is processed based on respective rules, so as to improve defects of the audio/video data, and thus achieve the purpose of improving audio/video playing effects.
In this embodiment, an audio and video quality enhancement method based on machine learning is provided, where if initial audio and video information to be processed is received, the audio and video data to be processed is determined according to the initial audio and video information and preset machine learning parameters; the method comprises determining audio and video processing data according to the audio and video processing data and a preset machine learning database, determining audio and video processing data according to the audio and video processing data and a preset second optimization rule, determining audio quality enhancement data according to the audio processing data and the preset first optimization rule, determining video quality enhancement data according to the video processing data and a preset second optimization rule, determining audio and video processing data according to initial audio and video information and preset machine learning parameters, determining audio and video processing data according to the audio and video processing data and the preset machine learning database, determining audio quality enhancement data according to the preset first optimization rule and/or determining video quality enhancement data according to the video processing data and the preset second optimization rule under different processing data types, avoiding the phenomenon that the playing effect of initial audio and video is not good and the playing effect of the initial audio and video is not good, determining the playing effect of the initial audio and video can be performed under different processing data types, thereby improving the playing effect of the audio and video.
Further, based on the first embodiment of the present application, a second embodiment of the machine learning based audio/video quality enhancement method of the present application is provided, in this embodiment, the preset machine learning parameters include a standard video parameter threshold and a standard audio parameter threshold, and the step of determining the audio/video data to be processed according to the initial audio/video information and the preset machine learning parameters includes:
Step S11, when the initial audio-video information is audio information, detecting whether an audio parameter value in the audio information meets the standard audio parameter threshold value, and when the audio parameter value in the audio information does not meet the standard audio parameter threshold value, taking the audio data in the audio information as audio-video data to be processed;
Step S12, when the initial audio/video information is video information, detecting whether a video parameter value in the video information meets the standard video parameter threshold, and when the video parameter value in the video information does not meet the standard video parameter threshold, taking the video data in the video information as the audio/video data to be processed.
In this embodiment, when determining whether quality enhancement processing is required, the determination is made based on a standard video parameter threshold and a standard audio parameter threshold in preset machine learning parameters, where the standard video parameter threshold is a parameter of video data when user-defined quality enhancement processing is not required, and may be contrast, illumination intensity, etc. of the video data, for example, the contrast is outside a first area, and the illumination intensity is outside a first value, where it is determined that quality enhancement processing is required for the video data, the standard audio parameter threshold is a parameter of audio data when user-defined quality enhancement processing is not required, and may be noise, echo, etc. of the audio data, for example, noise is outside a certain second value, or echo is outside a third value, where it is determined that quality enhancement processing is required for the audio data. At this time, the initial audio-video information is divided into audio information containing audio data and video information containing video data for processing, when the audio information is audio information, the opportunity is to detect whether the audio parameter value in the audio information meets a standard audio parameter threshold, so that when the audio parameter value in the audio information does not meet the standard audio parameter threshold, the audio data in the audio information is used as audio-video data to be processed, wherein the audio parameter value refers to a relevant parameter of the audio data, can be a noise value, an echo value and the like, can be directly determined based on the audio parameter, does not meet the standard audio parameter threshold and refers to noise outside a certain second value or echo outside a third value, and when the audio parameter value is video information, the opportunity is to detect whether the video parameter value in the video information meets the standard video parameter threshold, so that when the video parameter value in the video information does not meet the standard video parameter threshold, the video data in the video information is used as audio-video data to be processed, wherein the video parameter value refers to a relevant parameter of the video data, can be contrast, illumination intensity and the like, can be directly determined based on the video parameter, and the condition that the video parameter value does not meet the standard video parameter refers to contrast is outside a first value and outside a first illumination intensity. To determine whether quality enhancement processing is required, thereby ensuring the effect of subsequent audio and video playing.
Further, the preset machine learning database includes an audio coding transmission library and a video coding transmission library, and the step of determining audio/video processing data according to the audio/video data to be processed and the preset machine learning database includes:
step S21, determining audio data in the audio-video data to be processed, determining a first coding transmission mode corresponding to the data characteristics of the audio data in the audio coding transmission library, and binding the first coding transmission mode and the audio data as audio processing data;
step S22, determining video data in the audio and video data to be processed, determining a second coding transmission mode corresponding to the data characteristics of the video data in the video coding transmission library, and binding the second coding transmission mode and the video data as video processing data.
In this embodiment, when the audio-video data to be processed is encoded and transmitted, the audio data (data related to audio playing) in the audio-video data to be processed is determined respectively, and then, on one hand, a first encoding transmission mode corresponding to the data characteristics of the audio data is determined in an audio encoding transmission library, and the first encoding transmission mode and the audio data are bound to be used as audio processing data, wherein the audio encoding transmission library refers to a database defining different encoding and transmission modes corresponding to the audio characteristics, the first encoding transmission mode refers to an encoding and transmission mode matched with the data characteristics of the audio data, the audio characteristics refer to characteristics of the audio data, such as wavelength, frequency and the like of the audio data, and on the other hand, a second encoding transmission mode corresponding to the data characteristics of the video data is determined in a video encoding transmission library, and the second encoding transmission mode and the video data are bound to be used as video processing data, the video encoding transmission mode refers to a database defining different encoding and transmission modes corresponding to the video characteristics, the second encoding transmission mode refers to a database matched with the data characteristics of the video data, and the audio characteristics refer to the characteristics of the audio data, such as wavelength, frequency and the audio characteristics of the audio data can be improved, and the audio-video data can be encoded and transmitted according to the audio characteristics and the audio characteristics.
In an embodiment, on one hand, proper coding settings and algorithms are selected for audio and video features of different audio and video data, so as to improve compression efficiency and transmission quality of the audio and video data. Redundancy and distortion of audio and video data can be reduced through coding optimization, and data transmission efficiency and viewing experience are improved. On the other hand, advanced transmission technology and protocols such as real-time transport protocol (RTP) and real-time streaming protocol (RTSP) are utilized to improve the transmission speed and stability of audio and video data. The delay and the packet loss rate of the audio and video data can be reduced through transmission optimization, and the real-time performance and the reliability of the data are improved.
Further, based on the first embodiment and/or the second embodiment of the present application, a third embodiment of the machine learning based audio/video quality enhancement method of the present application is provided, in this embodiment, the step S30 includes an audio noise reduction rule, an audio equalization rule, and an echo cancellation rule, and the step of determining audio quality enhancement data according to the audio processing data and the preset first optimization rule includes:
Step S31, when the noise value of the audio processing data is larger than a preset audio noise reduction threshold, noise reduction is carried out on the audio processing data based on the audio noise reduction rule to obtain noise reduction audio data, and when the balance value of the audio processing data is larger than a preset audio balance threshold, audio balance is carried out on the noise reduction audio data based on the audio balance rule to obtain balance audio data;
Step S32, when the echo value of the audio processing data is larger than a preset echo cancellation threshold value, echo cancellation is carried out on the balanced audio data based on the echo cancellation rule to obtain audio quality enhancement data;
and step S33, when the echo value of the audio processing data is smaller than or equal to a preset echo cancellation threshold value, taking the balanced audio data as audio quality enhancement data.
In this embodiment, when processing audio processing data, the audio processing data is processed according to an audio noise reduction rule, an audio equalization rule and an echo cancellation rule, where the audio noise reduction rule refers to a defined rule for reducing noise of the audio data, and may be a common audio noise reduction manner, the audio equalization rule refers to a defined rule for performing audio equalization of the audio data, and may be a common audio equalization manner, and the echo cancellation rule refers to a defined rule for performing echo cancellation of the audio data, and may be a common echo cancellation manner. When the noise value of the audio processing data is greater than a preset audio noise reduction threshold, noise reduction is performed on the audio processing data based on an audio noise reduction rule to obtain noise reduction audio data, and when the balance value of the audio processing data is greater than the preset audio balance threshold, audio balance is performed on the noise reduction audio data based on the audio balance rule to obtain balance audio data, wherein the preset audio noise reduction threshold is the optimal noise value of the audio processing data defined by a user based on machine learning, if the noise value exceeds the preset audio noise reduction threshold A, the noise reduction audio data is the audio data after noise reduction, the preset audio balance threshold is the optimal balance value of the audio processing data defined by the user based on machine learning, if the noise value exceeds the preset audio balance threshold B, the audio balance is determined to be required, the balance audio data is the noise reduction and the audio data after the audio balance, it is worth noting that the noise value of the audio processing data and the echo value of the audio processing data can be obtained based on a conventional mode, and the echo value of the audio processing data can be obtained without limitation. And finally, when the echo value of the audio processing data is smaller than or equal to a preset echo cancellation threshold value, namely no echo influence exists at the moment, the balanced audio data can be directly used as audio quality enhancement data, otherwise, when the echo value of the audio processing data is larger than the preset echo cancellation threshold value, echo cancellation is carried out on the balanced audio data based on the echo cancellation rule to obtain the audio quality enhancement data, the preset echo cancellation threshold value refers to the optimal echo cancellation echo value defined by a user based on machine learning, if the echo cancellation threshold value C exceeds the preset echo cancellation threshold value C, the echo cancellation is determined to be needed, and the audio quality enhancement data refers to the audio data after echo cancellation, noise reduction and audio equalization. It should be noted that the three processing flows may be sequentially exchanged, which is not limited herein.
In an embodiment, the audio noise reduction is to remove noise in the audio by using a filter or other methods, so that the noise of the audio data may be from environmental noise, device noise, etc. within a range required by the preset audio noise reduction threshold. Through noise reduction processing, the audio can be clearer and purer. The audio equalization refers to adjusting the frequency response characteristic of the audio to make the audio reach equalization on different frequencies, and the audio equalization can improve the tone quality and the hearing feel of the audio, so that the audio is more pleasant and comfortable. In audio communication, echo is a common problem, and echo cancellation technology can remove echo components in audio, and by analyzing echo characteristics in audio signals and canceling or weakening the echo characteristics, the definition and conversation quality of the audio can be improved. So as to ensure the audio playing effect in the above processing mode.
In an embodiment, the preset second optimization rule includes a video denoising rule, a contrast enhancement rule, a sharpening rule, and a resolution processing rule, and the step of determining video quality enhancement data according to the video processing data and the preset second optimization rule includes:
Step S41, when the noise value of the video processing data is larger than a preset video denoising threshold value, denoising the video processing data based on the video denoising rule to obtain denoising video data;
Step S42, when the contrast of the video processing data is smaller than or equal to a preset contrast enhancement threshold, the noise reduction video data is used as video quality enhancement data;
step S43, when the contrast of the video processing data is greater than a preset contrast enhancement threshold, performing contrast adjustment on the noise reduction video data based on the contrast enhancement rule to obtain contrast adjustment data, and when the sharpening value of the contrast adjustment data is greater than a preset sharpening threshold and the resolution of the contrast adjustment data is greater than a preset resolution processing threshold, performing processing on the contrast adjustment data based on the sharpening rule and the resolution processing rule to obtain video quality enhancement data.
In this embodiment, when processing video processing data, the video processing data is processed according to a video denoising rule, a contrast enhancement rule, a sharpening processing rule and a resolution processing rule, where the video denoising rule refers to a defined rule for denoising video data, and may be a common video denoising method, the contrast enhancement rule refers to a defined rule for performing contrast adjustment on video data, and may be a common contrast adjustment method, the sharpening processing rule refers to a defined rule for performing sharpening processing on video data, and may be a common sharpening processing method, and the resolution processing rule refers to a defined rule for performing resolution processing on video data, and may be a common resolution processing method. When the noise value of the video processing data is larger than a preset video denoising threshold value, denoising the video processing data based on a video denoising rule to obtain denoising video data, and performing subsequent processing based on the contrast of the audio processing data, wherein the preset video denoising threshold value is the optimal video denoising noise value defined by a user based on machine learning, if the noise value exceeds a preset video denoising threshold value D, denoising is determined to be needed, and denoising video data is the video data after denoising. At the moment, the contrast processing is required to be performed after the noise reduction is defined, so that the accuracy of the whole data can be ensured, and the accuracy of the subsequent video data processing is further improved. At this time, if the contrast of the video processing data is less than or equal to a preset contrast enhancement threshold, the noise reduction video data is used as video quality enhancement data, and the contrast enhancement threshold is an optimal contrast value defined by a user based on machine learning, if the contrast exceeds a preset contrast enhancement threshold E, it is determined that contrast adjustment, typically contrast enhancement, is required.
On the other hand, when the contrast of the video processing data is greater than a preset contrast enhancement threshold, performing contrast adjustment on the noise-reduced video data based on a contrast enhancement rule to obtain contrast adjustment data, and when the sharpening value of the contrast adjustment data is greater than a preset sharpening threshold and the resolution of the contrast adjustment data is greater than a preset resolution processing threshold, performing processing on the contrast adjustment data based on the sharpening rule and the resolution processing rule to obtain video quality enhancement data, wherein the preset sharpening threshold refers to an optimal sharpening value defined by a user based on machine learning, if the preset sharpening threshold is exceeded, determining that sharpening is required, and if the preset resolution processing threshold is exceeded, determining that resolution processing is required, and when the sharpening value of the contrast adjustment data is greater than the preset sharpening threshold, performing processing on the video quality enhancement data, namely, performing noise value of the video processing data and contrast of the video processing data, and performing processing on the video quality enhancement data, wherein the sharpening value and the resolution of the video processing data can be obtained in a non-restricted order based on a conventional manner.
In an embodiment, the video denoising rule can utilize a filter or other methods to reduce noise interference in an image, noise is usually caused by electromagnetic interference, signal attenuation and other factors in the signal transmission process, and by using an advanced denoising algorithm and the filter, the noise can be effectively restrained and removed, so that the image is clearer and finer. The contrast enhancement rule refers to adjusting the brightness and contrast of the video to make the image more vivid and lively, and the contrast refers to the brightness change degree in the image, and by optimizing the contrast, the dynamic range of the image can be increased, and the detail and texture presentation of the image can be improved. The sharpening rule is that the image is sharper by increasing the contrast of the edge and the detail, and the sharpening process is helpful to highlight the detail part in the image, so that the picture is clearer. The resolution processing rule is to enhance the low-resolution video to a higher resolution by advanced technologies such as deep learning, and the super-resolution processing can recover more image details and improve the definition of the video. It should be noted that, before or after the video denoising rule (before judging the contrast), the video data may be defogged, that is, the haze video may be processed, so as to reduce the interference of haze, improve the definition of the image, and the defogging algorithm usually analyzes the fog component in the image and removes or weakens the fog component, thereby recovering the clear image. The motion compensation processing can be carried out on video data, the motion track is analyzed through an algorithm aiming at the video with rapid motion, the image blurring is corrected, and the motion compensation technology can compensate the image blurring and distortion generated by motion, so that the picture is smoother and clearer. So as to ensure the video playing effect in the above processing mode.
Further, based on the first embodiment, the second embodiment, and/or the third embodiment of the present application, a fourth embodiment of the machine learning-based audio/video quality enhancement method of the present application is provided, in this embodiment, referring to fig. 3, fig. 3 is a schematic flow diagram of the second embodiment of the machine learning-based audio/video quality enhancement method of the present application, where the machine learning-based audio/video quality enhancement method further includes:
Step S50, acquiring initial training audios and videos, and determining play scores corresponding to the initial training audios and videos, wherein the play scores comprise first score values for direct play;
Step S60, determining an audio noise reduction threshold value, an audio equalization threshold value and an echo cancellation threshold value corresponding to the audio data under the first grading value, and taking the audio noise reduction threshold value, the audio equalization threshold value and the echo cancellation threshold value as the basis for starting a preset first optimization rule;
Step S70, determining a video denoising threshold value, a contrast enhancement threshold value, a sharpening threshold value and a resolution processing threshold value corresponding to the video data under the first grading value, and taking the video denoising threshold value, the contrast enhancement threshold value, the sharpening threshold value and the resolution processing threshold value as the basis for starting a preset second optimization rule.
In this embodiment, in the machine learning stage, the basis of the subsequent audio and video processing needs to be performed and the basis of how to process the audio and video are determined based on a plurality of initial training audio and video, where the initial training audio and video refers to a plurality of different audio and video data, or audio and video data after a plurality of processing of one audio and video data, for example, processing the audio and video data to different degrees of noise reduction, contrast, compiling and the like to obtain a plurality of audio and video data, so as to collect the score corresponding to each audio and video data by a user and the score of the audio and video data by the controller of the user, and further, two score values are used as play scores, and two score values are collected based on different proportions to obtain a final play score, and at this time, the first score value of directly playing the audio and video, that is the score value directly played without transmission, at this time, the influence of the own data on the play effect can be determined. And then, performing targeted processing on the audio data and the video data, when the audio data is the audio data, determining an audio noise reduction threshold value, an audio equalization threshold value and an echo cancellation threshold value corresponding to the audio data under a first grading value, taking the audio noise reduction threshold value, the audio equalization threshold value and the echo cancellation threshold value as the basis for starting a preset first optimization rule, at this time, determining the audio noise reduction threshold value, the audio equalization threshold value and the echo cancellation threshold value corresponding to the audio data under the highest first grading value, and further, determining whether the preset first optimization rule needs to be started for optimizing the audio data or not in the subsequent determination. When the video data is the video data, determining a video denoising threshold value, a contrast enhancement threshold value, a sharpening threshold value and a resolution processing threshold value corresponding to the video data under a first grading value, further taking the above threshold values as the basis for starting a preset second optimization rule, determining the video denoising threshold value, the contrast enhancement threshold value, the sharpening threshold value and the resolution processing threshold value corresponding to the video data under the highest first grading value, further determining whether the preset second optimization rule needs to be started or not to optimize the video data, at the moment, determining four threshold values respectively, and further performing quality enhancement processing on the audio and video data based on the threshold values, so as to ensure the audio and video playing effect.
Further, the playing score further includes a second score value played after transmission, and after the step of determining the playing score corresponding to the initial training audio and video, the method includes:
step S51, determining a first corresponding relation between the audio coding mode and the data characteristics of the audio data under the second scoring value, and taking the first corresponding relation as an audio coding transmission library in a preset machine learning database;
step S52, determining a second corresponding relation between the video coding mode and the data feature of the video data under the second score value, and using the second corresponding relation as a video coding transmission library in a preset machine learning database.
In this embodiment, in addition to the effect of the own data on the playing effect, the effect of transmission and encoding on the audio/video playing is also considered. The playing score further comprises a second score value played after transmission, and further the audio coding mode and the transmission mode which are uniquely corresponding to the data features of different audio data can be determined by determining a first corresponding relation between the audio coding mode and the data features of the audio data under the second score value and using the first corresponding relation as an audio coding transmission library in a preset machine learning database, namely determining the first corresponding relation between the audio coding mode and the data features of the audio data under the highest second score value, and further the coding and transmission modes are selected in subsequent selection so as to reduce the influence of coding and transmission on audio playing. When the video data is the video data, determining a second corresponding relation between the video coding mode and the data characteristics of the video data under a second grading value, and taking the second corresponding relation as a video coding transmission library in a preset machine learning database, namely determining the second corresponding relation between the video coding mode and the data characteristics of the video data under the highest second grading value, so that the video coding modes and the transmission modes which are uniquely corresponding to the data characteristics of different video data can be determined, and further, the coding and the transmission modes are selected in subsequent selection, so that the influence of coding and transmission on video playing is reduced.
It should be noted that the foregoing examples are only for understanding the present application, and do not constitute a limitation of the machine learning-based audio/video quality enhancement method of the present application, and that many forms of simple transformation based on this technical concept are within the scope of the present application.
The application also provides an audio and video quality enhancement device based on machine learning, referring to fig. 4, the audio and video quality enhancement device based on machine learning comprises:
The information acquisition module 10 is configured to determine audio and video data to be processed according to the initial audio and video information and preset machine learning parameters if the initial audio and video information to be processed is received;
The type judging module 20 is configured to determine audio and video processing data according to the audio and video data to be processed and a preset machine learning database, where the audio and video processing data includes audio processing data and video processing data;
The first obstacle avoidance module 30 is configured to determine, when the audio and video processing data is the audio processing data, audio quality enhancement data according to the audio processing data and a preset first optimization rule;
and the second obstacle avoidance module 40 is configured to determine video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data is the video processing data.
The audio and video quality enhancement device based on machine learning provided by the application can solve the technical problem of poor audio and video playing effect by adopting the audio and video quality enhancement method based on machine learning in the embodiment. Compared with the prior art, the audio and video quality enhancement device based on machine learning has the same beneficial effects as the audio and video quality enhancement method based on machine learning provided by the embodiment, and other technical features in the audio and video quality enhancement device based on machine learning are the same as the features disclosed by the method of the embodiment, and are not repeated herein.
The application provides an audio and video quality enhancement device (which can be a part of a cleaning robot) based on machine learning, comprising at least one processor and a memory in communication with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the audio and video quality enhancement method based on machine learning in the first embodiment. It should be noted that other devices shared with the cleaning robot, such as a power supply, may also be present in the machine learning-based audio/video quality enhancement device, and will not be described here.
Referring now to fig. 5, a schematic diagram of a controller suitable for use in implementing embodiments of the present application is shown. The controller in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (Personal DIGITAL ASSISTANT: personal digital assistant), a PAD (Portable Application Description: tablet), a PMP (Portable MEDIA PLAYER: portable multimedia player), a car-mounted terminal (e.g., car navigation terminal), etc., a fixed terminal such as a digital TV, a desktop computer, etc. The controller shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 5, the controller may include a processing device 1001 (e.g., a central processing unit, a graphics processor, etc.) that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. In the RAM1004, various programs and data required for the operation of the controller are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, devices may be connected to the I/O interface 1006 including input devices 1007 such as a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc., output devices 1008 including a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), speaker, vibrator, etc., storage devices 1003 including a magnetic tape, hard disk, etc., and communication devices 1009. The communication means 1009 may allow the controller to communicate with other devices wirelessly or by wire to exchange data. While a controller having various means is shown in the figures, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.
The controller provided by the application adopts the machine learning-based audio and video quality enhancement method in the embodiment, and can solve the technical problem of poor audio and video playing effect. Compared with the prior art, the beneficial effects of the controller provided by the application are the same as those of the audio and video quality enhancement method based on machine learning provided by the embodiment, and other technical features in the controller are the same as those disclosed by the method of the previous embodiment, and are not repeated herein.
It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
The present application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon for performing the machine learning-based audio-video quality enhancement method in the above-described embodiments.
The computer readable storage medium provided by the present application may be, for example, a USB flash disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (RAM: random Access Memory), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM: erasable Programmable Read Only Memory or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, the computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (Radio Frequency) and the like, or any suitable combination of the foregoing.
The computer readable storage medium may be included in the controller or may exist alone without being assembled into the controller.
The computer-readable storage medium carries one or more programs that, when executed by the controller, cause the controller to:
If the initial audio and video information to be processed is received, determining the audio and video data to be processed according to the initial audio and video information and preset machine learning parameters;
Determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data comprise audio processing data and video processing data;
Determining audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data are the audio processing data;
and determining video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data are the video processing data.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN: local Area Network) or a wide area network (WAN: wide Area Network), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present application may be implemented in software or in hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.
The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (namely computer program) for executing the machine learning-based audio and video quality enhancement method, so that the technical problem of poor audio and video playing effect can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the machine learning-based audio/video quality enhancement method provided by the embodiment, and are not repeated here.
The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the machine learning based audio/video quality enhancement method as described above.
The computer program product provided by the application can solve the technical problem of poor playing effect of audio and video. Compared with the prior art, the beneficial effects of the computer program product provided by the application are the same as those of the machine learning-based audio/video quality enhancement method provided by the embodiment, and are not repeated here.
The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all the equivalent structural changes made by the description and the accompanying drawings under the technical concept of the present application, or the direct/indirect application in other related technical fields are included in the scope of the present application.

Claims (10)

1.一种基于机器学习的音视频质量增强方法,其特征在于,所述基于机器学习的音视频质量增强方法包括:1. A method for enhancing audio and video quality based on machine learning, characterized in that the method for enhancing audio and video quality based on machine learning comprises: 若接收到待处理的初始音视频信息,则根据所述初始音视频信息和预设的机器学习参数确定待处理音视频数据;If the initial audio and video information to be processed is received, the audio and video data to be processed is determined according to the initial audio and video information and the preset machine learning parameters; 根据所述待处理音视频数据和预设的机器学习数据库确定音视频处理数据,其中,所述音视频处理数据包括音频处理数据和视频处理数据;Determining audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data includes audio processing data and video processing data; 在所述音视频处理数据为所述音频处理数据,根据所述音频处理数据和预设的第一优化规则确定音频质量增强数据;When the audio and video processing data is the audio processing data, determining audio quality enhancement data according to the audio processing data and a preset first optimization rule; 在所述音视频处理数据为所述视频处理数据,根据所述视频处理数据和预设的第二优化规则确定视频质量增强数据。When the audio and video processing data is the video processing data, the video quality enhancement data is determined according to the video processing data and a preset second optimization rule. 2.如权利要求1所述的基于机器学习的音视频质量增强方法,其特征在于,所述预设的机器学习参数包括标准视频参数阈值和标准音频参数阈值,所述根据所述初始音视频信息和预设的机器学习参数确定待处理音视频数据的步骤,包括:2. The method for enhancing audio and video quality based on machine learning according to claim 1, wherein the preset machine learning parameters include a standard video parameter threshold and a standard audio parameter threshold, and the step of determining the audio and video data to be processed according to the initial audio and video information and the preset machine learning parameters comprises: 在所述初始音视频信息为音频信息时,检测所述音频信息中的音频参数值是否满足所述标准音频参数阈值,在所述音频信息中的音频参数值不满足所述标准音频参数阈值时,将所述音频信息中的音频数据作为待处理音视频数据;When the initial audio and video information is audio information, detecting whether an audio parameter value in the audio information meets the standard audio parameter threshold, and when the audio parameter value in the audio information does not meet the standard audio parameter threshold, using audio data in the audio information as audio and video data to be processed; 在所述初始音视频信息为视频信息时,检测所述视频信息中的视频参数值是否满足所述标准视频参数阈值,在视频信息中的视频参数值不满足所述标准视频参数阈值时,将所述视频信息中的视频数据作为待处理音视频数据。When the initial audio and video information is video information, it is detected whether the video parameter value in the video information meets the standard video parameter threshold. When the video parameter value in the video information does not meet the standard video parameter threshold, the video data in the video information is used as the audio and video data to be processed. 3.如权利要求1所述的基于机器学习的音视频质量增强方法,其特征在于,所述预设的机器学习数据库包括音频编码传输库和视频编码传输库,所述根据所述待处理音视频数据和预设的机器学习数据库确定音视频处理数据的步骤,包括:3. The method for enhancing audio and video quality based on machine learning according to claim 1, wherein the preset machine learning database includes an audio coding transmission library and a video coding transmission library, and the step of determining the audio and video processing data according to the audio and video data to be processed and the preset machine learning database comprises: 确定所述待处理音视频数据中的音频数据,在所述音频编码传输库中确定所述音频数据的数据特征对应的第一编码传输方式,并将所述第一编码传输方式与所述音频数据绑定作为音频处理数据;Determine audio data in the audio and video data to be processed, determine a first encoding and transmission mode corresponding to the data characteristics of the audio data in the audio encoding and transmission library, and bind the first encoding and transmission mode to the audio data as audio processing data; 确定所述待处理音视频数据中的视频数据,在所述视频编码传输库中确定所述视频数据的数据特征对应的第二编码传输方式,并将所述第二编码传输方式与所述视频数据绑定作为视频处理数据。Determine the video data in the audio and video data to be processed, determine a second encoding and transmission method corresponding to the data characteristics of the video data in the video encoding and transmission library, and bind the second encoding and transmission method to the video data as video processing data. 4.如权利要求1所述的基于机器学习的音视频质量增强方法,其特征在于,所述预设的第一优化规则包括音频降噪规则、音频均衡规则和回声消除规则,所述根据所述音频处理数据和预设的第一优化规则确定音频质量增强数据的步骤,包括:4. The method for enhancing audio and video quality based on machine learning according to claim 1, wherein the preset first optimization rule comprises an audio noise reduction rule, an audio equalization rule and an echo cancellation rule, and the step of determining the audio quality enhancement data according to the audio processing data and the preset first optimization rule comprises: 在所述音频处理数据的噪声值大于预设的音频降噪阈值时,基于所述音频降噪规则对所述音频处理数据进行降噪得到降噪音频数据,并在所述音频处理数据的均衡值大于预设的音频均衡阈值时,基于所述音频均衡规则对所述降噪音频数据进行音频均衡得到均衡音频数据;When the noise value of the audio processing data is greater than a preset audio noise reduction threshold, the audio processing data is denoised based on the audio noise reduction rule to obtain noise-reduced audio data, and when the equalization value of the audio processing data is greater than a preset audio equalization threshold, the noise-reduced audio data is audio-equalized based on the audio equalization rule to obtain equalized audio data; 在所述音频处理数据的回声值大于预设的回声消除阈值时,基于所述回声消除规则对所述均衡音频数据进行回声消除得到音频质量增强数据;When the echo value of the audio processing data is greater than a preset echo cancellation threshold, performing echo cancellation on the equalized audio data based on the echo cancellation rule to obtain audio quality enhanced data; 在所述音频处理数据的回声值小于或者等于预设的回声消除阈值时,将所述均衡音频数据作为音频质量增强数据。When the echo value of the audio processing data is less than or equal to a preset echo cancellation threshold, the equalized audio data is used as audio quality enhancement data. 5.如权利要求1所述的基于机器学习的音视频质量增强方法,其特征在于,所述预设的第二优化规则包括视频去噪规则、对比度增强规则、锐化处理规则和分辨率处理规则,所述根据所述视频处理数据和预设的第二优化规则确定视频质量增强数据的步骤,包括:5. The method for enhancing audio and video quality based on machine learning according to claim 1, wherein the preset second optimization rule comprises a video denoising rule, a contrast enhancement rule, a sharpening processing rule and a resolution processing rule, and the step of determining the video quality enhancement data according to the video processing data and the preset second optimization rule comprises: 在所述视频处理数据的噪声值大于预设的视频去噪阈值时,基于所述视频去噪规则对所述视频处理数据进行降噪得到降噪视频数据;When the noise value of the video processing data is greater than a preset video denoising threshold, denoising the video processing data based on the video denoising rule to obtain denoised video data; 在所述视频处理数据的对比度小于或者等于预设的对比度增强阈值时,将所述降噪视频数据作为视频质量增强数据;When the contrast of the video processing data is less than or equal to a preset contrast enhancement threshold, using the noise reduction video data as video quality enhancement data; 在所述视频处理数据的对比度大于预设的对比度增强阈值时,基于所述对比度增强规则对所述降噪视频数进行对比度调整得到对比度调整数据,并在所述对比度调整数据的锐化值大于预设的锐化处理阈值,且所述对比度调整数据的分辨率大于预设的分辨率处理阈值时,基于所述锐化处理规则和所述分辨率处理规则对所述对比度调整数据进行处理得到视频质量增强数据。When the contrast of the video processing data is greater than a preset contrast enhancement threshold, the noise reduction video data is contrast adjusted based on the contrast enhancement rule to obtain contrast adjustment data, and when the sharpening value of the contrast adjustment data is greater than a preset sharpening processing threshold and the resolution of the contrast adjustment data is greater than a preset resolution processing threshold, the contrast adjustment data is processed based on the sharpening processing rule and the resolution processing rule to obtain video quality enhanced data. 6.如权利要求1至5任一项所述的基于机器学习的音视频质量增强方法,其特征在于,所述基于机器学习的音视频质量增强方法还包括:6. The method for enhancing audio and video quality based on machine learning according to any one of claims 1 to 5, characterized in that the method for enhancing audio and video quality based on machine learning further comprises: 获取初始训练音视频,并确定所述初始训练音视频对应的播放评分,其中,所述播放评分包括直接播放的第一评分值;Acquire initial training audio and video, and determine a playback score corresponding to the initial training audio and video, wherein the playback score includes a first score value for direct playback; 确定所述第一评分值下音频数据对应的音频降噪阈值、音频均衡阈值和回声消除阈值,并将所述音频降噪阈值、所述音频均衡阈值和所述回声消除阈值作为启动预设的第一优化规则的依据;Determine an audio noise reduction threshold, an audio equalization threshold, and an echo cancellation threshold corresponding to the audio data under the first scoring value, and use the audio noise reduction threshold, the audio equalization threshold, and the echo cancellation threshold as a basis for starting a preset first optimization rule; 确定所述第一评分值下视频数据对应的视频去噪阈值、对比度增强阈值、锐化处理阈值和分辨率处理阈值,并将所述视频去噪阈值、所述对比度增强阈值、所述锐化处理阈值和所述分辨率处理阈值作为启动预设的第二优化规则的依据。Determine the video denoising threshold, contrast enhancement threshold, sharpening processing threshold and resolution processing threshold corresponding to the video data under the first scoring value, and use the video denoising threshold, the contrast enhancement threshold, the sharpening processing threshold and the resolution processing threshold as the basis for starting a preset second optimization rule. 7.如权利要求6所述的基于机器学习的音视频质量增强方法,其特征在于,所述播放评分还包括传输之后播放的第二评分值,所述确定所述初始训练音视频对应的播放评分的步骤之后,包括:7. The method for enhancing audio and video quality based on machine learning according to claim 6, wherein the playback score also includes a second score value played after transmission, and after the step of determining the playback score corresponding to the initial training audio and video, the method further includes: 确定所述第二评分值下音频编码方式与音频数据的数据特征的第一对应关系,并将所述第一对应关系作为预设的机器学习数据库中的音频编码传输库;Determine a first correspondence between the audio encoding method and the data feature of the audio data under the second scoring value, and use the first correspondence as an audio encoding transmission library in a preset machine learning database; 确定所述第二评分值下视频编码方式与视频数据的数据特征的第二对应关系,并将所述第二对应关系作为预设的机器学习数据库中的视频编码传输库。A second corresponding relationship between the video encoding method and the data feature of the video data under the second scoring value is determined, and the second corresponding relationship is used as a video encoding transmission library in a preset machine learning database. 8.一种基于机器学习的音视频质量增强装置,其特征在于,所述基于机器学习的音视频质量增强装置包括:8. A device for enhancing audio and video quality based on machine learning, characterized in that the device for enhancing audio and video quality based on machine learning comprises: 信息获取模块,用于若接收到待处理的初始音视频信息,则根据所述初始音视频信息和预设的机器学习参数确定待处理音视频数据;An information acquisition module, for determining the audio and video data to be processed according to the initial audio and video information and preset machine learning parameters if the initial audio and video information to be processed is received; 类型判断模块,用于根据所述待处理音视频数据和预设的机器学习数据库确定音视频处理数据,其中,所述音视频处理数据包括音频处理数据和视频处理数据;A type determination module, used to determine audio and video processing data according to the audio and video data to be processed and a preset machine learning database, wherein the audio and video processing data includes audio processing data and video processing data; 第一增强模块,用于在所述音视频处理数据为所述音频处理数据,根据所述音频处理数据和预设的第一优化规则确定音频质量增强数据;A first enhancement module, configured to determine audio quality enhancement data according to the audio processing data and a preset first optimization rule when the audio and video processing data is the audio processing data; 第二增强模块,用于在所述音视频处理数据为所述视频处理数据,根据所述视频处理数据和预设的第二优化规则确定视频质量增强数据。The second enhancement module is used to determine video quality enhancement data according to the video processing data and a preset second optimization rule when the audio and video processing data is the video processing data. 9.一种基于机器学习的音视频质量增强设备,其特征在于,所述基于机器学习的音视频质量增强设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序配置为实现如权利要求1至7中任一项所述的基于机器学习的音视频质量增强方法的步骤。9. A machine learning-based audio and video quality enhancement device, characterized in that the machine learning-based audio and video quality enhancement device comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of the machine learning-based audio and video quality enhancement method as described in any one of claims 1 to 7. 10.一种介质,其特征在于,所述介质为计算机可读存储介质,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的基于机器学习的音视频质量增强方法的步骤。10. A medium, characterized in that the medium is a computer-readable storage medium, and a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the audio and video quality enhancement method based on machine learning as described in any one of claims 1 to 7 are implemented.
CN202411815572.5A 2024-12-11 2024-12-11 Audio and video quality enhancement method, device, equipment and medium based on machine learning Pending CN119296558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411815572.5A CN119296558A (en) 2024-12-11 2024-12-11 Audio and video quality enhancement method, device, equipment and medium based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411815572.5A CN119296558A (en) 2024-12-11 2024-12-11 Audio and video quality enhancement method, device, equipment and medium based on machine learning

Publications (1)

Publication Number Publication Date
CN119296558A true CN119296558A (en) 2025-01-10

Family

ID=94165833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411815572.5A Pending CN119296558A (en) 2024-12-11 2024-12-11 Audio and video quality enhancement method, device, equipment and medium based on machine learning

Country Status (1)

Country Link
CN (1) CN119296558A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104094312A (en) * 2011-12-09 2014-10-08 英特尔公司 Control of video processing algorithms based on measured perceptual quality characteristics
US20170330579A1 (en) * 2015-05-12 2017-11-16 Tencent Technology (Shenzhen) Company Limited Method and device for improving audio processing performance
CN112672157A (en) * 2020-12-22 2021-04-16 广州博冠信息科技有限公司 Video encoding method, device, equipment and storage medium
CN112906463A (en) * 2021-01-15 2021-06-04 上海东普信息科技有限公司 Image-based fire detection method, device, equipment and storage medium
CN113747257A (en) * 2021-08-31 2021-12-03 安徽创变信息科技有限公司 Audio and video data acquisition method and system
CN117459716A (en) * 2023-10-23 2024-01-26 合肥联宝信息技术有限公司 Digital signal testing methods, devices, equipment and storage media
CN118573959A (en) * 2024-05-28 2024-08-30 重庆平可杰信息技术有限公司 Audio and video data acquisition method and system based on 5G terminal equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104094312A (en) * 2011-12-09 2014-10-08 英特尔公司 Control of video processing algorithms based on measured perceptual quality characteristics
US20170330579A1 (en) * 2015-05-12 2017-11-16 Tencent Technology (Shenzhen) Company Limited Method and device for improving audio processing performance
CN112672157A (en) * 2020-12-22 2021-04-16 广州博冠信息科技有限公司 Video encoding method, device, equipment and storage medium
CN112906463A (en) * 2021-01-15 2021-06-04 上海东普信息科技有限公司 Image-based fire detection method, device, equipment and storage medium
CN113747257A (en) * 2021-08-31 2021-12-03 安徽创变信息科技有限公司 Audio and video data acquisition method and system
CN117459716A (en) * 2023-10-23 2024-01-26 合肥联宝信息技术有限公司 Digital signal testing methods, devices, equipment and storage media
CN118573959A (en) * 2024-05-28 2024-08-30 重庆平可杰信息技术有限公司 Audio and video data acquisition method and system based on 5G terminal equipment

Similar Documents

Publication Publication Date Title
CN109361949B (en) Video processing method, video processing device, electronic equipment and storage medium
JP4818450B1 (en) Graphics processing unit and information processing apparatus
CN109587560A (en) Video processing method, video processing device, electronic equipment and storage medium
US20230421716A1 (en) Video processing method and apparatus, electronic device and storage medium
CN105283917A (en) Method for cancelling noise and electronic device thereof
CN118800268B (en) Voice signal processing method, voice signal processing device and storage medium
US11822854B2 (en) Automatic volume adjustment method and apparatus, medium, and device
WO2022143522A1 (en) Audio signal processing method and apparatus, and electronic device
CN115767181A (en) Live video stream rendering method, device, equipment, storage medium and product
CN111754424A (en) Method, device and electronic device for facial skin beautifying treatment in pictures
WO2023274005A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN108495235B (en) Method and device for separating heavy and low sounds, computer equipment and storage medium
WO2024222373A1 (en) Audio noise reduction method and apparatus, device, storage medium and product
CN114845212A (en) Volume optimization method and device, electronic equipment and readable storage medium
CN119296558A (en) Audio and video quality enhancement method, device, equipment and medium based on machine learning
CN118609608A (en) Noise reduction using voice activity detection in audio processing systems and applications
JP6766203B2 (en) Video optimization processing system and method
WO2023197967A1 (en) Multi-channel sound mixing method, and device and medium
CN112950516B (en) Method and device for enhancing local contrast of image, storage medium and electronic equipment
TW202333144A (en) Audio signal reconstruction
CN117133296A (en) Display device and method for processing mixed sound of multipath voice signals
CN119360873B (en) AI-based intelligent noise reduction method, device, equipment and medium for conference audio stream
CN114449341B (en) Audio processing methods, devices, readable media and electronic equipment
JP5238849B2 (en) Electronic device, electronic device control method, and electronic device control program
US12340784B2 (en) Audio processing method, audio processing apparatus and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination