CN104079974B

CN104079974B - Audio/video processing method and system

Info

Publication number: CN104079974B
Application number: CN201410277025.6A
Authority: CN
Inventors: 林文富; 黄晓东
Original assignee: Vtron Technologies Ltd
Current assignee: Vtron Group Co Ltd
Priority date: 2014-06-19
Filing date: 2014-06-19
Publication date: 2017-08-25
Anticipated expiration: 2034-06-19
Also published as: CN104079974A

Abstract

The invention discloses a kind of audio/video processing method and system, its method includes step：Obtain the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data；Pixel loading is determined according to the audio frequency parameter and the video parameter, wherein, the pixel loading be each video data frame in need fill voice data pixel quantity；The voice data is filled into each video data frame of the video data according to the pixel loading.Using the present invention program, it is ensured that audio, the synchronism of video, while material resources cost, human cost can be reduced again.

Description

Audio/video processing method and system

Technical field

The present invention relates to multimedia technology field, more particularly to a kind of audio/video processing method and system.

Background technology

It is usually separated processing to Voice ＆ Video at present in multi-media processing；Video is superimposed, scale and dropped The processing such as make an uproar；The processing such as filter, postpone to audio；Because Video processing is big compared with the data volume of audio frequency process, two kinds of data warps Cross after processing, video can typically be slower than audio output, the specific time difference needs to be weighed according to the complexity of Video processing.

In order that audio, audio video synchronization are exported, general processing mode is to use audio special equipment, and equipment cost is higher, And need to adjust the output time of audio manually according to the time difference, complex operation, human cost is higher.

The content of the invention

It is an object of the invention to provide a kind of audio/video processing method and system, it is ensured that audio, the synchronization of video Property, while material resources cost, human cost can be reduced again.

The purpose of the present invention is achieved through the following technical solutions：

A kind of audio/video processing method, comprises the following steps：

Obtain the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data；

Determined to need to fill voice data in per one-frame video data according to the audio frequency parameter and the video parameter Pixel loading；

The voice data is filled into each video data frame of the video data according to the pixel loading.

A kind of audio frequency and video processing system, including：

Acquisition module, for obtaining the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data；

Processing module, for determining to need to fill out in per one-frame video data according to the audio frequency parameter and the video parameter Fill the pixel loading of voice data；

Module is filled, for the voice data to be filled into each of the video data according to the pixel loading In video data frame.

According to the scheme of the invention described above, it is corresponding in the corresponding audio frequency parameter of acquisition voice data and video data After video parameter, pixel loading is determined according to the audio frequency parameter and video parameter, and be based on the pixel loading by sound Frequency is according to being filled into each video data frame of video data, due to being to be filled into voice data respectively based on pixel loading In video data frame, the data volume for the voice data filled in each frame video data frame is identical, that is, is realized equably Voice data is filled into each video data frame, therefore, during audio video transmission, and voice data and video data are synchronous Transmission, voice data and video data ensure that good synchronism, simultaneously as the present invention program, no needs complexity Algorithm is that can be achieved, and also reduces material resources cost and human cost.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the audio/video processing method embodiment of the present invention；

Refinement schematic flow sheets of the Fig. 2 for the step S103 in Fig. 1 in one of the embodiments；

Fig. 3 be one of embodiment in odd-numbered frame filling mode schematic diagram；

Fig. 4 be one of embodiment in even frame filling mode schematic diagram；

Fig. 5 is the structural representation of one embodiment of the audio frequency and video processing system of the present invention；

Refinement structural representations of the Fig. 6 for the filling module in Fig. 1 in one of the embodiments；

Fig. 7 is the structural representation of another embodiment of the audio frequency and video processing system of the present invention；

Fig. 8 is the structural representation of the 3rd embodiment of the audio frequency and video processing system of the present invention.

Embodiment

For the objects, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with drawings and Examples, to this Invention is described in further detail.It should be appreciated that embodiment described herein is only to explain the present invention, Do not limit protection scope of the present invention.

In the following description, the embodiment first against the audio/video processing method of the present invention is illustrated, then to this hair Each embodiment of bright audio frequency and video processing system is illustrated.

It is shown in Figure 1, it is the schematic flow sheet of the audio/video processing method embodiment of the present invention.As shown in figure 1, this reality The audio/video processing method applied in example comprises the following steps：

Step S101：Obtain the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data；

Audio frequency parameter in the present embodiment can include sample rate, port number, the sampling bit wide of audio, and video parameter can be with Resolution ratio, frame per second, video color depth including video, the parameter of a portion can also be only included as needed, for example, audio Parameter can include sample rate, the port number of audio, and video parameter can include the frame per second of video；

Meanwhile, the audio frequency parameter in the present embodiment can also be the parameter for the data volume that can be used to determine voice data, depending on Frequency parameter can be the parameter for the frame number that can be used to determine video data, wherein, the data volume of identified voice data can be with Refer to the data volume of the data volume of unit interval or the voice data of random time section, or voice data is total Data volume, identified frame number can refer to the frame number of the frame number of unit interval or the video data of random time section, Or the totalframes of video data；

But include sample rate, port number, the sampling bit wide of audio with audio frequency parameter, video parameter includes the resolution of video Rate, frame per second, video color depth are preferred, this is mainly it is considered that voice data, video data transmission when, it is all corresponding including these Parameter, each audio frequency parameter can be obtained from audio sample chip or control data stream, can be obtained from video processor Each video parameter is taken, the acquisition of parameter is real-time and convenient, can improve the treatment effeciency of audio frequency and video；

Step S102：Pixel loading is determined according to the audio frequency parameter and the video parameter, wherein, the pixel Loading is put to need the pixel quantity of filling voice data in each video data frame；

The pixel loading for needing to fill voice data in each frame video can be determined according to P=(k × n)/f, its In, P refers to the pixel loading for needing to fill voice data in every one-frame video data, and k refers to the sample rate of audio, and n refers to audio Port number, f refers to the frame per second of video, but present invention determine that needs to fill the pixel loading of voice data in each frame video This mode is also not necessarily limited to, for example, it is also possible to multiply by the ratio of the total amount of data of voice data and the totalframes of video data Determined in the way of ratio of the video color depth with bit wide of sampling；

Step S103：The voice data is filled into each video of the video data according to the pixel loading In data frame；

The voice data is filled into the pixel in each video data frame of video data according to unit data quantity size Point, unit data quantity size is the amount of audio data that a pixel can be filled, and refers generally to the sampling bit wide of audio；

After the existing and pixel loading identical pixel is filled with voice data in current video data frame, Remaining voice data is then and then filled in next video data frame, it is also desirable to have and the pixel loading identical Pixel fills voice data.By that analogy, have and the pixel loading identical in so every one-frame video data Pixel is filled with voice data, fills in order, i.e., voice data has uniformly been filled into each of the video data In video data frame；

The sampling bit wide of general audio is less than the video color depth of video, therefore, it can voice data being also similar to that Pixel in video counts, for example, sample bits are a width of 16 (bits), video color depth is 24 bits, then can be by 16 bits Voice data is put into the video data of 24 bits (the corresponding data of i.e. one pixel), remaining 8 bit video data Directly abandoned when voice data is filled into video data, while again because a pixel is filled with the sound of 16 bits Frequency evidence, therefore, the video data for being filled with the pixel of voice data are all to lose, and fill the above-mentioned each frame of determination Need the mode for filling the pixel loading of voice data to be also based on sampling bit wide in video and be less than video color depth and residue The situation of bit data discard processing.

Why scheme carries out lossless process to audio in the present embodiment, and voice data is evenly embedded into video data Frame in, processing is damaged to video data, is mainly considered：In amount of audio data statistical formula, data volume (ratio per second It is special)=sample frequency × sampling resolution × channel number, by taking 44.1KHZ sample rate, the stereo, DAB of 16 as an example, Data volume=44.1k × 16 × 2=1411.2kb per second, about 1.4Mbps.And the data volume of video is relative to audio number It is 1920 × 1080, frame per second for 60HZ using resolution ratio, exemplified by the video data of the bit of color depth 24, often according to being very huge Data volume=1920 × 1080 × 24 × 60=2.78Gbps of second.Voice data per second and video data are contrasted, 1.4Mbps/2.78Gbps=0.0005036, it is known that, voice data per second is probably 5/10000ths of video data or so. From these data analyses it is recognised that generally the voice data of each second can be neglected substantially to coming compared with video data Slightly disregard.And the distortion for having some pixels from the point of view of our human body angles, in video data does not have shadow to vision substantially Ring, still, the sense of hearing of human body is but very sensitive, the range change ear of moment can be felt to obtain.

Accordingly, the scheme according to above-mentioned the present embodiment, it is to obtain the corresponding audio frequency parameter of voice data and video counts After corresponding video parameter, pixel loading is determined according to the audio frequency parameter and video parameter, and fill out based on the pixel Voice data is filled into each video data frame of video data by charge, due to being that voice data is based on into pixel loading It is filled into each video data frame, the data volume for the voice data filled in each frame video data frame is identical, that is, is realized Equably voice data is filled into each video data frame, therefore, during audio video transmission, voice data and video counts According to being synchronous transfer, voice data and video data ensure that good synchronism, simultaneously as the present invention program, without Algorithm that will be complicated is that can be achieved, and also reduces material resources cost and human cost, further, since being that audio is carried out at free of losses Reason, and processing is damaged to video data, that is to say, that the present invention program is used, can be real without increase in bandwidth The mixed transport of existing audio frequency and video, can solve the problem of transmission medium bandwidth is limited.

Further, since being filled with the video data of the pixel of voice data can lose, if inside video data frame, one It is straight toward a specific position filling voice data video data of this line to be caused to lose completely, and cannot reduce, For example, being all embedded into the first row, so the video data of this line can be caused unreducible, therefore, one embodiment wherein In, when voice data is filled into each video data frame of video data, adjacent two video data frame fills voice data The corresponding position of pixel it is different, for example, can be by voice data according to first position, the second place, first position, second Position ... is filled uniformly with each video data frame of video data, i.e. need to be filled into the first video data frame Voice data be filled into first position, it is necessary to the voice data being filled into the second video data frame is filled into the second place, The voice data being filled into the 3rd video data frame is needed to be filled into first position, by that analogy, first position and second Put can with any row in video data frame or any row or it is any one include the block of x rows y row, wherein, x and y are Integer, but first position and the second place are two different positions, certainly, be also not necessarily limited to according to first position, the second place, The mode that first position, the order of the second place ... are filled, such as can also be first position, the second place, the 3rd Position, the order of first position ..., or can also be first position, first position, the second place, the second place, The order of one position ..., herein not exhaustion；

Using the filling mode in the present embodiment, when display immediately in rear end, any processing is not done to video data, Visually, also it can be seen that complete video image.

But it is considered that because the reading order of video data is according to going into what snakelike order was read, if filling data Also in this order, it can avoid looking for filling position, therefore, in one of the embodiments, as shown in Fig. 2 described Step can be included by being filled into the voice data in each video data frame of the video data according to the pixel loading Suddenly：

Step S1031：The video data is divided into odd-numbered frame and even frame in order；

The video data can be divided into odd-numbered frame and even frame by way of mark is set, can also be according to video The video data will be divided into odd-numbered frame and even frame by the reading order of data frame；

Step S1032：When the audio frequency parameter is filled into each video data frame of video data, if current filling The video data frame of voice data is odd-numbered frame, then is filled since the first row, if the currently video data of filling voice data Frame is even frame, then is filled since last column；

Wherein, filled for odd-numbered frame since the first row, the first row can fill the voice data of b × N-bit, its In, b is the sampling bit wide of audio, as shown in figure 3, numbers of the N for the pixel of a line, the row is filled from left to right, if filling Complete the first row also has voice data to be not filled by, i.e. N is less than the pixel loading, then refills the second row, and the second row is from the right side Fill, i.e., filled by snakelike order, as previously described, because voice data is very little relative to the data volume of video data to the left , therefore, the voice data that the video data needs of a frame are filled can just have been filled by typically only needing to one or two row；

For even frame, such as Fig. 4 is also to be filled by snakelike order, will not be described here.

Using scheme in the present embodiment, otherwise voice data filling is entered when just starting to read one-frame video data, Voice data filling is entered when running through one-frame video data, fills more convenient and is easily achieved, meanwhile, in rear end When display, any processing can not also be done to video data.

It is in one of the embodiments, described in order to ensure that each pixel has enough capacity to place voice data Audio frequency parameter can include sampling bit wide, the video parameter can include video color depth, audio/video processing method of the invention, Step can also be included：Judge whether the video color depth is less than the sampling bit wide, if so, then generating prompt message.

In addition, display end receives the foregoing video data for being filled with voice data, the i.e. blended data of the two, typically Including two processing procedures, one is to extract voice data from blended data, and two be that (if desired for) will fill audio number According to the video data of pixel reduced；Wherein, it is only necessary to extract audio number from the pixel of access voice data According to, and by the data storage extracted in DDR (Double Data Rate Double Data Rates synchronous DRAM), And stamp a timestamp in the voice data of each second, that is, complete the extraction of video data.In the same of voice data extraction When, the reduction of video data can also be completed, because in order to prevent video counts from thoroughly losing, general adjacent two video data frame is filled out The corresponding position of pixel for filling voice data is different, and therefore, current video data frame is filled out in the pixel of a certain position Voice data is filled, then this will be not fill out in the pixel of the former frame of current video data frame and the same position of a later frame Voice data is filled, therefore, it can average the video data of the former frame of current video data frame and a later frame determination It is used for the video data for filling the pixel of voice data in current video data frame, using scheme in the present embodiment, realizes Reduction to the video data of the pixel of filling voice data.

If for example, voice data be according to it is foregoing currently filling voice data video data frame be odd-numbered frame when from The first row starts filling, the side filled when the video data frame of currently filling voice data is even frame since last column Blended data, is carried out serioparallel exchange by formula first, solves video parameter and voice data parameter, and these parameters are simply transported After calculation, the pixel number that the voice data of insertion inside odd, even video data frame takes can be obtained.Due to each odd-numbered frame What the position of video data insertion voice data was just as, next frame is that even frame is real video in same position Data, also, a line or 2 rows video data be it is nullified, it is also little for the influential effect of video-see, it is possible to Video data restoration is carried out by the way of the filling of front and rear frame video data.Detailed process is：First odd-numbered frame of correspondence, it is filled out The video data for filling the pixel of voice data is directly the video data of first even frame same position, from second odd number Frame data start, and determine to fill the pixel of voice data by the way of the data addition of front and rear two frames even frame is averaged Video data；For last even frame, its video data for filling the pixel of voice data is directly last The video data of odd-numbered frame same position, since first even frame, is added using the data of front and rear two frames odd-numbered frame and made even Average fills the video data of the pixel of voice data surely.

The audio/video processing method of the embodiment of the present invention, is that before audio video transmission, voice data is filled into video counts According to obtaining blended data, and the transmitting procedure that the blended data is transferred into display end is not limited by the way of, can be high Fast parallel series and the serial transmission of staticizer or it is transmitted by Web compression method.

Maintaining method is handled according to the audio frequency and video of the invention described above, the present invention also provides a kind of audio frequency and video processing and safeguards system System, just the embodiment of the audio frequency and video processing maintenance system of the present invention is described in detail below.The sound of the present invention is shown in Fig. 5 The structural representation of the embodiment of Video processing maintenance system.For convenience of description, merely illustrate in Figure 5 related to the present invention Part.

As shown in figure 5, the audio frequency and video processing system in the present embodiment, including：

Acquisition module 201, for obtaining the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data；

Processing module 202, for determining pixel loading according to the audio frequency parameter and the video parameter, wherein, The pixel loading be each video data frame in need fill voice data pixel quantity；

Module 203 is filled, for the voice data to be filled into the video data according to the pixel loading Each video data frame in.

In one of the embodiments, voice data is being filled into each video data of video data by filling module 203 When in frame, the corresponding position of pixel of adjacent two video data frames filling voice data can be different.

In one of the embodiments, as shown in fig. 6, filling module 203 can include：

Division unit 2031, for the video data to be divided into odd-numbered frame and even frame in order；

Fills unit 2032, for when voice data is filled into each video data frame of video data, if currently The video data frame for filling voice data is odd-numbered frame, then is filled since the first row, if the currently video of filling voice data Data frame is even frame, then is filled out since last column.

In one of the embodiments, the audio frequency parameter can include sampling bit wide, and the video parameter can include Video color depth, as shown in fig. 7, the audio frequency and video processing system of the present embodiment can also include reminding module 204, the reminding module 204 are used to judge whether the video color depth is less than the sampling bit wide, if so, then generating prompt message.

In one of the embodiments, the audio frequency and video processing system of the present embodiment can also include recovery module 205, should be also Grand master pattern block 205, which is used to average the video data of the former frame of current video data frame and a later frame, determines current video number According to the video data for being used to fill the pixel of voice data in frame.

The audio frequency and video processing maintaining method of the audio frequency and video processing maintenance system of the present invention and the present invention is corresponded, above-mentioned The technical characteristic and its advantage that the embodiment of audio frequency and video processing maintaining method is illustrated are applied to audio frequency and video processing and safeguard system In the embodiment of system, hereby give notice that.

Embodiment described above only expresses the several embodiments of the present invention, and it describes more specific and detailed, but simultaneously Therefore the limitation to the scope of the claims of the present invention can not be interpreted as.It should be pointed out that for one of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention Protect scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of audio/video processing method, it is characterised in that comprise the following steps：

The corresponding audio frequency parameter of voice data and the corresponding video parameter of video data are obtained, the audio frequency parameter includes audio The port number of sample rate and audio, the video parameter includes the frame per second of video；

Pixel loading is determined according to P=(k × n)/f, wherein, the pixel loading be each video data frame in need The pixel quantity of voice data is filled, P refers to the pixel quantity for needing to fill voice data in each video data frame, k Refer to the sample rate of audio, n refers to the port number of audio, and f refers to the frame per second of video；

2. audio/video processing method according to claim 1, it is characterised in that adjacent two video data frame fills audio number According to the corresponding position of pixel it is different.

3. audio/video processing method according to claim 1, it is characterised in that described to be incited somebody to action according to the pixel loading Each video data frame that the voice data is filled into the video data includes step：

The video data is divided into odd-numbered frame and even frame in order；

When the voice data is filled into each video data frame of the video data, if currently filling voice data Video data frame is odd-numbered frame, then is filled since the first row, if currently the video data frame of filling voice data is even frame, Then filled since last column.

4. audio/video processing method according to claim 1, it is characterised in that：

The audio frequency parameter also includes sampling bit wide, and the video parameter also includes video color depth；

Also include step：Judge whether the video color depth is less than the sampling bit wide, if so, then generating prompt message.

5. the audio/video processing method according to one of claim 2 to 4, it is characterised in that also including step：

The video data of the former frame of current video data frame and a later frame is averaged and determines to use in current video data frame In the video data of the pixel of filling voice data.

6. a kind of audio frequency and video processing system, it is characterised in that comprise the following steps：

Acquisition module, for obtaining the corresponding audio frequency parameter of voice data and the corresponding video parameter of video data, the audio Parameter includes the sample rate of audio and the port number of audio, and the video parameter includes the frame per second of video；

Processing module, for determining pixel loading according to P=(k × n)/f, wherein, the pixel loading is each The pixel quantity of filling voice data is needed in video data frame, P refers to be needed to fill voice data in each video data frame Pixel quantity, k refers to the sample rate of audio, and n refers to the port number of audio, and f refers to the frame per second of video；

Fill module, each video for the voice data to be filled into the video data according to the pixel loading In data frame.

7. audio frequency and video processing system according to claim 6, it is characterised in that the filling module is filled out by voice data When being charged in each video data frame of video data, adjacent two video data frame fills the corresponding position of pixel of voice data It is different.

8. audio frequency and video processing system according to claim 6, it is characterised in that the filling module includes：

Division unit, for the video data to be divided into odd-numbered frame and even frame in order；

Fills unit, for when voice data is filled into each video data frame of video data, if currently filling audio The video data frame of data is odd-numbered frame, then is filled since the first row, if currently the video data frame of filling voice data is Even frame, then fill out since last column.

9. audio frequency and video processing system according to claim 6, it is characterised in that：

Also include reminding module, the reminding module is used to judge whether the video color depth is less than the sampling bit wide, if so, Then generate prompt message.

10. the audio frequency and video processing system according to one of claim 7 to 9, it is characterised in that also include：

Recovery module, determines to work as forward sight for the video data of the former frame of current video data frame and a later frame to be averaged It is used for the video data for filling the pixel of voice data in frequency data frame.