A kind of audio/video encoding/decoding device and method
Technical field
The present invention relates to audio/video encoding/decoding, particularly a kind of audio/video encoding/decoding device and method that reduces user system environment dependence and raising software quality stability.
Background technology
Prior art generally adopts DirectShow or FFMPEG to carry out audio/video encoding/decoding.DirectShow be Microsoft release based on COM (Component Object Model, The Component Object Model) kit that Streaming Media is handled, Streaming Media on a kind of windows platform Development Framework is provided, and the audio/video encoding/decoding that can be used on the windows platform is handled.And FFMPEG is the project of fan exploitation of increasing income, and the integrated encoding and decoding of various Voice ﹠ Videos, and the conversion of various file formats generally adopt C language and assembler language to develop, can be cross-platform.
If adopt above-mentioned DirectShow technology to carry out audio/video encoding/decoding, can have following problem:
1) to the dependence height of user system environment.Adopt the DirectShow technology to carry out audio/video encoding/decoding, depend on the encoding and decoding assembly of the many third party developer's exploitations that exist in the system, and these assemblies obtain by some players or software kit that some are special are installed, and therefore cause existing in the different systems assembly different probably.
2) software quality exists uncertain.Because adopt the encoding and decoding assembly that depends on third party developer's exploitation, and the developer of these assemblies is different, quality is also uneven, causes some running papers can report an error on some computer, it is no problem to operate on some computer.And Microsoft also prepares to abandon gradually this framework of DirectShow, causes having a disconcerting prospect of this technology.
If adopt above-mentioned FFMPEG technology audio/video encoding/decoding, can have following problem:
Because FFMPE adopts the GPL agreement, if use this project, software must open source code so.And FFMPEG framework and code complexity are huge, and do not have note, are difficult to understand.If revise the BUG that exists or do necessary expansion, then difficulty is too high.And FFMPEG is mainly at the LINUX platform, and being transplanted to the WINDOWS platform then needs to make a large amount of modifications, uses inconvenience.
Summary of the invention
The technical problem to be solved in the present invention is, how at the dependence high and software quality uncertainty of available technology adopting DirecShow technology to user system environment, and the shortcoming that adopts the FFMPEG technology to be difficult to maintenance and expansion, provide a kind of and reduce user system environment dependence and raising software quality stability audio/video encoding/decoding device and method.
The technical solution adopted for the present invention to solve the technical problems is: a kind of audio/video encoding/decoding device comprises:
Decoder module is used for the importing medium file, calls corresponding decoder this media file is decoded;
Playing module is used for decoded audio, video data is play;
The conversion and control module, be used for decoded audio, video data is carried out relevant treatment, comprise: the convergent-divergent and the additive effect that carry out video resolution according to coding parameter are handled, carry out repeating frame and frame losing processing according to frame per second and timestamp, voice data is carried out sample rate conversion process, voice data is abandoned or fills up the processing of quiet data and the processing that a plurality of media files is connected into a media file according to data volume and timestamp;
Coding module comprises encoder and media file synthesizer, is used to call corresponding encoder and media file synthesizer, generates the multimedia file of specified format and relevant parameter.
Among the present invention, described decoder module specifically comprises polymorphic type decoding management device, image decoder, DirectShow decoder and media file decoder,
Polymorphic type decoding management device, being used for that the media file of file suffixes image type file suffixes by name is sent into image decoder decodes, the media file of file suffixes wmv, wma, rmvb and rm by name is sent into the DirectShow decoder decode, will send into the media file decoder except that the media file file suffixes wmv, wma, rmvb and the rm by name and decode;
Image decoder, the video data of YUV12 (all deposit by point of 12bit in the MPEG, the be abbreviated as YUV12 usually) type that obtains after being used for decoding returns to polymorphic type decoding management device;
The DirectShow decoder, the video data of the YUV12 type that obtains after being used for decoding and/or the voice data of 16 PCM (Pulse Code Modulation, pulse-code modulation recording) type return to polymorphic type decoding management device;
The media file decoder, the video data of the YUV12 type that obtains after being used for decoding and/or the voice data of 16 PCM types return to polymorphic type decoding management device.
Among the present invention, described media file decoder comprises that decode controller, file separate manager, file separator plug-in unit, decoding plug-in manager, audio decoder plug-in unit and Video Decoder plug-in unit;
Decode controller calls file separation manager and obtains media file information, file separation manager calls corresponding file separator plug-in unit according to the suffix name and obtains fileinfo, this document information comprises the reproduction time of number, the file of audio/video flow, the parameter of each road stream, and this parameter comprises code check, the resolution of video, the sample rate and the code check of audio frequency;
When playing module or conversion and control module invokes decode controller obtain the voice data of the video data of next YUV12 type or 16 PCM types, decode controller calls corresponding file separator plug-in unit order from file by file separation manager to be separated some videos or audio compression data and passes to decode controller, and data are passed to the decoding plug-in manager by decode controller, the decoding plug-in manager sends the audio decoder plug-in unit to according to the type of packed data or the Video Decoder plug-in unit is decoded, and the video data of the YUV12 type that decodes or the voice data of 16 PCM types are returned to decode controller.
Among the present invention, described conversion and control module comprises data processing module, file link block and modular converter,
The decoded audio, video data that data processing module is sent polymorphic type decoding management device carries out video scaling, image overlay, additive effect, the conversion of video display area ratio, eliminates interlacing scan processing, smoothed video timestamp, audio sample rate conversion and level and smooth audio time stamp processing;
The request of the file connection processing that the file link block is sent the interface is carried out the file connection processing and is handled, and comprises the alignment of file header timestamp, the alignment of end-of-file timestamp, the connection of file and the computing of video and audio time stamp;
Modular converter obtains the audio, video data that need carry out the file connection processing or obtains the audio, video data that need not to carry out the file connection processing from data processing module from the file link block, these audio, video datas are carried out conversion logic to be handled, comprise when video time stamp is less than or equal to audio time stamp, the video interface that calls coding module writes encoder with video data and encodes, when video time stamp during, call audio interface in the coding module and voice data is write encoder encode greater than audio time stamp.
Among the present invention, described coding module also comprises encode manager, and described encoder comprises audio coder plug-in unit and video encoder plug-in unit, and described media file synthesizer comprises media file synthesizer plug-in unit,
The conversion and control module writes voice data in the encode manager, encode manager is encoded to voice data by calling corresponding audio encoder plug-in unit, the conversion and control module writes video data in the encode manager, encode manager is by calling corresponding video encoder plug-in unit to coding video data, after coding is finished, encode manager is called corresponding media file synthesizer plug-in unit, and the data behind the coding are write media file according to the media file format standard of correspondence.
For dealing with problems on the other hand, the invention also discloses a kind of audio/video encoding/decoding method, comprise the steps:
A, importing medium file call corresponding decoder this media file are decoded;
B, decoded audio, video data is selected to play, or decoded audio, video data carried out relevant treatment, comprise: the convergent-divergent and the additive effect that carry out video resolution according to coding parameter are handled, carry out repeating frame and frame losing processing according to frame per second and timestamp, voice data is carried out sample rate conversion process, voice data is abandoned or fills up the processing of quiet data and the processing that a plurality of media files is connected into a media file according to data volume and timestamp;
C, call corresponding encoder and the media file synthesizer is encoded, generate the multimedia file of specified format and relevant parameter.
Among the present invention, in the described steps A, before decoding, also comprise the type of judging this media file, specifically comprise:
If the A1 file suffixes is called the image type file suffixes, include but not limited to bmp, jpg, png, then call image decoder and decode, obtain the video data of YUV12 type.
If the A2 file suffixes is wmv, wma, rmvb, rm, then call the DirectShow decoder and decode, obtain the YUV12 data behind the video decode, obtain the voice data of 16 PCM types behind the audio decoder.
If the A3 file suffixes is other media file suffix except that wmv, wma, rmvb, rm, include but not limited to mpg, mp4, ogg, then calling the media file decoder decodes, obtain the video data of YUV12 type behind the video decode, obtain the voice data of 16 PCM types behind the audio decoder.If decoding is failed then is adopted the DirectShow decoder to decode.
Among the present invention, in the described steps A 3, the process of decoding specifically comprises:
A31 obtains the step of fileinfo, and decode controller calls file and separates manager and obtain media file information in this step, and file separates manager and calls corresponding file separator plug-in unit according to the suffix name and obtain fileinfo;
A32, the step of video decode, when video data that obtains when playing module or conversion and control module invokes decode controller in this step and/or voice data, decode controller calls corresponding file separator plug-in unit order from file by file separation manager to be separated some video compression datas and/or audio compression data and passes to decode controller, and data are passed to the decoding plug-in manager by decode controller, the decoding plug-in manager sends the Video Decoder plug-in unit to according to the type of packed data and/or the audio decoder plug-in unit is decoded, and the video data of the YUV12 type that decodes and/or the voice data of 16 PCM types are returned to decode controller.
Among the present invention, among the described step B, the described step that decoded audio, video data is carried out relevant treatment comprises:
B1, data processing module carries out video scaling, image overlay, additive effect, the conversion of video display area ratio, eliminates interlacing scan processing, smoothed video timestamp, audio sample rate conversion and level and smooth audio time stamp processing decoded audio, video data;
B2, the request of the file connection processing that the file link block is sent the interface is carried out the file connection processing and is handled, and comprises the alignment of file header timestamp, the alignment of end-of-file timestamp, the connection of file and the computing of video and audio time stamp;
B3, modular converter obtains the audio, video data that need carry out the file connection processing or obtains the audio, video data that need not to carry out the file connection processing from data processing module from the file link block, these audio, video datas are carried out conversion logic to be handled, comprise when video time stamp is less than or equal to audio time stamp, the video interface that calls coding module writes encoder with video data and encodes, when video time stamp during, call audio interface in the coding module and voice data is write encoder encode greater than audio time stamp.
Among the present invention, described step C specifically comprises:
C1, the conversion and control module writes voice data in the encode manager, and by calling corresponding audio encoder plug-in unit voice data is encoded by encode manager, the conversion and control module writes video data in the encode manager, and by encode manager by calling corresponding video encoder plug-in unit to coding video data;
C2, after coding was finished, encode manager was called corresponding media file synthesizer plug-in unit, and the data after encoding are write media file according to the media file format standard of correspondence.
Enforcement the invention provides a kind of audio/video encoding/decoding device and method, compared with prior art, has following advantage:
1) reduced dependence, improved the certainty of software quality user system environment.
For WMV, WMA, RM, the RMVB form, the very reliable and stable assembly that Microsoft and Real company provide carries out encoding and decoding, and the computer of every WINDOWS platform all has this class component, has adopted the DirectShow technology to carry out encoding and decoding for this class form; And,, develop the audio frequency and video separator and the media file synthesizer of all kinds of media files then according to the standard of different file formats for extended formatting, again according to the standard of each coded format, develop decoder and encoder.In addition, for the high encoder of some technical difficulty, can also realize by select buying some very ripe and stable commercial SDK.Audio/video flow is adopted pattern respectively and carries out the method for audio-visual synchronization based on timestamp, design and develop the decoding platform, therefore reduced the dependence of user system environment, and software quality also is controlled.
2) be easy to safeguard.
Because encoding and decoding of the present invention separate with media file, composite part is mostly developed voluntarily according to Open Standard, and have detailed exploitation document, maintainable strong, do not need to announce the correlated source code.And another part is to realize by buying some maturations and stable commercial SDK, can solve some BUG by after-sale service.
Description of drawings
Fig. 1 is the flow chart of explanation audio/video encoding/decoding method embodiment of the present invention;
Fig. 2 is the data flow diagram that the data that enter decode procedure among Fig. 1 are carried out judgment processing;
Fig. 3 carries out the data flow diagram that codec data is handled among Fig. 2;
Fig. 4 is the data flow diagram of the invoked procedure of file connection processing among Fig. 1;
Fig. 5 is the data flow diagram that generates multimedia file among Fig. 1.
Embodiment
Further understand and understanding for the auditor is had architectural feature of the present invention and the function that had, cooperate detailed explanation, be described as follows in order to preferred embodiment and accompanying drawing:
As shown in Figure 1, mainly comprise according to audio/video encoding/decoding device of the present invention: decoder module, playing module, conversion and control module and coding module four major parts.According to audio/video encoding/decoding method of the present invention, each part is performed such work.
Step S1, decoder module importing medium file calls corresponding decoder this media file is decoded.Wherein decoder module specifically comprises polymorphic type decoding management device, image decoder, DirectShow decoder and media file decoder, for the audio, video data that enters decode procedure, does following processing (as shown in Figure 2) among this step S1:
A1, if file suffixes is called the image type file suffixes, as bmp, jpg, png etc., then polymorphic type decoding management device calls image decoder and decodes, obtain the video data of YUV12 (all deposit in the MPEG, be abbreviated as YUV12 usually) type by point of 12bit.
A2, if file suffixes is wmv, wma, rmvb, rm, then polymorphic type decoding management device calls the DirectShow decoder and decodes, obtain the video data of YUV12 type behind the video decode, obtain the voice data of 16 PCM (Pulse Code Modulation, pulse-code modulation recording) type behind the audio decoder.
A3, if file suffixes is other media file suffix except that wmv, wma, rmvb, rm, as mpg, mp4, ogg etc., but be not limited to this several forms, then polymorphic type decoding management device calls the media file decoder and decodes, obtain the YUV12 data behind the video decode, obtain the voice data of 16 PCM types behind the audio decoder.If decoding is failed then is adopted the DirectShow mode to decode.
Concrete, described media file decoder comprises that decode controller, file separate manager, file separator plug-in unit, decoding plug-in manager, audio decoder plug-in unit and Video Decoder plug-in unit, further comprise (as shown in Figure 3) for above-mentioned A3:
A31 obtains the process of fileinfo.
Decode controller calls file and separates manager, obtains media file information, as comprises the number of audio/video flow, the reproduction time of file, the parameter of each road stream (as information such as the sample rate of the code check of video, resolution, audio frequency, code checks).
File separates manager and calls corresponding file separator plug-in unit according to the suffix name and obtain fileinfo (as the type of file etc., because every type file format correspondence a file separator plug-in unit, these plug-in units are unified interface, be convenient to expansion, and the file separator plug-in unit is the standard according to this form correspondence, related data in the Study document is obtained fileinfo.For example the file separator plug-in unit of MP4 series can read the particular data in the MP4 file according to the content among the ISO14496, obtain fileinfo), if obtain failure, then call the file separator plug-in unit of other types, can solve because of file suffixes name mistake and cause the problem that can not play.
A32, the process of video decode.
When the Forward interface that calls decode controller when extraneous (playing module or conversion and control module) obtains the video data of next YUV12 type, decode controller calls the GetBlock interface and obtains some video compression datas from file separation manager, file separates the GetBlock interface that manager calls corresponding file separator plug-in unit, order is separated some video compression datas from file, and passes to decode controller.
Decode controller calls the PutData interface of decoding plug-in manager, data are passed to the decoding plug-in manager, the decoding plug-in manager is according to the type of packed data, call the PutData interface of corresponding decoder plug-in unit, packed data is passed to the decoder plug-in unit, and (various decoder plug-in units all are to write according to the compression of certain media formats and decompression standard, as decoding to the H264 video compression data, according to the described content of the 10th part of the MPEG-4 of ISO/IEC, realize decoding to H264.These plug-in units are unified interface, be convenient to expansion), the decode controller GetNextFrame interface that calls the decoding plug-in manager obtains the data of a video image then, the decoding plug-in manager is by calling the GetNextFrame interface in the corresponding decoder plug-in unit, packed data to input is decoded, and obtains the data of a video image.If the video compression data of input is not enough to obtain a video data, can repeat said process so, up to the video data that decodes a YUV12 type.
A33, the video decoding process of audio decoder process and steps A 32 is similar, no longer is repeated in this description here.
Step S2 can select to play to decoded audio, video data.
Step S3 to decoded audio, video data, carries out the convergent-divergent of video resolution according to coding parameter, and processing such as additive effect according to frame per second and timestamp, are carried out processing such as repeating frame and frame losing.Voice data is carried out sampling rate conversion, voice data is abandoned or fills up the processing of quiet data and the processing that a plurality of files is connected into a file according to data volume and timestamp.
Concrete, described conversion and control module comprises data processing module, file link block and modular converter, Fig. 4 is the invoked procedure flow chart of tape file connection processing among the above-mentioned steps S3, and the file connection processing is exactly that a plurality of file conversions are become a file output.
Data processing module mainly obtains audio, video data among this Fig. 4 from polymorphic type decoding management device, comprises the voice data of YUV12 data and 16 PCM types, and the step that data processing module is mainly handled comprises:
1) video scaling.
According to the bilinear interpolation algorithm, with the required resolution of conversion of resolution one-tenth coding of decoded image.
2) image overlay.
According to the image overlay algorithm, other images (picture of watermark or user's appointment) are added on the decoded image.
3) additive effect.
According to existing all kinds of special efficacy algorithms, as the anaglyph algorithm, the colored black and white algorithm that changes, inverse algorithms etc. carry out the effect conversion process to the image that decodes.
4) video display area ratio conversion.
According to the bilinear interpolation algorithm, image change is become the display mode of 4:3 or 16:9.
5) eliminating interlacing scan handles.
User by selecting is eliminated interlace mode, eliminates interlacing scan, comprising:
A, repetition odd-numbered line for the video image of a YUV12 type, copy to even number line to the data of odd-numbered line, replace even number line.
B, repetition even number line for the video image of a YUV12 type, copy to odd-numbered line to the data of even number line.
C, each numerical value in the adjacent two row odd-numbered lines is averaged the even number line between replacing it.
D, each value in the adjacent two row even number lines is averaged the odd-numbered line between replacing it.
6) smoothed video timestamp.
According to the timestamp of the image of decoding out, export in strict accordance with frame per second by multiimage and the image that abandons the feasible output of image.
7) audio sample rate conversion.
Adopt linear interpolation algorithm, the audio sample rate that decoding is come out converts the required audio sample rate of coding to.
8) level and smooth audio time stamp.
According to the relation of data volume and timestamp, fill quiet data or abandon data, make the dateout timestamp corresponding with data volume.
The step that the file link block is mainly handled among this Fig. 4 comprises:
1) file header timestamp alignment.
A media file video time started stamp might not be consistent with audio frequency time started stamp, can cause audio frequency and video asynchronous in the connection procedure, therefore video time stamp need be alignd with audio time stamp.The method of alignment is exactly the time started in each road audio/video flow to be stabbed the timestamp that flows on that minimum road stab as the time started of file, and all the other each road streams need to mend quiet data (audio frequency) by frame per second benefit picture black (video) or by data volume, align.
2) end-of-file timestamp alignment.
A media file video concluding time stamp might not be consistent with audio frequency concluding time stamp, also causes audio frequency and video asynchronous in the connection procedure, therefore video concluding time stamp and audio frequency concluding time stamp need be alignd.The method of alignment is exactly the concluding time in each road audio/video flow to be stabbed the timestamp that flows on maximum that road stab as the concluding time of file, and all the other each road streams need to mend quiet data (audio frequency) by frame per second benefit picture black (video) or by data volume, align.
3) connection of file.
After a file conversion finishes, create the decoder of next file, the timestamp of the Voice ﹠ Video data that obtain from decoder all will add the duration sum of the All Files that changed the front.
4) computational methods of video and audio time stamp.
The timestamp of obtaining from decoder need deduct the time started stamp of file, adds the duration sum of the All Files that changed the front.For example:
It is 0 second that a, file A video flowing time started stab, and the concluding time stamp is 5 seconds, and frame per second is 10 seconds; Audio stream time started stamp is 1 second, and the concluding time stamp is 6 seconds, and sample rate is 44100, and channel number is 1 sound channel.
It is 2 seconds that b, file B video flowing time started stab, and the concluding time stamp is 7 seconds, and frame per second is 10 seconds; Audio stream time started stamp is 3 seconds, and the concluding time stamp is 7 seconds, and sample rate is 44100, and channel number is 1 sound channel.
C, two files of connection:
The first step, the alignment of file header timestamp is according to 1) need as can be known 1 second quiet data of audio stream benefit.Data volume is 44100*16/8*1.
In second step, the alignment of end-of-file timestamp when video playback finishes, needs need mend 10 picture black by frame per second polishing end-of-file according to frequency, aligns with audio frequency.
In the 3rd step, the decoder of establishment file B carries out the alignment of file header timestamp to file B.The duration of file A is 6-0=6 second, therefore is revised as 2-2+6=6 for the video first two field picture timestamp.
Therefore the timestamp of the audio, video data that gets access to from the file link block is continuous forever.
The conversion process of modular converter comprises among this Fig. 4:
Carry out the file connection processing if desired, modular converter need obtain video and voice data from the file link block; If need not to carry out the file connection processing, modular converter then obtains audio, video data from data processing module so.Conversion logic is as follows:
After taking a frame of video and voice data, judge that whether video time stamp is less than audio time stamp, if be less than or equal to audio time stamp, by the FCWriteVideoFrame interface that calls in the encode manager video data is write encoder so, encode; If greater than voice data, by the FCWriteAudioData interface that calls in the encode manager voice data is write encoder so, encode.The GetFrame interface that modular converter calls the file link block obtains the next frame data, repeats said process.
Step S4, coding module (comprising encoder and media file synthesizer) call corresponding encoder and media file synthesizer and encode and generate the multimedia file of specified format and relevant parameter.
Concrete, described coding module also comprises encode manager, described encoder comprises audio coder plug-in unit and video encoder plug-in unit, and described media file synthesizer comprises media file synthesizer plug-in unit, further comprises (as shown in Figure 5) for above-mentioned steps S4:
The FCWriteAudioData interface that D1, switching controller call in the encode manager writes the voice data of 16 PCM types in the encode manager respectively;
D2, encode manager encodes to voice data by calling corresponding audio encoder plug-in unit that (most of encoder plug-in unit all is to realize according to the corresponding normative document coding of this coding; A part encoder plug-in unit such as MPG encode, and realize by the SDK that buys MainConcept company for example and provide);
D3, after coding was finished, encode manager was by calling corresponding media file synthesizer plug-in unit, and the data behind the coding according to the media file format standard of correspondence, are written in the media file;
D4, video coding mode and audio coding mode are similar.
In sum,, reduced the dependence of user system environment, made that software quality is controlled by adopting technical solution of the present invention, and because encoding and decoding separate with media file, composite part mostly is to carry out according to disclosed standard, maintainable strong.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.