JP2005236719A

JP2005236719A - Device and method for multiplexing code

Info

Publication number: JP2005236719A
Application number: JP2004044064A
Authority: JP
Inventors: Yasunori Kaminuma; 康典上沼
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 2004-02-20
Filing date: 2004-02-20
Publication date: 2005-09-02

Abstract

<P>PROBLEM TO BE SOLVED: To improve transmission efficiency at the time of transmitting a multiplexed stream and to efficiently utilize the resources of a reproducing device when encoded data such as sound and images are multiplexed. <P>SOLUTION: When the reproducing device side generates a multiplexed stream for synchronizing and reproducing each encoded data with respect to a plurality of pieces of encoded data, the block size of each encoded data is specified through a decoder buffer size inputting part 109 and the maximum number of frames that does not exceed the block size is read from one of the plurality of pieces of encoded data, and when a frame having simultaneousness with the one of the encoded data is read from the other of the plurality of pieces of encoded data, a frame is read within a range where the other encoded data does not exceed the block size. When the frames having simultaneousness are multiplexed, it is possible to read a frame whose time stamp value is smaller than the terminating time of the last frame of the one of the encoded data. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、符号多重化装置及び多重化方法に関し、特に、音声や画像等の符号情報を効率よく多重化する装置及び方法に関する。 The present invention relates to a code multiplexing apparatus and a multiplexing method, and more particularly to an apparatus and method for efficiently multiplexing code information such as voice and images.

近年、インターネットを利用したストリーミング配信サービスが普及している。ストリーミングとは、サーバに記録されている動画や音楽をダウンロードしながら順次再生することであり、これを利用すれば、ダウンロードの完了まで待たなくとも再生することができる。 In recent years, streaming distribution services using the Internet have become widespread. Streaming refers to sequentially playing video and music recorded on a server while downloading them. By using this, it is possible to play without waiting for completion of the download.

ストリーミングサービスにおける重要な技術として、多重化技術がある。多重化にあたっては、図２に示すように、再生装置で動画や音声等のマルチメディアデータを同期して再生できるように、同時刻に再生すべき複数のメディアデータを配置した多重化ストリームを生成する必要がある。通常、上記マルチメディアデータは、圧縮符号化される。符号化されたメディアデータが固定のビットレートであれば、多重化は比較的容易であるが、広く普及している符号化方式であるＭＰＥＧ等では、ビットレートは可変となるため、多重化を容易に行うことができない。 As an important technology in the streaming service, there is a multiplexing technology. When multiplexing, as shown in FIG. 2, a multiplexed stream in which a plurality of media data to be played back at the same time is generated so that the playback device can play back multimedia data such as moving images and sounds in synchronization. There is a need to. Usually, the multimedia data is compressed and encoded. If the encoded media data is a fixed bit rate, multiplexing is relatively easy. However, in MPEG, which is a widely used encoding method, the bit rate is variable, so multiplexing is not possible. It cannot be done easily.

これを解決する技術の一つとして、特許文献１に記載の符号記録装置、符号多重方法では、ビデオデータの再生時刻を示すタイムスタンプと、オーディオデータの再生時刻を示すタイムスタンプとを比較し、各々のデータが同時刻性を有するように多重化することで可変ビットレートに対応している。 As one of the techniques for solving this, in the code recording device and the code multiplexing method described in Patent Document 1, a time stamp indicating the reproduction time of video data is compared with a time stamp indicating the reproduction time of audio data, Multiplexing so that each data has the same time property supports a variable bit rate.

特開平１０−３２０９１４号公報JP-A-10-320914

しかし、上記特許文献１に記載の従来の符号多重化装置等においては、符号化データを可能な限り細分化して多重化しているため、パケットヘッダ等の符号化データ以外のデータが増加し、マルチメディア（音声や画像）符号化データを同期して再生できるような多重化ストリームを伝送する際に、伝送効率が低下するという問題があった。これは、携帯電話等の低帯域通信端末等では、伝送効率を少しでも向上させることが重要であるため、特に問題である。 However, in the conventional code multiplexing device described in Patent Document 1, since encoded data is divided as much as possible and multiplexed, data other than encoded data such as packet headers increases, When transmitting a multiplexed stream that can reproduce media (voice or image) encoded data synchronously, there has been a problem that transmission efficiency decreases. This is a particular problem in low-bandwidth communication terminals such as mobile phones because it is important to improve the transmission efficiency as much as possible.

また、上述のように、符号化データを可能な限り細分化して多重化するため、デコーダの前段に配置される一時保存用の領域であるデコーダ入力バッファの容量が大きすぎると、装置が無駄になるという問題があった。 Further, as described above, since the encoded data is divided and multiplexed as much as possible, if the capacity of the decoder input buffer, which is a temporary storage area arranged in the preceding stage of the decoder, is too large, the apparatus is wasted. There was a problem of becoming.

そこで、本発明は、上記従来の符号多重化装置等における問題点に鑑みてなされたものであって、音声や画像等の符号化データを多重化するにあたって、多重化ストリーム伝送時の伝送効率を向上させるとともに、再生装置の資源を無駄なく活用することのできる符号多重化装置及び多重化方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems in the conventional code multiplexing device and the like, and in multiplexing encoded data such as voice and image, the transmission efficiency at the time of multiplexed stream transmission is improved. It is an object of the present invention to provide a code multiplexing apparatus and a multiplexing method that can improve and utilize the resources of a reproducing apparatus without waste.

上記目的を達成するため、本発明は、符号多重化装置であって、複数の符号化データを再生装置側で各符号化データを同期して再生するための多重化ストリームを生成するにあたって、各符号化データのブロックサイズを指定するデコーダバッファサイズ入力部を備えることを特徴とする。 In order to achieve the above object, the present invention provides a code multiplexing device, and in generating a multiplexed stream for reproducing a plurality of encoded data in synchronization with each encoded data on the reproducing device side, A decoder buffer size input unit for specifying a block size of encoded data is provided.

そして、本発明によれば、音声や画像等の符号化データを再生装置側で各符号化データを同期して再生できるような多重化ストリームを生成する際、各符号化データのブロックサイズを指定することができるデコーダバッファサイズ入力部を備えるため、多重化される符号化データ以外のオーバーヘッドを低減することができ、伝送効率の向上を図ることができるとともに、再生装置側の入力バッファサイズを考慮しているため、デコーダバッファモデルに適合するように最適化された多重化ストリームを生成することができ、再生装置の資源を無駄なく活用することが可能となる。前記複数の符号化データは、音声データ及び画像データを含むようにすることができる。 Then, according to the present invention, when generating a multiplexed stream in which encoded data such as audio and images can be reproduced in synchronization with each encoded data on the playback device side, the block size of each encoded data is specified. Since the decoder buffer size input unit can be provided, overhead other than the encoded data to be multiplexed can be reduced, transmission efficiency can be improved, and the input buffer size on the playback device side is considered. Therefore, it is possible to generate a multiplexed stream that is optimized so as to conform to the decoder buffer model, and it is possible to use the resources of the playback device without waste. The plurality of encoded data may include audio data and image data.

また、本発明は、符号多重化方法であって、複数の符号化データを再生装置側で各符号化データを同期して再生するための多重化ストリームを生成するにあたって、各符号化データのブロックサイズを指定し、前記複数の符号化データの一方から、前記ブロックサイズを超えない最大数のフレームを読み込み、前記複数の符号化データの他方から、前記一方の符号化データと同時刻性を有するフレームを読み込むにあたって、該他方の符号化データが前記ブロックサイズを超えない範囲でフレームを読み込むことを特徴とする。これによって、上述のように、多重化される符号化データ以外のオーバーヘッドを低減して伝送効率の向上を図り、再生装置の資源を無駄なく活用することができる。 Further, the present invention is a code multiplexing method, wherein a block of each encoded data is generated when generating a multiplexed stream for reproducing a plurality of encoded data in synchronization with each encoded data on the reproducing apparatus side. Specify the size, read the maximum number of frames that does not exceed the block size from one of the plurality of encoded data, and have the same time property as the one encoded data from the other of the plurality of encoded data In reading a frame, the frame is read in a range in which the other encoded data does not exceed the block size. As a result, as described above, overhead other than the encoded data to be multiplexed can be reduced to improve transmission efficiency, and resources of the playback device can be utilized without waste.

前記符号多重化方法において、前記複数の符号化データの他方から、前記一方の符号化データと同時刻性を有するフレームを読み込むにあたって、該他方の符号化データが前記ブロックサイズを超えない範囲で、前記一方の符号化データの最後のフレームの終端時刻よりもタイムスタンプ値が小さいフレームを読み込むようにすることができる。また、前記複数の符号化データを、音声データ及び画像データを含むデータとすることができる。 In the code multiplexing method, when reading a frame having the same time property as the one encoded data from the other of the plurality of encoded data, the other encoded data does not exceed the block size, A frame having a time stamp value smaller than the end time of the last frame of the one encoded data can be read. The plurality of encoded data may be data including audio data and image data.

本発明によれば、音声や画像等の符号化データを多重化するにあたって、多重化ストリーム伝送時の伝送効率を向上させるとともに、再生装置の資源を無駄なく活用することが可能となる。 According to the present invention, it is possible to improve the transmission efficiency at the time of multiplexed stream transmission and multiplex the resources of the playback apparatus without waste when multiplexing encoded data such as audio and images.

図１は、本発明にかかる符号多重化装置の一実施の形態を示し、この符号多重化装置は、オーディオ符号データが格納されたオーディオ符号化ファイル１０１と、ビデオ符号データが格納されたビデオ符号化ファイル１０２と、アプリケーション１０３と、各符号データを多重化する多重化処理部１０４と、多重化ストリームを格納する多重化ファイル１０５とで構成され、多重化処理部１０４には、オーディオ出力バッファ１０６と、ビデオ出力バッファ１０７と、システム情報バッファ１０８とが含まれ、各バッファは、多重化ストリームを出力する際の一時的な保存領域として使用される。また、アプリケーション１０３には、デコーダバッファサイズ入力部１０９が含まれる。 FIG. 1 shows an embodiment of a code multiplexing apparatus according to the present invention. This code multiplexing apparatus includes an audio encoding file 101 storing audio code data and a video code storing video code data. Multiplex file 102, application 103, multiplex processing section 104 that multiplexes each encoded data, and multiplex file 105 that stores the multiplexed stream. The multiplex processing section 104 includes an audio output buffer 106. A video output buffer 107 and a system information buffer 108, and each buffer is used as a temporary storage area when a multiplexed stream is output. The application 103 includes a decoder buffer size input unit 109.

次に、上記構成を有する符号多重化装置の動作について説明する。 Next, the operation of the code multiplexing apparatus having the above configuration will be described.

図３は、本発明によって複数のメディアデータを多重化する形式であるＭＰＥＧ−４ファイルフォーマット（以下「ＭＰ４」という）の概略構成を示す。ＭＰ４は、ＭＰＥＧ−４（ＩＳＯ／ＩＥＣ１４４９６）で標準化されているファイルフォーマットであり、ＭＰＥＧ−４等の符号化データを格納するための汎用的なフォーマットである。ＭＰ４構造は、階層化されたbox（またはatom）と呼ばれる基本データ単位の集合によって表現される。音声や画像の符号化データは、「media data」box３０１内にチャンクと呼ばれる単位で格納される。 FIG. 3 shows a schematic configuration of an MPEG-4 file format (hereinafter referred to as “MP4”) which is a format for multiplexing a plurality of media data according to the present invention. MP4 is a file format standardized by MPEG-4 (ISO / IEC 14496), and is a general-purpose format for storing encoded data such as MPEG-4. The MP4 structure is expressed by a set of basic data units called hierarchical boxes (or atoms). The encoded data of audio and image is stored in a unit called “chunk” in the “media data” box 301.

メディアデータに関する情報を格納するために「track」box３０２／３０３が用意され、メディアの種類毎に必要になる。「track」box３０２／３０３は、該当メディアがファイル内のどこに存在するかを示す「chunk offset」box３０４と、各チャンクにいくつのサンプルが含まれているかを示す「sample to chunk」box３０５と、各サンプルのサイズを示す「sample size」box３０６と、各サンプルの時間長を示す「time to sample」box３０７等で構成される。 A “track” box 302/303 is prepared for storing information relating to media data, and is required for each type of media. A “track” box 302/303 includes a “chunk offset” box 304 that indicates where the corresponding medium exists in the file, a “sample to chunk” box 305 that indicates how many samples are included in each chunk, and each sample. “Sample size” box 306 indicating the size of each sample, “time to sample” box 307 indicating the time length of each sample, and the like.

本発明では、再生装置側のデコード入力バッファサイズを意識した多重化を行う。そのため、予め、アプリケーション１０３にデコード入力バッファサイズを指定する。アプリケーション１０３は、オーディオ符号化ファイル１０１（または、ビデオ符号化ファイル１０２）から、デコード入力バッファサイズを超えない最大数のフレームを読み込む。続いて、他方の符号化ファイルから同時刻性を有するフレームをすべて読み込む。この時、デコード入力バッファサイズを超えていなければ、先に読み込んだ符号化データを併せて多重化を行うが、デコード入力バッファサイズを超えている場合には、超えない範囲で最大数のフレームと、先に読み込んだ符号化データのうち、同時刻性を有するフレームだけを多重化する。実際の多重化処理は、多重化処理部１０４が行い、各出力バッファに含まれるすべてのサンプルを１つのチャンクとなるように多重化ファイル１０５へ追加する。 In the present invention, multiplexing is performed in consideration of the decoding input buffer size on the playback apparatus side. Therefore, the decode input buffer size is designated in advance in the application 103. The application 103 reads the maximum number of frames that do not exceed the decoding input buffer size from the audio encoding file 101 (or the video encoding file 102). Subsequently, all frames having the same time property are read from the other encoded file. At this time, if the decoding input buffer size is not exceeded, the previously read encoded data is multiplexed together, but if the decoding input buffer size is exceeded, the maximum number of frames and Of the previously read encoded data, only frames having the same time property are multiplexed. The actual multiplexing process is performed by the multiplexing processing unit 104, and all samples included in each output buffer are added to the multiplexed file 105 so as to become one chunk.

次に、本発明にかかる符号多重化装置の動作の具体例について説明する。 Next, a specific example of the operation of the code multiplexing apparatus according to the present invention will be described.

図４に示すビデオシーケンス４０１がビデオ符号化ファイル１０２（図１参照）に、オーディオシーケンス４０２がオーディオ符号化ファイル１０１に各々格納されているものとする。ここでは、約１秒分のストリームを示す。尚、オーディオシーケンス４０２に含まれる各フレーム周期（時間長）は固定としているが、可変であってもよい。 Assume that the video sequence 401 shown in FIG. 4 is stored in the video encoded file 102 (see FIG. 1), and the audio sequence 402 is stored in the audio encoded file 101, respectively. Here, a stream of about 1 second is shown. Each frame period (time length) included in the audio sequence 402 is fixed, but may be variable.

デコード入力バッファサイズは、図９に示すようなユーザインタフェースを用いて指定され、デコーダバッファサイズ入力部１０９にて処理・記憶する。ここでは、デコード入力バッファサイズを、ビデオ入力バッファ＝３０００バイト、オーディオ入力バッファ＝１５００バイトと仮定する。 The decode input buffer size is designated using a user interface as shown in FIG. 9 and processed and stored in the decoder buffer size input unit 109. Here, it is assumed that the decode input buffer size is video input buffer = 3000 bytes and audio input buffer = 1500 bytes.

次に、図１、図４及び図７を中心に参照しながら、本発明にかかる符号多重化装置による処理フローについて説明する。 Next, a processing flow by the code multiplexing apparatus according to the present invention will be described with reference to FIGS. 1, 4 and 7.

まず、ビデオ符号化ファイル１０２から、ビデオ入力バッファサイズを超えない最大数のフレーム、すなわちビデオフレーム１〜ビデオフレーム６を読み込む（７０１）と同時に、各フレームのサイズ及びタイムスタンプを記憶する（７０２／７０３）。引き続き、オーディオ符号化ファイル１０１からオーディオフレームを１フレームずつ読み込むが（７０４）、次の２つの条件が満たされる間は、読み込みを継続する。
（ａ）先に読み込んだ最後のビデオフレーム（ここではビデオフレーム６）の終端時刻よりもタイムスタンプ（ＴＳ）値が小さい（７０５）。
（ｂ）読み込んだオーディオフレームサイズの合計が、オーディオ用デコード入力バッファサイズよりも小さい（７０６）。 First, the maximum number of frames that do not exceed the video input buffer size, that is, video frame 1 to video frame 6 are read from the video encoded file 102 (701), and at the same time, the size and time stamp of each frame are stored (702 / 703). Subsequently, audio frames are read frame by frame from the audio encoding file 101 (704), but reading is continued while the following two conditions are satisfied.
(A) The time stamp (TS) value is smaller than the end time of the last video frame read earlier (here, video frame 6) (705).
(B) The sum of the read audio frame sizes is smaller than the audio decoding input buffer size (706).

ここでは、オーディオフレーム７のタイムスタンプ（３８４ｍｓ）がビデオフレーム６の終端時刻（３７０ｍｓ）よりも大きくなる（（ａ）の条件が満たされなくなる）ため、読み込みを停止する。 Here, since the time stamp (384 ms) of the audio frame 7 becomes larger than the end time (370 ms) of the video frame 6 (the condition (a) is not satisfied), the reading is stopped.

続いて、ビデオフレーム１〜６及びオーディオフレーム１〜６を各々のビデオ出力バッファ１０７及びオーディオ出力バッファ１０６に書き込み（７０７／７０８）、多重化処理部１０４へ多重化指示を行う。多重化指示を受けた多重化処理部１０４は、ビデオ出力バッファ１０７内に蓄積されているデータを１つのチャンクとなるように、同様にオーディオ出力バッファ１０６内に蓄積されているデータを１つのチャンクとなるように多重化ファイルへ出力する（７０９／７１０）。この時の「track」boxなので、ヘッダ情報は、システム情報バッファ１０８に保持しておき（７１１）、すべての符号化データを多重化及びファイル出力した後にファイル出力し最終的な多重化ファイルが生成される（７１２）。 Subsequently, the video frames 1 to 6 and the audio frames 1 to 6 are written into the video output buffer 107 and the audio output buffer 106 (707/708), and a multiplexing instruction is given to the multiplexing processing unit 104. Upon receiving the multiplexing instruction, the multiplexing processing unit 104 similarly converts the data accumulated in the audio output buffer 106 into one chunk so that the data accumulated in the video output buffer 107 becomes one chunk. (709/710). Since it is a “track” box at this time, the header information is held in the system information buffer 108 (711), and after all the encoded data is multiplexed and output to a file, the file is output to generate a final multiplexed file. (712).

続いて、符号化データ読み込み処理へ戻る。ビデオ符号化ファイル１０２から、ビデオ入力バッファサイズを超えない最大数のフレーム、すなわちビデオフレーム７〜ビデオフレーム１３を読み込む。続けてオーディオ符号化ファイル１０１からオーディオフレームを前述の条件を満たす間、１フレームずつ読み込みを行う。ここでは、オーディオフレーム７〜オーディオフレーム１３のサイズの合計（１６５０バイト）がオーディオ入力バッファサイズを超える（（ｂ）の条件が満たされなくなる）ため、読み込みを停止する。但し、ここでは、ビデオフレーム１３の終端時刻（９００ｍｓ）がオーディオフレーム１２の終端時刻（７６８ｍｓ）よりも大きいので、ビデオフレーム７〜１２及びオーディオフレーム７〜１２を出力バッファに書き込み、多重化処理部１０４へ多重化指示を行い、ビデオフレーム１３は、アプリケーション１０３内に保持しておく。多重化指示を受けた多重化処理部１０４は、前述と同様の多重化処理を行う。 Subsequently, the process returns to the encoded data reading process. The maximum number of frames that do not exceed the video input buffer size, that is, video frame 7 to video frame 13 are read from the video encoded file 102. Subsequently, audio frames are read frame by frame from the audio encoding file 101 while satisfying the above-described conditions. Here, since the total size (1650 bytes) of the audio frames 7 to 13 exceeds the audio input buffer size (the condition (b) is not satisfied), reading is stopped. However, here, since the end time (900 ms) of the video frame 13 is larger than the end time (768 ms) of the audio frame 12, the video frames 7 to 12 and the audio frames 7 to 12 are written to the output buffer, and the multiplexing processing unit A multiplexing instruction is issued to 104, and the video frame 13 is held in the application 103. Upon receiving the multiplexing instruction, the multiplexing processing unit 104 performs the same multiplexing process as described above.

上記と同様の手順で、ビデオフレーム１３〜１５及びオーディオフレーム１３〜１６を各々の符号化ファイルから読み込み、多重化処理部１０４にて多重化処理を行う。このようにして図５に示す多重化ストリームが得られる。 The video frames 13 to 15 and the audio frames 13 to 16 are read from the respective encoded files in the same procedure as described above, and the multiplexing processing unit 104 performs multiplexing processing. In this way, the multiplexed stream shown in FIG. 5 is obtained.

一方、図４に示したビデオシーケンス及びオーディオシーケンスを従来技術を適用して多重化処理した場合には、図６に示す多重化ストリームとなる。 On the other hand, when the video sequence and the audio sequence shown in FIG. 4 are multiplexed by applying the conventional technique, the multiplexed stream shown in FIG. 6 is obtained.

従来技術を適用して図６の多重化ストリームが生成される過程について補足説明する。
まず、ビデオフレーム１を多重化ストリームへ出力する。続いて、ビデオフレーム１と同時刻性を有するオーディオフレームを多重化ストリームへ出力する。同時刻性を有するフレームとは、一方の符号化データの一つのフレーム周期（時間長）の間に他方の符号化データのタイムスタンプ（開始時刻）が含まれているフレームを指し、ここでは、ビデオフレーム１（０〜８０ｍｓ）と同時刻性を有するオーディオフレームは、オーディオフレーム１（０ｍｓ）及びオーディオフレーム２（６４ｍｓ）となる。続いて、オーディオフレーム２（６４ｍｓ〜１２８ｍｓ）と同時刻性を有するビデオフレーム２（８０ｍｓ）及びビデオフレーム３（１２０ｍｓ）を多重化ストリームへ出力する。同様の判定処理及び多重化処理を繰り返すことで、図６に示す多重化ストリームが生成される。 The process of generating the multiplexed stream of FIG. 6 by applying the prior art will be supplementarily described.
First, video frame 1 is output to the multiplexed stream. Subsequently, an audio frame having the same time property as the video frame 1 is output to the multiplexed stream. The frame having the same time property refers to a frame in which the time stamp (start time) of the other encoded data is included in one frame period (time length) of one encoded data. Audio frames having the same time as video frame 1 (0 to 80 ms) are audio frame 1 (0 ms) and audio frame 2 (64 ms). Subsequently, the video frame 2 (80 ms) and the video frame 3 (120 ms) having the same time as the audio frame 2 (64 ms to 128 ms) are output to the multiplexed stream. By repeating similar determination processing and multiplexing processing, the multiplexed stream shown in FIG. 6 is generated.

ここで、図８を参照しながら、両者のヘッダ情報量を比較する。ここでは、データ量に差分の内boxについては省略することとし、「sample to chunk」boxと「chunk offset」boxの合計サイズを使用して比較する。 Here, the header information amounts of the two are compared with each other with reference to FIG. Here, the box of the difference in the data amount is omitted, and the comparison is performed using the total size of the “sample to chunk” box and the “chunk offset” box.

従来技術を適用して多重化した際の「sample to chunk」boxと「chunk offset」boxの合計サイズは、３２０バイト（チャンクNo，サンプル数，ｉｄｘ及びオフセットアドレスは、各４バイト）となり、本発明により多重化した際の「sample to chunk」boxと「chunk offset」boxの合計サイズは、７２バイトとなり、約１秒分の符号化データあたり２４８バイト削減されている。これを伝送時間に換算すると、通信速度が６４０００ｂｐｓ（ｂｉｔ／ｓｅｃ）とした場合、約３０ミリ秒となり、例えば、６０秒分のデータが多重化されている場合には、約１．８秒転送時間を短縮することができる。ストリーミング再生では、符号化データに先立って送信されるシステム（ヘッダ）情報が少なければ少ない程、再生開始までの待ち時間が短くて済むことになる。 The total size of the “sample to chunk” box and the “chunk offset” box when multiplexed by applying the conventional technology is 320 bytes (chunk No, number of samples, idx and offset address are 4 bytes each). The total size of “sample to chunk” box and “chunk offset” box when multiplexed according to the invention is 72 bytes, which is reduced by 248 bytes per one second of encoded data. When this is converted into transmission time, when the communication speed is 64000 bps (bit / sec), it becomes about 30 milliseconds. For example, when data for 60 seconds is multiplexed, transfer is about 1.8 seconds. Time can be shortened. In streaming playback, the smaller the system (header) information transmitted prior to the encoded data, the shorter the waiting time until playback starts.

本発明にかかる符号多重化装置のシステム構成図である。1 is a system configuration diagram of a code multiplexing apparatus according to the present invention. 復号装置に入力される多重化ストリームを示す図である。It is a figure which shows the multiplexed stream input into a decoding apparatus. ＭＰＥＧ−４ファイルのフォーマットの概略図である。It is the schematic of the format of an MPEG-4 file. ビデオ及びオーディオ符号化シーケンスを示す図である。It is a figure which shows a video and an audio encoding sequence. 本発明による多重化ストリームを示す図である。FIG. 4 shows a multiplexed stream according to the present invention. 従来技術による多重化ストリームを示す図である。It is a figure which shows the multiplexed stream by a prior art. 本発明による多重化処理フロー図である。It is a multiplexing process flowchart by this invention. 本発明と従来技術とでヘッダ情報量を比較した図である。It is the figure which compared the header information amount with this invention and the prior art. 本発明によるデコーダバッファサイズ入力図である。FIG. 6 is a decoder buffer size input diagram according to the present invention.

Explanation of symbols

１０１オーディオ符号化ファイル
１０２ビデオ符号化ファイル
１０３アプリケーション
１０４多重化処理部
１０５多重化ファイル
１０６オーディオ出力バッファ
１０７ビデオ出力バッファ
１０８システム情報バッファ
１０９デコーダバッファサイズ入力部
３０１ Media Data Box
３０２ Track Box(Video)
３０３ Track Box(Audio)
３０４ Chunk Offset Box
３０５ Sample to Chunk Box
３０６ Sample Size Box
３０７ Time to Sample Box
４０１ビデオシーケンス
４０２オーディオシーケンス
７０１ビデオフレーム読み込み処理
７０２ビデオフレームサイズ記憶処理
７０３ビデオフレーム・タイムスタンプ記憶処理
７０４オーディオフレーム読み込み処理
７０５タイムスタンプ比較処理
７０６オーディオデータサイズ比較処理
７０７ビデオ出力バッファ書き込み処理
７０８オーディオ出力バッファ書き込み処理
７０９多重化ファイル書き込み処理（ビデオ出力バッファから）
７１０多重化ファイル書き込み処理（オーディオ出力バッファから）
７１１システム情報出力バッファ書き込み処理
７１２多重化ファイル書き込み処理（システム情報出力バッファから） DESCRIPTION OF SYMBOLS 101 Audio encoding file 102 Video encoding file 103 Application 104 Multiplexing process part 105 Multiplexing file 106 Audio output buffer 107 Video output buffer 108 System information buffer 109 Decoder buffer size input part 301 Media Data Box
302 Track Box (Video)
303 Track Box (Audio)
304 Chunk Offset Box
305 Sample to Chunk Box
306 Sample Size Box
307 Time to Sample Box
401 Video sequence 402 Audio sequence 701 Video frame read processing 702 Video frame size storage processing 703 Video frame / time stamp storage processing 704 Audio frame read processing 705 Time stamp comparison processing 706 Audio data size comparison processing 707 Video output buffer write processing 708 Audio output Buffer write processing 709 Multiplex file write processing (from video output buffer)
710 Multiplex file write processing (from audio output buffer)
711 System information output buffer write processing 712 Multiplex file write processing (from system information output buffer)

Claims

A feature is provided with a decoder buffer size input unit for designating a block size of each encoded data when generating a multiplexed stream for reproducing the encoded data in synchronization with each other on the playback device side. A code multiplexing apparatus.

2. The code multiplexing apparatus according to claim 1, wherein the plurality of encoded data includes audio data and image data.

In generating a multiplexed stream for reproducing a plurality of encoded data on the playback device side in synchronization with each encoded data,
Specify the block size of each encoded data,
Read the maximum number of frames not exceeding the block size from one of the plurality of encoded data,
When reading a frame having the same time property as the one encoded data from the other of the plurality of encoded data, the frame is read within a range in which the other encoded data does not exceed the block size. Code multiplexing method.

When reading a frame having the same time property as the one encoded data from the other of the plurality of encoded data, the other encoded data is within a range that does not exceed the block size. 4. The code multiplexing method according to claim 3, wherein a frame having a time stamp value smaller than an end time of the last frame is read.

5. The code multiplexing method according to claim 3, wherein the plurality of encoded data includes audio data and image data.