JP2006238003A

JP2006238003A - Demultiplexer

Info

Publication number: JP2006238003A
Application number: JP2005049053A
Authority: JP
Inventors: Tadamasa Toma; 正真遠間; Yoshinori Matsui; 義徳松井; Shinya Sumino; 眞也角野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-02-24
Filing date: 2005-02-24
Publication date: 2006-09-07

Abstract

【課題】パケットロスなどの受信エラーによりデータの欠落したストリームを格納したＭＰ４ファイルを再生する際には、データが欠落した区間において、動画の表示がフリーズする、あるいは、音声が無音になるなどの課題があった。
【解決手段】ステップ１１０４において、サンプルの再生時間長が所定の値を超える場合にはデータが不連続であると判定し、動画像と音声が共に不連続である区間の再生をスキップすることにより、データが欠落した区間の再生をスキップできる。
【選択図】図１０PROBLEM TO BE SOLVED: To reproduce a moving image display in a section in which data is lost or to silence sound when playing an MP4 file storing a stream in which data is lost due to a reception error such as packet loss. There was a problem.
In step 1104, when the playback time length of a sample exceeds a predetermined value, it is determined that the data is discontinuous, and skipping playback of a section in which both the moving image and the sound are discontinuous is performed. , It is possible to skip the playback of the section where the data is missing.
[Selection] Figure 10

Description

動画像、音声などの符号化ストリームの多重化データを分離、復号して再生する逆多重化装置に関する。 The present invention relates to a demultiplexing apparatus that separates, decodes and reproduces multiplexed data of an encoded stream such as a moving image and audio.

近年、蓄積メディアや通信ネットワークの大容量化、あるいは伝送技術の進歩にともない、動画や音声などの符号化マルチメディアデータを扱う機器や、サービスが普及してきた。例えば、放送分野においては、従来のアナログ放送に代わり、デジタル符号化されたメディアデータの放送が開始された。現在のデジタル放送は、固定受信のみを対象としているが、将来的には携帯電話などの移動体向けの放送も予定されている。また、通信分野においても、第３世代の携帯電話向けの動画配信サービスが立ち上がるなど、固定端末と携帯端末の双方でマルチメディアデータを扱う環境が整ってきている。これらの背景を鑑みると、SD（Secure Digital）カードなどのメモリカード、あるいはDVD-RAM（Digital Versatile Disk-Rewritable）などの光ディスクに、放送や、インターネット経由で受信したコンテンツデータを記録し、機器間でコンテンツデータを共有するといった使用方法の普及が見込まれる。 In recent years, devices and services that handle encoded multimedia data such as moving images and voices have become widespread as storage media and communication networks have increased in capacity and transmission technology has advanced. For example, in the broadcasting field, broadcasting of digitally encoded media data has started instead of conventional analog broadcasting. The current digital broadcasting is intended only for fixed reception, but in the future broadcasting for mobiles such as mobile phones is also planned. Also in the communication field, an environment for handling multimedia data on both fixed terminals and mobile terminals has been established, such as the launch of a video distribution service for third-generation mobile phones. In view of these backgrounds, content data received via broadcast or the Internet is recorded on a memory card such as an SD (Secure Digital) card or an optical disk such as a DVD-RAM (Digital Versatile Disk-Rewritable), and it is used between devices. It is expected that usage methods such as sharing content data will spread.

メディアデータを放送、蓄積、あるいはネットワーク経由で配信する際には、メディアデータの再生に必要なヘッダ情報とメディアデータとが多重化される。多重化にあたっては、放送やＤＶＤなどの蓄積機器向け、および移動体向けに、それぞれ標準の多重化方式が規格化されている。まず、デジタル放送やDVDにおいては、ISO/IEC JTC1/SC29/WG 11 (International Standardisation Organization/International Engineering Consortium)において標準化されたＭＰＥＧ−２（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔＧｒｏｕｐ）システム規格が使用される。また、携帯端末では、第３世代の移動体通信システムの規格化を目的とする国際標準化団体である３ＧＰＰ(Third Generation Partnership Project)で、無線による動画配信規格として定められたTS26.234(Transparent end-to-end packet switched streaming service)において、ISO/IEC JTC1/SC29/WG 11で標準化されたMP４ファイルフォーマットが採用されている。 When media data is broadcast, stored, or distributed via a network, header information and media data necessary for reproducing the media data are multiplexed. In multiplexing, standard multiplexing schemes are standardized for storage devices such as broadcast and DVD, and for mobile units. First, in digital broadcasting and DVD, the MPEG-2 (Moving Picture Expert Group) system standard standardized by ISO / IEC JTC1 / SC29 / WG11 (International Standardization Organization / International Engineering Consortium) is used. In mobile terminals, TS26.234 (Transparent end), which was established as a wireless video distribution standard by the 3GPP (Third Generation Partnership Project), an international standardization organization aimed at standardizing third-generation mobile communication systems. -to-end packet switched streaming service) adopts the MP4 file format standardized by ISO / IEC JTC1 / SC29 / WG 11.

また、動画の符号化方式としては、現在普及しているＭＰＥＧ−２ＶｉｓｕａｌやＭＰＥＧ−４Ｖｉｓｕａｌの後継規格としてＭＰＥＧ−４ＡＶＣ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ）が標準化されたことから、今後ＭＰＥＧ−４ＡＶＣの符号化動画像データをＭＰＥＧ−２システム規格やＭＰ４ファイルフォーマット（以降、ＭＰ４と呼ぶ）により多重化し、放送、蓄積あるいは配信することが予想される。 In addition, since MPEG-4 AVC (Advanced Video Coding) has been standardized as a successor to the currently popular MPEG-2 Visual and MPEG-4 Visual as a video encoding system, MPEG-4 AVC will be used in the future. It is expected that the encoded moving image data is multiplexed, broadcasted, stored, or distributed according to the MPEG-2 system standard or the MP4 file format (hereinafter referred to as MP4).

以下に、ＭＰＥＧ−２システム、およびＭＰ４における符号化データ多重化方法の概要について説明する。ＭＰＥＧ−２システム、およびＭＰ４では、符号化データを扱う際の基本単位としてアクセスユニット（ＡＵ）を使用するため、まずＡＵの構造について説明する。ＡＵとは、動画像の１ピクチャ、あるいは音声の１フレーム分の符号化データを含む単位であり、ＭＰＥＧ−４ＡＶＣにおけるＡＵデータは、図１に示す構造をもつ。ＭＰＥＧ−４ＡＶＣでは、ピクチャの復号に必須のデータに加えて、復号に必須でないＳＥＩ（ＳｕｐｐｌｅｍｅｎｔａｌＥｎｈａｎｃｅｍｅｎｔＩｎｆｏｒｍａｔｉｏｎ）と呼ばれる補助情報や、ＡＵの境界情報などをＡＵデータに含めることができ、これらのデータは全てＮＡＬ（ＮｅｔｗｏｒｋＡｄａｐｔａｔｉｏｎＬａｙｅｒ）ユニットに格納される。ＮＡＬユニットは、図１（a）に示すようにヘッダとペイロードから構成され、ヘッダのサイズは１バイトであり、ペイロードに格納されるデータのタイプ（以降、ＮＡＬユニットタイプと呼ぶ）を示すフィールドなどが含まれる。ＮＡＬユニットタイプは、スライスやＳＥＩなどデータの種類別に値が定義されており、ＮＡＬユニットに格納されたデータの種類を取得する際にはＮＡＬユニットタイプを参照する。ＡＵには、図１（ｂ）および（ｃ）に示すように、１ピクチャ分のスライスデータに加えて、ヘッダ情報やＳＥＩなどのＮＡＬユニットが格納されるが、ＮＡＬユニットにはＮＡＬユニットデータの境界を識別するための情報が存在しないため、ＡＵ格納時には、各ＮＡＬユニットの先頭に境界情報が付加することができる。境界情報としては、図１（ｂ）のように0x000001の３バイトで示されるスタートコードプレフィックスを付加する方法（以降、バイトストリームフォーマットと呼ぶ）と、図１（ｃ）のようにＮＡＬユニットのサイズを付加する方法（以降、ＮＡＬサイズフォーマットと呼ぶ）の２種類がある。なお、ＡＵの先頭ＮＡＬユニット、および特定のＮＡＬユニットタイプ値をもつＮＡＬユニットに対しては、スタートコードプレフィックスの前に、ｚｅｒｏ＿ｂｙｔｅ（値が０ｘ００である１バイト）を１つ以上付加することが規定されている。ＭＰＥＧ−２システムでは、バイトストリームフォーマットが使用され、ＭＰ４ではＮＡＬサイズフォーマットが使用される。次に、スライス、およびヘッダ情報について詳しく説明する。スライスは、ＩＤＲ（ＩｎｓｔａｎｔａｎｅｏｕｓＤｅｃｏｄｅｒＲｅｆｒｅｓｈ）スライスとそれ以外のスライスの２種類に分けられる。ＩＤＲスライスとは、画面内符号化されたスライスデータであり、後述するＳＰＳ（ＳｅｑｕｅｎｃｅＰａｒａｍｅｔｅｒＳｅｔ）などのヘッダ情報はＩＤＲスライスにおいてのみ切り替えることができる。ピクチャにＩＤＲスライスが含まれる際には、同一ピクチャ内の他のスライスも全てＩＤＲスライスであるため、以降、ＩＤＲスライスを含むＡＵをＩＤＲＡＵと呼ぶことにする。また、ＩＤＲＡＵから、次のＩＤＲＡＵの直前ＡＵまでのAUから構成される単位をシーケンスと呼び、ＡＵのスライスデータを復号する際には、シーケンス内のＡＵのみが参照されるため、シーケンス単位でランダムアクセスすることができる。次に、ヘッダ情報にはＳＰＳとＰＰＳ（ＰｉｃｔｕｒｅＰａｒａｍｅｔｅｒＳｅｔ）の２種類があり、ＳＰＳはシーケンス単位で固定のヘッダ情報であり、ＰＰＳはピクチャ単位で切り替えることのできるヘッダ情報である。ＳＰＳとＰＰＳは、ともに複数もつことができ、個々のＳＰＳ、あるいはＰＰＳはインデックス番号により区別される。また、１ＮＡＬユニットには１つのＳＰＳ、あるいはＰＰＳが格納される。各ピクチャが参照するＳＰＳ、およびＰＰＳのインデックス番号は次のように取得される。まず、ピクチャが参照するＰＰＳのインデックス番号は、スライスデータのヘッダ部に示される。次に、ＰＰＳには、ＰＰＳが参照するＳＰＳのインデックス番号が示されるため、ピクチャが参照するＰＰＳを解析することにより、ピクチャが参照するＳＰＳのインデックス番号を取得する。 The outline of the encoded data multiplexing method in the MPEG-2 system and MP4 will be described below. In the MPEG-2 system and MP4, since an access unit (AU) is used as a basic unit when handling encoded data, the structure of the AU will be described first. The AU is a unit including encoded data for one picture of moving picture or one frame of audio, and the AU data in MPEG-4 AVC has a structure shown in FIG. In MPEG-4 AVC, in addition to data essential for picture decoding, auxiliary information called SEI (Supplemental Enhancement Information) that is not essential for decoding, AU boundary information, and the like can be included in AU data. Are all stored in a NAL (Network Adaptation Layer) unit. As shown in FIG. 1A, the NAL unit is composed of a header and a payload, the header size is 1 byte, and a field indicating the type of data stored in the payload (hereinafter referred to as the NAL unit type). included. The NAL unit type has a value defined for each type of data such as slice and SEI, and refers to the NAL unit type when acquiring the type of data stored in the NAL unit. As shown in FIGS. 1B and 1C, in addition to slice data for one picture, the AU stores NAL units such as header information and SEI. The NAL unit stores NAL unit data. Since there is no information for identifying the boundary, the boundary information can be added to the head of each NAL unit when storing the AU. The boundary information includes a method of adding a start code prefix indicated by 3 bytes of 0x000001 as shown in FIG. 1B (hereinafter referred to as a byte stream format), and the size of the NAL unit as shown in FIG. There are two types of methods (hereinafter referred to as NAL size format). Note that it is stipulated that one or more zero_bytes (one byte whose value is 0x00) is added before the start code prefix for the head NAL unit of the AU and a NAL unit having a specific NAL unit type value. ing. The MPEG-2 system uses the byte stream format, and MP4 uses the NAL size format. Next, the slice and header information will be described in detail. The slices are divided into two types: IDR (Instantaneous Decoder Refresh) slices and other slices. The IDR slice is slice data that is intra-coded, and header information such as SPS (Sequence Parameter Set) described later can be switched only in the IDR slice. When an IDR slice is included in a picture, all other slices in the same picture are also IDR slices. Therefore, an AU including an IDR slice is hereinafter referred to as an IDR AU. A unit composed of AUs from an IDR AU to an AU immediately before the next IDR AU is called a sequence, and when decoding AU slice data, only the AU in the sequence is referred to. Can be accessed randomly. Next, there are two types of header information, SPS and PPS (Picture Parameter Set), SPS is fixed header information in sequence units, and PPS is header information that can be switched in picture units. There can be a plurality of SPS and PPS, and each SPS or PPS is distinguished by an index number. Further, one SPS or PPS is stored in one NAL unit. The index numbers of the SPS and the PPS referred to by each picture are obtained as follows. First, the PPS index number referenced by the picture is indicated in the header portion of the slice data. Next, since the PPS indicates the index number of the SPS referred to by the PPS, the index number of the SPS referenced by the picture is obtained by analyzing the PPS referenced by the picture.

次に、放送において、ＭＰＥＧ−２システムによりＡＵデータを多重化する際の方法について説明する。ＭＰＥＧ−２システムでは、符号化データは、まずＰＥＳ（ＰａｃｋｅｔｉｚｅｄＥｌｅｍｅｎｔａｒｙＳｔｒｅａｍ）パケットに多重化され、さらにＰＥＳパケットがＴＳ（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ）パケットに多重化される。図２の（a）と（ｂ）に、ＰＥＳパケットとＴＳパケットの構造をそれぞれ示す。ＰＥＳパケットのペイロードには、アクセスユニット（ＡＵ）データが格納される。図２（a）の（１）から（３）は、ＰＥＳパケットのペイロードへのＡＵデータの格納例を示すものであり、（１）、（２）に示すように１以上のＡＵをまとめて格納してもよいし、（３）に示すようにＡＵデータを分割して格納してもよい。さらに、ペイロードには、ＡＵデータとは別に、スタッフィングデータを含めることもできる。ＰＥＳパケットのヘッダは、0x000001の３バイトで示されるスタートコードプレフィックス、および１バイトのストリームＩＤから構成される計４バイトのスタートコードから開始する。ストリームＩＤとは、ＰＥＳパケットのペイロードデータに含まれる符号化データの種類を示す識別番号であり、ＭＰＥＧ−４ＡＶＣでは、0xE0以上0xEF以下の任意の値をとることができる。ヘッダには、ペイロード内で開始する先頭ＡＵの復号時刻、および表示時刻を格納することができるが、全てのＰＥＳパケットに必ずこれらの時間情報が格納されるわけではなく、時間情報が格納されないＰＥＳパケットも存在する。ＰＥＳパケットのヘッダにより復号時刻、あるいは表示時刻が示されないＡＵの時間情報が必要である際には、ＡＵデータを解析して、直前ＡＵとの復号時刻、あるいは表示時刻の差分値を取得する。なお、ＰＥＳパケットの開始位置は、ＴＳパケットのペイロードデータ内で、４バイトのスタートコードをサーチすることにより検出する。一方、ＰＥＳパケットのデータは、図２の（ｂ）に示すように、ＴＳパケットのペイロードに分割して格納される。ＴＳパケットは、サイズが１８８バイトである固定長のパケットであり、４バイトのヘッダ、アダプテーションフィールド、ペイロードデータから構成される。なお、アダプテーションフィールドは、ヘッダ内の特定のフラグがセットされている場合にのみ存在する。ヘッダには、ＴＳパケットが伝送するデータの種類を示すＰＩＤと呼ばれる識別番号と、continuity_counterと呼ばれるカウンタが含まれる。continuity_counterは、４ビットのフィールドであり、同一ＰＩＤのＴＳパケットにおいては、送信順に１ずつ増加し、最大値に達すると循環する。ＴＳパケットのＰＩＤと、ＴＳパケットが伝送するデータの種類との対応関係は、別途ＴＳパケットにより送信されるプログラム情報によって提供される。このため、ＴＳパケット受信時には、まず、ＴＳパケットのＰＩＤを取得し、ＰＩＤの値に応じてパケットを振り分ける。例えば、受信開始時に取得したプログラム情報により、ＭＰＥＧ−４ＡＶＣのデータはＰＩＤが３２であるＴＳパケットにより伝送されることが示される際には、ＰＩＤが３２であるＴＳパケットを取得することにより、ＭＰＥＧ−４ＡＶＣのＡＵデータを取得することができる。ここで、受信したＴＳパケットのcontinuity_counter値にギャップが発生している際には、伝送路においてパケットロスが発生したことを示す。また、ＴＳパケットからＡＵデータを分離する際には、ＴＳパケットのペイロードデータからＰＥＳパケットを分離し、分離されたＰＥＳパケットからＡＵのデータを分離する。 Next, a method for multiplexing AU data by the MPEG-2 system in broadcasting will be described. In the MPEG-2 system, first, the encoded data is multiplexed into a PES (Packetized Elementary Stream) packet, and the PES packet is further multiplexed into a TS (Transport Stream) packet. FIGS. 2A and 2B show the structures of the PES packet and the TS packet, respectively. Access unit (AU) data is stored in the payload of the PES packet. (1) to (3) in FIG. 2 (a) show an example of storing AU data in the payload of the PES packet. As shown in (1) and (2), one or more AUs are grouped together. The AU data may be divided and stored as shown in (3). Further, the stuffing data can be included in the payload separately from the AU data. The header of the PES packet starts with a start code prefix of 3 bytes of 0x000001 and a 4-byte start code composed of a 1-byte stream ID. The stream ID is an identification number indicating the type of encoded data included in the payload data of the PES packet. In MPEG-4AVC, the stream ID can take any value from 0xE0 to 0xEF. The header can store the decoding time and display time of the first AU starting in the payload, but these time information is not necessarily stored in all PES packets, and PES in which time information is not stored is stored. There are also packets. When the time information of the AU whose decoding time or display time is not indicated by the header of the PES packet is required, the AU data is analyzed, and the difference value between the decoding time and the display time from the previous AU is acquired. The start position of the PES packet is detected by searching for a 4-byte start code in the payload data of the TS packet. On the other hand, as shown in FIG. 2B, the data of the PES packet is divided and stored in the payload of the TS packet. The TS packet is a fixed-length packet having a size of 188 bytes, and includes a 4-byte header, an adaptation field, and payload data. Note that the adaptation field exists only when a specific flag in the header is set. The header includes an identification number called PID indicating the type of data transmitted by the TS packet and a counter called continuity_counter. The continuity_counter is a 4-bit field. In a TS packet with the same PID, the continuity_counter increases by 1 in the order of transmission and circulates when the maximum value is reached. The correspondence between the PID of the TS packet and the type of data transmitted by the TS packet is provided by program information separately transmitted by the TS packet. For this reason, when a TS packet is received, first, the PID of the TS packet is acquired, and the packet is distributed according to the value of the PID. For example, when the program information acquired at the start of reception indicates that MPEG-4 AVC data is transmitted by a TS packet with a PID of 32, by acquiring the TS packet with a PID of 32, MPEG-4 AVC AU data can be acquired. Here, when a gap occurs in the continuity_counter value of the received TS packet, it indicates that a packet loss has occurred in the transmission path. Further, when separating AU data from TS packets, PES packets are separated from payload data of TS packets, and AU data is separated from the separated PES packets.

最後に、ＭＰ４におけるＡＵデータの多重化方法について説明する。ＭＰ４では、サンプル単位のヘッダ情報やメディアデータは、Ｂｏｘと呼ばれるオブジェクト単位で管理する。ここで、サンプルとは、ＭＰ４においてメディアデータを扱う際の基本単位であり、１サンプルは１ＡＵに相当する。各サンプルには、復号時刻順で昇順となるようにサンプル番号が振られ、サンプル番号は、サンプル毎に１ずつ増加する。図３（a）はＢｏｘの構造を示し、以下のフィールドから構成される。 Finally, a method for multiplexing AU data in MP4 will be described. In MP4, header information and media data in units of samples are managed in units of objects called boxes. Here, a sample is a basic unit for handling media data in MP4, and one sample corresponds to 1 AU. Each sample is assigned a sample number so as to be in ascending order in decoding time order, and the sample number increases by one for each sample. FIG. 3A shows the box structure, which is composed of the following fields.

size：sizeフィールドも含めたBox全体のサイズ
type：Ｂｏｘの識別子であり、通常はアルファベット4文字で表される。フィールド長は４バイトであり、ＭＰ４ファイル内でＢｏｘを検索する際には、連続する４バイト分のデータがtypeフィールドの識別子と一致するかどうかを判定することにより行う。 size: Size of the entire Box including the size field
type: A box identifier, usually represented by four alphabetic characters. The field length is 4 bytes, and when a Box is searched in the MP4 file, it is determined by determining whether or not the continuous 4 bytes of data matches the identifier of the type field.

version：Ｂｏｘのバージョン番号
flags：Ｂｏｘ毎に設定されるフラグ情報
データ：ヘッダ情報やメディアデータが格納される。 version: Box version number
flags: Flag information set for each box Data: Header information and media data are stored.

なお、versionとflagsは必須でないため、Ｂｏｘによってはこれらのフィールドは存在しない。以後、Boxの参照にはtypeフィールドの識別子を使用することとし、例えばtypeが‘moov'であるBoxは、moovと呼ぶ。ＭＰ４ファイルにおけるＢｏｘ構造を図３（b）に示す。ＭＰ４ファイルは、ftyp、moov、mdat、あるいはmoofから構成され、ftypがファイルの先頭に配置される。ftypは、ＭＰ４ファイルを識別するための情報を含み、mdatには、メディアデータが格納される。mdatに含まれる各メディアデータはトラックと呼ばれ、各トラックはトラックIDにより識別される。次に、moovにはmdatの各トラックに含まれるサンプルについてのヘッダ情報が格納される。moov内では、図４（a）に示すように、Ｂｏｘが階層的に配置され、音声、動画像などの各メディアトラックにヘッダ情報は、それぞれ別々のtrakに格納される。trak内においても、Ｂｏｘが階層的に配置され、サンプルのサイズや復号時刻、表示開始時間、あるいはランダムアクセス可能なサンプルの情報などがstbl内の各Boxに格納される（図４（ｂ））。ランダムアクセス可能なサンプルはシンクサンプルと呼ばれ、シンクサンプルのサンプル番号の一覧は、stbl内のstssにより示される。上記では、トラック内の全サンプルのヘッダ情報をmoovに格納していたが、トラックを分割してフラグメント化し、フラグメント単位でヘッダ情報を格納することもできる。トラックを分割した単位に対するヘッダ情報は、moofにより示される。図５はフラグメント化したＭＰ４ファイルの構造例であり、mdat #1内のサンプルのヘッダ情報はmoof #1に格納される。 In addition, since version and flags are not essential, these fields do not exist depending on the Box. Hereinafter, the identifier of the type field is used for reference to Box. For example, a Box whose type is 'moov' is called moov. The Box structure in the MP4 file is shown in FIG. The MP4 file is composed of ftyp, moov, mdat, or moof, and ftyp is arranged at the head of the file. ftyp includes information for identifying the MP4 file, and media data is stored in mdat. Each media data included in mdat is called a track, and each track is identified by a track ID. Next, header information about samples included in each track of mdat is stored in moov. In moov, as shown in FIG. 4A, Boxes are arranged hierarchically, and header information is stored in separate traks for each media track such as audio and moving images. Even in trak, Boxes are arranged hierarchically, and sample size, decoding time, display start time, information on randomly accessible samples, etc. are stored in each Box in stbl (FIG. 4 (b)). . Randomly accessible samples are called sync samples, and a list of sample numbers of sync samples is indicated by stss in stbl. In the above description, the header information of all the samples in the track is stored in moov. However, the track can be divided and fragmented, and header information can be stored in fragment units. The header information for the unit into which the track is divided is indicated by moof. FIG. 5 shows an example of the structure of a fragmented MP4 file. The header information of the sample in mdat # 1 is stored in moof # 1.

ここで、携帯端末において受信した放送データを、携帯電話から電子メールに添付して送信するケースを考える。３GPPでは、電子メールにおいて動画像や音声などを添付する際に、メディアデータをMP４で多重化することが規定されており、受信した放送データを電子メールに添付する場合などには、多重化方式をＴＳからＭＰ４へ変換する必要がある。なお、ＡＲＩＢ（ＡｓｓｏｃｉａｔｉｏｎｏｆＲａｄｉｏＩｎｄｕｓｔｒｉｅｓａｎｄＢｕｓｉｎｅｓｓｅｓ，電波産業界）において規格化され、日本で実施されるモバイル向けの地上デジタル放送（以下、１セグ放送と呼ぶ。）では、動画像の符号化方式としてＭＰＥＧ−４ＡＶＣ、音声の符号化方式としてＭＰＥＧ−２ＡＡＣが使用される。 Here, consider a case in which broadcast data received by a mobile terminal is transmitted from a mobile phone as an attachment to an e-mail. 3GPP stipulates that media data should be multiplexed with MP4 when attaching moving images or voices in e-mails, and a multiplexing method when attaching received broadcast data to e-mails. Need to be converted from TS to MP4. Note that in terrestrial digital broadcasting (hereinafter referred to as 1-segment broadcasting) for mobile use standardized in ARIB (Association of Radio Industries and Businesses) and implemented in Japan, MPEG is used as a moving image encoding method. -4 MPEG-2 AAC is used as an encoding method for AVC and audio.

図６は、パケットロスを含む放送データを受信して、放送データ内の動画像と音声のデータをMP4に多重化する際の復号時刻情報の格納方法を示す。この例では、図６（a）に示すように、動画像の５２番目から１００番目までのAUがパケットロスにより受信できなかったとする。このとき、MP4には、AU１からAU５１までと、AU10１以降がサンプルとして格納される。従って、図６（ｂ）のように、５１番目のサンプルはAU１０１のピクチャデータを格納することになる。これらサンプルについての復号時刻情報をmoov内に格納すると、各サンプルの復号時刻の差分値を記述するBoxであるsttsのテーブルは図６（c）のようになる。サンプル１からサンプル５０までは、復号時刻の差分値が全て１００ｍｓとなるため同一のエントリを参照すればよいが、サンプル５１とサンプル５２の復号時刻の差分値は、AU５２からAU１００がロストしたため、５０００ｍｓとなる。従って、サンプル５１用に新規のエントリを作成して差分値が５０００ｍｓであることを示す。ここで、受信ストリームは１０Hzの固定フレームレートであるとする。 FIG. 6 shows a method for storing decoding time information when receiving broadcast data including packet loss and multiplexing moving picture and audio data in the broadcast data into MP4. In this example, as shown in FIG. 6A, it is assumed that the 52nd to 100th AUs of a moving image cannot be received due to packet loss. At this time, MP4 stores AU1 to AU51 and AU101 and later as samples. Therefore, as shown in FIG. 6B, the 51st sample stores picture data of AU101. When the decoding time information for these samples is stored in the moov, a table of stts that is a box describing the difference value of the decoding time of each sample is as shown in FIG. From sample 1 to sample 50, the difference value of the decoding time is all 100 ms, so the same entry may be referred to. However, the difference value of the decoding time of sample 51 and sample 52 is 5000 ms because AU 52 to AU 100 are lost. It becomes. Therefore, a new entry is created for the sample 51 to indicate that the difference value is 5000 ms. Here, it is assumed that the received stream has a fixed frame rate of 10 Hz.

図７は、ＭＰ４ファイルを再生する従来の逆多重化装置１００のブロック図である。多重化装置１００は、入力されたＭＰ４ファイルから動画像および音声のＡＵを分離し、復号して再生するものであり、ヘッダ分離手段１０１、ヘッダメモリ１０２、ｍｄａｔメモリ１０３、サンプル取得手段１０４、時刻情報解析手段１０５、および復号表示手段１０６とを備える。 FIG. 7 is a block diagram of a conventional demultiplexer 100 that plays back an MP4 file. The multiplexing apparatus 100 separates a moving image and audio AU from an input MP4 file, decodes and reproduces them, and includes a header separation unit 101, a header memory 102, an mdat memory 103, a sample acquisition unit 104, a time An information analysis unit 105 and a decoding display unit 106 are provided.

ヘッダ分離手段１０１は、ＭＰ４ファイルＦｉｌｅＩｎからヘッダデータＨｄａｔとｍｄａｔデータＤｍｄａｔとを分離して、ヘッダデータＨｄａｔをヘッダメモリ１０２に、ｍｄａｔデータＤｍｄａｔをｍｄａｔメモリ１０３に、それぞれ入力する。ここで、ヘッダデータＨｄａｔは、ｍｏｏｖあるいはｍｏｏｆのデータを含む。時刻情報解析手段１０５は、ヘッダメモリ１０２からヘッダ時刻情報ＴＨｉｎｆを取得して解析し、外部から入力された再生開始時刻Ｔｓｔを満たす音声あるいは動画像のサンプルを決定する。さらに、前記決定したサンプルのサンプル番号ＳｐｌＮｕｍをサンプル取得手段１０４に入力するとともに、サンプル番号ＳｐｌＮｕｍであるサンプルの復号時刻および表示時刻を示す時刻情報Ｔｉｎｆを復号表示手段１０６に入力する。なお、復号時刻と表示時刻とが等しい場合には、復号時刻のみを時刻情報Ｔｉｎｆに含めてもよい。ここで、ヘッダ時刻情報ＴＨｉｎｆとは、ｍｏｏｖ内のｓｔｔｓとｃｔｔｓ、およびｍｏｏｆ内のｔｒｕｎなどに格納されるサンプル毎の復号時刻と表示時刻に関する情報を指す。また、サンプル番号とは、ＭＰ４の各トラックにおいてサンプルを識別する番号であり、先頭サンプルから復号順に１ずつ増加する。従って、再生開始後は、シーク再生などを行うために外部から再生開始時刻が新たに与えられなければ、サンプル番号ＳｐｌＮｕｍは１ずつ増加する。次に、サンプル取得手段は、サンプル番号ＳＰｌＮｕｍであるサンプルデータｓｐｌＤａｔａ１を取得して、復号表示手段１０６に入力する。手順としては、ヘッダメモリ１０２から取得したアクセス情報ＡｃｓＩｎｆを解析して、所望のサンプル番号をもつサンプルの格納位置を特定し、ｍｄａｔメモリ１０３からサンプルデータｓｐｌＤａｔ１を取得する。最後に、復号表示手段１０６は、時刻情報Ｔｉｎｆから取得した復号時刻と表示時刻に基づいて、サンプルデータｓｐｌＤａｔ１を復号し、出力Ｏｕｔとして出力する。 The header separation means 101 separates the header data Hdat and the mdat data Dmdat from the MP4 file FileIn, and inputs the header data Hdat to the header memory 102 and the mdat data Dmdat to the mdat memory 103, respectively. Here, the header data Hdat includes moov or moof data. The time information analysis unit 105 acquires and analyzes the header time information THinf from the header memory 102, and determines a voice or moving image sample that satisfies the reproduction start time Tst input from the outside. Further, the sample number SplNum of the determined sample is input to the sample acquisition unit 104, and time information Tinf indicating the decoding time and display time of the sample having the sample number SplNum is input to the decoding display unit 106. When the decoding time and the display time are equal, only the decoding time may be included in the time information Tinf. Here, the header time information THinf refers to information on the decoding time and display time for each sample stored in stts and ctts in moov, trun in moof, and the like. The sample number is a number for identifying a sample in each track of MP4, and is incremented by 1 in the decoding order from the first sample. Therefore, after the start of reproduction, the sample number SplNum is incremented by 1 unless a new reproduction start time is given from the outside in order to perform seek reproduction or the like. Next, the sample acquisition unit acquires the sample data splData1 having the sample number SP1Num and inputs it to the decoding display unit 106. As a procedure, the access information AcsInf acquired from the header memory 102 is analyzed, the storage position of the sample having the desired sample number is specified, and the sample data splDat1 is acquired from the mdat memory 103. Finally, the decoding display means 106 decodes the sample data splDat1 based on the decoding time acquired from the time information Tinf and the display time, and outputs it as output Out.

図８は、逆多重化装置１００の動作を示すフローチャートである。まず、ステップ１００１においてＭＰ４ファイルのヘッダ部とデータ部を分離し、続いてステップ１００２で復号を開始するサンプルを決定する。ステップ１００３では、ヘッダ部の時刻情報からサンプルの復号時刻と表示時刻、および再生時間長（Ｄｕｒ＿ｉｎ）を決定する。ここで、再生時間長とは、動画像であれば表示を継続する時間であり、音声であれば再生を継続する時間となる。次に、ステップ１００４では、ステップ１００３で取得した復号時刻と表示時刻に基づいてサンプルを復号、表示し、Ｄｕｒ＿ｉｎの間再生を継続する。なお、音声では、復号時刻と表示時刻は同一である。復号が必要な最終サンプルの復号が終了するまでステップ１００３とステップ１００４の処理を繰り返す。本フローチャートでは、ステップ１００２において復号開始サンプルを決定した後は、サンプルを復号順に順次サンプルを再生するとして、復号開始後は復号するサンプルのサンプル番号は１ずつ増加するものとしている。シーク再生などの特殊再生を行う場合には、特殊再生動作により指定された位置に相当するサンプルのサンプル番号を取得し、当該サンプルから復号開始するという動作を繰り返す。
特開２００３―１１４８４５公報（第６−１８項、図６） FIG. 8 is a flowchart showing the operation of the demultiplexer 100. First, in step 1001, the header portion and the data portion of the MP4 file are separated, and then in step 1002, a sample to start decoding is determined. In step 1003, the decoding time and display time of the sample and the playback time length (Dur_in) are determined from the time information in the header part. Here, the reproduction time length is a time for which display is continued for a moving image, and is a time for continuing reproduction for a sound. Next, in step 1004, samples are decoded and displayed based on the decoding time and display time acquired in step 1003, and playback is continued during Dur_in. In the case of voice, the decoding time and the display time are the same. Steps 1003 and 1004 are repeated until the decoding of the final sample that needs to be decoded is completed. In this flowchart, after the decoding start sample is determined in step 1002, the samples are sequentially reproduced in the decoding order, and the sample number of the sample to be decoded is incremented by 1 after the decoding is started. When performing special reproduction such as seek reproduction, the operation of obtaining the sample number of the sample corresponding to the position designated by the special reproduction operation and starting decoding from the sample is repeated.
Japanese Patent Laid-Open No. 2003-114845 (Section 6-18, FIG. 6)

従来の逆多重化装置１００において、図６に示すＭＰ４ファイルを再生する際の動作について説明する。ここで、図６（ａ）のストリームにおいては、復号時刻と表示時刻が等しいものとする。時刻情報解析手段１０５は、ｓｔｔｓなどＭＰ４ファイルのヘッダに格納された時刻情報に従ってサンプルの復号時刻、表示時刻、および再生時間長を決定する。このとき、サンプル５１とサンプル５２との復号時刻の差分値は５０００ｍｓであり、サンプル５１の再生時間長は５０００ｍｓであると決定されるため、サンプル５１の復号結果は５０００ｍｓの間表示が継続される。結果として、本来の再生時間長が１００ｍｓであるサンプル５１が実際よりも４９００ｍｓ長く表示されることになる。このように、従来の多重化装置１００において、図６（ｂ）のようなストリームが格納されたＭＰ４ファイルを再生すると、サンプルの再生が長時間に渡ってフリーズし、再生品質が低下するという課題があった。 The operation of reproducing the MP4 file shown in FIG. 6 in the conventional demultiplexer 100 will be described. Here, it is assumed that the decoding time and the display time are the same in the stream of FIG. The time information analysis means 105 determines the decoding time, display time, and playback time length of the sample according to the time information stored in the MP4 file header such as stts. At this time, the difference value between the decoding times of the sample 51 and the sample 52 is 5000 ms, and the playback time length of the sample 51 is determined to be 5000 ms. Therefore, the display of the decoding result of the sample 51 is continued for 5000 ms. . As a result, the sample 51 whose original reproduction time length is 100 ms is displayed 4900 ms longer than the actual one. As described above, in the conventional multiplexing apparatus 100, when the MP4 file storing the stream as shown in FIG. 6B is reproduced, the reproduction of the sample is frozen for a long time, and the reproduction quality is deteriorated. was there.

本発明は、以上の課題を解決するためになされたものである。
本発明の請求項１にかかる逆多重化装置は、動画像、音声などの符号化データが１以上のパケットに多重化されたデータから、前記各符号化データを分離して、復号、再生する逆多重化装置であって、前記多重化されたパケットからヘッダとペイロードを分離する分離手段と、少なくとも前記各符号化データ内のフレームの復号時刻あるいは表示時刻を示す情報を含む前記ヘッダを解析して、前記各符号化データにおけるフレーム毎の属性情報が所定の条件を満たすかどうか判定する判定手段と、前記所定の条件を満たす場合には、前記ヘッダにより示されるフレームの復号時刻、表示時刻、あるいは再生時間長を修正する修正手段と、を備え、前記多重化された前記動画像、あるいは前記音声の符号化データは、フレームデータが欠落した領域を含むことを特徴とする。 The present invention has been made to solve the above problems.
A demultiplexing apparatus according to claim 1 of the present invention separates each encoded data from data obtained by multiplexing encoded data such as moving images and audio into one or more packets, and decodes and reproduces the encoded data. A demultiplexer that analyzes a header that includes at least information indicating a decoding time or a display time of a frame in each encoded data, and a separating unit that separates a header and a payload from the multiplexed packet. Determining means for determining whether or not the attribute information for each frame in each encoded data satisfies a predetermined condition, and when the predetermined condition is satisfied, the decoding time, the display time of the frame indicated by the header, Or a correction means for correcting a reproduction time length, and the multiplexed video or audio encoded data lacks frame data. Characterized in that it comprises a band.

本発明の請求項２にかかる逆多重化装置は、請求項１に記載の逆多重化装置であって、前記属性情報は、前記ヘッダから取得したフレームの再生時間長ＤＵＲ１であり、前記判定手段は、前記フレームの再生時間長ＤＵＲ１が所定の値を超える場合には、前記フレームと、再生時刻順が前記フレームの直後であるフレームとの間に不連続区間が存在すると判定し、前記不連続区間が存在すると判定された際には、前記修正手段は、前記フレームの再生時間長を、前記再生時間長ＤＵＲ１よりも短い時間長ＤＵＲ２に再設定し、前記不連続区間とは、前記連続する２フレーム間において、フレームデータが欠落していることを示すものであることを特徴とする。 A demultiplexing apparatus according to claim 2 of the present invention is the demultiplexing apparatus according to claim 1, wherein the attribute information is a reproduction time length DUR1 of a frame acquired from the header, and the determination unit If the playback time length DUR1 of the frame exceeds a predetermined value, it is determined that a discontinuous section exists between the frame and a frame whose playback time order is immediately after the frame, and the discontinuity When it is determined that there is a section, the correcting means resets the playback time length of the frame to a time length DUR2 shorter than the playback time length DUR1, and the discontinuous section is the continuous It is characterized by indicating that frame data is missing between two frames.

本発明の請求項３にかかる逆多重化装置は、請求項１に記載の逆多重化装置であって、前記属性情報は、各符号化データ内の復号順で連続する２フレーム間に前記不連続区間が存在するかどうかを示すものであり、前記判定手段は、前記属性情報に基づいて前記不連続区間が存在するかどうか判定し、前記不連続区間が存在すると判定された際には、前記修正手段は、前記連続する２フレームのうち、復号順で前のフレームの再生時間長を、前記再生時間長ＤＵＲ１よりも短い時間長ＤＵＲ２に再設定することを特徴とする。 A demultiplexing apparatus according to a third aspect of the present invention is the demultiplexing apparatus according to the first aspect, wherein the attribute information is not included between two consecutive frames in the decoding order in each encoded data. It indicates whether or not there is a continuous section, the determination means determines whether or not the discontinuous section exists based on the attribute information, and when it is determined that the discontinuous section exists, The correcting means resets the playback time length of the previous frame in decoding order to the time length DUR2 shorter than the playback time length DUR1 among the two consecutive frames.

本発明の請求項４にかかる逆多重化装置は、請求項２、あるいは請求項３に記載の逆多重化装置であって、前記多重化データは、動画像と音声の符号化データを共に含み、前記判定手段は、前記動画像と前記音声のフレームのそれぞれについて不連続区間を取得して、前記動画像の不連続区間と、前記音声の不連続区間とが再生時刻において重なるかどうか判定し、前記修正手段は、前記不連続区間の重なりがある場合には、前記動画像と前記音声の双方について、フレームの再生時間長を、前記再生時間長ＤＵＲ１よりも短い時間長ＤＵＲ２に再設定することを特徴とする。 A demultiplexing apparatus according to a fourth aspect of the present invention is the demultiplexing apparatus according to the second or third aspect, wherein the multiplexed data includes both encoded video and audio encoded data. The determination unit acquires a discontinuous section for each of the moving image and the audio frame, and determines whether the discontinuous section of the moving image and the discontinuous section of the sound overlap at a reproduction time. The correction means resets the frame playback time length to a time length DUR2 shorter than the playback time length DUR1 for both the moving image and the audio when the discontinuous sections overlap. It is characterized by that.

本発明の請求項５にかかる逆多重化装置は、請求項１に記載の逆多重化装置であって、前記所定の条件を満たす場合には、復号順が前記フレームの直後であるフレームが、ランダムアクセス可能なフレームとなるように、前記動画像において復号するフレームを決定する決定手段、をさらに備えることを特徴とする。 A demultiplexing apparatus according to claim 5 of the present invention is the demultiplexing apparatus according to claim 1, wherein when the predetermined condition is satisfied, a frame whose decoding order is immediately after the frame is: The image processing apparatus further includes a determining unit that determines a frame to be decoded in the moving image so as to be a randomly accessible frame.

本発明の請求項６にかかる逆多重化装置は、前記多重化データは、動画像と音声の符号化データを共に含み、前記決定手段は、前記所定の条件を満たす場合に、前記音声において次に復号するランダムアクセス可能なフレームは、再生時刻が、前記動画像において次に復号するランダムアクセス可能なフレームの再生時刻と同一、直後のフレームであることを特徴とする。 In the demultiplexing apparatus according to claim 6 of the present invention, the multiplexed data includes both moving image and audio encoded data, and the determining means performs the following in the audio when the predetermined condition is satisfied. The random accessible frame to be decoded is the frame immediately after the reproduction time of the moving image which is the same as the reproduction time of the next randomly accessible frame to be decoded.

本発明の請求項７にかかる逆多重化方法は、動画像、音声などの符号化データが１以上のパケットに多重化されたデータから、前記各符号化データを分離して、復号、再生する逆多重化方法であって、前記多重化されたパケットからヘッダとペイロードを分離する分離ステップと、少なくとも前記各符号化データ内のフレームの復号時刻あるいは表示時刻を示す情報を含む前記ヘッダを解析して、前記各符号化データにおけるフレーム毎の属性情報が所定の条件を満たすかどうか判定する判定ステップと、前記所定の条件を満たす場合には、前記ヘッダにより示されるフレームの復号時刻、表示時刻、あるいは再生時間長を修正する修正ステップと、を備え、前記多重化された前記動画像、あるいは前記音声の符号化データは、フレームデータが欠落した領域を含むことを特徴とする。 In the demultiplexing method according to claim 7 of the present invention, each encoded data is separated from data obtained by multiplexing encoded data such as moving images and sounds into one or more packets, and is decoded and reproduced. A demultiplexing method, comprising: a separation step of separating a header and a payload from the multiplexed packet; and analyzing the header including at least information indicating a decoding time or a display time of a frame in each encoded data A determination step for determining whether or not the attribute information for each frame in each encoded data satisfies a predetermined condition, and when the predetermined condition is satisfied, a decoding time of a frame indicated by the header, a display time, Or a correction step for correcting a reproduction time length, wherein the multiplexed moving image or audio encoded data is frame data. There characterized in that it comprises a missing region.

本発明の請求項８にかかる逆多重化方法は、請求項７に記載の逆多重化方法であって、前記所定の条件を満たす場合には、復号順が前記フレームの直後であるフレームが、ランダムアクセス可能なフレームとなるように、前記動画像において復号するフレームを決定する決定ステップ、をさらに備えることを特徴とする。 The demultiplexing method according to claim 8 of the present invention is the demultiplexing method according to claim 7, wherein when the predetermined condition is satisfied, a frame whose decoding order is immediately after the frame is: The method further comprises a determining step of determining a frame to be decoded in the moving image so as to be a randomly accessible frame.

本発明の請求項９にかかる記録媒体は、動画像、音声などの符号化データが１以上のパケットに多重化されたデータから、前記各符号化データを分離して、復号、再生する逆多重化方法であって、前記多重化されたパケットからヘッダとペイロードを分離する分離ステップと、少なくとも前記各符号化データ内のフレームの復号時刻あるいは表示時刻を示す情報を含む前記ヘッダを解析して、前記各符号化データにおけるフレーム毎の属性情報が所定の条件を満たすかどうか判定する判定ステップと、前記所定の条件を満たす場合には、前記ヘッダにより示されるフレームの復号時刻、表示時刻、あるいは再生時間長を修正する修正ステップと、を備え、前記多重化された前記動画像、あるいは前記音声の符号化データは、フレームデータが欠落した領域を含むことを特徴とする逆多重化方法を行わせるものであることを特徴とする。 According to a ninth aspect of the present invention, there is provided a recording medium comprising: a demultiplexer that separates and decodes and reproduces each encoded data from data obtained by multiplexing encoded data such as moving images and sounds into one or more packets. A separation step of separating a header and a payload from the multiplexed packet, and analyzing the header including at least information indicating a decoding time or a display time of a frame in each encoded data, A determination step for determining whether or not the attribute information for each frame in the encoded data satisfies a predetermined condition, and when the predetermined condition is satisfied, the decoding time, display time, or reproduction of the frame indicated by the header A correction step for correcting a time length, and the multiplexed moving image or the encoded audio data includes frame data. And characterized in that to perform the demultiplexing method characterized by comprising the dropped area.

本発明の請求項１の逆多重化装置によれば、データの一部が欠落した符号化ストリームを格納したＭＰ４ファイルを再生する際に、データの欠落に起因して動画像の表示がフリーズするなどの再生品質の低下を解消できるという効果が得られる。 According to the demultiplexer of claim 1 of the present invention, when playing back an MP4 file storing an encoded stream in which a part of the data is missing, the display of the moving image freezes due to the lack of data. The effect that the degradation of reproduction quality such as can be eliminated is obtained.

本発明の請求項２の逆多重化装置によれば、ＭＰ４規格を拡張せずに、データの一部が欠落した符号化ストリームを格納したＭＰ４ファイルを再生する際に、データの欠落に起因して動画像の表示がフリーズするなどの再生品質の低下を解消できるという効果が得られる。 According to the demultiplexing apparatus of claim 2 of the present invention, when the MP4 file storing the encoded stream in which a part of the data is lost is reproduced without extending the MP4 standard, the data is lost. As a result, it is possible to eliminate the deterioration of the reproduction quality such as the display of the moving image being frozen.

本発明の請求項３の逆多重化装置によれば、データの欠落位置を特定したうえで、再生動作を切替えられるため、データの欠落に起因して動画像の表示がフリーズするなどの再生品質の低下を、より確実に防ぐことができるという効果が得られる。 According to the demultiplexing apparatus of claim 3 of the present invention, since the reproduction operation can be switched after the data missing position is specified, the reproduction quality such as the display of the moving image freezes due to the data loss. It is possible to obtain an effect of more reliably preventing the lowering of.

以下、本発明の実施の形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（実施の形態１）
まず、本発明の実施の形態１に係る逆多重化装置１０００について説明する。ここでは、音声と動画像のトラックが各１つずつ格納されたＭＰ４ファイルを入力する。符号化方式は、音声がＭＰＥＧ−２ＡＡＣ、動画像がＭＰＥＧ−４ＡＶＣとする。なお、複数の音声あるいは動画像のトラックを含むＭＰ４ファイルであってもよい。また、符号化方式についても、音声としてはＭＰＥＧ−４ＡＡＣ、ＡＭＲ（ＡｄａｐｔｉｖｅＭｕｌｔｉ−Ｒａｔｅ）、動画像としてはＭＰＥＧ−４Ｖｉｓｕａｌ、Ｈ．２６３、ＶＣ−１（ＳＭＰＴＥで規格化された符号化方式）など他の方式であってもよい。また、多重化の方式についても、ＭＰ４に限定されるものではなく、例えば、ＡＳＦ（ＡｄｖａｎｃｅｄＳｙｓｔｅｍｓＦｏｒｍａｔ、マイクロソフト社の開発したフォーマット）やＱｕｉｃｋＴｉｍｅ（アップル社の開発したフォーマット）であってもよい。 (Embodiment 1)
First, the demultiplexing apparatus 1000 according to Embodiment 1 of the present invention will be described. Here, an MP4 file in which one track each for audio and moving images is stored is input. The encoding method is MPEG-2 AAC for audio and MPEG-4 AVC for moving images. It may be an MP4 file including a plurality of audio or moving image tracks. As for the encoding method, MPEG-4 AAC and AMR (Adaptive Multi-Rate) are used for audio, and MPEG-4 Visual, H.264 are used for moving images. Other systems such as H.263, VC-1 (encoding system standardized by SMPTE) may be used. Also, the multiplexing method is not limited to MP4, and may be, for example, ASF (Advanced Systems Format, a format developed by Microsoft) or Quick Time (a format developed by Apple).

図９は、逆多重化装置１０００の構成を示すブロック図である。逆多重化装置１０００は、ヘッダ分離手段１０１、ヘッダメモリ１０２、ｍｄａｔメモリ１０３、サンプル取得手段１０４、時刻情報解析手段１００５、復号表示手段１００２、および修正手段１００１とを備え、修正手段１００１を備えた点、および時刻情報解析手段１００５と復号表示手段１００２の動作において従来の逆多重化装置１００と異なるため、これら相違点について主に説明する。 FIG. 9 is a block diagram showing a configuration of the demultiplexer 1000. The demultiplexing apparatus 1000 includes a header separation unit 101, a header memory 102, an mdat memory 103, a sample acquisition unit 104, a time information analysis unit 1005, a decoding display unit 1002, and a correction unit 1001, and includes a correction unit 1001. Since the point and the operation of the time information analysis unit 1005 and the decoding display unit 1002 are different from those of the conventional demultiplexer 100, these differences will mainly be described.

時刻情報解析手段１００５は、サンプル番号ＳｐｌＮｕｍであるサンプル（以降、サンプル番号ＳｐｌＮｕｍであるサンプルをサンプル[ＳｐｌＮｕｍ]と示す。）の復号時刻と表示時刻、およびサンプル[ＳｐｌＮｕｍ＋ｉ]の復号時刻を少なくとも含む時刻情報Ｔｉｎｆ１を修正手段１０７に入力する。ここで、サンプル[ＳｐｌＮｕｍ＋ｉ]は、表示順がサンプル[ＳｐｌＮｕｍ]の直後のサンプルを示す。さらに、サンプル[ＳｐｌＮｕｍ＋ｉ]とサンプル[ＳｐｌＮｕｍ]の表示時刻の差分から算出されたサンプル[ＳｐｌＮｕｍ]の再生時間長Ｄｕｒ＿ｉｎについても時刻情報Ｔｉｎｆ１に含める。修正手段１００１は、時刻情報Ｔｉｎｆ１に基づいて、サンプル[ＳｐｌＮｕｍ]の復号時刻、表示時刻、および再生時間長を決定し、修正時刻ＭｏｄＴｉｎｆとして復号表示手段１００２に入力する。復号表示手段１００２は、修正時刻ＭｏｄＴｉｎｆに従って、サンプル[ＳｐｌＮｕｍ]を復号して再生する。 The time information analysis unit 1005 includes at least a decoding time and a display time of a sample having a sample number SplNum (hereinafter, a sample having a sample number SplNum is referred to as a sample [SplNum]), and a decoding time of the sample [SplNum + i]. Information Tinf1 is input to the correction means 107. Here, the sample [SplNum + i] indicates a sample immediately after the sample [SplNum] in the display order. Furthermore, the playback time length Dur_in of the sample [SplNum] calculated from the difference between the display times of the sample [SplNum + i] and the sample [SplNum] is also included in the time information Tinf1. The correction means 1001 determines the decoding time, display time, and playback time length of the sample [SplNum] based on the time information Tinf1, and inputs it to the decoding display means 1002 as the correction time ModTinf. The decoding display unit 1002 decodes and reproduces the sample [SplNum] according to the correction time ModTinf.

図１０は、逆多重化装置１０００の動作を示すフローチャートである。まず、ステップ１００１でＭＰ４ファイルのヘッダ部とデータ部とを分離して、ステップ１００２では復号を開始するサンプルを決定する。次に、ステップ１１０３では、ｓｔｔｓやｃｔｔｓ、あるいはｔｒｕｎなどの時刻情報に基づいて、次に復号するサンプルであるサンプル[ＳｐｌＮｕｍ]と、表示順がサンプル[ＳｐｌＮｕｍ]の直後となるサンプル[ＳｐｌＮｕｍ＋ｉ]の復号時刻と表示時刻、およびサンプル[ＳｐｌＮｕｍ]の再生時間長を取得する。 FIG. 10 is a flowchart showing the operation of the demultiplexer 1000. First, in step 1001, the header portion and the data portion of the MP4 file are separated, and in step 1002, a sample to start decoding is determined. Next, in step 1103, based on time information such as stts, ctts, or trun, a sample [SplNum] that is a sample to be decoded next and a sample [SplNum + i] whose display order is immediately after the sample [SplNum] are displayed. The decoding time, the display time, and the playback time length of the sample [SplNum] are acquired.

ステップ１１０４では、サンプル[ＳｐｌＮｕｍ]の再生時間長Ｄｕｒ＿ｉｎを修正する必要があるかどうか判定する。ステップ１１０４において修正の必要があると判定された際にはステップ１１０５に進み、修正は不要と判定されればステップ１１０７に進む。ステップ１１０５では、サンプル[ＳｐｌＮｕｍ]の再生時間長をＤｕｒ＿ｏｕｔに修正するとともに、サンプル[ＳｐｌＮｕｍ＋ｉ]の表示時刻を修正し、ステップ１１０６に進む。ここで、サンプル[ＳｐｌＮｕｍ＋ｉ]の表示時刻を修正することにより、サンプル[ＳｐｌＮｕｍ]以降のサンプルの表示時刻も修正が必要となるため、修正後の表示時刻に間に合うように以降のサンプルの復号時刻も修正する。通常、Ｄｕｒ＿ｏｕｔとＤｕｒ＿ｉｎの差分に等しい時間だけ、サンプル[ＳｐｌＮｕｍ]および以降のサンプルの復号時刻を早めるが、表示時刻に間に合えば、任意の時間だけ復号時刻を調整してもよい。ステップ１１０６では、サンプル[ＳｐｌＮｕｍ]を復号、表示し、Ｄｕｒ＿ｏｕｔの時間再生を継続する。再生を継続するとは、動画像であれば復号結果を繰り返し表示することを示し、音声であれば当該区間再生を継続することを示す。ただし、音声の場合に、再生時間長Ｄｕｒ＿ｏｕｔが１フレーム分の再生時間長よりも長い場合には、１フレームを再生した後は無音、あるいは擬似雑音を再生するなど所定の処理を行うものとする。また、ステップ１１０７では、サンプル[ＳｐｌＮｕｍ]を復号、表示し、Ｄｕｒ＿ｉｎの時間表示を継続する。このように、ステップ１１０３からステップ１１０７までの処理を、復号が必要な最終ピクチャまで繰り返す。ここで、各サンプルの復号時刻と表示時刻とが等しい場合には、ステップ１１０３からステップ１１０６までの処理においては、復号時刻のみ取得すればよく、再生時間長についても、連続する２つのサンプルにおける復号時刻の差分値から算出できる。各サンプルについて復号時刻と表示時刻が等しいかどうかは、ｍｏｏｖ内のサンプルであれば、復号時刻と表示時刻との差分値を格納するＢｏｘであるｃｔｔｓがｍｏｏｖ内に存在するかどうか、あるいは、ｃｔｔｓのエントリにより示される差分値が０であるかどうかにより判別できる。また、ｍｏｏｆ内のサンプルであれば、ｍｏｏｆ内のｔｒｕｎにおいてサンプルの復号時刻と表示時刻との差分値を格納するフィールドが存在するかどうか、また当該フィールドの値が０であるかどうかにより判定できる。 In step 1104, it is determined whether or not the playback time length Dur_in of the sample [SplNum] needs to be corrected. If it is determined in step 1104 that correction is necessary, the process proceeds to step 1105. If it is determined that correction is not necessary, the process proceeds to step 1107. In step 1105, the playback time length of the sample [SplNum] is corrected to Dur_out, the display time of the sample [SplNum + i] is corrected, and the process proceeds to step 1106. Here, by correcting the display time of the sample [SplNum + i], the display times of the samples after the sample [SplNum] also need to be corrected. Therefore, the decoding times of the subsequent samples are also set in time for the corrected display time. Correct it. Normally, the decoding time of the sample [SplNum] and the subsequent samples is advanced by a time equal to the difference between Dur_out and Dur_in, but the decoding time may be adjusted by an arbitrary time if it is in time for the display time. In step 1106, the sample [SplNum] is decoded and displayed, and the time reproduction of Dur_out is continued. “Continue playback” indicates that the decoding result is repeatedly displayed in the case of a moving image, and indicates that the section playback is continued in the case of a sound. However, in the case of audio, if the playback time length Dur_out is longer than the playback time length of one frame, a predetermined process such as silence or pseudo-noise is played after playing one frame. . In step 1107, the sample [SplNum] is decoded and displayed, and the time display of Dur_in is continued. In this way, the processing from step 1103 to step 1107 is repeated until the last picture that needs to be decoded. Here, when the decoding time of each sample is equal to the display time, only the decoding time needs to be acquired in the processing from step 1103 to step 1106, and the playback time length is also decoded in two consecutive samples. It can be calculated from the time difference value. Whether the decoding time and the display time are the same for each sample is, in the case of a sample in moov, whether or not ctts, which is a box for storing the difference value between the decoding time and the display time, exists in moov, or ctts It is possible to determine whether or not the difference value indicated by the entry is zero. In the case of a sample in the moof, it can be determined whether there is a field for storing a difference value between the decoding time and the display time of the sample in the trun in the moof, and whether the value of the field is 0. .

図１１（ａ）と（ｂ）は、動画像サンプルの再生時間長Ｄｕｒ＿ｉｎの算出例を示す。図中のＩ、Ｐ、Ｂは、それぞれ画面内符号化ピクチャ（Ｉピクチャ）、前方参照ピクチャ（Ｐピクチャ）、および双予測ピクチャ（Ｂピクチャ）が格納されたサンプルを示し、Ｉ、Ｐ、Ｂなどのタイプに付加された番号は表示順を示す。さらに、点線で示したサンプルについては、パケットロスなどの受信エラーにより取得できなかったサンプルを示す。図１１（ａ）は、各サンプルの復号時刻と表示時刻とが等しい例であり、Ｐ５、Ｐ６およびＰ７の３枚のピクチャが取得できなかったものとする。本トラックでは、各サンプルの復号時刻と表示時刻が等しいことから、Ｐ４の再生時間長Ｄｕｒ＿ｉｎは、復号順が直後となるＰ８の復号時刻からＰ４の復号時刻を減算することにより、４００ｍｓと算出できる。一方、図１１（ｂ）は、復号時刻と表示時刻とが異なるサンプルが混在する例であり、Ｂ５が取得できなかったものとする。本トラックでは、再生時間長Ｄｕｒ＿ｉｎは、表示順で連続するサンプルの表示時刻の差分値に一致することから、Ｂ４の再生時間長Ｄｕｒ＿ｉｎは、Ｐ６の表示時刻からＢ４の表示時刻を減算することにより、２００ｍｓと計算される。ここで、Ｂ４の再生時間長Ｄｕｒ＿ｉｎは、復号順で次のピクチャであるＰ９の表示時刻との差分値とはならないことに注意する。また、復号順がＢ４の後であるＰ９以降のピクチャの復号時刻についても、これらピクチャの表示時刻に間に合うように調節する必要がある。例えば、Ｐ６の表示時刻を５００ｍｓに変更すると、Ｂ７の表示時刻は６００ｍｓとなるため、６００ｍｓにおいてＢ７の復号を完了していなければならない。 FIGS. 11A and 11B show calculation examples of the reproduction time length Dur_in of the moving image sample. In the figure, I, P, and B indicate samples in which an intra-picture coded picture (I picture), a forward reference picture (P picture), and a bi-predictive picture (B picture) are stored, respectively. The numbers added to the types indicate the display order. Furthermore, the samples indicated by dotted lines indicate samples that could not be acquired due to reception errors such as packet loss. FIG. 11A shows an example in which the decoding time and display time of each sample are equal, and it is assumed that three pictures P5, P6, and P7 could not be acquired. In this track, since the decoding time of each sample is equal to the display time, the reproduction time length Dur_in of P4 can be calculated as 400 ms by subtracting the decoding time of P4 from the decoding time of P8 immediately after the decoding order. . On the other hand, FIG. 11B is an example in which samples having different decoding times and display times are mixed, and B5 cannot be acquired. In this track, the playback time length Dur_in matches the difference value between the display times of consecutive samples in the display order. Therefore, the playback time length Dur_in of B4 is obtained by subtracting the display time of B4 from the display time of P6. , 200 ms is calculated. Note that the playback time length Dur_in of B4 is not a difference value from the display time of P9, which is the next picture in decoding order. It is also necessary to adjust the decoding times of pictures subsequent to P9 whose decoding order is after B4 so as to be in time for the display times of these pictures. For example, if the display time of P6 is changed to 500 ms, the display time of B7 becomes 600 ms. Therefore, the decoding of B7 must be completed at 600 ms.

次に、ステップ１１０４の判定動作とステップ１１０５における再生時間長Ｄｕｒ＿ｏｕｔの決定方法について説明する。まず、ステップ１１０４では、再生時間長Ｄｕｒ＿ｉｎの値が閾値ＴＨＲＥＳを超えるかどうかにより判定する。閾値ＴＨＲＥＳは、動画像や音声などメディア毎に定めた値とし、動画像と音声について同一の値であってもよい。例えば、１０Hzの動画像であれば１フレームあたりの再生時間長は通常１００ｍｓであり、フレームレートが変動する際にも１フレームの再生時間長が５００ｍｓを超えないと想定できる際には、閾値THRESを５００ｍｓに設定する。つまり、再生時間長Ｄｕｒ＿ｉｎが５００ｍｓを超えた場合には、パケットロスなどの受信エラーによりデータが欠落して不連続になっているとみなす。このように、動画像や音声のフレームレートに基づいて閾値ＴＨＲＥＳを決定できる。ここで、ＭＰ４ファイルに格納された音声や動画像のフレームレートは、ｓｔｔｓなどヘッダ内の時刻情報、あるいは、サンプルエントリに格納されたストリームの初期化情報内のパラメータから取得する。ＭＰＥＧ−４ＡＶＣであれば、固定フレームレート時には、サンプルエントリに格納されたＳＰＳ内のパラメータからフレームレートが取得できる。フレームレートが可変である際にも、ＳＰＳ内のパラメータに基づいて、フレームレートの最大値を見積れるため、その結果に基づいて閾値ＴＨＲＥＳを決定してもよい。音声であれば、ヘッダ情報からサンプリング周波数を取得すれば、１フレームを構成する標本点数をサンプリング周波数で除算することで、フレームレートが計算できる。なお、フレームレートに依存せずに、予め定めた所定の値を閾値ＴＨＲＥＳとしてもよい。さらに、ＭＰ４ファイルのブランド情報に基づいて決定してもよいし、閾値ＴＨＲＥＳを示す情報を格納したＢｏｘをＭＰ４ファイル内に別途格納し、当該Ｂｏｘを参照して決定してもよい。ここで、閾値ＴＨＲＥＳは、トラック毎に設定してもよいし、各トラックに共通の値としてもよい。 Next, the determination operation in step 1104 and the method for determining the reproduction time length Dur_out in step 1105 will be described. First, in step 1104, determination is made based on whether or not the value of the reproduction time length Dur_in exceeds the threshold value THRES. The threshold value THRES is a value determined for each medium such as a moving image or sound, and may be the same value for the moving image and sound. For example, in the case of a 10 Hz moving image, the playback time length per frame is normally 100 ms, and when it can be assumed that the playback time length of one frame does not exceed 500 ms even when the frame rate varies, the threshold THRES Is set to 500 ms. That is, when the reproduction time length Dur_in exceeds 500 ms, it is considered that data is lost due to reception errors such as packet loss and is discontinuous. In this way, the threshold value THRES can be determined based on the frame rate of moving images and audio. Here, the frame rate of the audio or moving image stored in the MP4 file is obtained from the time information in the header such as stts or the parameter in the initialization information of the stream stored in the sample entry. In the case of MPEG-4 AVC, at a fixed frame rate, the frame rate can be acquired from the parameters in the SPS stored in the sample entry. Even when the frame rate is variable, since the maximum value of the frame rate can be estimated based on the parameters in the SPS, the threshold value THRES may be determined based on the result. In the case of speech, if the sampling frequency is obtained from the header information, the frame rate can be calculated by dividing the number of sample points constituting one frame by the sampling frequency. Note that a predetermined value may be set as the threshold value THRES without depending on the frame rate. Furthermore, it may be determined based on brand information of the MP4 file, or a box storing information indicating the threshold value THRES may be separately stored in the MP4 file and determined by referring to the box. Here, the threshold value THRES may be set for each track, or may be a value common to each track.

また、閾値ＴＨＲＥＳ以外の情報から、サンプル[ＳｐｌＮｕｍ]とサンプル[ＳｐｌＮｕｍ＋１]の間が、受信エラーによるデータの欠落などに起因して、不連続となっているかどうかを判定して再生時間長Ｄｕｒ＿ｉｎの修正有無を判定してもよい。不連続であるかどうかは、サンプルデータ、あるいはヘッダ情報のいずれから判定してもよい。サンプルデータからは、例えばＭＰＥＧ−４ＡＶＣであれば、不連続点を示す特定のＮＡＬユニットタイプをもつＮＡＬユニットがサンプル[ＳｐｌＮｕｍ]に含まれるかどうかにより判定できる。また、ヘッダ情報としては、ｓｔｂｌ、あるいはｔｒｕｎ内のＢｏｘにより、連続する２つのサンプル間が不連続となっているかどうかを示すことにして、これらのＢｏｘを参照して判定できる。 Further, from information other than the threshold value THRES, it is determined whether or not the sample [SplNum] and the sample [SplNum + 1] are discontinuous due to a lack of data due to a reception error or the like, and the reproduction time length Dur_in is determined. The presence or absence of correction may be determined. Whether it is discontinuous may be determined from either sample data or header information. For example, in the case of MPEG-4 AVC, it can be determined from the sample data based on whether or not a NAL unit having a specific NAL unit type indicating a discontinuous point is included in the sample [SplNum]. As header information, stbl or Box in trun indicates whether or not two consecutive samples are discontinuous, and determination can be made with reference to these Boxes.

なお、再生時間長Ｄｕｒ＿ｉｎの値が閾値ＴＨＲＥＳと同一である際にも再生時間長を修正してもよい。 Note that the playback time length may also be corrected when the value of the playback time length Dur_in is the same as the threshold value THRES.

図１２は、ステップ１１０５における再生時間長Ｄｕｒ＿ｏｕｔの決定方法を示すフローチャートである。このとき、再生時間長Ｄｕｒ＿ｏｕｔを修正した後にも、後続サンプルにおいてＡＶ同期が保証される。ここでは、動画像のトラックは必ず存在するとする。まず、ステップ１２０１では、音声が存在するかどうか判定し、存在する場合にはステップ１２０２に進み、存在しなければステップ１２０５に進む。ステップ１２０２では、動画像と音声とで不連続区間の重なりがあるかどうか判定し、重なりがあると判定されればステップ１２０３に進み、重なりがなければステップ１２０４に進む。ステップ１２０３では、不連続区間が重なる部分の時間長であるＤｉｓ＿ｄｕｒだけ再生時間長Ｄｕｒ＿ｉｎを短縮する。結果として、
修正後の再生時間長Ｄｕｒ＿ｏｕｔ
＝修正前の再生時間長Ｄｕｒ＿ｉｎ − Ｄｉｓ＿ｄｕｒ
となる。ここで、Ｄｉｓ＿ｄｕｒは下記のように計算される。
いま、動画像については、サンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ]とサンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ＋ｉ]の間において不連続区間があり、
音声については、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]とサンプル[Ａ＿ＳａｍｐｌｅＮｕｍ＋ｉ]の間において不連続区間があるとすると、
Ｄｉｓ＿ｄｕｒ＝
ＭＩＮ（サンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ＋ｉ]の表示時刻、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ＋ｉ]の表示時刻）
− ＭＡＸ（サンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ]の表示時刻＋動画像の１フレーム間隔、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]の表示時刻＋音声の１フレーム間隔）
となる。ここで、ＭＡＸ（ｍ、ｎ）はｍとｎのうち値が大きいほうを示し、ＭＩＮ（ｍ、ｎ）はｍとｎのうち値が小さいほうを示す。両者の値が等しければ、その値を示す。つまり、再生するトラックが存在しない区間の再生をスキップすることになる。
ステップ１２０４では、再生時間長の修正は行わずに、
修正後の再生時間長Ｄｕｒ＿ｏｕｔ
＝修正前の再生時間長Ｄｕｒ＿ｉｎ
とする。 FIG. 12 is a flowchart showing a method for determining the playback time length Dur_out in step 1105. At this time, even after the reproduction time length Dur_out is corrected, AV synchronization is ensured in subsequent samples. Here, it is assumed that there is always a moving image track. First, in step 1201, it is determined whether or not there is a voice. If it exists, the process proceeds to step 1202, and if not, the process proceeds to step 1205. In step 1202, it is determined whether or not there is an overlap of discontinuous sections between the moving image and the sound. If it is determined that there is an overlap, the process proceeds to step 1203, and if there is no overlap, the process proceeds to step 1204. In step 1203, the playback time length Dur_in is shortened by Dis_dur which is the time length of the portion where the discontinuous sections overlap. as a result,
Modified playback time length Dur_out
= Playback time length before correction Dur_in-Dis_dur
It becomes. Here, Dis_dur is calculated as follows.
Now, for a moving image, there is a discontinuous section between sample [V_SampleNum] and sample [V_SampleNum + i]
For speech, if there is a discontinuous section between sample [A_SampleNum] and sample [A_SampleNum + i]
Dis_dur =
MIN (display time of sample [V_SampleNum + i], display time of sample [A_SampleNum + i])
-MAX (Display time of sample [V_SampleNum] + 1 frame interval of moving image, display time of sample [A_SampleNum] + 1 frame interval of audio)
It becomes. Here, MAX (m, n) indicates the larger value of m and n, and MIN (m, n) indicates the smaller value of m and n. If both values are equal, that value is indicated. That is, playback of a section where there is no track to be played back is skipped.
In step 1204, without adjusting the playback time length,
Modified playback time length Dur_out
= Play time length before correction Dur_in
And

ステップ１２０５では、再生時間長Ｄｕｒ＿ｏｕｔ＝１フレーム間隔とする。ここで、１フレーム間隔とは、フレームレートが固定であれば、当該フレームレートにおける１フレームあたりの再生時間長を示す。例えば、フレームレートが１０Ｈｚであれば、１フレーム間隔は１００ｍｓに相当する。フレームレートが可変であっても、各フレームの再生時間長はフレーム間隔の整数倍となるため、フレーム間隔を適用できる。例えば、通常は１０Ｈｚであるが、符号化時のビットレートや処理時間の制約により、フレームをドロップすることがある。このとき、１フレームの再生時間長は２００ｍｓや３００ｍｓなどフレーム間隔１００ｍｓの整数倍となるが、これらの場合にも再生時間長Ｄｕｒ＿ｏｕｔは１フレーム間隔としてよい。 In step 1205, the playback time length Dur_out = 1 frame interval. Here, when the frame rate is fixed, the one-frame interval indicates the reproduction time length per frame at the frame rate. For example, if the frame rate is 10 Hz, one frame interval corresponds to 100 ms. Even if the frame rate is variable, the playback time length of each frame is an integral multiple of the frame interval, so that the frame interval can be applied. For example, although it is usually 10 Hz, the frame may be dropped due to the bit rate at the time of encoding and the restriction of the processing time. At this time, the playback time length of one frame is an integral multiple of a frame interval of 100 ms, such as 200 ms or 300 ms. In these cases, the playback time length Dur_out may be set to one frame interval.

図１３は、ＭＰ４ファイル内に音声と動画像のトラックが各１本ずつ存在するケースを示す。なお、動画像において復号時刻と表示時刻は等しいとする。図１３（ａ）では、動画像は表示時刻が４００ｍｓから６００ｍｓの間において不連続であり、音声は２００ｍｓから３５０ｍｓの間において不連続であると判定されたケースである。このとき、各々が不連続である区間においても他のトラックのデータは連続であり、例えば音声が再生できない区間においても動画像は再生できることから、Ｄｕｒ＿ｉｎは修正しない。なお、４番目の動画像サンプルＶ４の表示時刻は３００ｍｓ、再生時間長Ｄｕｒ＿ｉｎは４番目と５番目の動画像サンプルの表示時刻の差分をとり、３００ｍｓとなる。一方、図１３（ｂ）は、動画像は表示時刻が４００ｍｓから６００ｍｓの間において不連続であり、音声は４００ｍｓから５００ｍｓの間において不連続であると判定されたケースである。このとき、４００ｍｓから５００ｍｓまでの間は、動画像と音声のデータが共に欠落していると判定できる。従って、４番目の動画像サンプルＶ４と８番目の音声サンプルＡ８について、再生時間長Ｄｕｒ＿ｉｎを修正する。不連続区間の重なり部分は１００ｍｓであるため、
Ｖ４については、
再生時間長Ｄｕｒ＿ｏｕｔ＝再生時間長Ｄｕｒ＿ｉｎ − １００ｍｓ
＝（６００ｍｓ − ３００ｍｓ） − １００ｍｓ
＝２００ｍｓ
となる。
Ａ８については、
再生時間長Ｄｕｒ＿ｏｕｔ＝再生時間長Ｄｕｒ＿ｉｎ − １００ｍｓ
＝（５００ｍｓ − ３５０ｍｓ） − １００ｍｓ
＝５０ｍｓ
となる。 FIG. 13 shows a case where one track each of audio and moving images exists in the MP4 file. Note that the decoding time and the display time are the same in the moving image. In FIG. 13A, the moving image is determined to be discontinuous when the display time is between 400 ms and 600 ms, and the sound is determined to be discontinuous between 200 ms and 350 ms. At this time, the data of the other tracks are continuous even in the sections where each is discontinuous. For example, the moving image can be played back even in the section where the sound cannot be played back, so Dur_in is not corrected. The display time of the fourth moving image sample V4 is 300 ms, and the reproduction time length Dur_in is 300 ms, which is the difference between the display times of the fourth and fifth moving image samples. On the other hand, FIG. 13B shows a case in which the moving image is determined to be discontinuous when the display time is between 400 ms and 600 ms, and the sound is discontinuous between 400 ms to 500 ms. At this time, between 400 ms and 500 ms, it can be determined that both moving image and audio data are missing. Therefore, the playback time length Dur_in is corrected for the fourth moving image sample V4 and the eighth audio sample A8. Since the overlap of discontinuous sections is 100ms,
For V4,
Reproduction time length Dur_out = Reproduction time length Dur_in − 100 ms
= (600ms-300ms)-100ms
= 200 ms
It becomes.
For A8,
Reproduction time length Dur_out = Reproduction time length Dur_in − 100 ms
= (500ms-350ms)-100ms
= 50 ms
It becomes.

なお、動画像と音声のデータが共に欠落している区間が所定の時間長を超える場合にのみ、再生時間長Ｄｕｒ＿ｉｎを修正してもよい。 Note that the playback time length Dur_in may be corrected only when a section in which both moving image and audio data are missing exceeds a predetermined time length.

また、音声と動画像が各１本ずつであるケースに限らず、２つ以上のトラックがある際には、それらのトラックの不連続区間が重なる場合にのみ再生時間長を短縮することにしてもよい。 Also, not only the case where there is one voice and one moving image, but when there are two or more tracks, the playback time length is shortened only when the discontinuous sections of those tracks overlap. Also good.

なお、上記では音声のトラックが必ず存在することにして、ステップ１２０１においては動画像のトラックが存在するかどうか判定してもよい。 In the above description, it is assumed that an audio track always exists, and it may be determined in step 1201 whether or not a moving image track exists.

さらに、不連続区間において再生がスキップされたことを、ユーザに示してもよい。図１４は、再生経過部分を示すシークバーを用いる例である。図１４（ａ）に示すように、６０秒のＡＶコンテンツが格納されたＭＰ４ファイルを再生するとして、表示時刻が２０秒から２５秒までの間は不連続区間であるため、再生がスキップされるとする。図１４（ｂ）は、再生開始からの経過時間が２０秒であるときのシークバーの様子を示し、シークバーの終端は、２０秒の位置にある。図１４（ｃ）は、再生開始からの経過時間が２５秒であるときのシークバーの様子を示す。このとき、ＭＰ４ファイル内のＡＶデータにおける２０秒から２５秒までの５秒間は再生がスキップされるため、２０秒経過した時点でコンテンツの２５秒の位置が再生され、経過時間が２５秒ではコンテンツの３０秒までの位置が再生される。従って、シークバーの終端は３０秒の位置を示す。ユーザは、シークバーが不連続に移動することにより不連続区間であったと知ることができるが、図１４（ｃ）のようにシークバーの中でスキップされた区間を明示的に示してもよい。なお、シークバーにより再生経過部分を示さずに、再生している位置のみを示してもよい。また、予めスキップを考慮した総再生時間をシークバーに示し、再生時にはシークバーが連続的に移動するようにしてもよい。図１４の例では、予め総再生時間を５５秒としておくことになる。 Furthermore, the user may be shown that playback has been skipped in the discontinuous section. FIG. 14 shows an example in which a seek bar indicating the playback progress portion is used. As shown in FIG. 14 (a), when playing an MP4 file storing 60 seconds of AV content, playback is skipped because the display time is a discontinuous section from 20 seconds to 25 seconds. And FIG. 14B shows the state of the seek bar when the elapsed time from the start of reproduction is 20 seconds, and the end of the seek bar is at the position of 20 seconds. FIG. 14C shows the state of the seek bar when the elapsed time from the start of reproduction is 25 seconds. At this time, since the reproduction is skipped for 5 seconds from 20 seconds to 25 seconds in the AV data in the MP4 file, the position of the content of 25 seconds is reproduced when 20 seconds elapse, and the content is reproduced when the elapsed time is 25 seconds. Positions up to 30 seconds are reproduced. Therefore, the end of the seek bar indicates the position of 30 seconds. Although the user can know that the seek bar is a discontinuous section by moving discontinuously, the section skipped in the seek bar may be explicitly shown as shown in FIG. In addition, you may show only the position which is reproducing | regenerating, without showing the reproduction progress part by a seek bar. Alternatively, the total playback time in consideration of skipping may be indicated on the seek bar in advance, and the seek bar may be moved continuously during playback. In the example of FIG. 14, the total playback time is set to 55 seconds in advance.

また、サンプルの再生時間長を調節することにより再同期点までスキップするかどうかを切替えてもよい。例えば、再生端末において予め設定されたとおりに動作してもよい。また、当該処理を行うかどうかを、再生開始に先立って、端末の画面上でユーザに選択させてもよい。 Further, whether to skip to the resynchronization point may be switched by adjusting the reproduction time length of the sample. For example, the playback terminal may operate as set in advance. Further, whether to perform the process may be selected by the user on the terminal screen prior to the start of reproduction.

また、スキップ処理が必要となるのは、ＭＰ４ファイルのトラック内に不連続な区間が存在するなど、不完全なデータが格納されるケースが主である。従って、ファイル内に不完全なトラックが含まれるかどうかを示す識別情報に従って、スキップ処理を行うかどうか決定してもよい。ここで、ｆｔｙｐなどに格納されるＭＰ４のブランドや、３ＧＰＰやＳＤＡ（ＳｅｃｕｒｅＤｉｇｉｔａｌＡｓｓｏｃｉａｔｉｏｎ）などの運用規格で規定される特定のＢｏｘ内の情報を使用できる。例えば、放送データを記録したＭＰ４ファイルであることを示すブランドが存在するかどうかにより判定する。 In addition, the skip processing is mainly required when incomplete data is stored, such as when there is a discontinuous section in the track of the MP4 file. Therefore, whether to perform skip processing may be determined according to identification information indicating whether or not an incomplete track is included in the file. Here, it is possible to use information in a specific box defined by an operational standard such as MP4 brand stored in ftyp or the like, 3GPP, or SDA (Secure Digital Association). For example, the determination is made based on whether or not there is a brand indicating that it is an MP4 file in which broadcast data is recorded.

携帯端末において放送を受信する際には、建物の影などに入り電波の受信状況が悪くなると、数秒など比較的長時間に渡って、データが受信できなくなるため、このような環境で記録したＭＰ４ファイルの再生には、本実施の携帯の逆多重化装置が特に有効となる。 When receiving a broadcast on a portable terminal, if the reception situation of radio waves deteriorates due to the shadow of a building or the like, data cannot be received for a relatively long time such as several seconds. The portable demultiplexer of this embodiment is particularly effective for file reproduction.

（実施の形態２）
実施の形態１に係る逆多重化装置１０００では、不連続区間をスキップした際に、復号順で次のサンプルから復号開始するとしていた。しかしながら、特に動画像においては、復号順で次のサンプルから復号できるかどうかは保証されないため、スキップ後の画像が正しく復号できないことがあった。例えば、復号開始するサンプルがＰピクチャであり、当該Ｐピクチャが受信エラーにより取得できなかったピクチャを参照するなどのケースでは、次のＩピクチャまでのピクチャは正しく復号できないことがある。本実施の形態の逆多重化装置は、スキップ後に、ランダムアクセス可能なサンプルから復号開始することにより、復号開始後のサンプルが正しく復号できることを保証し、再生品質を向上する。 (Embodiment 2)
In the demultiplexing apparatus 1000 according to Embodiment 1, when a discontinuous section is skipped, decoding is started from the next sample in decoding order. However, in particular, in the case of moving images, it is not guaranteed whether decoding can be performed from the next sample in the decoding order, and thus the skipped image may not be decoded correctly. For example, in the case where the sample to start decoding is a P picture and the P picture refers to a picture that could not be acquired due to a reception error, pictures up to the next I picture may not be decoded correctly. The demultiplexing apparatus according to the present embodiment starts decoding from randomly accessible samples after skipping, thereby ensuring that samples after decoding can be correctly decoded and improving reproduction quality.

本実施の形態の逆多重化装置は、修正手段１００１とサンプル取得手段１０４の動作が異なる点において実施の形態１の逆多重化装置と異なる。以下に、実施の形態１の逆多重化装置と異なる点について説明する。 The demultiplexing apparatus according to the present embodiment is different from the demultiplexing apparatus according to the first embodiment in that the operations of the correction unit 1001 and the sample acquisition unit 104 are different. Hereinafter, differences from the demultiplexer according to Embodiment 1 will be described.

修正手段１００１は、サンプル[ＳｐｌＮｕｍ]の再生時間長Ｄｕｒ＿ｏｕｔを決定するとともに、次に復号するサンプル[ＳｐｌＮｕｍ＋ｊ]を決定して、サンプル[ＳｐｌＮｕｍ＋ｊ]の表示時刻を算出する。さらに、次に復号するサンプルがサンプル[ＳｐｌＮｕｍ＋ｊ]であることを時刻情報解析手段１００５に通知する。サンプル取得手段１０４は、修正手段１００１により決定されたサンプルのサンプルデータｓｐｌＤａｔ１を復号表示手段１００２に入力し、復号表示手段１００２は、修正手段１００１から入力された復号時刻と表示時刻に基づいてサンプルデータｓｐｌＤａｔ１を復号し、出力する。 The correcting unit 1001 determines the reproduction time length Dur_out of the sample [SplNum], determines the sample [SplNum + j] to be decoded next, and calculates the display time of the sample [SplNum + j]. Further, the time information analyzing unit 1005 is notified that the next sample to be decoded is the sample [SplNum + j]. The sample acquisition means 104 inputs the sample data splDat1 of the sample determined by the correction means 1001 to the decoding display means 1002, and the decoding display means 1002 uses the sample data based on the decoding time and display time input from the correction means 1001. Decode and output splDat1.

ここで、サンプル[ＳｐｌＮｕｍ＋ｉ] とサンプル[ＳｐｌＮｕｍ＋ｊ]が同一であれば、サンプル[ＳｐｌＮｕｍ]の次に復号されるサンプルは、実施の形態１の多重化装置において決定されるサンプルと同一になる。 Here, if the sample [SplNum + i] and the sample [SplNum + j] are the same, the sample decoded next to the sample [SplNum] is the same as the sample determined in the multiplexing apparatus of the first embodiment.

図１５は、本実施の形態の逆多重化装置の動作を示すフローチャートであり、ステップ１３０１とステップ１３０２の動作において逆多重化装置１０００と異なる。ステップ１３０１では、サンプル[ＳｐｌＮｕｍ]の再生時間長を修正する。ステップ１３０２では、サンプル[ＳｐｌＮｕｍ]の次に復号するサンプルであるサンプル[ＳｐｌＮｕｍ＋ｊ]を決定するとともに、サンプル[ＳｐｌＮｕｍ＋ｊ]の表示時刻を決定する。サンプル[ＳｐｌＮｕｍ＋ｊ]の表示時刻は、ｓｔｔｓとｃｔｔｓ、あるいはｔｒｕｎなどの従来のＭＰ４ファイルのヘッダ情報から取得した表示時刻から、下記の時間長を減算して取得する。 FIG. 15 is a flowchart showing the operation of the demultiplexing apparatus according to the present embodiment, and the operations of Step 1301 and Step 1302 are different from those of the demultiplexing apparatus 1000. In step 1301, the playback time length of the sample [SplNum] is corrected. In step 1302, a sample [SplNum + j], which is a sample to be decoded next to the sample [SplNum], is determined, and a display time of the sample [SplNum + j] is determined. The display time of the sample [SplNum + j] is obtained by subtracting the following time length from the display time obtained from the header information of the conventional MP4 file such as stts and ctts or trun.

｛サンプル[ＳｐｌＮｕｍ＋ｊ]の表示時刻
− ＭＡＸ（サンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ]の表示時刻＋動画像の１フレーム間隔、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]の表示時刻＋音声の１フレーム間隔）｝
ここで、動画像についてはサンプル[Ｖ＿ＳａｍｐｌｅＮｕｍ]の直後、音声についてはサンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]の直後に不連続区間があるとする。 {Display time of sample [SplNum + j]-MAX (Display time of sample [V_SampleNum] + 1 frame interval of moving image, display time of sample [A_SampleNum] + 1 frame interval of audio)}
Here, it is assumed that there is a discontinuous section immediately after the sample [V_SampleNum] for the moving image and immediately after the sample [A_SampleNum] for the sound.

図１６は、ステップ１３０２においてサンプル[ＳｐｌＮｕｍ＋ｊ]を決定する際の動作を示すフローチャートである。ステップ１４０１では、サンプル[ＳｐｌＮｕｍ]がフラグメント内のサンプルであるかどうか判定し、フラグメント内のサンプルであればステップ１４０２に進み、そうでなければステップ１４０５に進む。ステップ１４０２では、ＭＰ４ファイル内にｍｆｒａが存在するかどうか判定する。ｍｆｒａとは、フラグメント部分に格納されたランダムアクセス可能なサンプルの位置を示すＢｏｘである。ｍｆｒａが存在する際にはステップ１４０３に進み、存在しなければステップ１４０４に進む。ステップ１４０３では、ｍｆｒａに示されるランダムアクセス可能なサンプルのうち、復号順がサンプル[ＳｐｌＮｕｍ]の次となるサンプルを、サンプル[ＳｐｌＮｕｍ＋ｊ]にすると決定する。なお、ｍｆｒａでは、シンクサンプル以外にもＧＤＲ（ＧｒａｄｕａｌＤｅｃｏｄｅｒＲｅｆｒｅｓｈ）タイプのサンプルを示すことができるが、スキップ先としてはシンクサンプルのみを選択してもよい。ここで、ＧＤＲタイプのサンプルとは、当該サンプルから復号開始した際に、所定の枚数のサンプルを復号した時点で正しい復号結果が得られるサンプルであり、復号開始サンプルから正しい復号結果が得られるかどうかにおいてシンクサンプルと異なる。また、ランダムアクセス可能サンプルがシンクサンプルであるかどうかは、各サンプルがシンクサンプルであるかどうかを示すｔｒｕｎ内のフラグ情報を参照するなどして判定できる。ステップ１４０４では、復号順で直後のフラグメントの先頭サンプルをサンプル[ＳｐｌＮｕｍ＋ｊ]にすると決定する。ここで直後のフラグメントとは、当該メディアのサンプルが格納されるフラグメントのうち、復号順が直後であるフラグメントを指す。サンプル[ＳｐｌＮｕｍ]がｍｏｏｖ内のサンプルである際には、ステップ１４０５に進むが、このときは、シンクサンプルのうち、復号順で次のサンプルをサンプル[ＳｐｌＮｕｍ＋ｊ]にすると決定する。なお、ｍｏｏｖ内のシンクサンプルは、ｓｔｓｓにより示される。ここで、ｍｏｏｖにおいて、サンプル[ＳｐｌＮｕｍ]以降にシンクサンプルが存在しないときには、先頭フラグメントの先頭サンプルにスキップする。 FIG. 16 is a flowchart showing an operation when determining a sample [SplNum + j] in step 1302. In step 1401, it is determined whether or not the sample [SplNum] is a sample in the fragment. If it is a sample in the fragment, the process proceeds to step 1402. Otherwise, the process proceeds to step 1405. In step 1402, it is determined whether mfra exists in the MP4 file. mfra is a box indicating the position of a randomly accessible sample stored in the fragment part. If mfra exists, the process proceeds to step 1403; otherwise, the process proceeds to step 1404. In step 1403, it is determined that the sample whose decoding order is next to the sample [SplNum] among the randomly accessible samples indicated by mfra is the sample [SplNum + j]. In addition, in mfra, a GDR (Global Decoder Refresh) type sample can be shown in addition to the sync sample, but only the sync sample may be selected as the skip destination. Here, the GDR type sample is a sample in which a correct decoding result is obtained when a predetermined number of samples are decoded when decoding is started from the sample, and whether a correct decoding result is obtained from the decoding start sample. It is different from the sync sample. Whether or not the randomly accessible sample is a sync sample can be determined by referring to flag information in trun indicating whether or not each sample is a sync sample. In step 1404, it is determined that the first sample of the fragment immediately after in the decoding order is the sample [SplNum + j]. Here, the term “fragment immediately after” refers to a fragment in which the decoding order is immediately after the fragment in which the sample of the media is stored. If the sample [SplNum] is a sample in the moov, the process proceeds to step 1405. At this time, it is determined that the next sample in the decoding order is the sample [SplNum + j] among the sync samples. A sync sample in moov is indicated by stss. Here, in moov, when there is no sync sample after the sample [SplNum], skip to the first sample of the first fragment.

なお、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]以降の音声サンプルで、動画像の復号開始サンプル[ＳｐｌＮｕｍ＋ｊ]よりも復号時刻が前であるサンプルが必ず存在するように、サンプル[ＳｐｌＮｕｍ＋ｊ]を決定してもよい。このとき、ステップ１４０３では、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ＋１]と表示時刻が等しい、あるいは後であるランダムアクセス可能なサンプルで、最も表示時刻が早いサンプルをサンプル[ＳｐｌＮｕｍ＋ｊ]とする。ステップ１４０３では、フラグメントの先頭サンプルのうち、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ＋１]と表示時刻が等しい、あるいは後であり、最も表示時刻が早いサンプルをサンプル[ＳｐｌＮｕｍ＋ｊ]とする。また、ステップ１４０５では、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ＋１]と表示時刻が等しい、あるいは後であるシンクサンプルで、最も表示時刻が早いサンプルをサンプル[ＳｐｌＮｕｍ＋ｊ]とする。 Note that the sample [SplNum + j] may be determined so that there is always a sample whose decoding time is earlier than the moving image decoding start sample [SplNum + j] in the audio samples after the sample [A_SampleNum]. At this time, in step 1403, the sample whose display time is the same as that of the sample [A_SampleNum + 1] or that can be accessed at random later and whose display time is the earliest is set as the sample [SplNum + j]. In step 1403, among the first samples of fragments, the sample [A_SampleNum + 1] whose display time is the same as or later than that of the sample [A_SampleNum + 1] is the sample [SplNum + j]. In step 1405, the sample with the same display time as the sample [A_SampleNum + 1] or the later sync sample with the earliest display time is set as the sample [SplNum + j].

また、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]以降の音声サンプルで、動画像の復号開始サンプル[ＳｐｌＮｕｍ＋ｊ]よりも復号時刻が後であり、サンプル[ＳｐｌＮｕｍ＋ｊ]との復号時刻の差分値が所定の値以下であるサンプルが必ず存在するようにしてもよい。このようにすることで、動画像を再生開始時間近傍において音声も再生できることが保証される。 In addition, in the audio samples after the sample [A_SampleNum], there are samples whose decoding time is later than the moving image decoding start sample [SplNum + j] and the difference value of the decoding time from the sample [SplNum + j] is equal to or smaller than a predetermined value. You may make it exist. By doing so, it is ensured that the audio can be reproduced near the reproduction start time of the moving image.

動画像トラックが存在する際には、上記の方法で動画像の復号開始サンプルを決定する。図１７は、音声の復号サンプルを決定する際の動作を示すフローチャートである。まず、ステップ１５０１では、動画像の復号開始サンプル（ＶＳＰＬ＿ＮＥＸＴ）を決定する。次に、ステップ１５０２において、ＭＰ４ファイル内に音声トラックが存在するかどうか判定し、音声トラックが存在する際にはステップ１５０３に進み、存在しなければステップ１５０４に進む。ステップ１５０３では、音声については、動画像の復号開始サンプルＶＳＰＬ＿ＮＥＸＴと同一、あるいは直後の復号時刻をもつサンプルから復号開始すると決定し、ステップ１５０４に進む。ステップ１５０４では、動画像については、ＶＳＰＬ＿ＮＥＸＴから復号、再生すると決定する。なお、動画像の復号開始サンプルがＧＤＲタイプのサンプルである際には、正しい復号結果が得られるサンプルから表示開始してもよく、このとき、復号開始サンプルと表示開始サンプルとが異なることになる。さらに、動画像の表示を開始するサンプルに従って、音声の復号開始サンプルを決定してもよい。 When a moving image track exists, a moving image decoding start sample is determined by the above method. FIG. 17 is a flowchart showing an operation when determining audio decoding samples. First, in step 1501, a moving image decoding start sample (VSPL_NEXT) is determined. Next, in step 1502, it is determined whether or not an audio track exists in the MP4 file. If an audio track exists, the process proceeds to step 1503. If not, the process proceeds to step 1504. In step 1503, for audio, it is determined to start decoding from a sample having the same decoding time as the moving image decoding start sample VSPL_NEXT, or immediately after, and the process proceeds to step 1504. In step 1504, it is determined that the moving image is decoded and reproduced from VSPL_NEXT. When the video decoding start sample is a GDR type sample, the display may start from a sample from which a correct decoding result is obtained. At this time, the decoding start sample and the display start sample are different. . Furthermore, a speech decoding start sample may be determined according to a sample for starting display of a moving image.

なお、サンプル[Ａ＿ＳａｍｐｌｅＮｕｍ]以降のサンプルで、復号時刻が、動画像の復号開始サンプル[ＳｐｌＮｕｍ＋ｊ]よりも前であるサンプルが存在する際には、サンプル[ＳｐｌＮｕｍ＋ｊ]の直前の復号時刻をもつ音声サンプルから復号開始してもよい。 In addition, when there is a sample after the sample [A_SampleNum] whose decoding time is earlier than the moving image decoding start sample [SplNum + j], an audio sample having a decoding time immediately before the sample [SplNum + j] Decoding may be started from.

なお、音声トラックについても、動画像トラックの復号開始サンプルに依存せずに、直後のシンクサンプルから復号開始してもよい。このとき、音声では、通常、全てのサンプルがシンクサンプルであるため、直後のサンプルから復号開始することになる。 Note that the audio track may start decoding from the immediately following sync sample without depending on the decoding start sample of the moving image track. At this time, in audio, since all the samples are usually sync samples, decoding starts from the immediately following sample.

図１８は、サンプル[ＳｐｌＮｕｍ＋ｊ]を決定する際の例であり、図１３（ｂ）と同一のケースである。ただし、動画像については１番目と７番目のサンプルのみがシンクサンプルであるとする。図１３（ｂ）では、動画像については、不連続区間の直後であるＶ５から復号を開始したが、本実施の形態では、不連続区間の直後のシンクサンプルであるＶ７から復号を開始する。また、音声についても、Ｖ７と復号時刻が等しいＡ１５から復号開始される。 FIG. 18 shows an example of determining a sample [SplNum + j], which is the same case as FIG. 13B. However, it is assumed that only the first and seventh samples of the moving image are sync samples. In FIG. 13B, decoding of a moving image is started from V5 immediately after the discontinuous section, but in this embodiment, decoding is started from V7 that is a sync sample immediately after the discontinuous section. Also for audio, decoding starts from A15 where the decoding time is equal to V7.

ここまでは、パケットロスなどによりデータが欠落し、結果としてサンプルが取得できないケースについて説明したが、データの一部が欠落したサンプルを再生する際にも、同様の方法が適用できる。図２１は、データの一部が欠落したサンプルを含むトラックを再生する際の動作を示す。図２１（ａ）は、MP4ファイルに格納されたストリーム例であり、点線で囲ったサンプルは、データの一部が欠落したサンプル（以降、不完全なサンプルと呼ぶ。）である。この例では、２番目から１１番目までのサンプルは、不完全なサンプルである。なお、不完全なサンプルであるかどうかは、ｍｄａｔのサンプルデータに識別情報を含めることにより示してもよいし、ｍｏｏｖあるいはｍｏｏｆなどのヘッダ情報において別途情報を示してもよい。 Up to this point, the case where data is lost due to packet loss or the like and a sample cannot be acquired as a result has been described. However, the same method can be applied when reproducing a sample from which a part of the data is lost. FIG. 21 shows an operation when a track including a sample in which a part of data is missing is reproduced. FIG. 21A shows an example of a stream stored in an MP4 file. A sample surrounded by a dotted line is a sample from which a part of data is missing (hereinafter referred to as an incomplete sample). In this example, the second through eleventh samples are incomplete samples. Whether or not the sample is incomplete may be indicated by including identification information in the mdat sample data, or may be indicated separately in header information such as moov or moof.

不完全なサンプルは正常に復号できないことがあり、直前サンプルの復号結果を用いてコンシールメントするなどのエラー処理を行っても正しい復号結果を得られないため、特に、不完全なサンプルが長時間連続すると再生品質の低下が顕著になる。一方で、短時間であればエラー処理により一定以上の再生品質が得られると考えられるため、不完全なサンプルが連続する区間の時間長に応じて、当該区間のサンプルを再生するかどうか決定することは有効である。図２１（ｂ）は、不完全なサンプルが存在する際の再生動作を示すフローチャートである。まず、ステップ１６０１において、不完全なサンプルが一定時間以上連続するかどうか判定し、一定時間以上連続すると判定された際にはステップ１６０２に進み、そうでなければステップ１６０３に進む。ステップ１６０２では、不完全なサンプルが連続する区間の再生をスキップすると決定し、ステップ１６０３では、当該区間における不完全なサンプルについても再生すると決定する。なお、オーディオとビデオのトラックについて、不完全なサンプルが連続する区間が重なるかどうかに基づいて、再生動作を決定してもよい。また、不完全なサンプルを再生するかどうかをユーザが決定してもよい。さらに、不完全なサンプルが連続するかどうかではなく、一定区間内において所定の割合以上のサンプルが不完全なサンプルであるかどうかなど、他の判定条件に基づいて再生動作を決定してもよい。 Incomplete samples may not be decoded normally, and correct decoding results cannot be obtained even if error processing such as concealment is performed using the decoding result of the immediately preceding sample. When it is continuous, the reproduction quality is significantly reduced. On the other hand, since it is considered that a reproduction quality of a certain level or more can be obtained by error processing for a short time, it is determined whether or not to reproduce the sample in the section according to the time length of the section in which incomplete samples continue. It is effective. FIG. 21B is a flowchart showing the reproduction operation when an incomplete sample exists. First, in step 1601, it is determined whether or not an incomplete sample continues for a certain period of time. If it is determined that the incomplete sample continues for a certain period of time, the process proceeds to step 1602, and if not, the process proceeds to step 1603. In step 1602, it is determined that the reproduction of the section in which the incomplete samples are continuous is skipped, and in step 1603, it is determined that the incomplete samples in the section are also reproduced. Note that the playback operation may be determined based on whether or not the sections in which incomplete samples continue for the audio and video tracks overlap. Also, the user may decide whether to play incomplete samples. Further, the reproduction operation may be determined based on other determination conditions such as whether or not incomplete samples are continuous, and whether or not a predetermined ratio or more of samples are incomplete samples within a certain interval. .

（実施の形態３）
ここで、上記実施の形態１から実施の形態２で示した多重化方式変換装置、および逆多重化装置を用いたシステムを説明する。 (Embodiment 3)
Here, a system using the multiplexing method conversion apparatus and the demultiplexing apparatus shown in the first to second embodiments will be described.

図１９は、放送、および通信によるコンテンツ配信サービスを実現するシステムの全体構成を示すブロック図である。まず、放送データを受信するケースについて述べる。携帯電話ex１０５、あるいはＤＶＤレコーダなどのディスクレコーダex１０４は、デジタル化された符号化メディアデータが多重化されたＴＳパケット列を受信する。携帯電話ex１０５では、受信したＴＳパケット列を、ＭＰ４に変換してからＳＤカードex１０６に記録する。記録したＭＰ４ファイルは、本発明に係る逆多重化装置を備えた携帯電話ex１０５、ディスクレコーダex１０４、あるいは図示しないパーソナルコンピュータなどで視聴することができる。また、ＭＰ４ファイルを電子メールに添付して、携帯電話ex１０５から無線基地局ex１０７を経由して、本発明に係る逆多重化装置を備えた別の携帯電話ex１０８に送信し、携帯電話ex１０８においてＭＰ４ファイルを視聴することもできる。さらに、メール添付ではなく、ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）およびＴＣＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）などのプロトコルを使用して、携帯電話ex１０５から携帯電話ex１０８にダウンロード、あるいは擬似ストリーミング配信してもよい。 FIG. 19 is a block diagram showing an overall configuration of a system that realizes a content distribution service by broadcasting and communication. First, a case where broadcast data is received will be described. A cellular phone ex105 or a disk recorder ex104 such as a DVD recorder receives a TS packet sequence in which digitized encoded media data is multiplexed. In the mobile phone ex105, the received TS packet sequence is converted into MP4 and then recorded on the SD card ex106. The recorded MP4 file can be viewed on a mobile phone ex105, a disk recorder ex104, or a personal computer (not shown) equipped with the demultiplexer according to the present invention. Also, the MP4 file is attached to an e-mail and transmitted from the mobile phone ex105 via the radio base station ex107 to another mobile phone ex108 equipped with the demultiplexing device according to the present invention. You can also watch the file. Furthermore, instead of mail attachment, download may be performed from the mobile phone ex105 to the mobile phone ex108 or pseudo-streaming distribution using a protocol such as HTTP (Hyper Text Transport Protocol) and TCP (Transmission Control Protocol).

ディスクレコーダex１０４においても、受信したＴＳパケット列をＭＰ４に変換し、ＳＤカード、ＤＶＤなどの光ディスク、あるいはハードディスクに記録することができる。また、記録したＭＰ４ファイルを、携帯電話や図示しないパーソナルコンピュータに対してダウンロード、あるいは擬似ストリーミング配信してもよい。 Also in the disk recorder ex104, the received TS packet sequence can be converted into MP4 and recorded on an optical disk such as an SD card or a DVD, or a hard disk. The recorded MP4 file may be downloaded to a mobile phone or a personal computer (not shown), or pseudo-streamed.

コンテンツサーバex１０２からインターネット経由で配信されたＴＳパケット列を携帯電話ex１０５、あるいはディスクレコーダex１０４において受信する際にも、上記放送データを受信した際と同様にＭＰ４ファイルを使用することができる。 When receiving the TS packet sequence distributed from the content server ex102 via the Internet at the mobile phone ex105 or the disc recorder ex104, the MP4 file can be used in the same manner as when the broadcast data is received.

また、ＴＳに関わらず、インターネット上のストリーミング配信などで使用されるＲＴＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）などのプロトコルにより送信されたデータをＭＰ４で記録する際にも、本発明に係る逆多重化装置が適用できる。 In addition, the demultiplexer according to the present invention is also used when MP4 is used to record data transmitted by a protocol such as RTP (Real-time Transport Protocol) used for streaming delivery on the Internet regardless of TS. Is applicable.

（実施の形態４）
上記各実施の形態で示した逆多重化装置における逆多重化方法を実現するためのプログラムを、フレキシブルディスク等の記憶媒体に記録するようにすることにより、上記各実施の形態で示した処理を、独立したコンピュータシステムにおいて簡単に実施することが可能となる。 (Embodiment 4)
By recording a program for realizing the demultiplexing method in the demultiplexing apparatus shown in each of the above embodiments on a storage medium such as a flexible disk, the processing shown in each of the above embodiments is performed. It can be easily implemented in an independent computer system.

図２０は、上記各実施の形態の逆多重化装置における逆多重化方法を、フレキシブルディスク等の記録媒体に記録されたプログラムを用いて、コンピュータシステムにより実施する場合の説明図である。 FIG. 20 is an explanatory diagram when the demultiplexing method in the demultiplexing apparatus of each of the above embodiments is implemented by a computer system using a program recorded on a recording medium such as a flexible disk.

図２０(b) は、フレキシブルディスクの正面からみた外観、断面構造、及びフレキシブルディスクを示し、図２０(a) は、記録媒体本体であるフレキシブルディスクの物理フォーマットの例を示している。フレキシブルディスクＦＤはケースＦ内に内蔵され、該ディスクの表面には、同心円状に外周からは内周に向かって複数のトラックＴｒが形成され、各トラックは角度方向に１６のセクタＳｅに分割されている。従って、上記プログラムを格納したフレキシブルディスクでは、上記フレキシブルディスクＦＤ上に割り当てられた領域に、上記プログラムが記録されている。 FIG. 20B shows the appearance, cross-sectional structure, and flexible disk as seen from the front of the flexible disk, and FIG. 20A shows an example of the physical format of the flexible disk that is the recording medium body. The flexible disk FD is built in the case F, and a plurality of tracks Tr are formed concentrically on the surface of the disk from the outer periphery toward the inner periphery, and each track is divided into 16 sectors Se in the angular direction. ing. Therefore, in the flexible disk storing the program, the program is recorded in an area allocated on the flexible disk FD.

また、図２０(c) は、フレキシブルディスクＦＤに上記プログラムの記録再生を行うための構成を示す。多重化方式変換装置における多重化方式変換方法、および逆多重化装置における逆多重化方法を実現する上記プログラムをフレキシブルディスクＦＤに記録する場合は、コンピュータシステムＣｓから上記プログラムをフレキシブルディスクドライブを介して書き込む。また、フレキシブルディスク内のプログラムにより上記各実施の形態の逆多重化装置における逆多重化方法を実現する上記各実施の形態の逆多重化装置における逆多重化方法をコンピュータシステム中に構築する場合は、フレキシブルディスクドライブによりプログラムをフレキシブルディスクから読み出し、コンピュータシステムに転送する。 FIG. 20C shows a configuration for recording and reproducing the program on the flexible disk FD. When the above-described program for realizing the multiplexing method conversion method in the multiplexing method conversion device and the demultiplexing method in the demultiplexing device is recorded on the flexible disk FD, the program is transferred from the computer system Cs via the flexible disk drive. Write. In the case where a demultiplexing method in the demultiplexing device in each of the above embodiments for realizing the demultiplexing method in the demultiplexing device in each of the above embodiments by a program in a flexible disk is constructed in a computer system. The program is read from the flexible disk by the flexible disk drive and transferred to the computer system.

なお、上記説明では、記録媒体としてフレキシブルディスクを用いて説明を行ったが、光ディスクを用いても同様に行うことができる。また、記録媒体はこれに限らず、ＩＣカード、ＲＯＭカセット等、プログラムを記録できるものであれば同様に実施することができる。 In the above description, a flexible disk is used as the recording medium, but the same can be done using an optical disk. Further, the recording medium is not limited to this, and any recording medium such as an IC card or a ROM cassette capable of recording a program can be similarly implemented.

本発明に係る逆多重化装置は、放送や通信などパケットロスによるデータの欠落が発生する環境において受信したストリームを記録したＭＰ４ファイルを再生する機器全般に適用することができ、特にＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）カードなどの記録メディアに蓄積されたコンテンツを再生できる携帯端末などにおいて有効である。 The demultiplexing apparatus according to the present invention can be applied to all devices that play MP4 files that record received streams in an environment in which data loss due to packet loss occurs, such as broadcasting or communication, and in particular, SD (Secure Digital). It is effective in a portable terminal that can reproduce content stored in a recording medium such as a card.

ＭＰＥＧ−４ＡＶＣにおけるＡＵのデータ構造を示す図The figure which shows the data structure of AU in MPEG-4 AVC ＰＥＳパケット、およびＴＳパケットのデータ構造を示す図The figure which shows the data structure of a PES packet and TS packet ＭＰ４のＢｏｘ構造を示す図The figure which shows the Box structure of MP4 ＭＰ４におけるmoovの階層構造を示す図The figure which shows the hierarchical structure of moov in MP4 フラグメント化されたＭＰ４の構造例Example of fragmented MP4 structure データが一部欠落したストリームを格納するＭＰ４ファイルについて説明する図The figure explaining the MP4 file which stores the stream from which some data are missing 従来の逆多重化装置１００の構成を示すブロック図The block diagram which shows the structure of the conventional demultiplexing apparatus 100 従来の逆多重化装置１００の動作を示すフローチャートFlowchart showing the operation of the conventional demultiplexer 100 本発明の実施の形態１に係る逆多重化装置１０００の全体構成を示すブロック図1 is a block diagram showing the overall configuration of a demultiplexer 1000 according to Embodiment 1 of the present invention. 本発明の実施の形態１に係る逆多重化装置１０００の動作概要を示すフローチャートThe flowchart which shows the operation | movement outline | summary of the demultiplexing apparatus 1000 which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る逆多重化装置１０００の動作例を示す図The figure which shows the operation example of the demultiplexing apparatus 1000 which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る逆多重化装置１０００において、サンプルの再生時間長を修正するかどうかを判定する際の動作例を示す図The figure which shows the operation example at the time of determining whether the demultiplexing apparatus 1000 which concerns on Embodiment 1 of this invention corrects the reproduction time length of a sample. 本発明の実施の形態１に係る逆多重化装置１０００における、サンプルの再生時間長を決定する際の動作例を示す図The figure which shows the operation example at the time of determining the reproduction time length of a sample in the demultiplexing apparatus 1000 which concerns on Embodiment 1 of this invention. 不連続区間の再生がスキップされたことのユーザへの通知方法の一例を示す図The figure which shows an example of the notification method to a user that reproduction | regeneration of the discontinuous area was skipped 本発明の実施の形態２に係る逆多重化装置の動作概要を示すフローチャートThe flowchart which shows the operation | movement outline | summary of the demultiplexing apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る逆多重化装置において、不連続区間後の動画像の復号開始サンプルを決定する動作を示すフローチャートThe flowchart which shows the operation | movement which determines the decoding start sample of the moving image after a discontinuous area in the demultiplexing apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る逆多重化装置において、不連続区間後のオーディオの復号開始サンプルを決定する動作を示すフローチャートThe flowchart which shows the operation | movement which determines the decoding start sample of the audio | voice after a discontinuous area in the demultiplexing apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る逆多重化装置の動作例を示すフローチャートThe flowchart which shows the operation example of the demultiplexing apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態１および本実施の形態２に係る逆多重化装置の実用例を示す図The figure which shows the practical example of the demultiplexing apparatus which concerns on Embodiment 1 and Embodiment 2 of this invention 上記各実施の形態の逆多重化装置における逆多重化方法をコンピュータシステムにより実現するためのプログラムを格納するための記憶媒体についての説明図Explanatory drawing about the storage medium for storing the program for implement | achieving the demultiplexing method in the demultiplexing apparatus of each said embodiment with a computer system 不完全なサンプルを含むMP4ファイルを再生する際の動作を示すフローチャートFlow chart showing the operation when playing an MP4 file containing incomplete samples

Explanation of symbols

１０１ヘッダ分離手段
１０２ヘッダメモリ
１０３ｍｄａｔメモリ
１０４サンプル取得手段
１０５時刻情報解析手段
１０６復号表示手段 DESCRIPTION OF SYMBOLS 101 Header separation means 102 Header memory 103 mdat memory 104 Sample acquisition means 105 Time information analysis means 106 Decoding display means

Claims

A demultiplexer that separates and decodes and reproduces each encoded data from data obtained by multiplexing encoded data such as moving images and voices into one or more packets;
Separating means for separating a header and a payload from the multiplexed packet;
Determination means for analyzing whether or not the attribute information for each frame in each encoded data satisfies a predetermined condition by analyzing the header including at least information indicating a decoding time or a display time of the frame in each encoded data When,
If the predetermined condition is satisfied, correction means for correcting the decoding time, display time, or playback time length of the frame indicated by the header;
With
The demultiplexer according to claim 1, wherein the multiplexed moving image or audio encoded data includes a region where frame data is missing.

The attribute information is a playback time length DUR1 of the frame acquired from the header,
The determination means determines that there is a discontinuous section between the frame and a frame whose playback time order is immediately after the frame when the playback time length DUR1 of the frame exceeds a predetermined value;
When it is determined that the discontinuous section exists, the correcting unit resets the playback time length of the frame to a time length DUR2 shorter than the playback time length DUR1,
The demultiplexing apparatus according to claim 1, wherein the discontinuous section indicates that frame data is missing between the two consecutive frames.

The attribute information indicates whether or not the discontinuous section exists between two consecutive frames in the decoding order in each encoded data,
The determination means determines whether the discontinuous section exists based on the attribute information,
When it is determined that the discontinuous section exists, the correcting means sets the playback time length of the previous frame in decoding order to the time length DUR2 shorter than the playback time length DUR1 among the two consecutive frames. The demultiplexing apparatus according to claim 1, wherein the demultiplexing apparatus is reset to

The multiplexed data includes both moving image and audio encoded data,
The determination means acquires a discontinuous section for each of the moving image and the audio frame, determines whether the discontinuous section of the moving image and the discontinuous section of the audio overlap at a reproduction time,
The correction means resets the frame playback time length to a time length DUR2 shorter than the playback time length DUR1 for both the moving image and the audio when the discontinuous sections overlap. The demultiplexing apparatus according to claim 2 or 3, wherein

The demultiplexer according to claim 1, wherein
If the predetermined condition is satisfied, a determination unit that determines a frame to be decoded in the moving image so that a frame whose decoding order is immediately after the frame is a randomly accessible frame;
The demultiplexer further comprising:

The demultiplexer according to claim 1, wherein
The multiplexed data includes both moving image and audio encoded data,
When the predetermined means satisfies the predetermined condition, the random accessible frame to be decoded next in the audio has the same reproduction time as the reproduction time of the next randomly accessible frame to be decoded in the moving image. A demultiplexer characterized by being a frame immediately after.

A demultiplexing method for separating, decoding, and reproducing each encoded data from data obtained by multiplexing encoded data such as moving images and audio into one or more packets,
A separation step of separating a header and a payload from the multiplexed packet;
A determination step of analyzing whether or not the attribute information for each frame in each encoded data satisfies a predetermined condition by analyzing the header including information indicating at least a decoding time or a display time of a frame in each encoded data When,
If the predetermined condition is satisfied, a correction step of correcting the decoding time, display time, or playback time length of the frame indicated by the header;
With
2. The demultiplexing method according to claim 1, wherein the multiplexed moving image or audio encoded data includes an area where frame data is missing.

The demultiplexing method according to claim 7, comprising:
A determination step of determining a frame to be decoded in the moving image so that a frame whose decoding order is immediately after the frame is a randomly accessible frame when the predetermined condition is satisfied;
The demultiplexing method further comprising:

A demultiplexing method for separating, decoding, and reproducing each encoded data from data obtained by multiplexing encoded data such as moving images and audio into one or more packets,
A separation step of separating a header and a payload from the multiplexed packet;
A determination step of analyzing whether or not the attribute information for each frame in each encoded data satisfies a predetermined condition by analyzing the header including information indicating at least a decoding time or a display time of a frame in each encoded data When,
If the predetermined condition is satisfied, a correction step of correcting the decoding time, display time, or playback time length of the frame indicated by the header;
With
The multiplexed moving image or audio encoded data includes a demultiplexing method characterized by including a region where frame data is missing.