CN100420186C

CN100420186C - Method and apparatus for playing storaged voice in network

Info

Publication number: CN100420186C
Application number: CNB2006100035788A
Authority: CN
Inventors: 刘利锋; 于锋; 刘廷永; 郑志彬
Original assignee: Huawei Technologies Co Ltd
Current assignee: Hangzhou Huawei Enterprises Communications Technologies Co Ltd
Priority date: 2006-02-15
Filing date: 2006-02-15
Publication date: 2008-09-17
Anticipated expiration: 2026-02-15
Also published as: CN1852110A

Abstract

The present invention discloses a method for playing stored voice in a network system, which comprises that the same voice contents respectively become corresponding voice code data based on different codes of voice code types for carrying out storage in advance on a network side; a subsequent network side carries out an interaction with a client end of a user for knowing the voice code types supported by the client end of the user; the corresponding voice code data which corresponds to the voice contents is selected according to the voice code types supported by the client end of the user and is played to the client end of the user. Correspondingly, the present invention also provides an apparatus for playing stored voice in a network system. The present invention can avoid that real-time voice code treatment occupies a processing resource of a network system in the process of playing stored voice to make the network system ensure that more users can simultaneously receive the played voice.

Description

The method of playing storaged voice and device thereof in network system

Technical field

The present invention relates to the speech play technical field, especially relate to a kind of in network system the method and the device thereof of playing storaged voice.

Background technology

At present, the technology of playing storaged voice has obtained very general application in a lot of network environments in the IP network system, as storaged voice play-back technology (VoIP in IP-based voice communication system, Voice Over IP), the integrated (CTI of computer speech, Computer Tone Intergrated), IP call center or internet radio etc. have very all obtained disposing widely in the network application of fashion, thereby the voice document that makes these network applications some can both be prestored plays to networking client by network system.

Generally most of voice document is stored in the corresponding webserver in advance with forms such as MP3, WAV or WMA, network system adopts RTP (RTP, Real-timeTransport Protocol) that audio medium stream is transmitted usually simultaneously; In order to improve the validity of playing storaged voice in network system, need at first the voice document of storing in advance to be carried out speech coding and handle like this, the speech data after then speech coding being handled is sent to networking client as the load of RTP.Wherein Chang Yong voice coding modes comprise G.711, G.723, G.728 with multiple modes such as gsm, different voice coding modes can be supported different transmission rates and voice quality respectively.

In the prior art, the process of carrying out playing storaged voice in network system is as follows usually:

In network system, at first voice document is stored in advance with normally used wave file WAV form, when needs carry out speech play, the network system side is selected a kind of voice coding modes that client can be supported that receives, the voice document of the WAV form that will store in advance according to selected voice coding modes reads out and carries out real-time corresponding encoding process then, at last the speech data encapsulation process after the encoding process is become the RTP form and transmit, thereby the speech play that realization will be stored is in advance given the purpose of networking client.

But adopt above-mentioned in network system the processing mode of playing storaged voice, but exist following defective usually:

When needs carry out speech play, need carry out the real-time voice encoding process to the voice document of storage in advance, and in the applied environment of reality, carrying out the speech coding processing in real time is relatively to expend network system to handle resource, especially in internet radio or IP call center's network application, a centralized servers often needs to provide the speech play service for the multi-channel network client simultaneously, and the voice coding modes that might different networking clients can support has nothing in common with each other, and playing progress rate each other is also asynchronous, will take a large amount of network system processing resources if in the speech play process, carry out the speech coding processing so in real time, thereby make network system can guarantee that the number of users that receives voice simultaneously will reduce greatly.

Summary of the invention

The technical problem to be solved in the present invention be to propose a kind of in network system the method and the device thereof of playing storaged voice, to avoid in the playing process of storaged voice real-time voice encoding process to take the processing resource of network system, make network system guarantee that more user can receive the broadcast voice simultaneously.

For addressing the above problem, the technical scheme that the present invention proposes is as follows:

A kind of in network system the method for playing storaged voice, comprise step:

Same voice content is become corresponding vocoded data based on different speech coding type codings respectively;

The corresponding vocoded data that described coding is obtained is stored in the predefined lattice frame file of network side in advance;

The subsequent network side is by knowing the speech coding type that subscription client is supported alternately with subscription client; And

According to the speech coding type that subscription client is supported, in described lattice frame file, select the corresponding vocoded data of the corresponding voice content of storage to play to subscription client.

Preferably, described predefined format is a Resource Interchange File Format.

Preferably, network side by the session initiation protocol signaling, H.323 signaling or customized signaling and subscription client carry out session interaction, know the speech coding type that subscription client is supported.

Preferably, network side is carrying out in the reciprocal process with subscription client, carries out the speech coding type that dissection process knows that subscription client is supported by the packet that subscription client is sent.

Preferably, the described speech coding type of supporting according to subscription client, the corresponding vocoded data of the corresponding voice content of selection storage in described lattice frame file specifically comprises:

In advance in network side storing speech coding type with can find to the corresponding relation between the address deviant of vocoded data that should the speech coding type;

Network side indexes the corresponding address deviant according to the speech coding type of subscription client support in described corresponding relation; And

According to the address deviant that indexes, in described lattice frame file, find the vocoded data of the speech coding type that the respective user client of storage supports.

Preferably, network side sends to subscription client by the form that vocoded data is packaged into the RTP packet, realizes vocoded data is played to subscription client.

Wherein said speech coding type comprises at least a in the following speech coding type:

G.711 speech coding type;

G.723 speech coding type;

G.728 speech coding type;

Gsm speech coding type.

A kind of in network system the device of playing storaged voice, comprising:

Memory cell is used for storing in advance same voice content respectively based on each vocoded data of different speech coding type codings at network side predefined format document framework;

Know the unit, be used for network side by knowing the speech coding type that subscription client is supported alternately with subscription client;

Selected cell is used for the corresponding vocoded data of knowing the speech coding type of the subscription client support of knowing the unit at the corresponding voice content of each vocoded data selection of described cell stores according to described;

Broadcast unit is used for the vocoded data that described selected cell is selected is played to subscription client.

Preferably, described selected cell specifically comprises:

Storing sub-units is used for storaged voice type of coding in advance and can finds the corresponding relation between the address deviant of vocoded data that should the speech coding type;

The index subelement is used for knowing that according to described the speech coding type of the subscription client support of knowing the unit indexes the corresponding address deviant in described storing sub-units stored relation;

Search subelement, the address deviant that is used for indexing according to described index subelement finds the vocoded data of the speech coding type of respective user client support at each vocoded data of described cell stores.

Preferably, described broadcast unit specifically comprises:

The encapsulation subelement is used for the vocoded data that described selected cell is selected is packaged into the RTP packet;

Send subelement, be used for the RTP packet after the described encapsulation subelement encapsulation process is sent to subscription client, realize vocoded data is played to subscription client.

The beneficial effect that the present invention can reach is as follows:

Technical solution of the present invention is by in advance will be based on the different speech coding storage of same voice content in the network system side, the subsequent network side can be by knowing the speech coding type that subscription client is supported with subscription client alternately like this, and then in each vocoded data of storage in advance, find corresponding vocoded data according to the speech coding type of subscription client support, be potted directly in the RTP load data bag and play to subscription client, and need not the real-time speech coding conversion process of carrying out, thereby can avoid because the network system that the real-time voice encoding process takies is handled resource, make network system can guarantee that more user receives the broadcast voice simultaneously, has increased user capacity.

Description of drawings

Fig. 1 is the main realization principle flow chart of the method for the present invention's playing storaged voice in network system;

The embodiment process chart that Fig. 2 implements in IP call center for the inventive method;

Fig. 3 is the main composition structured flowchart of the device of the present invention's playing storaged voice in network system;

Fig. 4 is the concrete composition structured flowchart of selected cell in apparatus of the present invention;

Fig. 5 is the concrete composition structured flowchart of broadcast unit in apparatus of the present invention.

Embodiment

The main purpose that technical solution of the present invention proposes is in order to play in the voice document process of storage in advance in network system, real-time voice encoding process process is avoided, preventing that the real-time voice encoding process from taking the too much processing resource of network system, thereby the number of users that guarantees to receive the voice that network system plays increases.In order to achieve the above object, the main design concept of technical solution of the present invention is at first to define a kind of voice document to preserve form, then all voice documents to be played are converted to the defined preservation form of the present invention, follow-uply generate the netcast VoP fast send to user's receiving terminal on this preserves the basis of form, wherein technical solution of the present invention goes for all the storaged voice file being generated the field that the RTP packet transmits.

Be explained in detail below in conjunction with main realization principle, specific implementation process and the corresponding beneficial effect thereof of each accompanying drawing technical solution of the present invention.

Please refer to Fig. 1, this figure is the main realization principle flow chart of the method for the present invention's playing storaged voice in network system, and its main implementation procedure is as follows:

Step S10 at first becomes corresponding vocoded data based on different speech coding type codings respectively with same voice content and stores at network side in advance; Wherein can store at network side in advance same voice content is placed in the file based on Resource Interchange File Format framework (RIFF, Resource Interchange File Format) based on each vocoded data of different speech coding type coding respectively.

Promptly at first the RIFF file format is expanded here, preserve form with the new voice document that forms the present invention program's definition, will the same voice content of expression but speech coding form any a plurality of vocoded datas inequality be placed in the RIFF file after this expansion.

One of them RIFF file can be made up of a plurality of chunks, and each basic chunk is defined as follows:

typedef unsigned long DWORD；

typedef unsigned char BYTE；

typedef DWORD FOURCC； //Four-character code

typedef FOURCC CKID； //Four-character-code chunk identifier

typedef DWORD CKSIZE； //32-bit unsigned size

typedef struct{ //Chunk structure

CKID ckID； //Chunk type identifier

CKSIZE ckSize； //Chunk size field (size of ckData)

BYTE ckData[ckSize]；//Chunk data

}CK；

Common RIFF voice document is defined as follows (is example with the WAVE voice document):

<WAVE-form>->

RIFF(′WAVE′

<frnt-ck> //Format

[<fact-ck>] //Fact chunk

[<cue-ck>] //Cue points

[<playlist-ck>] //Playlist

[<assoc-data-list> //Associated data list

<wave-data> ) //Wave data

Here a kind of chunk of redetermination in the RIFF file is used for describing the various vocoded data tabulations based on same voice content proposed by the invention, and wherein ckID equals " mct ", and ckDara is by following mctArray[n] expression:

typedef unsigned long DWORD；

typedef unsigned char BYTE；

Typedef struct{ // speech coding form

BYTE mctType; // speech coding type

BYTE mctMinLenth; // minimum frame length

BYTE mctReserved[2]; // keep

DWORDmctLocation; The relativity shift position of // vocoded data

}MCT；

MctType: be the speech coding type, consistent with loadtype (PT, the Payload Type) value of RTP;

MctMinLenth: be the minimum frame length of this speech coding type;

MctLocation: the side-play amount of every kind of relative first vocoded data of vocoded data, first language

The side-play amount of sound coded data is 0;

MctReserveed: keep, use as subsequent expansion.

MCT mctArray[n]；

N: represent the speech coding type number that this chunk supports.

Shown in the more voice coded data storage format that the multiple speech coding categorical data based on same voice content that the present invention proposes is here formed is specific as follows:

<WAVE-form>-> RIFF((′WAVE′

<mct-ck>

<fmt-ck> //type 1 Format

<wave-data> //type 1 Wave data

<fmt-ck> //type 2 Format

<wave-data> //type 2 Wave data

<fmt-ck> //type n Format

<wave-data> //type n Wave data

)

As above shown in the RIFF extendfile structure,＜mct-ck〉in preserved all n mct structures; Ensuing one group＜fmt-ck〉and＜wave-data〉be used for representing vocoded data based on a kind of speech coding type, it is identical with definition in the wave file that these two chunk represent, can comprise many group＜fmt-ck in the RIFF file〉and＜wave-data 〉.Like this, can be according to this extended format so that preserve any multiple vocoded data in RIFF file simultaneously based on same voice content.

Step S20, the subsequent network side is by knowing the speech coding type that subscription client is supported alternately with subscription client; Wherein network side can pass through session initiation protocol (SIP, Session InitialProtocol) signaling, H.323 signaling or self-defining signaling and subscription client carry out session interaction, with the speech coding type of knowing that subscription client is supported; Can also carry out in the mutual process with subscription client simultaneously, carry out the speech coding type that dissection process knows that subscription client is supported by the packet that subscription client is sent.

Step S30, network side is selected the corresponding vocoded data of corresponding voice content according to the speech coding type of the above-mentioned subscription client support of knowing, wherein the process of the corresponding vocoded data of the corresponding voice content of speech coding type selecting supported according to subscription client of network side is specific as follows:

Shown in the RIFF file structure after the above-mentioned expansion, in advance in network side storing speech coding type and can finding to the corresponding relation between the address deviant of vocoded data that should the speech coding type; Network side just can index the corresponding address deviant according to the speech coding type of the above-mentioned subscription client support of knowing in those corresponding relations like this, and finds vocoded data to the speech coding type of should subscription client supporting according to the address deviant that indexes in database;

Network side plays to this subscription client with the vocoded data of above-mentioned selection then, wherein the network side form that can be packaged into the RTP packet by the vocoded data that will select sends to this subscription client, thereby realizes the vocoded data of above-mentioned selection is played to this subscription client.

Mentioned speech coding type comprises speech coding type G.711, G.723 speech coding type, at least a speech coding type in speech coding type or the gsm speech coding type etc. G.728 in the said method wherein of the present invention.

Being example with IP call center's netcast storaged voice file below is elaborated to the main implementation process of said method of the present invention.Please refer to Fig. 2, this figure is the embodiment process chart that the inventive method is implemented in IP call center, in IP call center's network, in advance voice document to be played is carried out encoding process based on different speech coding types and become different vocoded datas, vocoded data after each processing is preserved form according to the above-mentioned RIFF file that proposes of the present invention preserve, will introduce whole speech play flow process by the netcast example of IP call center below at network side.Suppose that this IP call center comprises IP phone terminal, the server two large divisions of IP call center, both support Session Initiation Protocol respectively; Wherein IP call center's voice suggestion file (promptly treating the voice document of netcast) is kept in the server, and is as follows based on the specific implementation process of above-mentioned basic condition:

When step S100, IP call center will play voice document to subscription client, open corresponding RIFF voice document, and navigate to the mct-ck field;

Step S110, IP call center's server carries out session negotiation by SIP signaling and IP phone terminal, obtain IP phone terminal and server and carry out each required parameter of session, as comprise media channel address, the needed speech coding type of voice transfer or the like, in the above-mentioned mct-ck field that navigates to, find the MCT structure at the speech coding type place of mating then with the IP phone terminal according to negotiation result;

Step S120, IP call center's server navigates to fmt-ck field value and wave-data field value to the speech coding type of should the IP phone terminal supporting according to the mctLocation field value in the above-mentioned MCT structure that finds then;

Step S130, IP call center's server read out the data encapsulation of an integral multiple mctMinLenth field value in RTP load data bag in the wave-data field value that navigates to;

Step S140, data read in the wave-data field value that server judges whether to navigate to is complete, if, close the RIFF voice document that this is opened, and then IP call center's server is transferred to the IP phone terminal with the RTP load data bag of above-mentioned generation, realizes the vocoded data of the speech coding type correspondence of IP phone terminal support is played to this IP phone terminal; Continue execution in step S130 otherwise return.

From above-mentioned implementation process as can be seen technical solution of the present invention do not carry out complicated real-time voice encoding process, and just in the process of netcast voice document, only the speech coding type according to the subscription client support finds corresponding vocoded data from the RIFF file after the expansion, and be potted directly into to play in the RTP load data bag and get final product, therefore saved the processing resource of network system greatly.

In sum, technical solution of the present invention is by will be based in the different speech coding storage of the same voice content RIFF file after the expansion, like this in the netcast voice process, just can from the RIFF file, read corresponding vocoded data according to the speech coding type of subscription client support, be potted directly in the RTP load data bag and play to subscription client, and need not the real-time speech coding conversion process of carrying out, thereby can save the processing resource of network system Play Server greatly, increase user's capacity; The voice document that proposed here of the present invention is preserved form and other preservations form is the same and can play based on common speech player in addition, such as playing based on the MediaPlayer speech player etc.

The design principle of the method for playing storaged voice in network system based on the above-mentioned proposition of the present invention, the present invention also proposed here a kind of in network system the device of playing storaged voice, please refer to Fig. 3, this figure is the main composition structured flowchart of the device of the present invention's playing storaged voice in network system, it mainly comprises memory cell 10, know unit 20, selected cell 30 and broadcast unit 40, and wherein the main effect of each component units is as follows:

Memory cell 10 is mainly used in and stores in advance in network side predefined format document framework same voice content respectively based on each vocoded data of different speech coding type codings;

Know unit 20, be mainly used in network side by knowing the speech coding type that subscription client is supported alternately with subscription client;

Selected cell 30 is mainly used in the corresponding vocoded data of knowing speech coding type corresponding voice content of selection in each vocoded data of said memory cells 10 storages of the subscription client support of being known unit 20 according to above-mentioned;

Broadcast unit 40 is mainly used in the selected vocoded data that goes out of above-mentioned selected cell 30 is played to subscription client.

Please refer to Fig. 4, this figure is the concrete composition structured flowchart of selected cell in apparatus of the present invention, wherein the selected cell in apparatus of the present invention 30 mainly comprises storing sub-units 301, index subelement 302 and searches subelement 303, and the main effect of each part is as follows:

Storing sub-units 301 is used for storaged voice type of coding in advance and can finds the corresponding relation between the address deviant of vocoded data that should the speech coding type;

Index subelement 302 is used for knowing that according to above-mentioned the speech coding type of the subscription client support that unit 20 is known indexes the corresponding address deviant in above-mentioned storing sub-units 301 stored relation;

Search subelement 303, be used for finding vocoded data broadcast unit 40 and then will search the form that vocoded data that subelement 303 found is packaged into RTP load data bag and play to subscription client to the speech coding type of should subscription client supporting according to each vocoded data that the address deviant that above-mentioned index subelement 302 indexes is stored in said memory cells 10.

Please refer to Fig. 5, this figure is the concrete composition structured flowchart of broadcast unit in apparatus of the present invention, and wherein the broadcast unit in apparatus of the present invention 40 specifically comprises encapsulation subelement 401 and sends subelement 402, and wherein the concrete effect of each component units is as follows:

Encapsulation subelement 401 is used for above-mentioned selected cell 30 selected vocoded datas are packaged into RTP load data bag;

Send subelement 402, be used for the RTP load data bag after above-mentioned encapsulation subelement 401 encapsulation process is sent to subscription client, thereby realization plays to selected cell 30 selected corresponding vocoded datas the purpose of subscription client.

Concrete technology of other of device of playing storaged voice in network system of proposing here of the present invention realize that the relevant art of the method for playing storaged voice in network system of details and the above-mentioned proposition of the present invention realizes that details is same or similar in addition, please refer to the specific descriptions of correlation technique realization details in the said method, no longer too much give unnecessary details here.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. the method for a playing storaged voice in network system is characterized in that, comprises step:

2. the method for claim 1 is characterized in that, described predefined format is a Resource Interchange File Format.

3. the method for claim 1 is characterized in that, network side by the session initiation protocol signaling, H.323 signaling or customized signaling and subscription client carry out session interaction, know the speech coding type that subscription client is supported.

4. the method for claim 1 is characterized in that, network side is carrying out in the reciprocal process with subscription client, carries out the speech coding type that dissection process knows that subscription client is supported by the packet that subscription client is sent.

5. the method for claim 1 is characterized in that, the described speech coding type of supporting according to subscription client, and the corresponding vocoded data of the corresponding voice content of selection storage in described lattice frame file specifically comprises:

6. the method for claim 1 is characterized in that, network side sends to subscription client by the form that vocoded data is packaged into the RTP packet, realizes vocoded data is played to subscription client.

7. as the described method of 1～6 arbitrary claim, it is characterized in that described speech coding type comprises at least a in the following speech coding type:

G.711 speech coding type;

G.723 speech coding type;

G728 speech coding type;

Gsm speech coding type.

8. the device of a playing storaged voice in network system is characterized in that, comprising:

9. device as claimed in claim 8 is characterized in that, described selected cell specifically comprises:

10. device as claimed in claim 8 is characterized in that, described broadcast unit specifically comprises: