WO2007037641A1

WO2007037641A1 - Optional encoding system and method for operating the system

Info

Publication number: WO2007037641A1
Application number: PCT/KR2006/003903
Authority: WO
Inventors: Yun Ho Jeon
Original assignee: Realnetworks Asia Pacific Co., Ltd.
Priority date: 2005-09-30
Filing date: 2006-09-28
Publication date: 2007-04-05
Also published as: CN101273405A; KR100757858B1; CN101273405B; KR20070036870A

Abstract

A variable encoding system and method for operating the system, the method including: receiving audio data from a predetermined server, encoding the audio data via a predetermined encoder and providing a user terminal with the audio data. The variable encoding system and method for operating the system can increase usage efficiency of a memory device of a mobile terminal recording audio data and reduce a load of a wireless communication network by encoding audio data in a variable encoding system based on characteristics of data, and transmitting the audio data to a second user terminal via the wireless communication network.

Description

OPTIONAL ENCODING SYSTEM AND METHOD FOR OPERATING THE

SYSTEM

Technical Field The present invention relates to a method and system for receiving audio data from a predetermined server, encoding the audio data via a predetermined encoder and providing a user terminal with the audio data. In this instance, the encoder may be variably set based on a characteristic of the audio data and when the audio data includes more voice data than a predetermined ratio, the encoder includes qualcomm code excited linear prediction (QCELP), enhanced voice rated codec (EVRC), or adaptive multi-rate (AMR) and the like.

Background Art

As the Internet has developed nowadays, mobile terminals storing audio contents and replaying on demand have come into wide use. For example, in the case of downloading audio contents to a mobile terminal and using the audio contents such as a podcasting service, the audio contents are initially required to be downloaded to a computer terminal. The audio contents downloaded to the computer terminal are transmitted to the mobile terminal such as a Moving Picture Experts Group Audio Layer 3 (MP3) player, a mobile phone and the like in an encoded form of encoding based on an audio compression technology such as the MP3 method, an advanced audio coding (ACC) method, and the like. Thus, the mobile terminal may replay compressed audio contents by decoding the compressed audio contents. Also, the computer terminal may download audio contents, such as news broadcasting and the like, from a server providing the audio data in a predetermined cycle, encodes the audio contents, and provides the mobile terminal with the audio contents.

In this instance, the mobile terminal further includes a memory device for recording the audio contents which may be recorded in the memory device of the mobile terminal. However, the mobile terminal that was currently and widely used generally has a memory capacity of tens of or hundreds of megabytes (MB). The memory capacity may be insufficient for recording audio contents that are encoded at a high bit rate. Thus, for actual use, a technology that maximally compresses or encodes the audio contents recorded in the memory device is required.

Specifically, the audio data received from a predetermined server is already encoded in a particular method when receiving the audio data. By transmitting to the mobile terminal after transcoding the audio data that is encoded in the particular method, based on characteristics of the audio data, a technology that can increase memory efficiency of the mobile terminal and reduce a load of a transmission channel is required.

For example, audio data such as music, and the like, is generally compressed into a bit rate greater than 128 Kbps, and any voice-centered contents where sound quality is generally not a concern require a bit rate of at least 32 Kbps, but a vocoder that is optimized for human voice, for example, enhanced voice rated codec (EVRC), may be compressed into a low bit rate of 8 Kbps.

However, a sound source is generally by or in a grouping provided in an encoding method such as the MP3 method or the ACC method and the like regardless of voice data and music data in a rich site summary (RSS) or a podcasting according to a conventional art. Accordingly, cases when the mobile terminal stores the voice data compressed into a higher bit rate than necessary often occur. Thus, there is a problem that the memory of the mobile terminal is inefficiently used.

Disclosure of Invention Technical Goals

An aspect of the present invention provides a method and system for increasing usage efficiency of a memory device of a mobile terminal.

An aspect of the present invention also provides a method and system for reducing a load of a wireless communication network by encoding audio data in a variable encoding system based on characteristics of data, and transmitting the audio data to a second user terminal via the wireless communication network.

Technical solutions According to an aspect of the present invention, there is provided a method of variably encoding audio data including: receiving the audio data from a predetermined server; determining whether voice data is contained in the audio data by analyzing a data format of the audio data; generating second audio data by encoding only a portion corresponding to the voice data among the audio data via a predetermined vocoder when the voice data is contained in the audio data, the second audio data comprising conversion information about the vocoder and the encoding; and transmitting the generated second audio data to a second user terminal, wherein the second user terminal decodes the second audio data based on the conversion information.

According to another aspect of the present invention, there is provided a system for variably encoding audio data including: a receiver receiving the audio data from a predetermined server; a converter determining whether voice data is contained in the audio data by analyzing a data format of the audio data, and generating second audio data by encoding only a portion corresponding to the voice data among the audio data via a predetermined vocoder when the voice data is contained in the audio data, the second audio data comprising conversion information about the vocoder and the encoding; and a transmitter transmitting the generated second audio data to a second user terminal, wherein the second user terminal decodes the second audio data based on conversion information.

Brief Description of Drawings

FIG. 1 is a diagram illustrating a network including a variable encoding system, a server and a second user terminal according to the present invention;

FIG. 2 is a flowchart illustrating an operation based on a method of variably encoding audio data according to the present invention;

FIG. 3 and FIG. 4 are diagrams illustrating examples of networks including a variable encoding system, a server and a second user terminal according to the present invention;

FIG. 5 is a diagram illustrating data formats of audio data and second audio data according to an exemplary embodiment of the present invention;

FIG. 6 is a block diagram illustrating an internal configuration of a variable encoding system according to an exemplary embodiment of the present invention; and FIG. 7 is an internal block diagram of a general-purpose computer apparatus which can be adopted in implementing a variable encoding method according to the present invention. Best Mode for Carrying Out the Invention

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

FIG. 1 is a diagram illustrating a network including a variable encoding system, a web server, and a second user terminal according to the present invention.

Referring to FIG. 1, a variable encoding system 100 according to the present invention receives predetermined audio data from a web server 110. The web server 110 according to an embodiment of the present invention provides a podcasting service or a rich site summary (RSS) service. Accordingly, the variable encoding system 100 receives the audio data from the web server 110 in a predetermined cycle. Also, the audio data may include music data, voice data, or broadcasting data. The variable encoding system 100 that receives the audio data analyzes the audio data and identifies whether voice data is contained in the audio data. Identifying whether the voice data is contained in the audio data by analyzing the audio data may use a conventional art. For example, in order to identify whether the audio data is generally made of a human voice, the method for determining whether sound is cut off at a ratio greater than a predetermined ratio can be used. Also, whether the voice data is contained in the audio data may be determined by checking whether a predetermined pitch is detected from the audio data or whether a frequency of the audio data is crowded in a predetermined band by identifying the frequency of the audio data. Also, a current mobile communication terminal controls a transmission band in real-time via a function such as a voice activity detector (VAD), discontinuous transmission (DTX), or variable rate codec (VRC) and the like. Unlike the mobile communication terminal identifying in real-time whether the voice data is contained in the audio data, the variable encoding system 100 according to the present invention may determine in comparatively greater detail whether the voice data is contained in the audio data due to more available time in analyzing the audio data.

The variable encoding system 100 that receives the audio data from the server 110 determines whether the voice data is contained in the audio data, and encodes the voice data via a predetermined vocoder when the voice data is contained in the audio data. The variable encoding system 100 according to an embodiment of the present invention may use a vocoder such as qualcomm code excited linear prediction (QCELP), enhanced voice rated codec (EVRC), adaptive multi-rate (AMR), and the like. The second audio data is generated from the audio data after encoding via the vocoder. The second audio data may be encoded at a bit rate corresponding to about 8 Kbps when the EVRC is used for the audio data including the voice data. Also, when the voice data is not contained in the audio data, but when the music data or the song data is contained in the audio data, the variable encoding system 100 does not encode the audio data again.

A second user terminal 120 receives the second audio data from the variable encoding system 100.

The variable encoding system 100 according to an exemplary embodiment of the present invention is a computer terminal that receives the audio data from a service where audio contents are provided in the podcasting service or a similar method. Accordingly, the variable encoding system 100 receives the audio data from the server 110 via a wired/wireless Internet communication network. Also, the variable encoding system 100 variably encodes the second audio data, or transmits the audio data as is to the second user terminal 120. In this instance, the second user terminal 120 is a mobile terminal such as a mobile communication terminal, a Moving Picture Experts Group Audio Layer 3 (MP3) player, a PlayStation Portable (PSP), a portable multimedia player (PMP), a personal digital assistant (PDA), or an electronic notebook and the like, and the computer terminal transmits the second audio data by connecting with the second user terminal 120. The variable encoding system 100 according to an exemplary embodiment of the present invention is a predetermined independent server. Accordingly, the variable encoding system 100 receives the audio data from the server 110 via the wired/wireless communication network, variably generates the second audio data from the audio data, or transmits the audio data as is to the second user terminal 120. In this instance, the second user terminal 120 is the mobile communication terminal, and the variable encoding system 100 wirelessly transmits the second audio data to the mobile communication terminal via a data channel. Thus, the variable encoding system 100 according to the present invention may have an effect such as an increase of memory efficiency of the second user terminal 120, load reduction of a transmission channel, and the like. Specifically, the variable encoding system 100 according to the present invention may reduce full volume of the audio data by encoding only the voice data at a smaller bit rate again when the voice data is partially or fully contained in the audio data.

FIG. 2 is a flowchart illustrating an operation based on a method of variably encoding audio data according to the present invention.

In operation 201, a server transmits predetermined audio data to the variable encoding system according to an embodiment of the present invention. The server is a system that provides a podcasting service or an RSS service. Accordingly, the variable encoding system identifies a renewed audio data list by identifying the server in a predetermined cycle, and requires the audio data to be transmitted when there is the renewed audio data. In operation 202, the variable encoding system receives the audio data from the server and analyzes a data format. The audio data includes data such as broadcasting, music, a song, a voice, and the like. Accordingly, the audio data has a particular nature based on the data format and the particular nature may determine a characteristic by analyzing a frequency band, pitch detection, whether the sound is cut off, and the like. A characteristic of the audio data is determined by using the conventional art as is.

In operation 203, the variable encoding system determines whether the voice data is contained in the audio data based on an analysis result of the data format. The variable encoding system determines whether the voice data is contained in the audio data by analyzing the frequency band, pitch detection, whether the sound is cut off, and the like. The variable encoding system according to an embodiment of the present invention separates one audio data into a predetermined portion and identifies each portion which contains the voice data in the audio data. Here, whether each portion includes an index and whether the index includes the audio data is recorded in a predetermined memory device. Also, in operation 203, by branching to operation 206 when the voice data is not contained in the audio data as a result of analyzing the data format, the variable encoding system transmits the audio data as is to the second user terminal. When the voice data is partially or fully contained in the audio data, the variable encoding system encodes only a portion corresponding to the voice data among the audio data via a predetermined vocoder in operation 204 and generates the second audio data in operation 205. The variable encoding system according to an exemplary embodiment of the present invention encodes a predetermined portion corresponding to the voice data among the audio data via the vocoder. For example, to encode the audio data where a middle portion corresponds to the voice data, the variable encoding system encodes only the middle portion via the vocoder, and generates the second audio data by inserting identification information such as a predetermined flag or index information and the like into a beginning location of the middle portion or recombining conversion information such as vocoder information and the like. Specifically, when the voice data is partially contained in the audio data and the music data is partially contained in the audio data, the second audio data has a different bit rate classified by each partial interval. For example, the audio data may be encoded in the portion corresponding to the voice data at an 8 Kbps bit rate and be encoded in the portion corresponding to the music data at a 128 Kbps bit rate.

The variable encoding system according to an exemplary embodiment of the present invention may encode the total audio data at a bit rate corresponding to the voice data, when the voice data is contained in the audio data at a ratio corresponding to more than a predetermined ratio. In this instance, the predetermined ratio may be set by a developer or an operator of the variable encoding system.

In operation 206, the variable encoding system transmits the generated second audio data to the second user terminal. The variable encoding system according to an exemplary embodiment of the present invention may be embodied in a user's computer terminal and the second user terminal may be a mobile terminal such as a mobile phone, a PDA, an electric notebook, a PMP, a PSP, an MP3 player, and the like. The exemplary embodiment of the present invention is described in detail with reference to FIG. 3. FIG. 3 is a diagram illustrating an example of a network including a variable encoding system, a server and a second user terminal according to the present invention.

Referring to FIG. 3, the variable encoding system 300 may be embodied on a computer terminal 310. Specifically, the variable encoding system 300 is a predetermined application program or hardware located in the computer terminal 310. A server 301 transmits the audio data to the computer terminal 310 via a network 302 in a predetermined cycle based on the podcasting service or the RSS service. The network 302 may be considered as a wired/wireless network to provide the computer terminal 310 with Internet communication capacity. The computer terminal 310 that receives the audio data via the network 302 determines whether the voice data is contained in the audio data in the variable encoding system 300. When the voice data is contained in the audio data, the variable encoding system 300 generates the second audio data after encoding the audio data via the vocoder. When the second user terminal connects with the computer terminal 310, the computer terminal 310 transmits the second audio data that the variable encoding system 300 generates to the second user terminal. The second user terminal is a mobile terminal, such as an MP3 player 304, a mobile communication terminal 305, a PlayStation 306, and the like, having a predetermined memory device.

The second user terminal connects with the variable encoding system 300 via a short-distance communication module such as a universal serial bus (USB) module, a recommended standard-232C (RS-232C) module, a Bluetooth module, and the like, and the variable encoding system 300 transmits the second audio data to the second user terminal by identifying a connection of the second user terminal.

The variable encoding system according to an exemplary embodiment of the present invention is a predetermined independent server and the second user terminal is the mobile communication terminal. The exemplary embodiment of the present invention is described in detail with reference to FIG. 4. FIG. 4 is a diagram illustrating an example of a network including a variable encoding system, a server and a second user terminal according to an embodiment of the present invention.

Referring to FIG. 4, the variable encoding system 400 receives predetermined audio data from a server 401 via a network 402. In this instance, the network 402 may be interpreted a broad meaning including all wired/wireless communication network.

Similar to the exemplary embodiment of FIG. 3, the variable encoding system 400 that receives the audio data identifies whether the voice data is contained in the audio data, and generates the second audio data after encoding the audio data via the predetermined vocoder when the voice data is contained in the audio data. Also, the generated second audio data is transmitted to the second user terminal via the network 403. The second user terminal is a mobile communication terminal 404 and the network 403 includes a wireless communication network including a predetermined communication provider system.

Specifically, the variable encoding system 400 requires the communication provider system to establish a channel with a mobile communication terminal 404. Thus, the communication provider system sets a wireless channel of the variable encoding system 400 and the mobile communication terminal 404, and the variable encoding system 400 wirelessly transmits the second audio data to the mobile communication terminal 404 via the wireless channel. Also, the mobile communication terminal 404 according to an exemplary embodiment of the present invention queries whether there is the second audio data transmitting the variable encoding system 400 in a predetermined cycle, and requires the variable encoding system 400 to transmit the second audio data when there is the second audio data. Finally, the variable encoding system 400 according to the present invention may reduce memory usage of the mobile communication terminal 404 by efficiently reducing a volume of the audio data, and reduce the load of the transmission channel based on the mobile communication network.

Referring to FIG. 2 again, in operation 207, the second user terminal decodes the second audio data based on the conversion information and provides the user with the second audio data via a predetermined speaker device.

The variable encoding system according to an exemplary embodiment of the present invention maintains a user database recording user information about at least one user. The user information includes identification information of the second user terminal corresponding to the user, and telephone number information may be used as an example of the identification information. Specifically, the variable encoding system reads and extracts the user information corresponding to the second user terminal by referring to the user database to transmit the generated second audio data to the second user terminal, and wirelessly transmits the second audio data to the second user terminal based on the identification information corresponding to the user information. In this instance, the second user terminal is the mobile communication terminal such as the mobile phone.

FIG. 5 is a diagram illustrating data formats of audio data and second audio data according to an exemplary embodiment of the present invention. Referring to reference numeral 501 in FIG. 5, audio data according to an exemplary embodiment of the present invention is an 'A.MP3¹. The 'A.MP3' includes a plurality of playlists and the variable encoding system identifies whether the voice data is contained in the audio by analyzing each playlist. For example, 'A.MP3' is radio broadcasting and may include narration data of an announcer and music data. As a result of analyzing the playlist, the variable encoding system determines that 'Al' and 'A3' are the music data, and 'A2' and 'A4' are the narration data of the announcer. Also, the variable encoding system encodes 'Al' and 'A3' that are determined as the music data and encodes 'A2' and 'A4' by using the predetermined vocoder. Specifically, the variable encoding system analyzes each audio data classified by each playlist and, as a result of analysis, implements a heterogeneous encoding on each playlist. In this instance, the second user terminal is required to have a function to replay each list based on the playlist. Similar to reference numeral 501, the audio data where the voice data is significantly contained may prevent a problem that the audio data may be determined as the music data or the song data due to the music data or the song data at the beginning of the audio data.

Referring to reference numeral 502 in FIG. 5, the variable encoding system deletes the playlist from the reference numeral 501, inserts conversion information related to encoding in each playlist, and recombines the playlist into one audio data. In the case of the reference numeral 502, predetermined software that may decode the audio data encoded via a plurality of encoders is required. Since the software is a well-known and common-use technology, a detail description is omitted.

FIG. 6 is a block diagram illustrating an internal configuration of a variable encoding system according to an exemplary embodiment of the present invention

Referring to FIG. 6, the variable encoding system 600 according to the present invention includes a receiver 601 , a converter 602, and a transmitter 603.

The receiver 601 receives audio data from a predetermined server. The server provides the audio data such as voice, music, a song, broadcasting, and the like as a general server that provides the audio data. Also, the audio data includes all encoded data or unprocessed data.

The converter 602 determines whether voice data is contained in the audio data by analyzing a data format of the audio data that is received from the receiver 601, and generates second audio data by encoding the audio data via a predetermined vocoder when the voice data is contained in the audio data. The converter 602 according to an exemplary embodiment of the present invention determines whether a plurality of data that the received audio data is divided based on a predetermined playlist is each voice data. Accordingly, discriminative encoding is separately implemented in the plurality of data and the plurality of data is generated into the second audio. In this instance, the second audio data includes conversion information about the vocoder and the encoding.

The converter 602 according to an exemplary embodiment of the present invention generates the audio data into the second audio data via a particular encoder by a command of a user. The user may set the audio data via the particular encoder based on the user's taste or encoding errors to be encoded into the second audio data. For example, the user may set music data or song data to be encoded into the vocoder, according to memory capacity of the second user terminal.

The transmitter 603 transmits the generated second audio data to the user terminal. The variable encoding system 600 according to an exemplary embodiment of the present invention is included in a predetermined computer terminal in the form of an application program or hardware. Specifically, the receiver 601 receives the audio data from a predetermined server via an Internet communication network in a wired/wireless form, and the converter 602 determines whether the voice data is contained in the audio data and generates the second audio data by encoding the audio data via the vocoder when the voice data is contained in the audio data. Thus, when the second user terminal is connected via a short-distance communication module, such as a USB module, an RS-232C module, a ultra wideband (UWB) module, a Bluetooth module, a wireless local area network (LAN), and the like, and the transmitter 603 transmits the second audio data to the second user terminal.

The variable encoding system 600 according to an exemplary embodiment of the present invention is a predetermined independent server. Thus, the receiver 601 receives the audio data from the server via a wired/wireless communication network, and the converter 602 generates the second audio data according to whether the voice data is contained in the audio data. Thus, the transmitter 603 wirelessly transmits the second audio data to the second user terminal. The second user terminal includes a mobile communication terminal, a public switched telephone network (PSTN) terminal, voice over Internet protocol (VoIP), session initiation protocol (SIP), a media gateway controller (Megaco), a personal digital assistant (PDA), a cellular phone, a personal communication service (PCS) phone, a hand-held personal computer (PC), a code division multiple access (CDMA)-2000(lX, 3X) phone, a wideband CDMA (WCDMA) phone, a dual band/dual mode phone, a global system for mobile communication (GSM) phone, a mobile broadband system (MBS) phone, a satellite/terrestrial digital multimedia broadcasting (DMB) phone, and the like, as a predetermined communication terminal.

The variable encoding system 600 according to an exemplary embodiment of the present invention further includes a user database 604 and a database management unit 605.

The user database 604 maintains user information about at least one user. The user information includes identification information of the second user terminal corresponding to the user. Also, the database management unit 605 reads and extracts the user information corresponding to the second user terminal by referring to the user database 604, controls the transmitter 603, and wirelessly transmits the second audio data to the second user terminal, based on the identification information corresponding to the user information.

For example, the transmitter 603 parses the user database 604 to wirelessly transmit the second audio data to the second user terminal, reads and extracts predetermined the user information. The user information includes the identification information, such as telephone number information of the second user terminal and the like, and the transmitter 603 transmits the second audio data to the second user terminal based on the identification information such as the telephone number information and the like.

FIG. 7 is an internal block diagram of a general-purpose computer apparatus which can be adopted in implementing a variable encoding method according to the present invention.

A computer apparatus 700 includes at least one processor 710 connected to a main memory device including a RAM (Random Access Memory) 720 and a ROM (Read Only Memory) 730. The processor 710 is also known as a central processing unit (CPU). As well-known in the field of the art, the ROM 730 unidirectionally transmits data and instructions to the CPU, and the RAM 720 is generally used for bidirectionally transmitting data and instructions. The RAM 720 and the ROM 730 may include a certain proper form of a computer-readable recording medium. A mass storage device 740 is bidirectionally connected to the processor 710 to provide additional data storage capacity and may be one of a number of computer-readable recording mediums. The mass storage device 740 is used for storing programs, data, and the like, and is an auxiliary memory device such as a hard disc that is generally slower than the main memory device. A particular mass storage device such as a CD ROM 760 may be used. The processor 710 is connected to at least one input/output interface 750 such as a video monitor, a track ball, a mouse, a keyboard, a microphone, a touch-screen type display, a card reader, a magnetic or paper tape reader, a voice or hand- writing recognizer, a joystick, or other known computer input/output unit. The processor 710 may be connected to a wired or wireless communication network via a network interface 770. The procedure of the described method can be performed via the network connection. The described devices and tools are well-known to those skilled in the art of computer hardware and software.

The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the present invention.

Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Industrial Applicability

An aspect of the present invention provides a method and system for increasing usage efficiency of a memory device of a mobile terminal recording audio data. An aspect of the present invention also provides a method and system for reducing a load of a wireless communication network by encoding audio data in a variable encoding system based on characteristics of the audio data, and transmitting the audio data to a second user terminal via the wireless communication network.

Claims

1. A method of selectively encoding audio data, the method comprising: receiving the audio data from a predetermined server; determining whether voice data is contained in the audio data by analyzing a data format of the audio data; generating second audio data by encoding only a portion corresponding to the voice data among the audio data via a predetermined vocoder when the voice data is contained in the audio data, the second audio data comprising conversion information about the vocoder and the encoding; and transmitting the generated second audio data to a second user terminal, wherein the second user terminal decodes the second audio data based on the conversion information.

2. The method of claim 1, wherein a selective encoding system comprises a computer terminal, and when the second user terminal connects with the computer terminal, the computer terminal transmits the second audio data to the second user terminal.

3. The method of claim 1 , further comprising: maintaining a user database which records user information about at least one user, the user information comprising identification information of the second user terminal corresponding to the user, wherein the transmitting comprises: reading and extracting user information corresponding to the second user terminal by referring to the user database; and wirelessly transmitting the second audio data to the second user terminal, based on identification information corresponding to the user information.

4. The method of claim 1, wherein the vocoder comprises at least one of qualcomm code excited linear prediction (QCELP), enhanced voice rated codec

(EVRC), and adaptive multi-rate (AMR)

5. The method of claim 1, wherein the audio data is received from the server in a rich site summary (RSS) method.

6. The method of claim 1, wherein the second audio data is generated by dividing and encoding the audio data into a plurality of audio data via a heterogeneous vocoder.

7. A computer-readable recording medium storing a program for implementing the method according to any one of claims 1 through 6.

8. A system for selectively encoding audio data, the system comprising: a receiver receiving the audio data from a predetermined server; a converter determining whether voice data is contained in the audio data by analyzing a data format of the audio data, and generating second audio data by encoding only a portion corresponding to the voice data among the audio data via a predetermined vocoder when the voice data is contained in the audio data, the second audio data comprising conversion information about the vocoder and the encoding; and a transmitter transmitting the generated second audio data to a second user terminal, wherein the second user terminal decodes the second audio data based on conversion information.

9. The system of claim 8, further comprising: a user database recording user information about at least one user, the user information comprising identification information of the second user terminal corresponding to the user; and a database management unit reading and extracting the user information corresponding to the second user terminal by referring the user database, and controlling the transmitter to wirelessly transmit the second audio data to the second user terminal, based on the identification information corresponding to the user information.