KR100660293B1

KR100660293B1 - Terminal voice menu moving system

Info

Publication number: KR100660293B1
Application number: KR1020040040070A
Authority: KR
Inventors: 김경민; 나동원; 엄봉수; 설원희; 이준우; 허재형
Original assignee: 에스케이 텔레콤주식회사
Priority date: 2004-06-02
Filing date: 2004-06-02
Publication date: 2006-12-20
Also published as: KR20050114943A

Abstract

이동통신 단말기를 이용한 애플리케이션에 음성인식 기반의 메뉴 네비게이션 기능을 부가함으로써 다양한 컨텐츠 들을 사용자들에게 제공할 수 있도록 한 단말 음성메뉴 이동 시스템이 개시된다. 시스템은, 사용자에 의해 발성된 음원을 엔코딩한 후 출력하는 이동통신 단말기; 상기 엔코딩된 음원을 입력받아 해석한 후 그에 따른 명령신호를 출력하는 백앤드(Back-end)부; 및 상기 명령신호에 따라 사용자가 원하는 자료를 제공하는 컨텐츠 서버로 구성된다.Disclosed is a terminal voice menu mobile system capable of providing various contents to users by adding a voice recognition-based menu navigation function to an application using a mobile communication terminal. The system includes a mobile communication terminal for encoding and outputting a sound source uttered by a user; A back-end unit which receives the encoded sound source and interprets it, and then outputs a command signal according thereto; And a content server providing data desired by a user according to the command signal.

단말 애플리케이션, 음성인식, Voice Menu Navigation, DSR(Distributed Speech Recognition), ASR(Automatic Speech Recognition) Terminal Applications, Speech Recognition, Voice Menu Navigation, Distributed Speech Recognition (DSR), Automatic Speech Recognition (ASR)

Description

Terminal voice menu moving system {SYSTEM FOR NAVIGATING A VOICE MENU IN A MOBILE PHONE AND METHOD THEREOF}

도 1은 본 발명에 따른 단말 음성메뉴 이동 시스템의 전체 구성을 보인 도이다.1 is a view showing the overall configuration of a mobile terminal voice menu moving system according to the present invention.

도 2는 도 1에서, DSR 서버의 기능을 설명하기 위한 도이다.FIG. 2 is a diagram for explaining a function of a DSR server in FIG. 1.

도 3은 도 1에서, ASR 서버의 기능을 설명하기 위한 도이다.FIG. 3 is a diagram for explaining a function of an ASR server in FIG. 1.

도 4는 WA 서버의 기능을 설명하기 위한 도이다.4 is a view for explaining the function of the WA server.

도 5는 도 1에서, DSR API의 기능을 설명하기 위한 도이다.FIG. 5 is a diagram for explaining a function of a DSR API in FIG. 1.

도 6 내지 도 8은 본 발명에 따른 메뉴이동 수행과정을 나타낸 표시화면이다.6 to 8 are display screens showing a menu moving process according to the present invention.

도 8은 본 발명에 따른 N-best 선택기능의 표시화면을 보인 도이다.8 is a view showing a display screen of the N-best selection function according to the present invention.

도 9는 본 발명에 따른 단말 음성메뉴 이동 시스템이 적용된 사용예를 보인 도이다.9 is a view showing a use example of the terminal voice menu moving system according to the present invention.

<주요도면에 대한 부호의 설명><Description of Symbols for Main Drawings>

10 : 이동통신 단말기 11 : DSR API10: mobile communication terminal 11: DSR API

20 : 백앤드부 21 : DSR 서버20: back-end part 21: DSR server

22 : ASR 서버 23 : TTS 서버22: ASR Server 23: TTS Server

24 : VXML 인터프리터 30 : 컨텐츠 서버24: VXML Interpreter 30: Content Server

본 발명은 이동통신 단말기의 부가 서비스에 관한 것으로서, 보다 상세하게는 이동통신 단말기를 이용한 애플리케이션에 음성인식 기반의 메뉴 네비게이션 기능을 부가함으로써 여러가지 컨텐츠 들을 사용자들에게 제공할 수 있도록 한 단말 음성메뉴 이동 시스템에 관한 것이다.The present invention relates to an additional service of a mobile communication terminal, and more particularly, to a terminal voice menu mobile system for providing various contents to users by adding a voice recognition-based menu navigation function to an application using the mobile communication terminal. It is about.

근래, 이동통신 기술의 발달과 더불어 음성인식 및 음성처리 기술의 발달로 전화번호를 직접 눌러서 전화를 걸지 않고 착신자명을 음성으로 입력하여 전화를 거는 음성인식 다이얼 서비스가 활성화되고 있다.Recently, with the development of mobile communication technology, voice recognition and voice processing technology has been activated for voice recognition dialing service to make a call by inputting the caller's name by voice without directly dialing the telephone number.

또한, 정보를 얻기 위해 음성인식을 이용하는 부가 서비스에 대한 욕구가 날로 증대되고 있다.In addition, the need for additional services using voice recognition to obtain information is increasing day by day.

한편, 이동통신 단말기의 조작은 단순조작과 각종 서비스의 메뉴 조작으로 대별될 수 있는데, 종래의 이동통신 단말기는 음성인식 다이얼 서비스, 음성인식 컨텐츠 서비스, 전화정보 업로드 서비스 및 전화정보 다운로드 서비스 등의 서비스를 이용할 경우, 사용자가 두 자리 이상의 키 조합에 의해 해당 서비스 접속번호(예를 들어, 15**-****) 또는 단축번호(#**)를 입력하여 서비스 접속을 하는 방법을 취하고 있다. On the other hand, the operation of the mobile communication terminal can be roughly divided into simple operation and menu operation of various services. Conventional mobile communication terminals have services such as voice recognition dial service, voice recognition content service, phone information upload service, and phone information download service. In case of using, the user accesses the service by inputting the service access number (for example, 15 **-****) or the shortcut number (# **) by using a combination of two or more digits. .

그러나, 이러한 경우 사용자가 해당 번호를 기억하고 나서 해당 번호를 일일 이 입력해야 하는 번거로움이 있다.However, in this case, the user has to remember the number and then has to enter the number one by one.

또 다른 예로써, 기존의 음성인식 기술이 채용된 길안내 서비스의 경우, 사용자가 버튼을 조작해 해당 서비스에 접속한 뒤에 원하는 메뉴를 선택하고 나서, 목적지를 음성으로 말하게 된다. 이 경우, 서비스 이용까지 많은 조작을 사용자가 수작업으로 해야만 하기 때문에 운전 도중에 이를 이용하게 되면 많은 불편과 위험요소가 발생할 수 있다.As another example, in the case of a road guidance service employing the existing voice recognition technology, a user operates a button to access a corresponding service, selects a desired menu, and then speaks a destination by voice. In this case, since many operations must be done manually by the user until the use of the service, use of it during driving may cause a lot of inconveniences and risks.

서비스의 접속 후에라도 이용 방법에 따른 해당 메뉴를 단말기에서 일일이 찾아 키를 조작해야 하는 불편함이 있었다. 음성인식 다이얼 서비스를 이용할 때 웹 환경에서 가입자 데이터베이스의 개인 주소록에 접속하여 일일이 전화번호의 등록, 편집 및 삭제 작업을 수행하였으나, 이는 인터넷 환경이 제공되지 않는 곳에서는 관리가 불가능한 제약이 있었다.Even after accessing the service, there was an inconvenience in that the user must find a corresponding menu according to the usage method and operate a key. When using the voice recognition dial service, the user accessed the personal address book of the subscriber database in the web environment, and registered, edited, and deleted the phone number. However, this could not be managed in a place where the Internet environment was not provided.

따라서, 이러한 부가 서비스 이용에 있어서의 불합리한 점을 극복하여, 사용자가 보다 간편하면서도 효율적으로 부가 서비스를 이용할 수 있도록 하는 방법이 요구되고 있다.Accordingly, there is a demand for a method of overcoming such an unreasonable point of use of the additional service and enabling the user to use the additional service more simply and efficiently.

본 발명은 상기한 바와 같은 종래기술에 따른 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은 이동통신 단말기를 이용한 애플리케이션에 음성인식 기반의 메뉴 네비게이션 기능을 부가함으로써 사용자들에게 시공의 제약 없이 여러가지 컨텐츠 들을 제공할 수 있는 단말 음성메뉴 이동 시스템을 제공함에 있다. The present invention has been made to solve the problems according to the prior art as described above, an object of the present invention is to add a voice recognition-based menu navigation function to the application using a mobile communication terminal without various construction restrictions to users The present invention provides a terminal voice menu mobile system that can provide contents.

본 발명의 다른 목적은, 음성인식 기술을 이동통신 단말기에 적용해 사용자가 일일이 버튼을 조작하는 대신 사용자 자신의 음성을 이용하여 이동통신 단말기의 모든 키 조작과 각종 부가 서비스를 제공받을 수 있도록 한 단말 음성메뉴 이동시스템을 제공함에 있다.It is another object of the present invention to apply a voice recognition technology to a mobile communication terminal so that a user can receive all the key operations and various additional services of the mobile communication terminal using his or her own voice instead of manipulating buttons one by one. It is to provide a voice menu moving system.

상기한 목적을 달성하기 위하여 본 발명에 따른 단말 음성메뉴 이동 시스템은, 사용자에 의해 발성된 음원을 엔코딩한 후 출력하는 이동통신 단말기; 상기 엔코딩된 음원을 입력받아 해석한 후 그에 따른 명령신호를 출력하는 백앤드(Back-end)부; 및 상기 명령신호에 따라 사용자가 원하는 자료를 제공하는 컨텐츠 서버를 포함하는 것을 특징으로 한다.In order to achieve the above object, a terminal voice menu moving system according to the present invention comprises: a mobile communication terminal for encoding and outputting a sound source uttered by a user; A back-end unit which receives the encoded sound source and interprets it, and then outputs a command signal according thereto; And a content server providing data desired by a user according to the command signal.

첨부된 도면을 참조하여 본 발명의 실시예를 설명하기에 앞서, 본 발명에 따른 단말 음성메뉴 이동 시스템을 최우선 실시예로 구현하기 위하여 전제되어야 할 이동통신 단말기의 내부환경에 대하여 설명한다.Before describing an embodiment of the present invention with reference to the accompanying drawings, it will be described the internal environment of the mobile communication terminal to be assumed in order to implement the terminal voice menu mobile system according to the present invention as a first priority embodiment.

먼저, 음성 메뉴 네비게이션을 위해서는 DSR(Distributed Speech Recognition) Platform을 구축하고 이를 쉽게 이용할 수 있는 단말 DSR API(Distributed Speech Recognition Application Progromming Interface)가 제작되어야 하고, 단말에 음성인식 엔진을 내장시켜 초기 메뉴 인식에 사용하며, 모든 애플리케이션에 음성인식 메뉴 네비게이션 기능을 부가시킬 수 있는 체계를 구축해야 한다. 또한, 고립어 인식에 그치고 있는 현재의 음성 플랫폼(Voice Platform)을 자연어 인식으로 업그레이드해야 하며 인식엔진의 튜닝으로 서비스에 바로 적용할 수 있는 문법(Grammar)을 구축해야 한다.First, for speech menu navigation, a DSR (Distributed Speech Recognition) Protomming Interface (DSR API) must be built and the DSR API can be easily used, and a speech recognition engine is built into the terminal for initial menu recognition. It is important to establish a system that can add voice recognition menu navigation to all applications. In addition, the current voice platform, which is only isolated language recognition, needs to be upgraded to natural language recognition, and a grammar that can be directly applied to services by tuning the recognition engine should be constructed.

이하, 본 발명의 바람직한 실시예를 첨부도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

이에 도시된 바와 같이, 본 발명에 따른 단말 음성메뉴 이동 시스템은, 이동통신 단말기(10)의 DSR(Distributed Speech Recognition) API(Application Progromming Interface)(11)가 백앤드(Back-end)부(20)의 DSR 서버(21)와 통신하는 구조이며, 상기 DSR 서버(21)는 ASR(Automatic Speech Recognition) 서버(22), TTS(Text-To-Speech) 서버(23), VXML(Voice eXtensible Markup Language; 음성확장성 표기언어) 인터프리터(Interpreter) 서버(24)와 연동하여 컨텐츠(Contents) 서버(30)로 부터의 서비스를 지원하게 된다.As shown in the drawing, in the terminal voice menu moving system according to the present invention, the Distributed Speech Recognition (DSR) Application Progromming Interface (API) 11 of the mobile communication terminal 10 includes a back-end unit 20. Communication with the DSR server 21, the DSR server 21 is an Automatic Speech Recognition (ASR) server 22, a Text-To-Speech (TTS) server 23, and Voice eXtensible Markup Language (VXML). The voice extensibility notation language) supports the service from the contents server 30 in conjunction with the interpreter server 24.

상기한 바와 같이 구성된 본 발명에 따른 음성메뉴 이동 시스템의 각 장치들에 대하여, 도 2 내지 도 5를 참조하여 설명한다.Each device of the voice menu moving system according to the present invention configured as described above will be described with reference to FIGS. 2 to 5.

이에 도시된 바와 같이, 상기 DSR 서버(21)는 VoiceXML과 단말 애플리케이션의 연동을 통한 음성출력/화면출력을 동시에 수행한다.As shown therein, the DSR server 21 simultaneously performs voice output / screen output through interworking of VoiceXML and the terminal application.

한편, 상기 DSR 서버(21)에는, 데이터 채널을 이용한 상기 이동통신 단말기(10)에서의 음성입력/키 입력이 동시에 수행 가능한 DSR 기반의 Multimodal UI가 지원된다. 아울러, 상기 이동통신 단말기(10)와 유기적인 연동이 가능하다. 즉, 상기 DSR 서버(21)는 음성엔진의 고성능화가 가능하고 음성통화용 코덱(EVRC) 을 이용함으로써, 상기 이동통신 단말기(10)의 하드웨어 변경 없이 튜닝이 용이하다. Meanwhile, the DSR server 21 supports a DSR-based Multimodal UI capable of simultaneously performing voice input / key input on the mobile communication terminal 10 using a data channel. In addition, organic communication with the mobile communication terminal 10 is possible. That is, the DSR server 21 is capable of high performance of the voice engine, and by using the voice call codec (EVRC), tuning is easy without changing the hardware of the mobile communication terminal 10.

상기 ASR 서버(22)는 라이브러리 형태로 제공되는 음성인식 엔진의 서비스를 구현하고 동작시킨다. 상기 ASR 서버(22)는 TCP 상의 서버형태로 구현되며, 음성인식 서비스를 필요로 하는 응용프로그램 들은 클라이언트가 되어 상기 ASR 서버(22)의 서비스를 이용하게 된다.The ASR server 22 implements and operates a service of a voice recognition engine provided in a library form. The ASR server 22 is implemented in the form of a server on TCP, and applications that require voice recognition services become clients and use the services of the ASR server 22.

또한, 상기 ASR 서버(22)는 클라이언트에 서비스를 제공하기 위해 1개 이상의 TCP 포트를 사용하며, 복수 개의 TCP 소켓 연결을 허용하여 복수 클라이언트가 동시에 서비스를 이용할 수 있게 된다.In addition, the ASR server 22 uses one or more TCP ports to provide a service to a client, and allows a plurality of TCP socket connections to allow a plurality of clients to use the service at the same time.

상기 ASR 서버(22)는 가상의 음성인식 채널을 통해 음성인식 서비스를 제공한다. 채널은 TCP 소켓과는 별도로 상기 ASR 서버(22)의 서비스를 이용하는 하나의 단위이며, 각각 고유한 ID로 구별된다. 상기 ASR 서버(22)는 하나의 TCP 소켓으로 복수 개의 채널을 이용할 수 있도록 지원하며, 최대 사용가능 채널 개수는 상기 ASR 서버(22)에 의해 제한된다The ASR server 22 provides a voice recognition service through a virtual voice recognition channel. A channel is a unit that uses the service of the ASR server 22 separately from the TCP socket, and is distinguished by a unique ID. The ASR server 22 supports the use of multiple channels with one TCP socket, and the maximum number of available channels is limited by the ASR server 22.

상기 ASR 서버(22)는 요청받은 음성인식 작업을 끝내고 결과를 클라이언트에 보내기 전에, 그 결과 데이타를 자신의 서버에 저장한다. 이 저장된 데이타는 음성인식 엔진의 지속적인 튜닝에 사용된다.The ASR server 22 stores the result data in its server before completing the requested voice recognition task and sending the result to the client. This stored data is used for continuous tuning of the speech recognition engine.

상기 ASR 서버(22)는 콘솔창(Console Window)을 통해 실행하고, 실행된 프로그램은 정상적인 경우 종료되지 않고 계속 수행된다. 이때, 실행된 콘솔창을 통해 현재의 채널 상태, I/O 및 음성인식 결과를 확인할 수 있다.The ASR server 22 is executed through a console window, and the executed program is continued without being terminated in a normal case. At this time, you can check the current channel status, I / O and voice recognition results through the executed console window.

도 4는 WA 서버의 기능을 설명하기 위한 도로서, 상기 WA 서버는 CP(Contents Provider) 측에서 서비스에 필요한 문법(Grammer)을 이동통신 단말기에 원격으로 업그레이드할 수 있도록 해주는 미들웨어 개념의 서버를 의미하며, 당업자에게 이미 공지된 기술분야이기 때문에 이에 대한 상세한 설명은 생략하기로 한다.FIG. 4 is a diagram illustrating a function of a WA server. The WA server refers to a server of a middleware concept that enables a CP to remotely upgrade a grammar required for a service to a mobile communication terminal. And, since it is already known in the art, a detailed description thereof will be omitted.

도 5는 도 1에서, DSR API의 기능을 설명하기 위한 도로서, 상기 DSR API(11)는 다음과 같은 기능을 수행한다.FIG. 5 is a diagram for explaining the function of the DSR API in FIG. 1, and the DSR API 11 performs the following functions.

(1) 음성 녹음 생성(1) create a voice recording

녹음 요청에 따른 신호를 입력 받으면, WIPI(Wireless Internet Platform for Interoperability)의 녹음관련 API를 사용하여 마이크로부터 입력되는 음원을 녹음한다. 상기 녹음된 음원은 단말기의 DSP(Digital Signal Processor)에 의해 EVRC로 엔코딩된 후 저장된다.When a signal is received in response to a recording request, the recording source API is recorded using a recording related API of the Wireless Internet Platform for Interoperability (WIPI). The recorded sound source is encoded by EVRC by the DSP (Digital Signal Processor) of the terminal and then stored.

(2) 음성 시작점 끝점 검출(2) Voice start point endpoint detection

사용자는 발성을 시작할 때 한번 버튼을 누르며, 이후 버튼을 떼더라도 계속 녹음을 입력 받는다. 사용자가 발성을 소정 시간 이상 하지 않을 경우에 자동으로 녹음은 종료된다. Zero Crossing Rate와 dB를 기준으로 하여 상기 캡쳐된 음원은 시작점과 끝점을 검출하게 된다.The user presses the button once at the beginning of the talk, and continues recording even after the button is released. Recording stops automatically when the user does not speak for more than a predetermined time. Based on the Zero Crossing Rate and dB, the captured sound source detects a start point and an end point.

(3) 음성 레벨 표시(3) voice level display

음성 녹음이 진행될때는 단말기 화면의 특정위치에 녹음되는 음성의 볼륨을 나타내는 조그만 아이콘이 표시된다.When the voice recording is in progress, a small icon indicating the volume of the voice recorded at a specific position on the terminal screen is displayed.

(4) N-best 화면 생성(4) Create N-best screen

상기 DSR 서버(21)가 음성인식한 결과, 여러 개의 답변을 줄 경우 화면에 적절하게 그 후보들을 표시하는 기능이다. 이 화면에서 사용자는 간단히 번호만을 말해 선택하거나 직접 DTMF(Dual Tone Multi Frequency) 버튼을 누를 수도 있다.As a result of the voice recognition by the DSR server 21, when multiple answers are given, the candidates are properly displayed on the screen. On this screen, the user can simply select a number or select it or directly press the Dual Tone Multi Frequency (DTMF) button.

(5) DSR 서버와 TCP 통신(5) TCP communication with DSR server

EVRC로 엔코딩된 음원은 TCP 연결을 통해 상기 DSR 서버(21)로 전달된다. 상기 DSR 서버(21)는 음성인식한 결과를 역시 이 TCP 연결을 통해 내려주게 된다.The sound source encoded by EVRC is transmitted to the DSR server 21 through a TCP connection. The DSR server 21 also drops the voice recognition result through this TCP connection.

다음은, 상기와 같이 구성된 각 부의 장치들을 이용한 음식인식에 따른 메뉴이동 수행과정에 대하여 설명한다.Next, a process of moving a menu according to food recognition using the apparatus of each unit configured as described above will be described.

도 6 내지 도 8은 본 발명에 따른 메뉴이동 수행과정을 나타낸 표시화면으로서, 도 6은 음성인식에 따른 기본 인터페이스 과정을, 도 7은 사용자로부터의 음성명령 체계를, 도 8은 N-best 선택화면을 각각 나타낸 표시화면이다.6 to 8 are display screens showing a menu moving process according to the present invention, FIG. 6 is a basic interface process according to voice recognition, FIG. 7 is a voice command system from a user, and FIG. 8 is an N-best selection. Each screen is a display screen.

먼저, 도 6을 참조하여, 음성인식에 따른 기본 인터페이스 과정에 대하여 설명한다.First, referring to FIG. 6, a basic interface process according to voice recognition will be described.

단말의 애플리케이션은 음성인식이 가능한 상태일때 입술 모양의 아이콘을 화면을 통해 표시한다. 사용자가 음성입력 버튼("통화" 버튼)을 누를 경우 녹음이 되고 있는 음성의 볼륨을 화면의 하단에 아이콘 형식으로 표현한다.The application of the terminal displays a lip icon on the screen when the voice recognition is possible. When the user presses the voice input button ("call" button), the volume of the voice being recorded is expressed in the form of an icon at the bottom of the screen.

음성녹음이 완료되고, 이 녹음된 음성을 상기 DSR 서버(21)로 넘겨 인식을 하고 있는 동안에는, "음성인식 중"이라는 아이콘을 화면 하단에 배치함으로써 사 용자가 음성인식의 진행상황을 파악할 수 있도록 한다. 음성녹음은 별도의 인식 핫키(hotkey)가 있어 오프 상태에서 온 상태가 되도록 눌러지면 확인음과 함께 개시된다. 이때, 음성녹음 잔여시간을 표시하는 진행바(Progressive Bar)가 약 2,3초 표시되고 끝점 검출도 진행된다.While voice recording is completed and the recorded voice is handed over to the DSR server 21 for recognition, the user can grasp the progress of voice recognition by placing an icon of "voice recognition" at the bottom of the screen. do. Voice recording is initiated with a confirmation tone when pressed to turn on from off with a separate recognition hotkey. At this time, a progress bar displaying the remaining voice recording time is displayed for about 2 or 3 seconds and the end point detection is also performed.

상기 인식과정이 성공하면, 해당 메뉴를 하이라이트(high-light)하여 사용자로 하여금 인식결과를 피드백(Feedback) 하도록 하고, 만약 인식결과가 불확실할 경우에는 화면 하단에 작은 창을 띄워 N-best 처리하고 사용자로 하여금 최종 확인하도록 한다.If the recognition process is successful, the menu is highlighted to allow the user to feed back the recognition result. If the recognition result is uncertain, a small window is displayed at the bottom of the screen to process N-best. Have the user make a final check.

다음은, 도 7을 참조하여, 사용자로부터의 음성명령 체계에 따른 메뉴항목 이동과정에 대하여 설명한다.Next, a menu item moving process according to a voice command system from a user will be described with reference to FIG. 7.

먼저, 음성명령은 현재 표시되는 메뉴항목을 읽는 단순 음성명령과 여러 단계의 메뉴 트리를 가로지르는 단축 음성명령으로 구분된다.First, a voice command is divided into a simple voice command for reading a currently displayed menu item and a short voice command crossing a menu tree of several levels.

단순 음성명령은 사용자가 화면의 메뉴항목을 읽음으로써 이루어지는데, 복합단어로 구성된 메뉴항목의 경우 여러 가지 대체 레이블(Alias)를 고려해야 한다. 예를 들어 "주변 시설물 찾기"와 같은 메뉴는 사용자가 "주변 시설물"과 같이 줄여서 한 단어로 말할 수도 있기 때문에, 사용자의 편의를 위한 여러 가지 대체 레이블(Alias)이 문법(Grammar)에 추가된다.Simple voice commands are made by the user reading a menu item on the screen. In the case of a menu item composed of compound words, various alternative labels should be considered. For example, a menu such as "Find Peripheral Facility" may be abbreviated in one word by the user, such as "Peripheral Facility", so that various alternative labels (Alias) are added to Grammar for the user's convenience.

단축 음성명령은 단말의 애플리케이션을 사용하는 사용자의 발성패턴(Corpus)을 수집하여 메뉴단계를 거치지 않고, 한번에 명령을 내리는 기능이다. 예를 들어 "서울 시청으로 길안내"와 같은 발성에 대해서 "길안내" 선택과 더불어 "서울 시청"이라는 POI까지 선택해주어 사용자의 편의를 제공한다.The short voice command is a function of collecting a utterance pattern (Corpus) of a user who uses an application of the terminal and giving a command at a time without going through a menu step. For example, for voices such as "Directions to Seoul City Hall", a user can be selected by selecting a "Route Guidance" and a POI called "Seoul City Hall".

단축 음성명령을 지원하기 위해서는 음성인식 엔진서버가 연속어 인식이 가능해야 하며, 문법(Grammar)이 ABNF(Augmented Backus-Naur Form) 형식이나 이와 동등한 형식의 문법구조를 수용할 수 있는 구조여야 한다.In order to support the short voice command, the speech recognition engine server must be able to recognize the continuous word, and the grammar must be able to accommodate the grammatical structure of the Augmented Backus-Naur Form (ABNF) form or the equivalent form.

한편, DSR 구조의 음성인식은 음성의 엔코딩, 전송, 디코딩, 인식, 인식결과 전송 등의 복잡한 과정을 거치는데다가 무선 인터넷의 전송속도 제한으로 인해 사용자의 대기시간이 길어질 수 있는 문제점이 있다. 통상적인 경우, 1 ~ 2초 내에 이 과정이 모두 끝나지만 처음 연결을 접속하는 과정은 접속지연이 발생하기 마련이다.On the other hand, the voice recognition of the DSR structure has a problem that the user's waiting time may increase due to the limitation of the transmission speed of the wireless Internet while going through a complicated process of encoding, transmitting, decoding, recognizing, and transmitting a recognition result. In general, the process is completed within 1 to 2 seconds, but the first connection connection process is delayed.

이러한 음성인식 시간 지연은 단말의 음성인식 애플리케이션을 사용자가 기피하게 만드는 주요한 원인이 될 수 있기 때문에, 적어도 초기화면에 대해서는 단말기에 내장된 음성인식 엔진을 이용하여 인식을 하도록 해서 사용자의 대기시간을 최소한으로 하는 기법을 적용한다.Since the voice recognition time delay may be a major cause for the user to evade the voice recognition application of the terminal, at least the initial screen may be recognized using the voice recognition engine built in the terminal to minimize the user's waiting time. Apply the technique.

단말기에 내장되는 음성인식 엔진은 단말 스펙의 제한으로 대어휘 문법(Grammar)이나 연속어인식 등이 지원되지 않는 문제점이 있다. 따라서, 단말에서 인식되는 초기화면의 경우 화면에 보이는 메뉴항목과 약간의 대체 레이블(Alias) 만이 음성인식 어휘에 포함된다.The speech recognition engine embedded in the terminal has a problem in that a large vocabulary grammar or continuous word recognition is not supported due to the limitation of the terminal specification. Therefore, in the case of the initial screen recognized by the terminal, only the menu items displayed on the screen and some alternative labels are included in the speech recognition vocabulary.

사용자에 따라서는 초기 지연시간이 다소 있더라도 단축음성명령을 통한 메뉴건너뛰기를 더 선호할 수도 있다. 따라서, 초기화면을 단말에 내장된 음성인식엔진을 사용할 것인지, DSR 서버를 통해 더 정교하고 풍부한 음성인식기능을 사용할 것인지를 결정하는 설정 유저 인터페이스를 더불어 제공될 수도 있다.Some users may prefer to skip menus via short voice commands, even if there is some initial delay. Accordingly, the initial screen may be provided with a setting user interface for determining whether to use a voice recognition engine built in the terminal or to use a more sophisticated and rich voice recognition function through the DSR server.

시스템의 음성인식은 100% 인식률을 만족시킬 수 없기 때문에, 음성인식 오류 (Out-of-Vocabulary)에 대한 친절한 유저 인터페이스를 제공해야 한다.Since the speech recognition of the system cannot satisfy 100% recognition rate, it must provide a friendly user interface for out-of-vocabulary.

사용자가 발성한 단어가 인식대상 어휘 중에서 어느 하나라도 압도적인 정확도를 나타내지 않는다면 비교적 높은 정확도(60% 대)를 갖는 단어들의 리스트로 대상을 좁혀서 사용자에게 다시 발성을 하도록 안내하는 기능이 필요하다. 이 기능을 N-best Selection 기능이라고 정의한다.If a word spoken by the user does not show an overwhelming accuracy of any of the words to be recognized, a function of narrowing the object to a list of words having a relatively high accuracy (60%) is required to guide the user to speak again. This function is defined as N-best Selection function.

N-best 선택상태일 때, 대상 인식단어의 갯수가 줄어들어 사용자가 동일한 발성을 하더라도 더 높은 인식률을 예측할 수 있으며, N-best 화면이 떴을때는 사용자가 버튼을 눌러서 입력을 하거나, "1번", "2번"과 같은 번호를 발성하여 다른 식으로 입력을 시도할 가능성이 많아 정확한 인식을 기대할 수 있다.When N-best is selected, the number of target recognition words is reduced so that the user can predict a higher recognition rate even if the user has the same voice.When the N-best screen is displayed, the user presses a button to enter or “1”, You can expect accurate recognition because it is more likely to try to input differently by uttering a number like "2".

부가하자면, 상기 N-best 선택기능을 통하여, 인식오류에 대한 긴 타임아웃을 설정하지 않게 되며, 인식오류에 대한 판단을 최대한 빨리하여 사용자의 대기시간이 줄게 된다.In addition, through the N-best selection function, the long timeout for the recognition error is not set, and the determination of the recognition error is made as soon as possible, thereby reducing the waiting time of the user.

도시된 바와 같이, 사용자는 한번의 통화버튼을 누른 뒤 모든 서비스를 음성으로 이용하는 것을 볼 수 있다. 이와 같이 음성으로 컨텐츠를 이용하게 되면, 불필요한 동작이 없어져 사용자는 편하고 안전하게 서비스를 이용할 수 있게 된다.As shown, the user can press the call button once and see that all services are spoken. As such, when the content is used by voice, unnecessary operation is eliminated, and the user can use the service comfortably and safely.

이상에서는 본 발명을 특정의 바람직한 실시예에 대하여 도시하고 설명하였으나, 본 발명은 상기한 실시예에 한정되지 아니하며, 특허 청구의 범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형이 가능할 것이다.While the invention has been shown and described with respect to certain preferred embodiments thereof, the invention is not limited to the embodiments described above, but in the field to which the invention pertains without departing from the spirit of the invention as claimed in the claims. Any person with ordinary knowledge will be able to make various modifications.

본 발명의 가장 큰 장점은 사용자가 단 한번의 버튼 조작으로 핸드폰의 모든 기능과 각종 컨텐츠들의 메뉴선택을 음성으로 한다는 것이다. 이는 사용자가 다른 작업을 하면서 각종 핸드폰 서비스를 이용할 수 있게 된다는 것을 의미한다.The greatest advantage of the present invention is that the user makes a voice selection of all functions of the mobile phone and menus of various contents with a single button operation. This means that users can use various mobile phone services while doing other tasks.

또한, 본 발명에 의해, 사용자는 한번의 통화버튼을 누른 뒤 모든 서비스를 음성으로 이용하는 것을 볼 수 있다. 따라서, 음성으로 컨텐츠를 이용 하게 되면 불필요한 동작이 없어져 사용자는 편하고 안전하게 통신사에서 제공하는 각종 부가서비스를 이용할 수 있게 된다.In addition, according to the present invention, the user can see that all the services are used as voice after pressing a call button once. Therefore, when the content is used by voice, unnecessary operations are eliminated, and the user can use various additional services provided by the communication company conveniently and safely.

Claims

A mobile communication terminal for encoding a sound source uttered by a user with an enhanced variable rate codec (EVRC) and outputting the encoded sound source;

A back-end unit which receives the encoded sound source and interprets it, and then outputs a command signal according thereto; And

And a content server providing data desired by a user according to the command signal.

The method of claim 1, wherein the mobile communication terminal,

Implemented as a Distributed Speech Recognition (DSR) application programming interface (API) for moving a voice menu, and a terminal voice menu moving system comprising a built-in voice recognition engine for initial menu recognition.

The method of claim 2, wherein the DSR API,

A terminal voice menu moving system, which performs voice recording generation, voice start and end point detection, voice level display, N-best screen generation, and TCP communication with the DSR server.

The method of claim 3, wherein the voice recording is generated,

And recording the sound source according to the user's recording request signal, and encoding and storing the recorded sound source.

The method of claim 3, wherein the voice start point and the end point detection,

The terminal voice menu moving system, characterized in that performed based on zero crossing point and divide (dB).

The method of claim 3, wherein the voice level display,

And a predetermined icon representing the volume of the voice at a predetermined position on the screen during voice recording.

According to claim 3, The N-best screen generation,

And if the DSR server recognizes a plurality of answers, the candidate words are selectively displayed on the screen.

According to claim 3, The N-best screen generation,

The terminal voice menu moving system, which can be implemented by a user inputting only a predetermined number to select a candidate word or directly pressing a dual tone multi frequency (DTMF) button.

The method of claim 1, wherein the back-end portion,

A Distributed Speech Recognition (DSR) server that performs voice / screen output through interworking of VoiceXML and a terminal application program; And

Implemented in the form of a library, the terminal voice menu moving system characterized in that it comprises an Automatic Speech Recognition (ASR) server for operating a service of the speech recognition engine.