KR102744896B1

KR102744896B1 - Display apparatus and controlling method thereof

Info

Publication number: KR102744896B1
Application number: KR1020230113974A
Authority: KR
Inventors: 김양수; 수라즈 싱 탄와르
Original assignee: 삼성전자주식회사
Priority date: 2017-05-12
Filing date: 2023-08-29
Publication date: 2024-12-20
Anticipated expiration: 2043-04-18
Also published as: KR102524675B1; KR20230130589A; KR20180124682A; KR20250002061A

Abstract

디스플레이 장치가 개시된다. 본 디스플레이 장치는, 디스플레이 및 복수의 텍스트 객체를 포함하는 UI 스크린을 표시하도록 디스플레이를 제어하고, 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자가 함께 표시되도록 디스플레이를 제어하고, 사용자가 발화한 음성의 인식 결과가 표시된 숫자를 포함하면 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 프로세서를 포함한다.A display device is disclosed. The display device includes a processor that controls the display to display a UI screen including a display and a plurality of text objects, controls the display to display a preset number together with a text object different from a predetermined language among the plurality of text objects, and performs an operation related to a text object corresponding to the displayed number when the recognition result of a voice spoken by a user includes the displayed number.

Description

{DISPLAY APPARATUS AND CONTROLLING METHOD THEREOF}

본 개시는 디스플레이장치 및 이의 제어방법에 관한 것으로, 더욱 상세하게는 다양한 언어로 구성된 컨텐츠에 대한 음성인식 제어를 제공하는 디스플레이장치 및 이의 제어방법에 관한 것이다.The present disclosure relates to a display device and a control method thereof, and more particularly, to a display device providing voice recognition control for content composed of various languages and a control method thereof.

전자 기술의 발달에 힘입어 다양한 유형의 디스플레이장치가 개발 및 보급되고 있었다. 특히, TV, 휴대폰, PC, 노트북 PC, PDA 등과 같은 각종 전자 장치들은 대부분의 일반 가정에서도 많이 사용되고 있었다.Thanks to the development of electronic technology, various types of display devices have been developed and distributed. In particular, various electronic devices such as TVs, mobile phones, PCs, laptop PCs, PDAs, etc. have been widely used in most households.

한편, 최근에는 디스플레이장치를 조금 더 편리하고 직관적으로 제어하기 위하여 음성 인식을 이용한 기술이 개발되고 있었다.Meanwhile, technology using voice recognition has been developed recently to control display devices more conveniently and intuitively.

종래 사용자의 음성에 따라 제어되는 디스플레이장치들은 음성인식엔진을 이용하여 음성 인식을 수행하게 되는데, 언어마다 다른 음성인식엔진이 존재하므로, 어떤 음성인식엔진을 이용해서 음성 인식을 수행할지 미리 결정할 필요가 있었다. 따라서, 보통은 디스플레이장치의 시스템 언어를 음성 인식에 사용할 언어로 결정하였다.Conventionally, display devices controlled by the user's voice perform voice recognition using a voice recognition engine. However, since different voice recognition engines exist for each language, it was necessary to determine in advance which voice recognition engine to use for voice recognition. Therefore, the system language of the display device was usually determined as the language to be used for voice recognition.

그러나 예컨대 디스플레이장치에 표시된 하이퍼링크텍스트에서 사용된 언어가 영어이고, 디스플레이장치의 시스템 언어는 한국어인 경우, 사용자가 그 하이퍼링크텍스트에 해당하는 음성을 발화하더라도 그 음성은 한국어 음성인식엔진을 거쳐 한국어 텍스트로 변환되기 때문에, 결국 해당 하이퍼링크텍스트를 선택할 수 없다는 문제가 있었다.However, for example, if the language used in the hyperlink text displayed on the display device is English and the system language of the display device is Korean, there is a problem in that even if the user speaks the voice corresponding to the hyperlink text, the voice is converted into Korean text through the Korean speech recognition engine, so the hyperlink text cannot be selected.

이와 같이 종래엔, 시스템 언어와 디스플레이장치에 실제 표시된 언어가 서로 다른 경우에 음성으로 디스플레이장치를 제어하는데 제한이 있었다.In this way, in the past, there were limitations in controlling the display device by voice when the system language and the language actually displayed on the display device were different.

본 개시는 상술한 필요성에 따른 것으로, 본 개시의 목적은 다양한 언어로 구성된 컨텐츠에 대한 음성인식 제어를 제공하는 디스플레이장치 및 이의 제어방법을 제공함에 있다.The present disclosure is made in accordance with the aforementioned necessity, and an object of the present disclosure is to provide a display device that provides voice recognition control for contents composed of various languages, and a control method thereof.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따른 디스플레이 장치는, 디스플레이 및 복수의 텍스트 객체를 포함하는 UI 스크린을 표시하도록 상기 디스플레이를 제어하고, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자가 함께 표시되도록 상기 디스플레이를 제어하고, 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 프로세서를 포함한다.In order to achieve the above-described purpose, according to one embodiment of the present disclosure, a display device includes a processor configured to control the display to display a UI screen including a display and a plurality of text objects, to control the display to display a preset number together with a text object having a different language from a predetermined language among the plurality of text objects, and to perform an operation related to a text object corresponding to the displayed number when a recognition result of a voice spoken by a user includes the displayed number.

이 경우, 상기 프로세서는, 상기 디스플레이 장치의 설정 메뉴에서 설정된 사용 언어를 상기 기결정된 언어로 설정할 수 있다. 또는 상기 복수의 텍스트 객체에 가장 많이 사용된 언어를 상기 기결정된 언어로 설정할 수 있다.In this case, the processor may set the language set in the setting menu of the display device to the predetermined language. Or, the most frequently used language in the plurality of text objects may be set to the predetermined language.

한편, 상기 UI 스크린은 웹 페이지이며, 상기 프로세서는, 상기 웹 페이지의 언어 정보에 대응되는 언어를 상기 기결정된 언어로 설정할 수 있다.Meanwhile, the UI screen is a web page, and the processor can set a language corresponding to language information of the web page to the predetermined language.

한편, 상기 프로세서는, 상기 복수의 텍스트 객체 중 2 이상의 언어로 구성된 텍스트 객체에 대해선, 상기 기결정된 언어의 포함 비율이 기설정된 비율 미만인 경우 상기 기결정된 언어와 상이한 텍스트 객체인 것으로 판단할 수 있다.Meanwhile, the processor may determine that a text object composed of two or more languages among the plurality of text objects is a text object different from the predetermined language if the inclusion ratio of the predetermined language is less than a preset ratio.

한편, 상기 프로세서는, 상기 기 설정된 숫자를 상기 기 설정된 숫자에 대응되는 텍스트 객체에 인접하여 표시하도록 상기 디스플레이를 제어할 수 있다.Meanwhile, the processor can control the display to display the preset number adjacent to a text object corresponding to the preset number.

한편, 본 개시에 따른 디스플레이 장치는 외부 장치와 통신을 수행하는 통신부를 더 포함하고, 상기 프로세서는, 상기 외부 장치의 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 상기 기설정된 숫자를 표시하도록 상기 디스플레이를 제어할 수 있다.Meanwhile, the display device according to the present disclosure further includes a communication unit that performs communication with an external device, and the processor can control the display to display the preset number while a signal corresponding to selection of a specific button of the external device is received.

이 경우, 상기 외부 장치는 마이크를 포함하며, 상기 통신부는, 상기 외부 장치의 마이크를 통해 입력된 음성에 대응하는 음성신호를 수신하고, 상기 프로세서는, 상기 수신된 음성신호에 대한 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, the external device includes a microphone, the communication unit receives a voice signal corresponding to a voice input through the microphone of the external device, and the processor can perform an operation related to a text object corresponding to the displayed number if the recognition result for the received voice signal includes the displayed number.

이 경우, 상기 프로세서는, 상기 수신된 음성신호에 대한 인식 결과가 상기 복수의 텍스트 객체 중 어느 하나에 대응하는 텍스트를 포함하면 해당 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, if the recognition result for the received voice signal includes text corresponding to any one of the plurality of text objects, the processor can perform an operation related to the corresponding text object.

한편, 상기 텍스트 객체와 관련된 동작은, 상기 텍스트 객체에 대응하는 URL 주소의 웹 페이지의 표시 동작 또는 상기 텍스트 객체에 대응하는 애플리케이션 프로그램 실행 동작일 수 있다.Meanwhile, the action related to the text object may be an action of displaying a web page of a URL address corresponding to the text object or an action of executing an application program corresponding to the text object.

한편, 상기 복수의 텍스트 객체는 제1 애플리케이션의 실행 화면에 포함된 것이며, 상기 프로세서는, 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행할 수 있다.Meanwhile, the plurality of text objects are included in the execution screen of the first application, and if the processor determines that there is no object corresponding to the recognition result of a voice spoken by the user while the execution screen of the first application is displayed on the execution screen of the first application, the processor can execute a second application different from the first application and perform an operation corresponding to the recognition result of the voice.

이 경우, 상기 제2 애플리케이션은 검색어에 대한 검색 결과를 제공하는 애플리케이션이고, 상기 프로세서는, 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 텍스트를 검색어로 한 검색 결과를 제공할 수 있다.In this case, the second application is an application that provides search results for a search word, and if the processor determines that an object corresponding to a recognition result of a voice spoken by a user while the execution screen of the first application is displayed is not present on the execution screen of the first application, the processor can execute the second application to provide a search result using a text corresponding to the recognition result of the voice as a search word.

한편, 본 개시에 따른 디스플레이 장치는 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 서버와 통신하는 통신부를 더 포함하고, 상기 프로세서는, 상기 사용자가 발화한 음성에 대응하는 음성 신호와 상기 기 결정된 언어에 대한 정보를 상기 서버에 제공하도록 상기 통신부를 제어하고, 상기 서버로부터 수신된 음성 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.Meanwhile, the display device according to the present disclosure further includes a communication unit that communicates with a server that performs voice recognition for a plurality of different languages, and the processor controls the communication unit to provide a voice signal corresponding to a voice spoken by the user and information on the predetermined language to the server, and if the voice recognition result received from the server includes the displayed number, an operation related to a text object corresponding to the displayed number can be performed.

이 경우, 상기 프로세서는, 상기 서버로부터 수신된 음성 인식 결과가 상기 복수의 텍스트 객체 중 어느 하나에 대응하는 텍스트를 포함하면 해당 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, if the voice recognition result received from the server includes text corresponding to any one of the plurality of text objects, the processor can perform an operation related to the corresponding text object.

한편, 본 개시의 일 실시 예에 따른 디스플레이 장치의 제어방법은, 복수의 텍스트 객체를 표시하는 단계, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시하는 단계 및 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 단계를 포함한다.Meanwhile, a control method of a display device according to an embodiment of the present disclosure includes a step of displaying a plurality of text objects, a step of displaying a preset number together with a text object that is different from a preset language among the plurality of text objects, and a step of performing an operation related to a text object corresponding to the displayed number if the recognition result of a voice spoken by a user includes the displayed number.

이 경우, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 복수의 텍스트 객체에 가장 많이 사용된 언어를 상기 기결정된 언어로 설정하는 단계를 더 포함할 수 있다.In this case, the control method of the display device according to the present disclosure may further include a step of setting the most frequently used language in the plurality of text objects to the predetermined language.

한편, 상기 복수의 텍스트 객체는 웹 페이지에 포함된 것이며, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 웹 페이지의 언어 정보에 대응되는 언어를 상기 기결정된 언어로 설정하는 단계를 더 포함할 수 있다.Meanwhile, the plurality of text objects are included in a web page, and the control method of the display device according to the present disclosure may further include a step of setting a language corresponding to language information of the web page to the predetermined language.

한편, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 복수의 텍스트 객체 중 2 이상의 언어로 구성된 텍스트 객체에 대해선, 상기 기결정된 언어의 포함 비율이 기설정된 비율 미만인 경우 상기 기결정된 언어와 상이한 텍스트 객체인 것으로 판단하는 단계를 더 포함할 수 있다.Meanwhile, the control method of the display device according to the present disclosure may further include a step of determining that a text object composed of two or more languages among the plurality of text objects is a text object different from the predetermined language if the inclusion ratio of the predetermined language is less than a preset ratio.

한편, 상기 기설정된 숫자를 함께 표시하는 단계는, 상기 기 설정된 숫자를 상기 기 설정된 숫자에 대응되는 텍스트 객체에 인접하여 표시할 수 있다.Meanwhile, the step of displaying the preset number together may display the preset number adjacent to a text object corresponding to the preset number.

한편, 상기 기설정된 숫자를 함께 표시하는 단계는, 외부 장치로부터 상기 외부 장치의 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 상기 기설정된 숫자를 표시할 수 있다.Meanwhile, the step of displaying the preset number together can display the preset number while a signal corresponding to selection of a specific button of the external device is received from the external device.

한편, 상기 텍스트 객체와 관련된 동작을 수행하는 단계는, 상기 텍스트 객체에 대응하는 URL 주소의 웹 페이지를 표시하거나, 상기 텍스트 객체에 대응하는 애플리케이션 프로그램을 실행할 수 있다.Meanwhile, the step of performing an action related to the text object may display a web page of a URL address corresponding to the text object or execute an application program corresponding to the text object.

한편, 상기 복수의 텍스트 객체는 제1 애플리케이션의 실행 화면에 포함된 것이며, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행하는 단계를 더 포함할 수 있다.Meanwhile, the plurality of text objects are included in the execution screen of the first application, and the control method of the display device according to the present disclosure may further include a step of executing a second application different from the first application and performing an operation corresponding to the recognition result of the voice when it is determined that an object corresponding to the recognition result of a voice spoken by the user while the execution screen of the first application is displayed is not present in the execution screen of the first application.

한편, 본 개시에 따른 디스플레이 장치의 제어방법은 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 서버에 상기 사용자가 발화한 음성에 대응하는 음성 신호와 상기 기 결정된 언어에 대한 정보를 제공하는 단계를 더 포함하며, 상기 텍스트 객체와 관련된 동작을 수행하는 단계는 상기 서버로부터 수신된 음성 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.Meanwhile, the control method of the display device according to the present disclosure further includes a step of providing a voice signal corresponding to the voice spoken by the user and information on the predetermined language to a server that performs voice recognition for a plurality of different languages, and the step of performing an operation related to the text object may perform an operation related to a text object corresponding to the displayed number if the voice recognition result received from the server includes the displayed number.

한편, 본 개시의 일 실시 예에 따른 디스플레이 장치의 제어방법을 실행하기 위한 프로그램이 저장된 컴퓨터 판독 가능 기록매체에 있어서, 상기 디스플레이 장치의 제어방법은, 복수의 텍스트 객체를 표시하도록 상기 디스플레이 장치를 제어하는 단계, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시하도록 상기 디스플레이 장치를 제어하는 단계, 및 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 단계를 포함한다.Meanwhile, in a computer-readable recording medium storing a program for executing a method for controlling a display device according to an embodiment of the present disclosure, the method for controlling the display device includes the steps of controlling the display device to display a plurality of text objects, the step of controlling the display device to display a preset number together with respect to a text object that is different from a predetermined language among the plurality of text objects, and the step of performing an operation related to a text object corresponding to the displayed number when the recognition result of a voice spoken by a user includes the displayed number.

도 1 내지 도 2는 본 개시의 다양한 실시 예에 따른 디스플레이장치에서의 음성 명령 입력 방법을 설명하기 위한 도면,
도 3은 본 개시의 일 실시 예에 따른 음성인식시스템을 설명하기 위한 도면,
도 4는 본 개시의 일 실시 예에 따른 디스플레이장치의 구성을 설명하기 위한 블럭도,
도 5 내지 도 7은 본 개시의 다양한 실시 예에 따른 객체 선택을 위한 숫자 표시 방식을 설명하기 위한 도면,
도 8 내지 도 9는 본 개시의 다양한 실시 예에 따른 음성 검색 방법을 설명하기 위한 도면,
도 10은 본 개시의 또 다른 실시 예에 따른 디스플레이장치의 구성을 설명하기 위한 블럭도, 그리고
도 11은 본 개시의 일 실시 예에 따른 디스플레이장치의 제어방법을 설명하기 위한 흐름도이다.Figures 1 and 2 are drawings for explaining a voice command input method in a display device according to various embodiments of the present disclosure.
FIG. 3 is a drawing for explaining a voice recognition system according to one embodiment of the present disclosure;
FIG. 4 is a block diagram for explaining the configuration of a display device according to one embodiment of the present disclosure.
FIGS. 5 to 7 are drawings for explaining a numerical display method for object selection according to various embodiments of the present disclosure.
FIGS. 8 and 9 are drawings for explaining a voice search method according to various embodiments of the present disclosure.
FIG. 10 is a block diagram for explaining the configuration of a display device according to another embodiment of the present disclosure, and
FIG. 11 is a flowchart for explaining a method for controlling a display device according to an embodiment of the present disclosure.

본 개시에 대하여 구체적으로 설명하기에 앞서, 본 명세서 및 도면의 기재 방법에 대하여 설명한다. Before describing the present disclosure in detail, the description method of the specification and drawings will be described.

먼저, 본 명세서 및 청구범위에서 사용되는 용어는 본 개시의 다양한 실시 예들에서의 기능을 고려하여 일반적인 용어들을 선택하였다 하지만, 이러한 용어들은 당해 기술 분야에 종사하는 기술자의 의도나 법률적 또는 기술적 해석 및 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 일부 용어는 출원인이 임의로 선정한 용어도 있다. 이러한 용어에 대해서는 본 명세서에서 정의된 의미로 해석될 수 있으며, 구체적인 용어 정의가 없으면 본 명세서의 전반적인 내용 및 당해 기술 분야의 통상적인 기술 상식을 토대로 해석될 수도 있다. First, the terms used in this specification and claims are general terms selected in consideration of functions in various embodiments of the present disclosure. However, these terms may vary depending on the intention of a person skilled in the art, legal or technical interpretation, emergence of new technologies, etc. In addition, some terms are terms arbitrarily selected by the applicant. These terms may be interpreted as defined in this specification, and if there is no specific definition of a term, it may be interpreted based on the overall content of this specification and common technical knowledge in the art.

또한, 본 명세서에 첨부된 각 도면에 기재된 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 부품 또는 구성요소를 나타낸다. 설명 및 이해의 편의를 위해서 서로 다른 실시 예들에서도 동일한 참조번호 또는 부호를 사용하여 설명한다. 즉, 복수의 도면에서 동일한 참조 번호를 가지는 구성요소를 모두 도시되어 있다고 하더라도, 복수의 도면들이 하나의 실시 예를 의미하는 것은 아니다. In addition, the same reference numbers or symbols described in each drawing attached to this specification represent parts or components that perform substantially the same function. For the convenience of explanation and understanding, the same reference numbers or symbols are used in different embodiments to describe them. That is, even if components having the same reference numbers are all depicted in multiple drawings, the multiple drawings do not mean one embodiment.

또한, 본 명세서 및 청구범위에서는 구성요소들 간의 구별을 위하여 "제1", "제2" 등과 같이 서수를 포함하는 용어가 사용될 수 있다. 이러한 서수는 동일 또는 유사한 구성요소들을 서로 구별하기 위하여 사용하는 것이며 이러한 서수 사용으로 인하여 용어의 의미가 한정 해석되어서는 안 된다. 일 예로, 이러한 서수와 결합된 구성요소는 그 숫자에 의해 사용 순서나 배치 순서 등이 제한되어서는 안 된다. 필요에 따라서는, 각 서수들은 서로 교체되어 사용될 수도 있다. In addition, terms including ordinal numbers such as "first", "second", etc. may be used in this specification and claims to distinguish between components. These ordinals are used to distinguish between identical or similar components, and the meaning of the terms should not be interpreted in a limited manner due to the use of these ordinals. For example, components associated with these ordinals should not be restricted in terms of the order of use or arrangement, etc., by their numbers. If necessary, each ordinal number may be used interchangeably.

본 명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, it should be understood that terms such as "comprise" or "consist of" are intended to specify the presence of a feature, number, step, operation, component, part or combination thereof described in the specification, but do not exclude in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

본 개시의 실시 예에서 "모듈", "유닛", "부(part)" 등과 같은 용어는 적어도 하나의 기능이나 동작을 수행하는 구성요소를 지칭하기 위한 용어이며, 이러한 구성요소는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 "모듈", "유닛", "부(part)" 등은 각각이 개별적인 특정한 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In the embodiments of the present disclosure, terms such as "module," "unit," "part," etc. are terms used to refer to components that perform at least one function or operation, and these components may be implemented as hardware or software, or as a combination of hardware and software. In addition, a plurality of "modules," "units," "parts," etc. may be integrated into at least one module or chip and implemented as at least one processor, except in cases where each of them needs to be implemented as individual specific hardware.

또한, 본 개시의 실시 예에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적인 연결뿐 아니라, 다른 매체를 통한 간접적인 연결의 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 포함한다는 의미는, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, in the embodiments of the present disclosure, when it is said that a part is connected to another part, this includes not only a direct connection but also an indirect connection through another medium. In addition, when it is said that a part includes a certain component, it means that it can further include other components, rather than excluding other components, unless there is a specific description to the contrary.

이하, 첨부된 도면을 이용하여 본 개시에 대하여 구체적으로 설명한다. Hereinafter, the present disclosure will be described in detail using the attached drawings.

도 1은 음성 인식에 따라 제어되는 본 개시의 일 실시 예에 따른 디스플레이장치를 설명하기 위한 도면이다.FIG. 1 is a drawing for explaining a display device according to one embodiment of the present disclosure controlled by voice recognition.

도 1을 참고하면, 디스플레이장치(100)는 도 1에 도시된 바와 같이 TV 일 수 있으나, 이는 일 예에 불과할 뿐, 스마트폰, 데스크탑 PC, 노트북, 스마트 워치, 네비게이션, 냉장고 등 디스플레이 기능을 갖는 어떠한 장치로도 구현될 수 있다.Referring to FIG. 1, the display device (100) may be a TV as shown in FIG. 1, but this is only an example, and may be implemented as any device having a display function, such as a smartphone, desktop PC, laptop, smart watch, navigation, or refrigerator.

디스플레이장치(100)는 사용자가 발화한 음성의 인식 결과에 기초하여 동작을 수행할 수 있다. 예컨대, 사용자가 "7번 채널로 변경"이라고 말하면 7번 채널의 프로그램을 표시할 수 있고, 사용자가 "전원 꺼"라고 말하면 전원을 오프할 수 있다. 또한, 디스플레이장치(100)는 사용자와 대화하듯이 동작할 수도 있다. 예컨대, 사용자가 "현재 방송 중인 프로그램의 명칭이 뭐야?"라는 음성에 대한 답변으로 "문의하신 프로그램 제목은 ○○○ 입니다"라는 메시지를 음성 또는 텍스트로 출력할 수 있고, 사용자가 "오늘 날씨 어때"라고 말하면 "원하시는 지역을 말씀해 주세요"라는 메시지를 음성 또는 텍스트로 출력할 수 있고, 이에 대해 사용자가 "서울"이라고 말하면 "서울의 기온은 ○○ 입니다"라는 메시지를 음성 또는 텍스트로 출력할 수 있다.The display device (100) can perform an operation based on the recognition result of the voice spoken by the user. For example, if the user says "Change to channel 7," the program of channel 7 can be displayed, and if the user says "Turn off the power," the power can be turned off. In addition, the display device (100) can also operate as if conversing with the user. For example, if the user says "What is the name of the currently broadcasting program?", the display device can output the message "The title of the program you inquired about is ○○○" as a voice or text, and if the user says "How is the weather today?", the display device can output the message "Please tell me the region you want" as a voice or text, and if the user says "Seoul," the display device can output the message "The temperature in Seoul is ○○" as a voice or text.

도 1에 도시된 바와 같이 디스플레이장치(100)는 디스플레이장치(100)에 연결되거나 디스플레이장치(100)에 포함된 마이크를 통해 사용자 음성을 수신할 수 있다. 또는, 디스플레이장치(100)는 외부 장치의 마이크를 통해 입력된 음성에 대응하는 음성 신호를 상기 외부 장치로부터 수신할 수도 있다. 이에 대해선 도 2를 참고하여 설명하도록 한다.As illustrated in FIG. 1, the display device (100) can receive a user's voice through a microphone connected to the display device (100) or included in the display device (100). Alternatively, the display device (100) can also receive a voice signal corresponding to a voice input through a microphone of the external device from the external device. This will be described with reference to FIG. 2.

도 2는 본 개시의 일 실시 에에 따른 디스플레이시스템을 설명하기 위한 도면이다.FIG. 2 is a drawing for explaining a display system according to one embodiment of the present disclosure.

도 2를 참고하면, 디스플레이시스템은 디스플레이장치(100)와 외부장치(200)를 포함한다.Referring to FIG. 2, the display system includes a display device (100) and an external device (200).

디스플레이장치(100)는 도 1에서 설명한 것처럼 음성 인식 결과에 따라 동작하는 장치이다. The display device (100) is a device that operates according to the voice recognition results as described in Fig. 1.

도 2에선 외부장치(200)가 리모컨으로 구현된 예를 도시하였으나, 스마트폰, 테블릿 PC, 스마트 워치 등의 전자 장치로 구현되는 것도 가능하다.In Fig. 2, an example is shown in which an external device (200) is implemented as a remote control, but it may also be implemented as an electronic device such as a smartphone, tablet PC, or smart watch.

외부장치(200)는 마이크를 포함한 장치로서, 마이크를 통해 입력된 음성에 대응하는 음성 신호를 디스플레이장치(100)로 전송할 수 있다. 예컨대, 외부장치(200)는 적외선(IR), RF, 블루투스, 와이파이 등의 무선 통신 방식을 이용하여 음성 신호를 디스플레이장치(100)로 전송할 수 있다.The external device (200) is a device including a microphone, and can transmit a voice signal corresponding to a voice input through the microphone to the display device (100). For example, the external device (200) can transmit a voice signal to the display device (100) using a wireless communication method such as infrared (IR), RF, Bluetooth, or Wi-Fi.

외부장치(200)의 마이크는 전력 절감을 위해 기 설정된 이벤트가 있는 경우에만 활성화될 수 있다. 예컨대, 외부장치(200)의 마이크 버튼(210)을 누르고 있는 동안 마이크가 활성화되고, 마이크 버튼(210)이 릴리즈되면 마이크가 비활성화된다. 즉, 마이크 버튼(210)이 눌려진 동안에만 음성을 입력받을 수 있다.The microphone of the external device (200) may be activated only when there is a preset event to save power. For example, the microphone is activated while the microphone button (210) of the external device (200) is pressed, and the microphone is deactivated when the microphone button (210) is released. In other words, voice input can be received only while the microphone button (210) is pressed.

디스플레이장치(100)의 마이크 또는 외부장치(200)의 마이크를 통해 입력된 음성에 대한 음성 인식은 외부 서버를 통해 수행될 수 있다. 도 3은 이와 관련한 실시 예를 설명하기 위한 도면이다.Voice recognition for voice input through the microphone of the display device (100) or the microphone of the external device (200) can be performed through an external server. Fig. 3 is a drawing for explaining an embodiment related to this.

도 3을 참고하면, 음성 인식 시스템(2000)은 디스플레이장치(100) 및 서버(300)를 포함한다. Referring to FIG. 3, the voice recognition system (2000) includes a display device (100) and a server (300).

디스플레이장치(100)는 도 1에서 설명한 것처럼 음성 인식 결과에 따라 동작하는 장치이다. 디스플레이장치(100)는 앞서 설명한 것과 같이 디스플레이장치(100)의 마이크 또는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성 신호를 서버(300)로 전송할 수 있다. The display device (100) is a device that operates according to the voice recognition results as described in Fig. 1. As described above, the display device (100) can transmit a voice signal corresponding to a voice input through the microphone of the display device (100) or the microphone of the external device (200) to the server (300).

디스플레이장치(100)는 음성 신호와 함께, 상기 음성 신호를 어떤 언어를 기반으로 인식해야 하는지를 나타내는 정보(이하 '언어 정보')를 서버(300)로 전송할 수 있다. 동일한 음성 신호라도, 어떤 언어의 음성 인식 엔진을 이용하느냐에 따라 다른 음성 인식 결과가 나올 수 있다. The display device (100) can transmit information (hereinafter, “language information”) indicating the language in which the voice signal should be recognized, along with the voice signal, to the server (300). Even for the same voice signal, different voice recognition results may be obtained depending on which language voice recognition engine is used.

서버(300)는 복수의 서로 다른 언어에 대한 음성 인식을 수행할 수 있다. 서버(300)는 여러 언어 각각에 대응하는 여러 음성 인식 엔진을 포함할 수 있다. 예를 들어 서버(300)는 한국어 음성 인식 엔진, 영어 음성 인식 엔진, 일본어 음성 인식 엔진 등을 포함할 수 있다. 서버(300)는 디스플레이장치(100)로부터 음성 신호와 언어 정보가 수신되면, 음성 신호에 대해 언어 정보에 대응하는 음성 인식 엔진을 이용해서 음성 인식을 수행할 수 있다.The server (300) can perform speech recognition for a plurality of different languages. The server (300) can include a plurality of speech recognition engines corresponding to each of the multiple languages. For example, the server (300) can include a Korean speech recognition engine, an English speech recognition engine, a Japanese speech recognition engine, etc. When the server (300) receives a speech signal and language information from the display device (100), the server (300) can perform speech recognition using a speech recognition engine corresponding to the language information for the speech signal.

그리고 서버(300)는 음성 인식의 결과를 디스플레이장치(100)로 전송하고, 디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과에 대응하는 동작을 수행할 수 있다.And the server (300) transmits the result of voice recognition to the display device (100), and the display device (100) can perform an operation corresponding to the result of voice recognition received from the server (300).

예를 들어, 디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과에 포함된 텍스트가 디스플레이장치(100)에 표시된 텍스트 객체와 일치되면, 해당 텍스트 객체와 관련한 동작을 수행할 수 있다. 예를 들어, 디스플레이장치(100)는 웹 페이지 내에서 음성인식결과에 포함된 텍스트와 일치되는 텍스트 객체가 있으면, 해당 텍스트 객체에 대응되는 URL 주소의 웹 페이지를 표시할 수 있다. 다만, 이는 일 예에 불과할 뿐, 디스플레이장치(100)의 다양한 애플리케이션이 제공하는 UI 객체가 음성인식에 의해 선택되어 해당 동작이 수행될 수 있다.For example, if the text included in the voice recognition result received from the server (300) matches a text object displayed on the display device (100), the display device (100) can perform an operation related to the corresponding text object. For example, if there is a text object matching the text included in the voice recognition result on a web page, the display device (100) can display a web page of a URL address corresponding to the corresponding text object. However, this is only an example, and UI objects provided by various applications of the display device (100) can be selected by voice recognition and the corresponding operation can be performed.

한편, 도 3에선 서버(300)가 하나인 것으로 도시하였으나, 복수의 언어에 각각에 대응되는 복수의 서버가 존재할 수 있다. 예컨대, 한국어 음성 인식을 담당하는 서버와 영어 음성 인식을 담당하는 서버가 별도로 존재할 수 있다. Meanwhile, in Fig. 3, it is illustrated that there is only one server (300), but there may be multiple servers corresponding to each of multiple languages. For example, there may be separate servers responsible for Korean speech recognition and English speech recognition.

한편, 상술한 예에선 음성 인식이 디스플레이장치(100)와는 별도의 장치인 서버(300)에서 이루어지는 것으로 설명하였으나, 또 다른 예에 따르면, 디스플레이장치(100)가 서버(300)의 기능을 수행하는 것도 가능하다. 즉, 상술한 디스플레이장치(100)와 서버(300)가 하나의 제품으로 구현되는 것도 가능하다.Meanwhile, in the above-described example, voice recognition is described as being performed in a server (300) that is a separate device from the display device (100), but according to another example, the display device (100) can also perform the function of the server (300). In other words, the above-described display device (100) and server (300) can also be implemented as a single product.

도 4는 본 개시의 일 실시 예에 따른 디스플레이장치(100)의 구성을 설명하기 위한 블럭도이다.FIG. 4 is a block diagram for explaining the configuration of a display device (100) according to one embodiment of the present disclosure.

디스플레이장치(100)는 디스플레이(110)와 프로세서(120)를 포함한다.The display device (100) includes a display (110) and a processor (120).

디스플레이(110)는 예컨대, LCD(Liquid Crystal Display)로 구현될 수 있으며, 경우에 따라 CRT(cathode-ray tube), PDP(plasma display panel), OLED(organic light emitting diodes), TOLED(transparent OLED) 등으로 구현될 수 있다. 또한, 디스플레이(110)는 사용자의 터치 조작을 감지할 수 있는 터치스크린 형태로 구현될 수도 있다.The display (110) may be implemented as, for example, an LCD (Liquid Crystal Display), and in some cases, may be implemented as a CRT (cathode-ray tube), a PDP (plasma display panel), an OLED (organic light emitting diodes), a TOLED (transparent OLED), etc. In addition, the display (110) may be implemented as a touch screen capable of detecting a user's touch operation.

프로세서(120)는 디스플레이장치(100)의 전반적인 동작을 제어하기 위한 구성이다. The processor (120) is configured to control the overall operation of the display device (100).

예를 들어, 프로세서(120)는 CPU, 램(RAM), 롬(ROM), 시스템 버스를 포함할 수 있다. 여기서, 롬은 시스템 부팅을 위한 명령어 세트가 저장되는 구성이고, CPU는 롬에 저장된 명령어에 따라 디스플레이장치(100)의 저장부에 저장된 운영체제를 램에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, CPU는 저장부에 저장된 각종 애플리케이션을 램에 복사하고, 실행시켜 각종 동작을 수행할 수 있다. 이상에서는 프로세서(120)가 하나의 CPU만을 포함하는 것으로 설명하였지만, 구현시에는 복수의 CPU(또는 DSP, SoC 등)으로 구현될 수 있다.For example, the processor (120) may include a CPU, a RAM, a ROM, and a system bus. Here, the ROM is a configuration in which a set of commands for system booting is stored, and the CPU copies the operating system stored in the storage of the display device (100) to the RAM according to the commands stored in the ROM, and executes the O/S to boot the system. When booting is complete, the CPU may copy various applications stored in the storage to the RAM and execute them to perform various operations. In the above description, the processor (120) includes only one CPU, but it may be implemented with multiple CPUs (or DSPs, SoCs, etc.) when implemented.

프로세서(120)는 디스플레이(110)에 표시된 객체를 선택하기 위한 사용자 명령이 입력되면, 사용자 명령에 의해 선택된 객체와 연관된 동작을 수행할 수 있다. 여기서 객체는 선택이 가능한 어떠한 객체라도 될 수 있으며, 예를 들어, 하이퍼링크 또는 아이콘 등일 수 있다. 선택된 객체와 연관된 동작이란 예컨대 하이퍼링크에 연결된 페이지, 문서, 영상 등을 표시하는 동작, 아이콘에 대응하는 프로그램을 실행하는 동작 등일 수 있다.When a user command for selecting an object displayed on the display (110) is input, the processor (120) can perform an operation associated with the object selected by the user command. Here, the object can be any object that can be selected, for example, a hyperlink or an icon. The operation associated with the selected object can be, for example, an operation for displaying a page, document, image, etc. connected to the hyperlink, an operation for executing a program corresponding to an icon, etc.

객체를 선택하기 위한 사용자 명령은 예컨대, 디스플레이장치(100)와 연결된 다양한 입력 장치(ex. 마우스, 키보드, 터치패드 등)를 통해 입력되는 명령이거나, 사용자가 발화한 음성에 대응하는 음성 명령일 수 있다. A user command for selecting an object may be, for example, a command input through various input devices (e.g., mouse, keyboard, touchpad, etc.) connected to the display device (100), or a voice command corresponding to a voice spoken by the user.

도 4에 도시하진 않았지만 디스플레이장치(100)는 음성을 입력받기 위한 음성 수신부를 더 포함할 수 있다. 음성 수신부는 마이크를 포함하여 사용자가 발화한 음성을 직접 입력받아 음성 신호를 생성할 수 있고, 또는 외부 장치(200)로부터 전기적인 음성 신호를 수신할 수 있다. 후자의 경우 음성 수신부는 외부 장치(200)와 유선 또는 무선 통신을 수행하기 위한 통신부로 구현될 수 있다. 이와 같은 음성 수신부는 경우에 따라 디스플레이장치(100)에 포함되지 않을 수 있다. 예를 들어, 외부 장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성 신호가 디스플레이장치(100)가 아닌 다른 장치를 거쳐 서버(300)로 전달되거나 혹은 외부 장치(200)로부터 직접적으로 서버(300)로 전달될 수 있고, 디스플레이장치(100)는 서버(300)로부터 음성 인식 결과만을 수신하는 형태로 구현될 수 있다.Although not shown in FIG. 4, the display device (100) may further include a voice receiving unit for receiving voice input. The voice receiving unit may include a microphone to directly receive a voice spoken by a user and generate a voice signal, or may receive an electric voice signal from an external device (200). In the latter case, the voice receiving unit may be implemented as a communication unit for performing wired or wireless communication with the external device (200). Such a voice receiving unit may not be included in the display device (100) depending on the case. For example, a voice signal corresponding to a voice input through a microphone of the external device (200) may be transmitted to the server (300) via a device other than the display device (100) or may be directly transmitted from the external device (200) to the server (300), and the display device (100) may be implemented in a form that receives only a voice recognition result from the server (300).

프로세서(120)는 디스플레이(110)에 표시된 텍스트 객체들 중, 기결정된 언어와 상이한 텍스트 객체에 대해선 숫자를 함께 표시하도록 디스플레이(110)를 제어할 수 있다. The processor (120) can control the display (110) to display numbers together with text objects that are different from a predetermined language among the text objects displayed on the display (110).

여기서 기결정된 언어란, 음성 인식의 기초가 되는 언어(음성인식에 이용할 음성인식엔진의 언어)를 의미하는 것으로서, 사용자가 수동으로 설정할 수 있고, 자동으로 설정되는 것도 가능하다. 수동으로 언어를 설정하는 경우에 대해 설명하자면, 예컨대, 디스플레이장치(100)가 제공하는 설정메뉴에서 사용언어(또는 시스템 언어)로서 설정된 언어를 음성 인식의 기초가 되는 언어로 설정할 수 있다. Here, the predetermined language means the language that is the basis of voice recognition (the language of the voice recognition engine to be used for voice recognition), and can be set manually by the user, or can be set automatically. In the case of manually setting the language, for example, the language set as the language used (or system language) in the setting menu provided by the display device (100) can be set as the language that is the basis of voice recognition.

자동으로 음성 인식의 기초가 되는 언어를 설정하는 일 실시 예에 따르면, 프로세서(120)는 디스플레이(110)에 현재 표시된 텍스트 객체에 가장 많이 사용된 언어를 식별하여 해당 언어를 음성 인식의 기초가 되는 언어로 자동 설정할 수 있다.In one embodiment of automatically setting a language as the basis for speech recognition, the processor (120) can identify the language most frequently used in text objects currently displayed on the display (110) and automatically set that language as the language as the basis for speech recognition.

구체적으로, 프로세서(120)는 현재 디스플레이(110)에 표시된 복수의 텍스트 객체 각각에 포함된 문자의 종류(예컨대, 한글 또는 알파벳)를 분석해서 복수의 텍스트 객체에 전반적으로 가장 많이 사용된 문자의 종류에 해당하는 언어를 음성 인식의 기초가 되는 언어로 설정할 수 있다.Specifically, the processor (120) can analyze the type of characters (e.g., Korean or alphabet) included in each of the plurality of text objects currently displayed on the display (110) and set the language corresponding to the type of characters most commonly used in the plurality of text objects as the language that serves as the basis for speech recognition.

또 다른 실시 예에 따르면, 프로세서(120)는 디스플레이(110)에 현재 표시된 객체들이 웹 페이지의 객체들이면, 해당 웹 페이지의 언어 정보에 대응되는 언어를 음성 인식의 기초가 되는 언어로서 설정할 수 있다. 웹 페이지의 언어 정보는 예컨대, HTML의 lang 속성에서 확인할 수 있다(예컨대, <html lang="en">).According to another embodiment, if the objects currently displayed on the display (110) are objects of a web page, the processor (120) may set a language corresponding to the language information of the web page as the language that serves as the basis for speech recognition. The language information of the web page can be checked, for example, in the lang attribute of HTML (e.g., <html lang="en">).

음성 인식의 기초가 되는 언어가 설정되었으면, 프로세서(120)는 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체에 대해서는 임의의 숫자가 함께 표시되도록 디스플레이(110)를 제어할 수 있다. 사용자는 디스플레이(110)에 표시된 임의의 숫자를 말함으로써 텍스트 객체를 선택할 수 있다. 또한, 이미지 객체 또한 음성으로 선택할 수 없기 때문에, 프로세서(120)는 이미지 객체에 대해서도 임의의 숫자가 함께 표시되도록 디스플레이(110)를 제어할 수 있다.Once the language that is the basis of voice recognition is set, the processor (120) can control the display (110) to display an arbitrary number together with a text object that is different from the language that is the basis of voice recognition. The user can select a text object by saying an arbitrary number displayed on the display (110). In addition, since an image object cannot be selected by voice, the processor (120) can control the display (110) to display an arbitrary number together with an image object as well.

프로세서(120)는 음성 인식의 기초가 되는 언어가 아닌 다른 언어로만 구성된 텍스트 객체에 대해선 음성 인식에 사용될 언어와 상이한 텍스트 객체라고 판단할 수 있다. 또한, 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체에 대해선, 음성 인식의 기초가 되는 언어의 포함 비율이 기설정된 비율 미만인 경우에 음성인식에 사용될 언어와 상이한 텍스트 객체라고 판단할 수 있다. 이에 대해서 도 5를 참고하여 좀 더 구체적으로 설명하도록 한다.The processor (120) can determine that a text object composed of only a language other than the language that is the basis of speech recognition is a text object different from the language to be used for speech recognition. In addition, the processor (120) can determine that a text object composed of two or more languages is a text object different from the language to be used for speech recognition if the inclusion ratio of the language that is the basis of speech recognition is less than a preset ratio. This will be described in more detail with reference to FIG. 5.

도 5는 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.Figure 5 illustrates a specific screen displayed on the display (110).

도 5를 참고하면, 복수의 텍스트 객체(51 ~ 59)를 포함하는 UI 스크린이 디스플레이(110)에 표시되어 있다. 음성 인식의 기초가 되는 언어가 영어로 설정되었다고 가정하도록 한다. 프로세서(120)는 영어가 아닌 다른 언어로 구성된 텍스트 객체들(51 ~ 56)에 대해선 임의의 숫자(①~⑥)가 함께 표시되도록 디스플레이(110)를 제어할 수 있다. 숫자들(①~⑥)은 대응하는 텍스트 객체들(51 ~ 56)에 인접한 위치에 표시될 수 있다. 그리고 영어로 구성된 텍스트 객체들(51, 58)에 대해선, 주변에 특정 아이콘(57a, 58a)이 함께 표시됨으로써, 텍스트 객체들(51, 58)에 포함된 텍스트를 발화함으로써 텍스트 객체들(51, 58)을 선택할 수 있음을 사용자에게 알릴 수 있다. 아이콘(57a, 58a)은 도 5에 도시한 것과 같이 "T"로 표현될 수 있으나, 이에 한정되는 것은 아니고, 예컨대 "Text" 등과 같이 다양한 형태로 표현될 수 있다.Referring to FIG. 5, a UI screen including a plurality of text objects (51 to 59) is displayed on a display (110). It is assumed that a language that is the basis of voice recognition is set to English. The processor (120) can control the display (110) so that arbitrary numbers (① to ⑥) are displayed together with text objects (51 to 56) composed of a language other than English. The numbers (① to ⑥) can be displayed at positions adjacent to the corresponding text objects (51 to 56). In addition, for text objects (51, 58) composed of English, specific icons (57a, 58a) are displayed together around them, thereby informing the user that the text objects (51, 58) can be selected by speaking the text included in the text objects (51, 58). The icon (57a, 58a) may be expressed as “T” as shown in Fig. 5, but is not limited thereto and may be expressed in various forms, such as “Text”.

2 이상의 언어로 구성된 텍스트 객체(59)에 대해선, 프로세서(120)는 영어의 포함 비율이 기설정된 비율(예컨대 50%) 미만인지 확인하여, 미만인 경우에 숫자를 함께 표시하도록 디스플레이(110)를 제어할 수 있다. 도 5에 도시한 텍스트 객체(59)는 한국어와 영어로 구성되어 있는데, 영어의 포함 비율이 기설정된 비율(예컨대 50%)을 넘으므로 숫자가 함께 표시되지 않는다. 대신 텍스트 객체에 포함된 텍스트를 발화함으로써 텍스트 객체가 선택이 가능함을 알리는 아이콘(59a)이 텍스트 객체(59)에 인접하여 표시될 수 있다.For a text object (59) composed of two or more languages, the processor (120) can check whether the inclusion ratio of English is less than a preset ratio (e.g., 50%) and, if so, control the display (110) to display numbers together. The text object (59) illustrated in FIG. 5 is composed of Korean and English, and since the inclusion ratio of English exceeds the preset ratio (e.g., 50%), numbers are not displayed together. Instead, an icon (59a) that notifies that the text object can be selected by speaking the text included in the text object can be displayed adjacent to the text object (59).

한편, 도 5에선 숫자가 "①"와 같은 형상인 것으로 도시되었으나, 숫자의 형상엔 제한이 없다. 예컨대 원형이 아닌 사각형 안에 "1"이 포함된 형태일 수도 있고, 단순히 "1"이라고만 표시될 수도 있다. 본 개시의 또 다른 실시 예에 따르면, 음성 인식의 기초가 되는 언어의 단어로 표시될 수 있는데, 음성 인식의 기초가 되는 언어가 영어라면 "one"이라고 표시될 수 있고, 음성 인식의 기초가 되는 언어가 스페인어라면 "uno"라고 표시될 수 있다. Meanwhile, in FIG. 5, the number is illustrated as having a shape like "①", but there is no limitation on the shape of the number. For example, it may be in a shape including "1" in a square instead of a circle, or it may simply be displayed as "1". According to another embodiment of the present disclosure, it may be displayed as a word of a language that is the basis of speech recognition. If the language that is the basis of speech recognition is English, it may be displayed as "one", and if the language that is the basis of speech recognition is Spanish, it may be displayed as "uno".

한편, 도 5에선 도시하지 않았으나, 숫자의 표시와 함께 "말하신 숫자에 대응하는 객체를 선택하실 수 있습니다"와 같이 숫자를 말할 것을 유도하는 문구가 추가적으로 디스플레이(110)에 표시될 수도 있다.Meanwhile, although not shown in FIG. 5, a phrase that encourages the user to say a number, such as “You can select an object corresponding to the number you said,” may additionally be displayed on the display (110) along with the number.

본 개시의 또 다른 실시 예에 따르면 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체에 대해선, 맨 앞의 단어의 언어가 음성인식에 사용될 언어와 다르면, 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체라고 판단할 수 있다. 본 실시 예에 관해선 도 6을 참고하여 설명하도록 한다.According to another embodiment of the present disclosure, the processor (120) may determine that a text object composed of two or more languages is a text object whose language is different from the language that is the basis of speech recognition if the language of the first word is different from the language to be used for speech recognition. The present embodiment will be described with reference to FIG. 6.

도 6은 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.Figure 6 illustrates a specific screen displayed on the display (110).

도 6을 참고하면, 복수의 텍스트 객체(61 ~ 63)를 포함하는 UI 스크린이 디스플레이(110)에 표시되어 있다. 음성인식에 사용될 언어가 한국어로 설정되었다고 가정하도록 한다. 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체(61)에 대해선, 맨 앞의 단어 "AAA"의 언어가 음성 인식의 기초가 되는 언어인 한국어가 아닌 영어이므로 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체라고 판단할 수 있다. 따라서, 프로세서(120)는 숫자(①)가 텍스트 객체(61)와 함께 표시되도록 디스플레이(110)를 제어할 수 있다.Referring to FIG. 6, a UI screen including a plurality of text objects (61 to 63) is displayed on the display (110). Let us assume that the language to be used for voice recognition is set to Korean. The processor (120) can determine that the text object (61) composed of two or more languages is a text object different from the language that is the basis of voice recognition because the language of the first word "AAA" is English, not Korean, which is the language that is the basis of voice recognition. Accordingly, the processor (120) can control the display (110) so that a number (①) is displayed together with the text object (61).

도 6을 참고하여 설명한 실시 예에 따르면, 2 이상의 언어로 구성된 텍스트 객체에 음성 인식의 기초가 되는 언어가 기 설정된 비율 이상으로 포함되어 있더라도 맨 앞 단어가 음성 인식의 기초가 되는 언어와 다르면 숫자를 표시한다. 반대로, 2 이상의 언어로 구성된 텍스트 객체에 음성 인식의 기초가 되는 언어가 기 설정된 비율 미만으로 포함되어 있더라도 맨 앞 단어가 음성 인식의 기초가 되는 언어와 같으면 숫자를 표시하지 않는다. 이는, 사용자가 텍스트 객체를 선택하기 위해 텍스트 객체의 가장 맨앞에 존재하는 단어를 말할 가능성이 높기 때문이다.According to an embodiment described with reference to FIG. 6, even if a text object composed of two or more languages includes a language that is the basis of speech recognition in a preset ratio or more, if the first word is different from the language that is the basis of speech recognition, a number is displayed. Conversely, even if a text object composed of two or more languages includes a language that is the basis of speech recognition in a ratio less than the preset ratio, if the first word is the same as the language that is the basis of speech recognition, a number is not displayed. This is because a user is likely to say the word that exists at the very front of a text object in order to select the text object.

한편, 본 개시의 또 다른 실시 예에 따르면, 이미지 객체 또한 음성으로 선택할 수 없기 때문에, 이미지 객체에도 숫자가 표시될 수 있다. 본 실시 예에 대해선 이하 도 7을 참고하여 설명한다.Meanwhile, according to another embodiment of the present disclosure, since image objects cannot be selected by voice, numbers can also be displayed on image objects. This embodiment will be described below with reference to FIG. 7.

도 7은 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.Figure 7 illustrates a specific screen displayed on the display (110).

도 7을 참고하면, 제1 이미지 객체(71), 제2 이미지 객체(72), 제3 이미지 객체(74), 제1 텍스트 객체(73) 및 제2 텍스트 객체(75)가 디스플레이(110)에 표시되어 있다. 프로세서(120)는 제1 이미지 객체(71)와 함께 숫자(①)를 표시하도록 디스플레이(110)를 제어할 수 있다.Referring to FIG. 7, a first image object (71), a second image object (72), a third image object (74), a first text object (73), and a second text object (75) are displayed on a display (110). The processor (120) can control the display (110) to display a number (①) together with the first image object (71).

한편, 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 표시된 복수의 객체가 URL 링크를 가지는 객체인 경우, 프로세서(120)는 상기 복수의 객체의 URL 링크를 비교한 결과, 동일한 URL 링크를 가지는 객체들이 있는 경우에 있어서, 해당 객체들이 모두 음성 인식으로 선택이 가능하지 않은 객체라면 어느 하나의 객체에만 숫자를 표시하도록 디스플레이(110)를 제어할 수 있고, 이 객체들 중 어느 하나라도 음성 인식으로 선택이 가능한 객체이면 숫자를 표시하지 않도록 디스플레이(110)를 제어할 수 있다.Meanwhile, according to another embodiment of the present disclosure, if a plurality of objects displayed on the display (110) are objects having URL links, and if, as a result of comparing the URL links of the plurality of objects, there are objects having the same URL link, if none of the objects can be selected by voice recognition, the processor (120) can control the display (110) to display a number only for one of the objects, and if any one of the objects can be selected by voice recognition, the processor can control the display (110) to not display a number.

좀 더 구체적으로 설명하자면, 디스플레이(110)에 음성 인식으로 선택이 가능하지 않은 객체(즉, 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체, 또는 이미지 객체)가 복수 개 표시되어 있고, 이들이 동일한 URL 주소의 링크를 가질 경우엔, 어느 하나의 객체에만 숫자가 표시될 수 있다. 도 7을 참고하여 설명하자면, 제2 이미지 객체(72)는 음성으로 선택될 수 없는 객체이고, 텍스트 객체(73)는 음성 인식의 기초가 되는 언어인 한국어와는 다른 언어인 영어로 구성되어 있기 때문에 제2 이미지 객체(72)와 제1 텍스트 객체(73)는 모두 음성으로 선택될 수 없지만, 제2 이미지 객체(72)와 제1 텍스트 객체(73)는 선택되었을 때 동일한 URL 주소로 연결되기 때문에, 제2 이미지 객체(72)와 제1 텍스트 객체(73) 중 어느 하나인 제2 이미지 객체(72)에만 숫자(②)가 표시될 수 있다. 또는, 제2 이미지 객체(72) 대신에 텍스트 객체(73)에 숫자가 표시되는 것도 가능하다. 이는, 디스플레이(110)에 표시되는 숫자의 개수를 최소화하기 위함이다.To explain more specifically, if a plurality of objects (i.e., text objects or image objects that are different from the language that is the basis of voice recognition) that cannot be selected by voice recognition are displayed on the display (110), and if they have links with the same URL address, a number may be displayed on only one of the objects. Referring to FIG. 7, since the second image object (72) is an object that cannot be selected by voice, and the text object (73) is composed of English, a language different from Korean, the language that is the basis of voice recognition, neither the second image object (72) nor the first text object (73) can be selected by voice, but since the second image object (72) and the first text object (73) are linked to the same URL address when selected, a number (②) may be displayed only on the second image object (72), which is one of the second image object (72) and the first text object (73). Alternatively, a number may be displayed on the text object (73) instead of the second image object (72). This is to minimize the number of numbers displayed on the display (110).

디스플레이(110)에 표시되는 숫자의 개수를 최소화하기 위해, 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 동일한 URL 주소를 갖는 복수의 객체가 디스플레이(110)에 표시되어 있고, 이들 중 어느 하나라도 음성 인식의 기초가 되는 언어와 동일한 텍스트 객체이면 이들 모두에 대해 숫자를 표시하지 않는다. 도 7을 참고하여 설명하자면, 프로세서(120)는 제3 이미지 객체(74)의 URL 주소와 제2 텍스트 객체(75)의 ULR 주소를 비교하여 서로가 같은 것으로 판단되고, 제2 텍스트 객체(75)가 음성 인식의 기초가 되는 언어인 한국어와 동일한 텍스트 객체라고 판단되면, 제3 이미지 객체(74)에는 숫자를 표시하지 않도록 디스플레이(110)를 제어한다.In order to minimize the number of numbers displayed on the display (110), according to another embodiment of the present disclosure, if a plurality of objects having the same URL address are displayed on the display (110), and any one of them is a text object identical to the language that is the basis of voice recognition, no numbers are displayed for any of them. Referring to FIG. 7, the processor (120) compares the URL address of the third image object (74) with the ULR address of the second text object (75) and determines that they are the same, and if the second text object (75) is determined to be a text object identical to Korean, which is the language that is the basis of voice recognition, the processor (120) controls the display (110) not to display numbers on the third image object (74).

사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 특정 텍스트를 포함하면, 프로세서(120)는 해당 텍스트에 대응하는 텍스트 객체와 관련한 동작을 수행할 수 있다. 도 5를 참고하여 설명하자면, 사용자가 "Voice recognition"이라고 말하면, 프로세서(120)는 텍스트 객체(59)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다.If the recognition result of the voice spoken by the user includes a specific text displayed on the display (110), the processor (120) can perform an operation related to a text object corresponding to the text. Referring to FIG. 5, if the user says “Voice recognition,” the processor (120) can control the display (110) to display a page of a URL address corresponding to the text object (59).

한편, 본 개시의 일 실시 예에 따르면, 사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 텍스트 객체들 중 2 이상의 텍스트 객체에 공통으로 포함된 텍스트를 포함한 경우, 프로세서(120)는 해당 텍스트 객체들에 각각 숫자를 표시하고, 사용자가 표시된 숫자를 발화하면 숫자에 대응하는 텍스트 객체에 관련한 동작을 수행할 수 있다. 도 5를 참고하여 설명하자면, 사용자가 발화한 음성 인식의 결과에 "Speech recognition"이 포함된 경우, 프로세서(120)는 화면에 표시된 텍스트 객체들 중 "Speech recognition"이 포함되어 있는 텍스트 객체를 검색한다. 복수의 텍스트 객체(57, 58)가 검색된 경우, 프로세서(120)는 텍스트 객체들(57, 58) 옆에 임의의 숫자를 표시하도록 디스플레이(110)를 제어할 수 있다. 예컨대, 텍스트 객체(57) 옆에 숫자 ⑦이 표시되고, 텍스트 객체(58) 옆에 숫자 ⑧이 표시될 수 있고, 사용자는 숫자 "7"을 말함으로써 텍스트 객체(57)를 선택할 수 있게 된다. 사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 숫자를 포함하면, 프로세서(120)는 포함된 숫자에 대응되는 텍스트 객체 또는 이미지 객체와 관련된 동작을 수행할 수 있다. 도 6을 참고하여 설명하자면, 사용자가 "일"이라고 말하면 프로세서(120)는 텍스트 객체(61)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다. Meanwhile, according to one embodiment of the present disclosure, if the recognition result of the voice spoken by the user includes a text that is commonly included in two or more text objects among the text objects displayed on the display (110), the processor (120) may display a number in each of the corresponding text objects, and when the user speaks the displayed number, an operation related to the text object corresponding to the number may be performed. Referring to FIG. 5, if the recognition result of the voice spoken by the user includes “Speech recognition,” the processor (120) searches for a text object that includes “Speech recognition” among the text objects displayed on the screen. If a plurality of text objects (57, 58) are searched, the processor (120) may control the display (110) to display an arbitrary number next to the text objects (57, 58). For example, the number ⑦ may be displayed next to the text object (57), the number ⑧ may be displayed next to the text object (58), and the user may select the text object (57) by speaking the number “7.” If the recognition result of the voice spoken by the user includes a number displayed on the display (110), the processor (120) can perform an operation related to a text object or image object corresponding to the included number. Referring to FIG. 6, if the user says “one,” the processor (120) can control the display (110) to display a page of a URL address corresponding to the text object (61).

사용자가 발화한 음성은 디스플레이장치(100)의 마이크를 통해 입력되거나 외부장치(200)이 마이크를 통해 입력될 수 있다. 후자의 경우, 디스플레이장치(100)는 마이크를 포함한 외부장치(200)와 통신하기 위한 통신부를 포함할 수 있고, 통신부는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성신호를 수신할 수 있다. 프로세서(120)는 통신부를 통해 외부장치(200)로부터 수신된 음성신호에 대한 인식 결과가 디스플레이(110)에 표시된 숫자를 포함하면, 해당 숫자에 대응하는 텍스트 객체와 관련된 동작을 수행할 수 있다. 도 6을 참고하여 설명하자면, 사용자가 외부장치(200)의 마이크에 "일"이라고 말하면 외부장치(200)는 음성신호를 디스플레이장치(200)로 전송하고, 프로세서(120)는 수신한 음성신호에 대한 음성 인식 결과를 바탕으로 텍스트 객체(61)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다.The voice spoken by the user may be input through the microphone of the display device (100) or may be input through the microphone of the external device (200). In the latter case, the display device (100) may include a communication unit for communicating with the external device (200) including the microphone, and the communication unit may receive a voice signal corresponding to the voice input through the microphone of the external device (200). If the recognition result for the voice signal received from the external device (200) through the communication unit includes a number displayed on the display (110), the processor (120) may perform an operation related to a text object corresponding to the number. Referring to FIG. 6, when the user says “one” into the microphone of the external device (200), the external device (200) transmits a voice signal to the display device (200), and the processor (120) may control the display (110) to display a page of a URL address corresponding to the text object (61) based on the voice recognition result for the received voice signal.

한편, 텍스트 또는 이미지 객체에 대응하여 표시된 숫자는 일정 기간 동안만 표시될 수 있다. 일 실시 예에 따르면, 프로세서(120)는 외부장치(200)에서 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 숫자들을 표시하도록 디스플레이(110)를 제어할 수 있다. 즉, 외부장치(200)의 특정버튼을 사용자가 누르고 있는 동안에만 숫자가 표시될 수 있다. 여기서 특정 버튼은 예컨대, 도 2에서 설명한 외부장치(200)의 마이크 버튼(210)일 수 있다.Meanwhile, the numbers displayed in response to the text or image object may be displayed only for a certain period of time. According to one embodiment, the processor (120) may control the display (110) to display the numbers while a signal corresponding to the selection of a specific button is received from the external device (200). That is, the numbers may be displayed only while the user presses a specific button of the external device (200). Here, the specific button may be, for example, the microphone button (210) of the external device (200) described in FIG. 2.

또 다른 실시 예에 따르면, 프로세서(120)는 디스플레이장치(100)의 마이크를 통해 입력된 음성이 기 설정된 키워드(예컨대, "Hi TV")를 포함하면 숫자들을 표시하고, 디스플레이장치(100)의 마이크를 통해 음성이 미입력되는 상태로 기 설정된 시간이 경과하면 표시된 숫자들을 제거할 수 있다.According to another embodiment, the processor (120) may display numbers when a voice input through the microphone of the display device (100) includes a preset keyword (e.g., “Hi TV”), and remove the displayed numbers when a preset period of time has elapsed without a voice being input through the microphone of the display device (100).

한편, 상술한 실시 예들에선 숫자가 표시되는 것으로 설명하였으나, 반드시 숫자가 표시될 필요는 없고, 사용자가 보고 읽을 수 있는 단어(의미를 가진 단어 또는 의미가 없는 단어)라면 어떠한 것이든 가능하다. 예컨대, 1, 2, 3.. 대신에 a, b, c...가 표시되는 것도 가능하다. 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 표시된 웹 페이지에 검색창이 있는 경우, 사용자는 검색하고자 하는 단어와, 검색 기능을 실행시키는 특정 키워드를 발화함으로써 손쉽게 검색을 수행할 수 있다. 예컨대, 디스플레이(110)에 표시된 웹 페이지에 검색창이 있는 경우, "○○○ 검색" 또는 "검색 ○○○" 등과 같이 말하기만 하면 "○○○"에 대한 검색결과가 디스플레이(110)에 표시될 수 있다.Meanwhile, although the above-described embodiments have described that numbers are displayed, numbers do not necessarily have to be displayed, and any words (meaningful or meaningless words) that a user can see and read may be used. For example, instead of 1, 2, 3, a, b, c, etc. may be displayed. According to another embodiment of the present disclosure, if there is a search window on a web page displayed on the display (110), the user can easily perform a search by uttering a word he or she wants to search for and a specific keyword that executes the search function. For example, if there is a search window on a web page displayed on the display (110), the search result for "○○○" can be displayed on the display (110) by simply saying "Search ○○○" or "Search ○○○."

이를 위해, 프로세서(120)는 디스플레이(110)에 표시된 웹 페이지에서 검색어 입력창을 검출할 수 있다. 구체적으로, 프로세서(120)는 디스플레이(110)에 표시된 웹 페이지의 구성 객체들 중에서 입력이 가능한 객체를 검색할 수 있다. HTML 상의 입력 태그(Input tag)가 입력이 가능한 객체이다. 입력 태그(Input tag)는 다양한 속성들(attributes)을 가지는데, 그 중 타입 속성(type attributes)은 입력 성격을 명확히 규정한다. 타입이 "search"인 경우엔 그 객체는 명확히 검색어 입력창에 해당된다. To this end, the processor (120) can detect a search word input window on a web page displayed on the display (110). Specifically, the processor (120) can search for an object that can be input among the constituent objects of the web page displayed on the display (110). An input tag on HTML is an object that can be input. An input tag has various attributes, and among them, type attributes clearly define the input characteristics. If the type is "search", the object clearly corresponds to a search word input window.

다만, 타입이 "text"인 객체의 경우엔 검색어 입력창인지 여부를 바로 판단할 수 없다. 일반적인 입력 객체들도 텍스트 타입(text type)을 가지고 있기 때문에 해당 객체가 검색어 입력창인지 일반 입력 창인지 구분할 수 없기 때문이다. 따라서, 이 경우엔 검색어 입력창인지 여부를 판단하는 별도의 과정이 필요하다.However, in the case of objects whose type is "text", it is not possible to immediately determine whether it is a search input window. This is because general input objects also have a text type, so it is not possible to distinguish whether the object is a search input window or a general input window. Therefore, in this case, a separate process is required to determine whether it is a search input window.

타입이 "text"인 객체의 경우, 검색어 입력창인지 여부를 판단하기 위해, 해당 객체의 추가적인 속성(attributes)에 대한 정보를 참고하게 된다. title이나 aria-label 에 "검색" 키워드가 있는 경우 해당 객체를 검색어 입력창이라고 판단할 수 있다.For objects whose type is "text", information about additional attributes of the object is referenced to determine whether it is a search input box. If the title or aria-label contains the keyword "search", the object can be determined to be a search input box.

그리고 프로세서(120)는 사용자가 발화한 음성의 인식 결과에 특정 키워드가 포함되어 있는지 판단한다. 여기서 특정 키워드는 "검색", "찾아" 등일 수 있다. 특정 키워드가 포함되어 있는 것으로 판단되면, 프로세서(120)는 사용자의 의도를 보다 정확히 판단하기 위해 상기 특정 키워드의 위치를 확인하다. 상기 특정 키워드의 앞 또는 뒤에 적어도 하나의 단어가 존재하는 경우라면 사용자의 의도가 그 적어도 하나의 단어를 검색하고자 하는 의도일 가능성이 높다. 만약 음성 인식 결과에 오직 "검색" 또는 "찾아"와 같은 특정 키워드만 포함된 경우라면 사용자가 검색하고자 하는 의도가 아닐 확률이 높다.And the processor (120) determines whether a specific keyword is included in the recognition result of the voice spoken by the user. Here, the specific keyword may be "search", "find", etc. If it is determined that the specific keyword is included, the processor (120) checks the location of the specific keyword in order to more accurately determine the user's intention. If at least one word exists before or after the specific keyword, it is highly likely that the user's intention is to search for the at least one word. If the voice recognition result only includes a specific keyword such as "search" or "find", it is highly likely that the user's intention is not to search.

이와 같은 사용자의 의도 판단 과정은 디스플레이장치(100)에서 수행될 수 있고, 서버(300)에서 수행되어 그 결과를 디스플레이장치(100)에 제공하는 것도 가능하다.This process of judging the user's intention can be performed on the display device (100), and can also be performed on the server (300) and the result provided to the display device (100).

사용자의 검색 의도가 판단된 경우, 프로세서(120)는 상기 특정 키워드를 제외한 나머지 단어를 검색어로 선정하고, 선정된 검색어를 상술한 방식에 따라 검출된 검색어 입력창에 입력하여 검색을 수행한다. 예컨대, 도 8에 도시한 바와 같이 검색어 입력창(810)을 포함하는 웹 페이지가 디스플레이(110)에 표시되면, 프로세서(120)는 검색어 입력창(810)을 검출하고, 사용자가 "강아지 검색"이라고 음성을 발화하면, 프로세서(120)는 발화된 음성에 대한 음성 인식 결과에서 "강아지"를 검색어로 선정하여 상기 검출된 검색어 입력창(810)에 입력하여 검색을 수행한다.When the user's search intent is determined, the processor (120) selects the remaining words excluding the specific keyword as search words, and performs a search by entering the selected search words into the search word input window detected according to the above-described method. For example, as shown in FIG. 8, when a web page including a search word input window (810) is displayed on the display (110), the processor (120) detects the search word input window (810), and when the user speaks "puppy search," the processor (120) selects "puppy" as a search word from the voice recognition results for the spoken voice, enters the search word into the detected search word input window (810), and performs a search.

한편, 디스플레이(110)에 표시된 웹 페이지에서 검색어 입력창을 검출하는 동작은 음성 인식 결과에 특정 키워드가 포함되어 있음이 판단된 이후에 수행될 수 있고, 또는 그 이전에 미리 수행되는 것도 가능하다.Meanwhile, the operation of detecting a search word input window in a web page displayed on the display (110) may be performed after it is determined that a specific keyword is included in the voice recognition result, or may be performed in advance.

도 9는 검색어 입력 방식의 또 다른 예를 설명하기 위한 도면이다. 도 9는 한 웹 페이지 내에 검색어 입력창이 복수 개인 경우에 검색 수행방법을 설명하기 위한 것이다.Fig. 9 is a drawing for explaining another example of a search word input method. Fig. 9 is for explaining a search execution method when there are multiple search word input windows in one web page.

도 9를 참고하면, 한 웹 페이지 안에 검색창이 2개인 경우를 도시한 것이다. 제1 검색어 입력창(910)은 뉴스 검색을 위한 것이고, 제2 검색어 입력창(920)은 주식 검색을 위한 것이다. 프로세서(120)는 객체의 배치 위치에 관한 정보 및 현재 화면의 레이아웃에 관한 정보에 기초하여, 사용자가 검색어를 포함한 음성을 발화한 시점에 표시된 검색어 입력창으로 검색을 수행한다. 예컨대, 제1 검색어 입력창(910)이 디스플레이(110)에 표시된 상황에서 사용자가 검색어 및 특정 키워드를 포함한 음성을 발화하면 프로세서(120)는 제1 검색어 입력창(910)에 검색어를 입력하고, 아래 방향으로 스크롤이 수행되어서 제2 검색어 입력창(920)이 디스플레이(110)에 표시된 상황에서 사용자가 검색어 및 특정 키워드를 포함한 음성을 발화하면 프로세서(120)는 제2 검색어 입력창(920)에 검색어를 입력할 수 있다. 즉, 한 웹 페이지 안에 다수의 검색어 입력창이 있는 경우, 현재 화면에서 보여지는 검색어 입력창으로 검색이 수행될 수 있다.Referring to FIG. 9, a case where there are two search windows in a web page is illustrated. The first search word input window (910) is for news search, and the second search word input window (920) is for stock search. The processor (120) performs a search using the search word input window displayed at the time when the user utters a voice including a search word, based on information about the arrangement position of objects and information about the layout of the current screen. For example, when the first search word input window (910) is displayed on the display (110), if the user utters a voice including a search word and a specific keyword, the processor (120) inputs the search word into the first search word input window (910), and when the second search word input window (920) is displayed on the display (110), if the user utters a voice including the search word and a specific keyword, the processor (120) can input the search word into the second search word input window (920). That is, if there are multiple search input boxes in one web page, a search can be performed using the search input box currently displayed on the screen.

디스플레이(110)에 보여지는 화면에 기초하여 음성 제어가 이루어진다 즉, 기본적으로 디스플레이(110)에 표시 중인 화면에 해당하는 애플리케이션을 이용하여 음성 명령에 따른 기능이 수행된다. 그러나 입력된 음성 명령이 현재 표시된 화면에 포함된 객체와 매칭되지 않거나, 현재 화면을 표시하고 있는 애플리케이션이 갖는 기능과 다른 것일 경우, 다른 애플리케이션이 실행되어 해당 음성 명령에 따른 기능을 수행할 수 있다.Voice control is performed based on the screen shown on the display (110). That is, basically, a function according to a voice command is performed using an application corresponding to the screen being displayed on the display (110). However, if the input voice command does not match an object included in the currently displayed screen or is different from the function of the application currently displaying the screen, another application may be executed to perform the function according to the voice command.

예를 들어, 현재 실행 중인 애플리케이션이 웹 브라우징 애플리케이션이고, 사용자가 발화한 음성이 웹 브라우징 애플리케이션이 표시하고 있는 웹 페이지 내 객체와 매칭되지 않는 경우, 프로세서(120)는 기 설정된 다른 애플리케이션을 실행시켜 사용자가 발화한 음성에 대응하는 검색 기능을 수행할 수 있다. 여기서 기 설정된 다른 애플리케이션은 검색 기능을 제공하는 애플리케이션으로서 예컨대, 구글™의 검색엔진을 이용하여 음성에 대응하는 텍스트에 대한 검색결과를 제공하는 애플리케이션, 음성에 대응하는 텍스트에 대응하는 VOD 컨텐츠의 검색 결과를 제공하는 애플리케이션 등일 수 있다. 한편, 이와 같은 다른 애플리케이션이 실행되기 전에, 프로세서(120)는 "현재 화면에서 ○○○와 일치되는 결과가 없습니다. 인터넷에서 ○○○를 검색하시겠습니까?"와 같은 사용자의 동의를 받기 위한 UI를 표시할 수 있고, UI에서 사용자 동의가 입력되고 나서 인터넷 검색 애플리케이션 등을 실행하여 검색 결과를 제공할 수 있다.For example, if the currently running application is a web browsing application and the voice spoken by the user does not match an object within the web page displayed by the web browsing application, the processor (120) may execute another preset application to perform a search function corresponding to the voice spoken by the user. Here, the other preset application may be an application that provides a search function, for example, an application that provides search results for text corresponding to the voice using the search engine of Google™, an application that provides search results for VOD content corresponding to text corresponding to the voice, etc. Meanwhile, before such another application is executed, the processor (120) may display a UI for obtaining the user's consent, such as "There are no results matching ○○○ on the current screen. Would you like to search for ○○○ on the Internet?", and after the user's consent is entered in the UI, an Internet search application, etc. may be executed to provide the search results.

디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과를 처리하는 음성처리부와 디스플레이장치(100)에 설치된 애플리케이션을 실행하는 애플리케이션부를 포함할 수 있다. 음성처리부는 서버(300)로부터 수신된 음성 인식 결과를 애플리케이션부에 제공한다. 애플리케이션부의 제1 애플리케이션이 실행되어 제1 애플리케이션의 화면이 디스플레이(110)에 표시되어 있는 동안 상기 인식 결과를 제공받은 경우, 제1 애플리케이션은 음성처리부로부터 제공받은 음성 인식 결과를 기초로 앞서 설명한 동작을 수행할 수 있다. 예컨대, 음성 인식 결과에 포함된 숫자에 해당하는 텍스트 또는 이미지 객체 탐색, 음성 인식 결과에 포함된 단어에 해당하는 텍스트 객체 탐색, 음성 인식 결과에 "검색"이 포함된 경우 검색창에 키워드 입력 후 검색을 실행하는 등과 같은 동작을 수행할 수 있다. 만약 제1 애플리케이션이 음성처리부로부터 제공받은 음성 인식 결과를 이용하여 수행할 동작이 없는 경우, 즉 예컨대 음성 인식 결과에 해당하는 텍스트 또는 이미지 객체가 없거나, 검색창이 없는 경우, 제1 애플리케이션은 음성처리부에 이를 통지하고, 음성처리부는 음성 인식 결과와 관련한 동작을 수행할 수 있는 제2 애플리케이션을 실행하도록 애플리케이션부를 제어할 수 있다. 예컨대, 제2 애플리케이션은 특정 검색어에 대한 검색 결과를 제공하는 애플리케이션이다. 애플리케이션부는 제2 애플리케이션을 실행하여 음성인식결과에 포함된 텍스트를 검색어로 이용한 검색 결과를 제공할 수 있다.The display device (100) may include a voice processing unit that processes the voice recognition result received from the server (300) and an application unit that executes an application installed on the display device (100). The voice processing unit provides the voice recognition result received from the server (300) to the application unit. If the recognition result is provided while the first application of the application unit is executed and the screen of the first application is displayed on the display (110), the first application may perform the operation described above based on the voice recognition result provided from the voice processing unit. For example, operations such as searching for a text or image object corresponding to a number included in the voice recognition result, searching for a text object corresponding to a word included in the voice recognition result, and executing a search after entering a keyword in a search window when “search” is included in the voice recognition result may be performed. If the first application does not have an operation to perform using the voice recognition result provided from the voice processing unit, that is, for example, if there is no text or image object corresponding to the voice recognition result or there is no search window, the first application may notify the voice processing unit of this, and the voice processing unit may control the application unit to execute a second application that can perform an operation related to the voice recognition result. For example, the second application is an application that provides search results for a specific search term. The application unit can execute the second application to provide search results using text included in the voice recognition results as a search term.

도 10은 디스플레이장치(100)가 TV로 구현된 경우의 구성을 도시한 블럭도이다. 도 10을 설명함에 있어서 도 4에서 설명한 구성과 중복되는 구성에 대한 설명은 생략한다.Fig. 10 is a block diagram illustrating the configuration when the display device (100) is implemented as a TV. In explaining Fig. 10, description of the configuration that overlaps with the configuration described in Fig. 4 is omitted.

도 10을 참고하면, 디스플레이장치(100)는 예를 들어 아날로그 TV, 디지털 TV, 3D-TV, 스마트 TV, LED TV, OLED TV, 플라즈마 TV, 모니터, 고정 곡률(curvature)인 화면을 가지는 커브드(curved) TV, 고정 곡률인 화면을 가지는 플렉시블(flexible) TV, 고정 곡률인 화면을 가지는 벤디드(bended) TV, 및/또는 수신되는 사용자 입력에 의해 현재 화면의 곡률을 변경 가능한 곡률 가변형 TV 등으로 구현될 수 있으나, 이에 한정되지 않는다. Referring to FIG. 10, the display device (100) may be implemented as, for example, an analog TV, a digital TV, a 3D-TV, a smart TV, an LED TV, an OLED TV, a plasma TV, a monitor, a curved TV having a screen with a fixed curvature, a flexible TV having a screen with a fixed curvature, a bent TV having a screen with a fixed curvature, and/or a curvature-variable TV capable of changing the curvature of the current screen by a received user input, but is not limited thereto.

디스플레이 장치(100)는 디스플레이(110), 프로세서(120), 튜너(130), 통신부(140), 마이크(150), 입/출력부(160), 오디오 출력부(170), 저장부(180)를 포함한다. The display device (100) includes a display (110), a processor (120), a tuner (130), a communication unit (140), a microphone (150), an input/output unit (160), an audio output unit (170), and a storage unit (180).

튜너(130)는 유선 또는 무선으로 수신되는 방송 신호를 증폭(amplification), 혼합(mixing), 공진(resonance) 등을 통하여 많은 전파 성분 중에서 디스플레이 장치(100)에서 수신하고자 하는 채널의 주파수만을 튜닝(tuning)시켜 선택할 수 있다. 방송 신호는 비디오(video), 오디오(audio) 및 부가 데이터(예를 들어, EPG(Electronic Program Guide)를 포함할 수 있다.The tuner (130) can select only the frequency of the channel to be received by the display device (100) from among many radio wave components by amplifying, mixing, resonating, etc. the broadcast signal received wired or wirelessly. The broadcast signal can include video, audio, and additional data (e.g., EPG (Electronic Program Guide).

튜너(130)는 사용자 입력에 대응되는 채널 번호에 대응되는 주파수 대역에서 비디오, 오디오 및 데이터를 수신할 수 있다. The tuner (130) can receive video, audio and data in a frequency band corresponding to a channel number corresponding to a user input.

튜너(130)는 지상파 방송, 케이블 방송, 또는, 위성 방송 등과 같이 다양한 소스에서부터 방송 신호를 수신할 수 있다. 튜너(130)는 다양한 소스에서부터 아날로그 방송 또는 디지털 방송 등과 같은 소스로부터 방송 신호를 수신할 수도 있다. The tuner (130) can receive broadcast signals from various sources, such as terrestrial broadcasting, cable broadcasting, or satellite broadcasting. The tuner (130) can also receive broadcast signals from various sources, such as analog broadcasting or digital broadcasting.

튜너(130)는 디스플레이 장치(100)와 일체형(all-in-one)으로 구현되거나 또는 디스플레이 장치(100)와 전기적으로 연결되는 튜너 유닛을 가지는 별개의 장치(예를 들어, 셋톱박스(set-top box), 입/출력부(160)에 연결되는 튜너)로 구현될 수 있다.The tuner (130) may be implemented as an all-in-one with the display device (100) or as a separate device (e.g., a set-top box, a tuner connected to an input/output unit (160)) having a tuner unit electrically connected to the display device (100).

통신부(140)는 다양한 유형의 통신방식에 따라 다양한 유형의 외부 기기와 통신을 수행하는 구성이다. 통신부(140)는 근거리 통신망(LAN: Local Area Network) 또는 인터넷망을 통해 외부 기기에 접속될 수 있고, 무선 통신(예를 들어, Z-wave, 4LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless, Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, 블루투스, 와이파이, Wi-Fi Direct, GSM, UMTS, LTE, WiBRO 등의 무선 통신) 방식에 의해서 외부 기기에 접속될 수 있다. 통신부(140)는 와이파이칩(141), 블루투스 칩(142), NFC칩(143), 무선 통신 칩(144) 등과 같은 다양한 통신 칩을 포함한다. 와이파이 칩(141), 블루투스 칩(142), NFC 칩(143)은 각각 WiFi 방식, 블루투스 방식, NFC 방식으로 통신을 수행한다. 무선 통신 칩(174)은 IEEE, 지그비, 3G(3rd Generation), 3GPP(3rd Generation Partnership Project), LTE(Long Term Evoloution) 등과 같은 다양한 통신 규격에 따라 통신을 수행하는 칩을 의미한다. 또한 통신부(140)는 외부장치(200)로부터 제어신호(예를 들어 IR 펄스)를 수신할 수 있는 광 수신부(145)를 포함한다. The communication unit (140) is a component that performs communication with various types of external devices according to various types of communication methods. The communication unit (140) can be connected to an external device through a local area network (LAN) or the Internet, and can be connected to an external device by a wireless communication method (for example, wireless communication such as Z-wave, 4LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless, Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, Bluetooth, WiFi, Wi-Fi Direct, GSM, UMTS, LTE, WiBRO, etc.). The communication unit (140) includes various communication chips such as a WiFi chip (141), a Bluetooth chip (142), an NFC chip (143), and a wireless communication chip (144). The WiFi chip (141), the Bluetooth chip (142), and the NFC chip (143) perform communication in a WiFi method, a Bluetooth method, and an NFC method, respectively. The wireless communication chip (174) refers to a chip that performs communication according to various communication standards such as IEEE, Zigbee, 3G (3rd Generation), 3GPP (3rd Generation Partnership Project), LTE (Long Term Evoloution), etc. In addition, the communication unit (140) includes an optical receiving unit (145) that can receive a control signal (e.g., IR pulse) from an external device (200).

프로세서(120)는 통신부(140)를 통해 서버(300)로 음성신호와 언어정보(음성 인식의 기초가 되는 언어에 대한 정보)를 전송할 수 있고, 서버(300)가 상기 언어 정보에 대응하는 언어의 음성인식엔진을 이용하여 상기 음성신호에 대하여 수행한 음성 인식의 결과를 전송하면, 상기 음성 인식의 결과를 통신부(140)를 통해 수신할 수 있다.The processor (120) can transmit a voice signal and language information (information on a language that is the basis of voice recognition) to the server (300) through the communication unit (140), and when the server (300) transmits the result of voice recognition performed on the voice signal using a voice recognition engine of a language corresponding to the language information, the result of the voice recognition can be received through the communication unit (140).

마이크(150)는 사용자가 발화한 음성을 수신할 수 있고, 수신된 음성에 대응하는 음성 신호를 생성할 수 있다. 마이크(150)는 디스플레이 장치(100)와 일체형으로 구현되거나 또는 분리될 수 있다. 분리된 마이크(150)는 디스플레이 장치(100)와 전기적으로 연결될 수 있다.The microphone (150) can receive a voice spoken by a user and generate a voice signal corresponding to the received voice. The microphone (150) can be implemented as an integral part of the display device (100) or can be separated. The separated microphone (150) can be electrically connected to the display device (100).

디스플레이장치(100)에 마이크가 없는 경우, 디스플레이장치(100)는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성신호를 통신부(140)를 통해 외부장치(200)로부터 수신할 수 있다. 통신부(140)는 와이파이, 블루투스 등의 통신 방식으로 외부장치(200)로부터 음성신호를 수신할 수 있다.If the display device (100) does not have a microphone, the display device (100) can receive a voice signal corresponding to a voice input through the microphone of the external device (200) from the external device (200) through the communication unit (140). The communication unit (140) can receive a voice signal from the external device (200) through a communication method such as Wi-Fi or Bluetooth.

입/출력부(160)는 외부 장치와 연결되기 위한 구성이다. 입/출력부(160)는 HDMI 입력 포트(High-Definition Multimedia Interface port, 161), 컴포넌트 입력 잭(162), 및 USB 포트(163) 중 적어도 하나를 포함할 수 있다. 도시한 것 이외에도 입/출력부(180)는 RGB, DVI, HDMI, DP, 썬드볼트 등의 포트 중 적어도 하나를 포함할 수 있다.The input/output unit (160) is configured to be connected to an external device. The input/output unit (160) may include at least one of an HDMI input port (High-Definition Multimedia Interface port, 161), a component input jack (162), and a USB port (163). In addition to those illustrated, the input/output unit (180) may include at least one of ports such as RGB, DVI, HDMI, DP, and Thunderbolt.

오디오 출력부(170)는 오디오를 출력하기 위한 구성으로서, 예컨대, 튜너(130)를 통해 수신된 방송 신호에 포함된 오디오, 또는 통신부(140), 입/출력부(160) 등을 통해 입력되는 오디오, 또는 저장부(180)에 저장된 오디오 파일에 포함된 오디오를 출력할 수 있다. 오디오 출력부(170)는 스피커(171) 및 헤드폰 출력 단자(172)를 포함할 수 있다. The audio output unit (170) is a component for outputting audio, and may output audio included in a broadcast signal received through a tuner (130), audio input through a communication unit (140), an input/output unit (160), or audio included in an audio file stored in a storage unit (180). The audio output unit (170) may include a speaker (171) and a headphone output terminal (172).

저장부(180)는 프로세서(120)의 제어에 의해 디스플레이 장치(100)를 구동하고 제어하기 위한 각종 애플리케이션 프로그램, 데이터, 소프트웨어 모듈을 포함할 수 있다. 예컨대, 저장부(180)는 인터넷망을 통해 수신된 웹 컨텐츠 데이터를 파싱하는 웹 파싱 모듈, JavaScript 모듈, 그래픽처리 모듈, 음성인식결과 처리모듈, 입력 처리 모듈을 포함할 수 있다. The storage unit (180) may include various application programs, data, and software modules for driving and controlling the display device (100) under the control of the processor (120). For example, the storage unit (180) may include a web parsing module for parsing web content data received through the Internet, a JavaScript module, a graphic processing module, a voice recognition result processing module, and an input processing module.

외부의 서버(300)가 아닌 디스플레이장치(100) 자체적으로 음성 인식을 수행하는 경우, 저장부(180)에는 다양한 언어에 맞는 다양한 음성인식엔진을 포함하는 음성인식모듈이 저장되어 있을 수 있다.When voice recognition is performed by the display device (100) itself rather than by an external server (300), a voice recognition module including various voice recognition engines suitable for various languages may be stored in the storage unit (180).

저장부(180)는 디스플레이(110)에서 제공되는 다양한 UI 화면을 구성하기 위한 데이터를 저장할 수 있다. 또한, 저장부(180)는 다양한 사용자 인터렉션에 대응되는 제어 신호를 생성하기 위한 데이터를 저장할 수 있다.The storage unit (180) can store data for configuring various UI screens provided on the display (110). In addition, the storage unit (180) can store data for generating control signals corresponding to various user interactions.

저장부(180)는 비휘발성 메모리, 휘발성 메모리, 플래시메모리(flash-memory), 하드디스크 드라이브(HDD) 또는 솔리드 스테이트 드라이브(SSD) 등으로 구현될 수 있다. 한편, 저장부(180)는 디스플레이 장치(100) 내의 저장 매체뿐만 아니라, 외부 저장 매체, 예를 들어, micro SD 카드, USB 메모리 또는 네트워크를 통한 웹 서버(Web server) 등으로 구현될 수 있다.The storage unit (180) may be implemented as a nonvolatile memory, a volatile memory, a flash memory, a hard disk drive (HDD), a solid state drive (SSD), etc. Meanwhile, the storage unit (180) may be implemented as not only a storage medium within the display device (100), but also an external storage medium, for example, a micro SD card, a USB memory, or a web server via a network.

프로세서(120)는 디스플레이 장치(100)의 전반적인 동작 및 디스플레이 장치(100)의 내부 구성요소들 사이의 신호 흐름을 제어하고, 데이터를 처리하는 기능을 수행한다. The processor (120) controls the overall operation of the display device (100) and the signal flow between the internal components of the display device (100), and performs the function of processing data.

프로세서(120)는 RAM(121), ROM(122), CPU(123) 및 버스(124)를 포함한다. RAM(121), ROM(122), CPU(123) 등은 버스(124)를 통해 서로 연결될 수 있다. 프로세서(120)는 SoC(System On Chip)로 구현될 수 있다. The processor (120) includes a RAM (121), a ROM (122), a CPU (123), and a bus (124). The RAM (121), the ROM (122), the CPU (123), etc. may be connected to each other through the bus (124). The processor (120) may be implemented as a SoC (System On Chip).

CPU(123)는 저장부(180)에 액세스하여, 저장부(180)에 저장된 O/S를 이용하여 부팅을 수행한다. 그리고 저장부(180)에 저장된 각종 프로그램, 컨텐츠, 데이터 등을 이용하여 다양한 동작을 수행한다.The CPU (123) accesses the storage (180) and performs booting using the O/S stored in the storage (180). Then, it performs various operations using various programs, contents, data, etc. stored in the storage (180).

ROM(122)에는 시스템 부팅을 위한 명령어 세트 등이 저장된다. 턴 온 명령이 입력되어 전원이 공급되면, CPU(123)는 ROM(122)에 저장된 명령어에 따라 저장부(180)에 저장된 O/S를 RAM(121)에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, CPU(123)는 저장부(180)에 저장된 각종 애플리케이션 프로그램을을 RAM(121)에 복사하고, RAM(121)에 복사된 애플리케이션 프로그램을 실행시켜 각종 동작을 수행한다.A set of commands for system booting, etc. are stored in ROM (122). When a turn-on command is input and power is supplied, CPU (123) copies the O/S stored in storage (180) to RAM (121) according to the command stored in ROM (122) and executes the O/S to boot the system. When booting is complete, CPU (123) copies various application programs stored in storage (180) to RAM (121) and executes the application programs copied to RAM (121) to perform various operations.

프로세서(120)는 저장부(180)에 저장된 모듈을 이용하여 다양한 동작을 수행할 수 있다. 예를 들어, 프로세서(120)는 인터넷망을 통해 수신한 웹 컨텐츠 데이터를 파싱하고 처리하여 해당 컨텐츠의 전체적인 레이아웃(layout)과 각 객체를 디스플레이(110)에 표시할 수 있다.The processor (120) can perform various operations using modules stored in the storage unit (180). For example, the processor (120) can parse and process web content data received through the Internet and display the overall layout of the content and each object on the display (110).

프로세서(120)는 음성 인식 기능이 활성화 되면, 웹 컨텐츠의 객체들을 분석하여 음성으로 컨트롤 될 수 있는 객체를 찾아서 객체의 위치, 객체에 관련된 동작, 객체 내 텍스트 포함 여부 등의 정보에 대한 전 처리를 수행하여 전처리 수행 결과를 저장부(180)에 저장할 수 있다.When the voice recognition function is activated, the processor (120) analyzes objects of web content to find objects that can be controlled by voice, performs preprocessing on information such as the location of the object, actions related to the object, and whether text is included in the object, and stores the preprocessing results in the storage unit (180).

그리고 프로세서(120)는 전 처리된 객체 정보에 기초하여, 음성으로 컨트롤 가능한(선택 가능한) 객체들이 식별되게 표시되도록 디스플레이(110)를 제어할 수 있다. 예를 들어, 프로세서(120)는 음성으로 컨트롤이 가능한 객체들의 색상을 다른 객체들과 다르게 표시하도록 디스플레이(110)를 제어할 수 있다. And the processor (120) can control the display (110) so that objects controllable (selectable) by voice are identifiable and displayed based on the preprocessed object information. For example, the processor (120) can control the display (110) so that the colors of objects controllable by voice are displayed differently from other objects.

그리고 프로세서(120)는 마이크(150)로 입력된 음성을 음성인식엔진을 이용해서 텍스트로 인식할 수 있다. 이 경우, 프로세서(120)는 기결정된 언어(음성 인식의 기초가될 언어로서 설정된 언어)의 음성인식엔진을 이용한다. 또는, 프로세서(120)는 음성 신호와 음성 인식의 기초가 되는 언어에 대한 정보를 서버(300)로 보내어 서버(300)로부터 음성인식결과로서 텍스트를 수신하는 것도 가능하다.And the processor (120) can recognize the voice input by the microphone (150) as text using a voice recognition engine. In this case, the processor (120) uses a voice recognition engine of a predetermined language (a language set as the basis of voice recognition). Alternatively, the processor (120) can send a voice signal and information about the language that is the basis of voice recognition to the server (300) and receive text as a voice recognition result from the server (300).

그리고 프로세서(120)는 전 처리된 객체들 중에서 음성 인식 결과에 대응하는 객체를 검색하고, 검색된 객체의 위치에 객체가 선택되었음을 표시할 수 있다. 예를 들어, 프로세서(120)는 음성에 의해 선택된 객체를 하이라이트하도록 디스플레이(110)를 제어할 수 있다. 그리고 프로세서(120)는 전 처리된 객체 정보를 기초로, 음성 인식 결과에 대응하는 객체에 관련된 동작을 수행하여 그 결과를 디스플레이(110) 또는 오디오 출력부(170)를 통해 출력할 수 있다.And the processor (120) can search for an object corresponding to the voice recognition result among the preprocessed objects and display that the object has been selected at the location of the searched object. For example, the processor (120) can control the display (110) to highlight the object selected by voice. And the processor (120) can perform an operation related to the object corresponding to the voice recognition result based on the preprocessed object information and output the result through the display (110) or the audio output unit (170).

도 11은 본 개시의 일 실시 예에 따른 디스플레이장치(100)의 제어방법을 설명하기 위한 흐름도이다. 도 11에 도시된 흐름도는 본 명세서에서 설명되는 디스플레이장치(100)에서 처리되는 동작들로 구성될 수 있다. 따라서, 이하에서 생략된 내용이라 하더라도 디스플레이장치(100)에 관하여 기술된 내용은 도 11에 도시된 흐름도에도 적용될 수 있다.FIG. 11 is a flowchart for explaining a control method of a display device (100) according to an embodiment of the present disclosure. The flowchart illustrated in FIG. 11 may be composed of operations processed in the display device (100) described in this specification. Therefore, even if the content is omitted below, the content described regarding the display device (100) may also be applied to the flowchart illustrated in FIG. 11.

도 11을 참고하면, 먼저 디스플레이장치(100)에서 복수의 텍스트 객체를 포함하는 UI 스크린을 표시한다(S1110).Referring to Fig. 11, first, a UI screen including multiple text objects is displayed on a display device (100) (S1110).

그리고 디스플레이장치에 표시된 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시한다(S1120). 여기서 기결정된 언어란, 음성인식의 기초가 되는 언어로서 미리 결정된 것을 의미한다. 음성인식의 기초가 될 언어는 디폴트 언어로 설정된 언어이거나, 사용자의 수동 설정에 의해 설정되거나, 디스플레이장치(100)에 표시된 객체들을 구성하는 언어에 기초하여 자동 설정될 수 있다. 자동 설정의 경우, 예컨대 디스플레이장치(100)에 표시된 객체들에 OCR(Optical character recognition)을 적용하여 객체를 구성하는 언어를 확인할 수 있다.And for text objects that are different from a predetermined language among multiple text objects displayed on the display device, a preset number is displayed together (S1120). Here, the preset language means a language that is determined in advance as a basis for voice recognition. The language that will be the basis for voice recognition may be a language set as a default language, set by a user's manual setting, or automatically set based on a language that constitutes objects displayed on the display device (100). In the case of automatic setting, for example, the language that constitutes the object may be confirmed by applying OCR (Optical character recognition) to objects displayed on the display device (100).

그리고 사용자가 발화한 음성의 인식 결과가 표시된 숫자를 포함하면 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행한다(S1130). And if the recognition result of the voice spoken by the user includes a displayed number, an action related to a text object corresponding to the displayed number is performed (S1130).

사용자가 발화한 음성의 인식 결과는 디스플레이장치의 자체적인 음성 인식에 의해 얻을 수 있거나, 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 외부 서버에 음성 인식을 요청해서 수신받을 수 있다. 후자의 경우, 디스플레이장치(100)는 사용자가 발화한 음성에 대응하는 음성 신호와 음성 인식의 기초가되는 언어로 설정된 언어에 대한 정보를 외부 서버에 제공하고, 외부 서버로부터 수신된 음성 인식 결과가 표시된 숫자를 포함하면, 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.The recognition result of the voice spoken by the user can be obtained by the display device's own voice recognition, or can be received by requesting voice recognition from an external server that performs voice recognition for multiple different languages. In the latter case, the display device (100) provides the external server with a voice signal corresponding to the voice spoken by the user and information about the language set as the basis of the voice recognition, and if the voice recognition result received from the external server includes a displayed number, an operation related to a text object corresponding to the displayed number can be performed.

예컨대, 텍스트 객체가 웹 페이지 내의 하이퍼링크텍스트인 경우, 텍스트 객체에 대응하는 URL 주소의 웹 페이지의 표시 동작을 수행할 수 있고, 텍스트 객체가 애플리케이션 실행을 위한 아이콘인 경우, 해당 애플리케이션을 실행할 수 있다.For example, if the text object is a hyperlink text within a web page, the display action of the web page corresponding to the URL address of the text object can be performed, and if the text object is an icon for executing an application, the application can be executed.

한편, 상기 복수의 텍스트 객체를 포함하는 UI 스크린은 제1 애플리케이션의 실행 화면일 수 있다. 제1 애플리케이션의 실행 화면이란 제1 애플리케이션이 제공하는 어떠한 화면이라도 될 수 있다. 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 디스플레이장치는 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행할 수 있다. 여기서 제1 애플리케이션은 웹브라우징 애플리케이션일 수 있고, 제2 애플리케이션은 다양한 소스, 예컨대 인터넷, 디스플레이장치 내 저장된 데이터, VOD 컨텐츠, 채널 정보(ex. EPG) 등에서 검색을 수행하는 애플리케이션일 수 있다. 예컨대, 현재 표시된 웹 페이지에서 음성 인식에 대응하는 객체가 없는 경우, 디스플레이장치는 다른 애플리케이션을 실행해서 음성 인식에 대응하는 검색 결과(예컨대, 구글 검색 결과, VOD 검색 결과, 채널 검색 결과 등)를 제공할 수 있다. Meanwhile, the UI screen including the plurality of text objects may be an execution screen of the first application. The execution screen of the first application may be any screen provided by the first application. If it is determined that an object corresponding to the recognition result of a voice spoken by the user while the execution screen of the first application is displayed is not present on the execution screen of the first application, the display device may execute a second application different from the first application to perform an operation corresponding to the recognition result of the voice. Here, the first application may be a web browsing application, and the second application may be an application that performs a search from various sources, such as the Internet, data stored in the display device, VOD content, channel information (e.g., EPG), etc. For example, if there is no object corresponding to voice recognition on the currently displayed web page, the display device may execute another application to provide a search result corresponding to voice recognition (e.g., Google search result, VOD search result, channel search result, etc.).

상술한 다양한 실시 예들에 따르면, 다양한 언어로 구성된 객체들에 대한 음성 컨트롤이 가능하며, 또한, 음성 검색을 보다 용이하게 할 수 있다.According to the various embodiments described above, voice control for objects composed of various languages is possible, and voice search can also be made easier.

한편, 이상에서 설명된 다양한 실시 예들은 소프트웨어(software), 하드웨어(hardware) 또는 이들의 조합된 것을 이용하여 컴퓨터(computer) 또는 이와 유사한 장치로 읽을 수 있는 기록 매체 내에서 구현될 수 있다. 하드웨어적인 구현에 의하면, 본 개시에서 설명되는 실시 예들은 ASICs(Application Specific Integrated Circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛(unit) 중 적어도 하나를 이용하여 구현될 수 있다. 일부의 경우에 본 명세서에서 설명되는 실시 예들이 프로세서(120) 자체로 구현될 수 있다. 소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다.Meanwhile, the various embodiments described above may be implemented in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof. In terms of hardware implementation, the embodiments described in the present disclosure may be implemented using at least one of ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), processors, controllers, micro-controllers, microprocessors, and other electrical units for performing functions. In some cases, the embodiments described in the present specification may be implemented in the processor (120) itself. In terms of software implementation, embodiments such as the procedures and functions described in the present specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described in the present specification.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 디스플레이장치(100)에서의 처리동작을 수행하기 위한 컴퓨터 명령어(computer instructions)는 비일시적 컴퓨터 판독 가능 매체(non-transitory computer-readable medium) 에 저장될 수 있다. 이러한 비일시적 컴퓨터 판독 가능 매체에 저장된 컴퓨터 명령어는 특정 기기의 프로세서에 의해 실행되었을 때 상술한 다양한 실시 예에 따른 디스플레이장치(100)에서의 처리 동작을 상기 특정 기기가 수행하도록 한다. Meanwhile, computer instructions for performing processing operations in the display device (100) according to the various embodiments of the present disclosure described above may be stored in a non-transitory computer-readable medium. When the computer instructions stored in the non-transitory computer-readable medium are executed by a processor of a specific device, they cause the specific device to perform processing operations in the display device (100) according to the various embodiments described above.

비일시적 컴퓨터 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 비일시적 컴퓨터 판독 가능 매체의 구체적인 예로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 있을 수 있다.A non-transitory computer-readable medium is not a medium that stores data for a short period of time, such as a register, cache, or memory, but a medium that permanently stores data and can be read by a device. Specific examples of non-transitory computer-readable media include CDs, DVDs, hard disks, Blu-ray disks, USBs, memory cards, and ROMs.

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.Although the preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above, and various modifications may be made by a person skilled in the art to which the present disclosure pertains without departing from the gist of the present disclosure as claimed in the claims, and such modifications should not be individually understood from the technical idea or prospect of the present disclosure.

100: 디스플레이장치
110: 디스플레이
120: 프로세서100: Display device
110: Display
120: Processor

Claims

In display devices,
display; and
Controlling the display to display a first screen including a plurality of objects;
Receives a voice signal corresponding to a voice input received using a microphone of the display device or an external device,
Processing a speech signal to obtain a speech recognition result including text,
If the above text corresponds to a single object among the multiple objects, an action related to the single object is performed,
a processor for controlling the display to display at least two icons if the text corresponds to at least two objects among the plurality of objects;
Each of the at least two icons above represents a number and is displayed adjacent to a corresponding object among the at least two objects,
The above numbers are expressed as words of the basic language set for speech recognition, a display device.

In the first paragraph,
The above display device further includes a communication unit;
The above processor,
Controlling the communication unit to transmit the above voice signal to the server,
A display device that receives the voice recognition result from the server using the above communication unit.

In the first paragraph,
The above processor,
A display device identifying at least one object containing the above text.

In the third paragraph,
The above processor,
If the identified object is one, perform an action related to the identified object,
A display device that controls the display to display two icons adjacent to each of the two identified objects, if there are two objects identified above.

In the first paragraph,
The above processor,
A display device that controls the display to display the at least two icons based on information about the positions of the at least two objects.

In the first paragraph,
The input voice is a first voice input, the voice signal is a first voice signal, the voice recognition result is a first voice recognition result, the text is a first text, and the single object is a first single object.
The above processor,
Receive a second voice signal corresponding to a second voice input received using the microphone of the display device or the external device,
Processing the second speech signal to obtain a second speech recognition result including a second text,
A display device that performs an action related to the second single object when the second text corresponds to a second single object among the at least two objects.

In the first paragraph,
The above processor,
A display device that sets the language set as the language of use in the settings menu provided by the above display device as the language that serves as the basis for voice recognition.

In the first paragraph,
The single object and the at least two objects are selectable based on the speech signal,
A display device, wherein each of said single objects and said at least two objects include text objects included in a preset language.

In the first paragraph,
The above processor,
A display device, controlling the display to display a phrase instructing the user to provide additional voice input including the number, together with at least two icons.

In the first paragraph,
The above processor,
A display device that controls the display to display search results based on the text included in the voice recognition result when it is determined that the plurality of objects do not correspond to the voice recognition result.

In a method for controlling a display device,
A step of displaying a first screen including a plurality of objects;
A step of receiving a voice signal corresponding to a voice input received using a microphone of the display device or an external device;
A step of processing a speech signal to obtain a speech recognition result including text;
If the above text corresponds to a single object among the plurality of objects, a step of performing an action related to the single object; and
If the above text corresponds to at least two objects among the plurality of objects, a step of displaying at least two icons is included;
Each of the at least two icons above represents a number and is displayed adjacent to a corresponding object among the at least two objects,
A control method in which the above numbers are expressed as words of a base language set for speech recognition.

In Article 11,
The above control method is,
a step of transmitting the above voice signal to a server; and
A control method further comprising: a step of receiving the voice recognition result from the server.

In Article 11,
The above control method is,
A control method further comprising: identifying at least one object comprising said text;

In Article 13,
The steps for performing actions related to the above single object are:
If the identified object is one, perform an action related to the identified object,
The step of displaying at least two icons above is:
A control method for displaying two icons adjacent to each of the two identified objects when there are two objects identified above.

In Article 11,
The step of displaying at least two icons above is:
A control method for displaying at least two icons based on information about the locations of at least two objects.

In Article 11,
The input voice is a first voice input, the voice signal is a first voice signal, the voice recognition result is a first voice recognition result, the text is a first text, and the single object is a first single object.
The above control method is,
A step of receiving a second voice signal corresponding to a second voice input received using the microphone of the display device or the external device;
A step of processing the second speech signal to obtain a second speech recognition result including a second text; and
A control method further comprising: a step of performing an action related to the second single object if the second text corresponds to a second single object among the at least two objects.

In Article 11,
The above control method is,
A control method further comprising: a step of setting a language set as a language used in a setting menu provided by the display device as a language that serves as the basis for voice recognition.

In Article 11,
The single object and the at least two objects are selectable based on the speech signal,
A control method, wherein each of said single objects and said at least two objects include a text object included in a preset language.

In Article 11,
The above control method is,
A control method further comprising: displaying a phrase instructing the user to provide an additional voice input including the number, together with at least two icons.

In Article 11,
The above control method is,
A control method further comprising: a step of displaying a search result based on the text included in the voice recognition result, if it is determined that the plurality of objects do not correspond to the voice recognition result.

A non-transitory computer-readable recording medium storing computer instructions that, when executed by a processor of a display device, cause the display device to perform an operation, the operation comprising:
A step of displaying a first screen including a plurality of objects;
A step of receiving a voice signal corresponding to a voice input received using a microphone of the display device or an external device;
A step of processing a speech signal to obtain a speech recognition result including text;
If the above text corresponds to a single object among the plurality of objects, a step of performing an action related to the single object; and
If the above text corresponds to at least two objects among the plurality of objects, a step of displaying at least two icons is included;
Each of the at least two icons above represents a number and is displayed adjacent to a corresponding object among the at least two objects,
A computer-readable recording medium in which the above numbers are expressed as words of a basic language set for voice recognition.

In Article 21,
The above actions are,
a step of transmitting the above voice signal to a server; and
A computer-readable recording medium further comprising: a step of receiving the voice recognition result from the server.

In Article 21,
The above actions are,
A computer-readable recording medium further comprising: a step of identifying at least one object comprising the text;

In Article 23,
The steps for performing actions related to the above single object are:
If the identified object is one, perform an action related to the identified object,
The step of displaying at least two icons above is:
A computer-readable recording medium, which displays two icons adjacent to each of the two identified objects, if there are two objects identified above.

In Article 21,
The step of displaying at least two icons above is:
A computer-readable recording medium that displays at least two icons based on information about the locations of at least two objects.

In Article 21,
The input voice is a first voice input, the voice signal is a first voice signal, the voice recognition result is a first voice recognition result, the text is a first text, and the single object is a first single object.
The above actions are,
A step of receiving a second voice signal corresponding to a second voice input received using the microphone of the display device or the external device;
A step of processing the second speech signal to obtain a second speech recognition result including a second text; and
A computer-readable recording medium further comprising: a step of performing an action related to the second single object if the second text corresponds to a second single object among the at least two objects;

In Article 21,
The above actions are,
A computer-readable recording medium further comprising: a step of setting a language set as a language used in a setting menu provided by the display device as a language that serves as the basis for voice recognition;

In Article 21,
The single object and the at least two objects are selectable based on the speech signal,
A computer-readable storage medium, wherein each of said single objects and said at least two objects include text objects contained in a preset language.

In Article 21,
The above actions are,
A computer-readable medium further comprising: a step of displaying a phrase instructing the user to provide additional voice input including the number, together with at least two icons mentioned above;

In Article 21,
The above actions are,
A computer-readable recording medium further comprising: a step of displaying a search result based on the text included in the voice recognition result, if it is determined that the plurality of objects do not correspond to the voice recognition result.

.