KR20250024214A

KR20250024214A - Real-time translation subtitle providing system that enables real-time sharing of translated subtitle information

Info

Publication number: KR20250024214A
Application number: KR1020230105407A
Authority: KR
Inventors: 백민호
Original assignee: (주)에어사운드
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2025-02-18
Anticipated expiration: 2043-08-11
Also published as: KR102815841B1

Abstract

본 발명은 번역 자막 정보의 실시간 공유가 가능한 실시간 번역 자막 제공시스템에 관한 것으로서 더욱 상세하게는 국제 회의장 또는 실시간 동영상 스트리밍 환경에서 화자의 음성 정보 또는 동영상 내 음성 정보를 관리 서버가 수신 받아 STT 변환 및 요청한 언어로 번역을 수행하여 요청한 사용자에게 자막 정보를 제공하되, 사용자의 요청에 따라 다른 사용자에게 해당 자막 정보를 제공하여 번역 자막 정보가 실시간으로 공유될 수 있는 실시간 번역 자막 제공시스템에 관한 것이다.
이를 위해 본 발명은 자막 제공 요청 정보가 수신된 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신 받아 해당 음성 정보의 언어인 제 1 언어에 대해 STT(Speech-To-Text) 변환을 통해 텍스트 형태의 제 1 자막 정보를 생성하고, 생성된 제 1 자막 정보를 요청된 제 2 언어로 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 관리 서버; 상기 관리 서버로 데이터 통신 가능하도록 연결되어 특정 화자의 음성 정보 또는 특정 동영상 데이터의 음성 정보를 전송하여 자막 제공 요청을 수행하고, 상기 관리 서버로부터 생성된 상기 제 1 자막 정보 및 제 2 자막 정보를 수신받으며, 선택적으로 수신받은 상기 제 1 자막 정보 및 제 2 자막 정보에 대한 자막 공유 요청 정보를 상기 관리 서버로 전송하여 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받기 위한 공유 Web 연결 정보를 상기 관리 서버로부터 수신받는 호스트 단말기; 및 상기 호스트 단말기로부터 상기 공유 Web 연결 정보를 수신받아 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받는 클라이언트 단말기;를 포함한다. The present invention relates to a real-time translation subtitle provision system capable of real-time sharing of translation subtitle information, and more specifically, to a real-time translation subtitle provision system capable of managing a speaker's voice information or voice information within a video in an international conference hall or real-time video streaming environment, converting it to STT and translating it into a requested language to provide subtitle information to a requesting user, and providing the subtitle information to another user at the user's request, so that the translation subtitle information can be shared in real time.
To this end, the present invention includes a management server which receives voice information of a speaker for whom subtitle provision request information has been received or voice information of video data, generates first subtitle information in the form of text through STT (Speech-To-Text) conversion for a first language, which is a language of the corresponding voice information, and translates the generated first subtitle information into a requested second language to generate second subtitle information in the form of text; a host terminal which is connected to the management server so as to enable data communication and transmits voice information of a specific speaker or voice information of specific video data to perform a subtitle provision request, receives the generated first subtitle information and second subtitle information from the management server, and selectively transmits subtitle sharing request information for the received first subtitle information and second subtitle information to the management server to receive shared Web connection information for receiving the first subtitle information and the second subtitle information from the management server; and a client terminal which receives the shared Web connection information from the host terminal and receives the first subtitle information and the second subtitle information.

Description

{Real-time translation subtitle providing system that enables real-time sharing of translated subtitle information}

본 발명은 번역 자막 정보의 실시간 공유가 가능한 실시간 번역 자막 제공시스템에 관한 것으로서 더욱 상세하게는 국제 회의장 또는 실시간 동영상 스트리밍 환경에서 화자의 음성 정보 또는 동영상 내 음성 정보를 관리 서버가 수신 받아 STT 변환 및 요청한 언어로 번역을 수행하여 요청한 사용자에게 자막 정보를 제공하되, 사용자의 요청에 따라 다른 사용자에게 해당 자막 정보를 제공하여 번역 자막 정보가 실시간으로 공유될 수 있는 실시간 번역 자막 제공시스템에 관한 것이다. The present invention relates to a real-time translation subtitle provision system capable of real-time sharing of translation subtitle information, and more specifically, to a real-time translation subtitle provision system capable of managing a speaker's voice information or voice information within a video in an international conference hall or real-time video streaming environment, converting it to STT and translating it into a requested language to provide subtitle information to a requesting user, and providing the subtitle information to another user at the user's request, so that the translation subtitle information can be shared in real time.

일반적으로, 외국어를 잘 못하는 사람이 국제 회의장이나 동영상 시청을 할 때 외국인이 화자인 경우 의사 소통이 잘되지 않아 불편함과 어려움을 겪는 경우가 많다. In general, people who are not good at foreign languages often experience discomfort and difficulty in communicating when the speakers are foreigners at international conferences or watching videos.

그렇지만 외국어를 잘 아는 다른 사람의 도움 없이는 스스로 사전 또는 회화책자 등을 찾아가며 외국인과 의사 소통을 해야하나 이는 실시간 대응이 어렵고 제대로 의사 소통을 하기가 어려웠다.However, without the help of someone who knows the foreign language well, you have to look up dictionaries or phrasebooks on your own to communicate with foreigners, which makes it difficult to respond in real time and communicate properly.

최근 들어 스마트폰 기술의 발달로 자동 번역기가 개발되어, 언어 체계가 비슷한 한국어와 일본어인 경우 그 신뢰성이 높아 실용화되어 있으며, 영어인 경우 한국어와 언어 체계가 달라 일본어의 경우보다는 번역의 신뢰성이 떨어지나 단문이나 간단한 복문 등은 실용적으로 이용할 수 있는 정도의 번역기 및 그 번역 프로그램들이 널리 알려져 있다.Recently, with the advancement of smartphone technology, automatic translation machines have been developed, and their reliability has been put to practical use in the case of Korean and Japanese, which have similar language systems. In the case of English, the language systems are different from Korean, so the reliability of translation is lower than in the case of Japanese, but translation machines and their translation programs that can be practically used for short texts or simple complex sentences are widely known.

한편, 사람의 음성을 기계가 인식하기 위한 음성인식에 대해서 많은 연구 노력을 기울이고 있는 분야이며, 샘플링 기술의 발달과 신경회로망을 이용한 자기 학습기능 등의 발달로 음성의 자동 인식율이 높아지고 일부 분야에서 실용화되고 있다.Meanwhile, much research effort is being put into voice recognition for machines to recognize human voices, and with the development of sampling technology and self-learning functions using neural networks, the automatic voice recognition rate is increasing and it is being put to practical use in some fields.

또한, 문자를 음성으로 출력하는 기술은 각 문자의 발음 조합이나 기타 단어의 발음 및 문장의 발음 등을 데이터베이스(DB)화하여 이를 음성으로 출력하는 음성 출력장치도 이미 널리 이용되고 있다.In addition, the technology of outputting text as speech is already widely used in voice output devices that create a database (DB) of pronunciation combinations of each letter, pronunciation of other words, pronunciation of sentences, etc. and output them as speech.

또한, 현대 사람들이 항상 소지하고 다니는 스마트폰(Smart Phone)을 이용하여 사용자들에게 다른 나라 언어를 자국어로 변환하여 제공하는 통역 서비스를 제공하고 있다. In addition, we provide interpretation services that convert foreign languages into the user's native language using the smart phone that modern people always carry with them.

그러나, 기존에 다양한 형태로 제공되고 있는 번역 시스템은 1 대 N 간의 대화에 적합하지 않고, 별도의 네트워크 등이 사전에 마련되어야 어느 정도 원활한 번역 정보를 제공받을 수 있는 문제가 있다. However, the translation systems that are currently available in various forms are not suitable for 1:N conversations, and there is a problem that separate networks must be prepared in advance to provide translation information with a certain degree of smoothness.

따라서 국제 회의장이나 컨퍼런스 등에서 연설하는 화자나 실시간 동영상 스트리밍이 이루어지는 동영상 데이터의 음성 정보를 실시간으로 번역하여 많은 사용자에게 동시에 제공할 수 있음은 물론 많은 사용자들이 신속하고 용이하게 번역 정보를 제공받을 수 있는 시스템의 개선이 요구된다. Therefore, there is a need for improvement in the system that can simultaneously provide real-time translation of speech information of speakers speaking at international conferences or conferences, or of video data streaming live, to many users, as well as provide translation information quickly and easily to many users.

한국등록특허 제10-1753649호Korean Patent No. 10-1753649

본 발명은 상기의 문제점을 해결하기 위해 안출된 것으로서 국제 회의장 또는 실시간 동영상 스트리밍 환경에서 화자의 음성 정보 또는 동영상 내 음성 정보를 관리 서버가 수신 받아 STT 변환 및 요청한 언어로 번역을 수행하여 요청한 사용자에게 자막 정보를 제공하되, 사용자의 요청에 따라 다른 사용자에게 해당 자막 정보를 제공하여 번역 자막 정보가 실시간으로 공유될 수 있는 실시간 번역 자막 제공시스템을 제공함에 그 목적이 있다. The present invention has been made to solve the above problems, and the purpose of the present invention is to provide a real-time translation subtitle provision system in which a management server receives a speaker's voice information or a voice information within a video in an international conference room or real-time video streaming environment, performs STT conversion and translation into a requested language, and provides subtitle information to a requesting user, and also provides the subtitle information to another user at the request of the user, so that the translated subtitle information can be shared in real time.

본 발명은 상기의 목적을 달성하기 위해 아래와 같은 특징을 갖는다.To achieve the above purpose, the present invention has the following features.

본 발명은 자막 제공 요청 정보가 수신된 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신받아 해당 음성 정보의 언어인 제 1 언어에 대해 STT(Speech-To-Text) 변환을 통해 텍스트 형태의 제 1 자막 정보를 생성하고, 생성된 제 1 자막 정보를 요청된 제 2 언어로 학습된 인공지능 기반의 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 관리 서버; 상기 관리 서버로 데이터 통신 가능하도록 연결되어 특정 화자의 음성 정보 또는 특정 동영상 데이터의 음성 정보를 전송하여 자막 제공 요청을 수행하고, 상기 관리 서버로부터 생성된 상기 제 1 자막 정보 및 제 2 자막 정보를 수신받으며, 수신받은 상기 제 1 자막 정보 및 제 2 자막 정보에 대해 선택적으로 자막 공유 요청 정보를 상기 관리 서버로 전송하여 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받기 위한 공유 Web 연결 정보를 상기 관리 서버로부터 수신받는 호스트 단말기; 및 상기 호스트 단말기로부터 상기 공유 Web 연결 정보를 수신받아 상기 관리 서버에 데이터 통신 가능하도록 접속되고, 상기 관리 서버로부터 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받는 클라이언트 단말기;를 포함한다. The present invention comprises: a management server which receives voice information of a speaker for whom subtitle provision request information has been received or voice information of video data, generates first subtitle information in the form of text through STT (Speech-To-Text) conversion for a first language, which is the language of the corresponding voice information, and performs translation based on artificial intelligence learned on the generated first subtitle information into a requested second language to generate second subtitle information in the form of text; a host terminal which is connected to the management server so as to enable data communication and transmits voice information of a specific speaker or voice information of specific video data to perform a subtitle provision request, receives the first subtitle information and the second subtitle information generated from the management server, and selectively transmits subtitle sharing request information to the management server for the received first subtitle information and the second subtitle information to receive shared Web connection information for receiving the first subtitle information and the second subtitle information from the management server; and a client terminal which receives the shared Web connection information from the host terminal and is connected to the management server so as to enable data communication, and receives the first subtitle information and the second subtitle information from the management server.

여기서 상기 호스트 단말기는, 상기 자막 제공 요청 정보가 수신된 음성 정보가 동영상 데이터인 경우, 상기 관리 서버로부터 수신한 제 1 자막 정보 및 제 2 자막 언어 중 적어도 하나를 상기 동영상 데이터의 출력 화면 상에 오버레이시켜 출력한다. Here, if the audio information in which the subtitle provision request information is received is video data, the host terminal overlays at least one of the first subtitle information and the second subtitle language received from the management server on the output screen of the video data and outputs it.

또한 상기 공유 Web 연결 정보는, 상기 관리 서버에서 생성한 웹 페이지의 URL(Uniform Resource Locator) 정보 또는 QR 코드 정보이고, 상기 관리 서버는 상기 상기 공유 Web 연결 정보를 통해 웹 페이지에 접속한 클라이언트 단말기로 상기 제 1 자막 정보 및 제 2 자막 정보를 전송하고, 상기 호스트 단말기와 적어도 하나의 클라이언트 단말기로부터 각각 대화 정보를 수신하여 상기 웹 페이지 상에 출력함에 따라 양방향 커뮤니케이션이 가능하도록 하는 공유 관리부가 포함되되, 상기 공유 관리부는 상기 클라이언트 단말기로부터 각각 수신된 대화 정보를 상기 호스트 단말기로 제공 시에 상기 호스트 단말기가 설정한 언어로 번역을 수행하여 제공한다. In addition, the shared Web connection information is URL (Uniform Resource Locator) information or QR code information of a Web page generated by the management server, and the management server transmits the first subtitle information and the second subtitle information to a client terminal that has accessed the Web page through the shared Web connection information, and includes a shared management unit that receives conversation information from the host terminal and at least one client terminal and outputs it on the Web page to enable two-way communication, wherein the shared management unit provides the conversation information received from each of the client terminals by translating it into a language set by the host terminal when providing it to the host terminal.

아울러 상기 호스트 단말기는 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신받아 해당 음성 정보의 언어인 제 1 언어에 대해 STT(Speech-To-Text) 변환을 수행하여 텍스트 형태의 제 1 자막 정보를 생성하는 호스트 변환부 및 상기 호스트 변환부로부터 생성된 제 1 자막 정보를 학습된 인공지능 기반의 번역을 수행하되 사전에 설정된 제 2 언어로 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 호스트 번역부를 더 포함한다. In addition, the host terminal further includes a host conversion unit that receives voice information of a speaker or voice information of video data, performs STT (Speech-To-Text) conversion on a first language, which is the language of the voice information, to generate first subtitle information in the form of text, and a host translation unit that performs translation based on learned artificial intelligence on the first subtitle information generated from the host conversion unit, but translates the translation into a second language set in advance, to generate second subtitle information in the form of text.

또한 상기 호스트 단말기의 통역 요청에 따라 상기 관리 서버와 데이터 통신 가능하도록 접속되어 요청된 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보 및 제 1 자막 정보를 상기 관리 서버로부터 수신받으며, 해당 음성 정보 및 제 1 자막 정보에 대해 통역 요청 시 설정된 제 2 언어로 통역을 수행한 실시간 통역 음성 정보를 상기 관리 서버로 제공하는 통역사 단말기;가 더 포함된다. In addition, an interpreter terminal is further included, which is connected to the management server so as to enable data communication in response to an interpretation request from the host terminal, receives voice information of a specific speaker requested or voice information of video data and first subtitle information from the management server, and provides real-time interpretation voice information that performs interpretation in a second language set at the time of the interpretation request for the voice information and first subtitle information to the management server.

아울러 상기 관리 서버는, 상기 통역사 단말기로부터 수신된 실시간 통역 음성 정보를 STT 변환을 수행하여 제 2 자막 정보를 생성하고, 생성된 제 2 자막 정보를 상기 제 1 자막 정보와 함께 요청한 상기 호스트 단말기에 제공하되 자막 공유 기능이 수행 중인 경우 상기 공유 Web 연결 정보를 통해 접속된 클라이언트 단말기로 상기 제 1 자막 정보 및 제 2 자막 정보를 함께 제공한다. In addition, the management server performs STT conversion on real-time interpretation voice information received from the interpreter terminal to generate second subtitle information, and provides the generated second subtitle information together with the first subtitle information to the requested host terminal. In addition, when the subtitle sharing function is being performed, the management server provides the first subtitle information and the second subtitle information together to a client terminal connected through the shared Web connection information.

또한 상기 호스트 단말기의 속기 요청에 따라 상기 관리 서버와 데이터 통신 가능하도록 접속되어 요청된 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 상기 관리 서버로부터 수신받으며, 해당 음성 정보에 대한 텍스트 형태의 실시간 속기 정보를 상기 관리 서버로 제공하는 속기사 단말기;가 더 포함되고, 상기 관리 서버는, 상기 속기사 단말기로부터 수신된 실시간 속기 정보를 인공지능 기반의 번역을 수행하여 제 2 자막 정보를 생성하고, 속기 요청한 상기 호스트 단말기로 해당 실시간 속기 정보 및 제 2 자막 정보를 제공하되, 자막 공유 기능이 수행 중인 경우 상기 공유 Web 연결 정보를 통해 접속된 클라이언트 단말기로 해당 실시간 속기 정보 및 제 2 자막 정보를 함께 제공한다. In addition, a stenographer terminal is further included, which is connected to the management server so as to enable data communication in response to a stenography request from the host terminal, receives voice information of a specific speaker requested or voice information of video data from the management server, and provides real-time stenography information in text form for the voice information to the management server; and the management server performs an artificial intelligence-based translation of the real-time stenography information received from the stenographer terminal to generate second subtitle information, and provides the real-time stenography information and the second subtitle information to the host terminal that has requested the stenography, and when a subtitle sharing function is being performed, provides the real-time stenography information and the second subtitle information together to a client terminal connected via the shared Web connection information.

본 발명에 따르면 화자의 음성 정보 또는 실시간 스트리밍 동영상 데이터의 음성 정보에 대한 번역 정보를 실시간으로 제공받을 수 있으며, 호스트 단말기의 공유 요청에 따라 다른 많은 수의 사용자들이 별도의 어플리케이션 설치 과정이나 로그인 과정 없이 신속하고 용이하게 실시간 번역 정보를 제공받을 수 있는 효과가 있다. According to the present invention, translation information for a speaker's voice information or voice information of real-time streaming video data can be provided in real time, and, in response to a sharing request from a host terminal, a large number of other users can quickly and easily receive real-time translation information without a separate application installation process or login process.

도 1은 본 발명의 일실시예에 따른 실시간 번역 자막 제공시스템의 개략적인 구성을 나타내는 도면이다.
도 2는 본 발명의 일실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다.
도 3은 본 발명의 다른 실시예에 따른 호스트 단말기의 내부 구성을 나타내는 도면이다.
도 4는 본 발명의 일실시예에 따른 호스트 단말기의 전용 어플리케이션에서 메인 표시 화면을 나타내는 도면이다.
도 5는 본 발명의 일실시예에 따른 호스트 단말기에서 제 1 자막 정보 또는 제 2 자막 정보가 동영상 데이터에 오버레이되어 출력되는 모습을 나타내는 도면이다.
도 6 및 도 7은 본 발명의 일실시예에 따른 호스트 단말기의 전용 어플리케이션에서 자막 공유 요청 화면을 나타내는 도면이다.
도 8은 본 발명의 일실시예에 따른 클라이언트 단말기의 웹 페이지 표시 화면을 나타내는 도면이다.
도 9는 본 발명의 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다.
도 10은 본 발명의 또 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다.
도 11은 본 발명의 또 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다. FIG. 1 is a drawing showing a schematic configuration of a real-time translation subtitle provision system according to one embodiment of the present invention.
FIG. 2 is a drawing showing the internal configuration of a real-time translation subtitle provision system according to one embodiment of the present invention.
FIG. 3 is a drawing showing the internal configuration of a host terminal according to another embodiment of the present invention.
FIG. 4 is a drawing showing a main display screen in a dedicated application of a host terminal according to one embodiment of the present invention.
FIG. 5 is a drawing showing a state in which first subtitle information or second subtitle information is overlaid on video data and output in a host terminal according to one embodiment of the present invention.
FIGS. 6 and 7 are drawings showing a subtitle sharing request screen in a dedicated application of a host terminal according to one embodiment of the present invention.
FIG. 8 is a drawing showing a web page display screen of a client terminal according to one embodiment of the present invention.
FIG. 9 is a diagram showing the internal configuration of a real-time translation subtitle provision system according to another embodiment of the present invention.
FIG. 10 is a drawing showing the internal configuration of a real-time translation subtitle providing system according to another embodiment of the present invention.
FIG. 11 is a drawing showing the internal configuration of a real-time translation subtitle providing system according to another embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 설명하기 위하여 이하에서는 본 발명의 바람직한 실시예를 예시하고 이를 참조하여 살펴본다.In order to explain the present invention, the operational advantages of the present invention, and the purpose achieved by practicing the present invention, preferred embodiments of the present invention will be exemplified and examined with reference thereto.

먼저, 본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니며, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 또한 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.First, the terminology used in this application is only used to describe specific embodiments, and is not intended to limit the present invention, and the singular expression may include plural expressions unless the context clearly indicates otherwise. Also, in this application, it should be understood that the terms "comprise" or "have" and the like are intended to specify the presence of a feature, number, step, operation, component, part or combination thereof described in the specification, but do not exclude in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

본 발명을 설명함에 있어서, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description will be omitted.

도 1은 본 발명의 일실시예에 따른 실시간 번역 자막 제공시스템의 개략적인 구성을 나타내는 도면이고, 도 2는 본 발명의 일실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이며, 도 3은 본 발명의 다른 실시예에 따른 호스트 단말기의 내부 구성을 나타내는 도면이고, 도 4는 본 발명의 일실시예에 따른 호스트 단말기의 전용 어플리케이션에서 메인 표시 화면을 나타내는 도면이다.FIG. 1 is a drawing showing a schematic configuration of a real-time translation subtitle providing system according to an embodiment of the present invention, FIG. 2 is a drawing showing the internal configuration of a real-time translation subtitle providing system according to an embodiment of the present invention, FIG. 3 is a drawing showing the internal configuration of a host terminal according to another embodiment of the present invention, and FIG. 4 is a drawing showing a main display screen in a dedicated application of a host terminal according to an embodiment of the present invention.

또한, 도 5는 본 발명의 일실시예에 따른 호스트 단말기에서 제 1 자막 정보 또는 제 2 자막 정보가 동영상 데이터에 오버레이되어 출력되는 모습을 나타내는 도면이며, 도 6 및 도 7은 본 발명의 일실시예에 따른 호스트 단말기의 전용 어플리케이션에서 자막 공유 요청 화면을 나타내는 도면이고, 도 8은 본 발명의 일실시예에 따른 클라이언트 단말기의 웹 페이지 표시 화면을 나타내는 도면이다. In addition, FIG. 5 is a drawing showing a state in which first subtitle information or second subtitle information is overlaid on video data and output in a host terminal according to an embodiment of the present invention, FIGS. 6 and 7 are drawings showing a subtitle sharing request screen in a dedicated application of a host terminal according to an embodiment of the present invention, and FIG. 8 is a drawing showing a web page display screen of a client terminal according to an embodiment of the present invention.

도면을 참조하면 본 발명의 일실시예에 따른 실시간 통역 서비스시스템(1000)은 본 발명은 자막 제공 요청 정보가 수신된 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신받아 해당 음성 정보의 언어인 제 1 언어에 대해 STT(Speech-To-Text) 변환을 통해 텍스트 형태의 제 1 자막 정보를 생성하고, 생성된 제 1 자막 정보를 요청된 제 2 언어로 학습된 인공지능 기반의 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 관리 서버(100)와, 상기 관리 서버(100)로 데이터 통신 가능하도록 연결되어 특정 화자의 음성 정보 또는 특정 동영상 데이터의 음성 정보를 전송하여 자막 제공 요청을 수행하고, 상기 관리 서버(100)로부터 생성된 상기 제 1 자막 정보 및 제 2 자막 정보를 수신받으며, 수신받은 상기 제 1 자막 정보 및 제 2 자막 정보에 대해 선택적으로 자막 공유 요청 정보를 상기 관리 서버(100)로 전송하여 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받기 위한 공유 Web 연결 정보를 상기 관리 서버로부터 수신받는 호스트 단말기(200) 및 상기 호스트 단말기(200)로부터 상기 공유 Web 연결 정보를 수신받아 상기 관리 서버(100)에 데이터 통신 가능하도록 접속되고, 상기 관리 서버(100)로부터 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받는 클라이언트 단말기(300)로 구성된다. Referring to the drawings, a real-time interpretation service system (1000) according to an embodiment of the present invention comprises: a management server (100) that receives voice information of a speaker for whom subtitle provision request information has been received or voice information of video data, generates first subtitle information in the form of text through STT (Speech-To-Text) conversion for a first language, which is the language of the corresponding voice information, and performs translation based on artificial intelligence learned on the generated first subtitle information into the requested second language to generate second subtitle information in the form of text; a host terminal (200) that is connected to the management server (100) so as to enable data communication and transmits voice information of a specific speaker or voice information of specific video data to perform a subtitle provision request, receives the first subtitle information and the second subtitle information generated from the management server (100), selectively transmits subtitle sharing request information for the received first subtitle information and the second subtitle information to the management server (100), and receives shared Web connection information for receiving the first subtitle information and the second subtitle information from the management server; and the host terminal (200) It is composed of a client terminal (300) that receives the shared Web connection information from the terminal (200), connects to the management server (100) so that data communication is possible, and receives the first subtitle information and the second subtitle information from the management server (100).

여기서 상기 관리 서버(100)는 호스트 단말기(200)가 요청한 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신하여 실시간으로 해당 음성 정보를 STT(Speech-To-Text) 변환을 수행하고, 수행된 결과로서 텍스트 형태의 제 1 자막 정보를 생성하며, 생성된 제 1 자막 정보를 기초로 하여 인공지능 기반의 번역이 수행된 텍스트 형태의 제 2 자막 정보를 생성한다. Here, the management server (100) receives voice information of a specific speaker or voice information of video data requested by the host terminal (200), performs STT (Speech-To-Text) conversion of the voice information in real time, generates first subtitle information in text form as a result of the conversion, and generates second subtitle information in text form with an artificial intelligence-based translation performed based on the generated first subtitle information.

이에 따라 생성된 제 1 자막 정보 및 제 2 자막 정보는 실시간으로 자막 제공 요청이 있는 호스트 단말기(200)로 전송하게 된다. Accordingly, the first subtitle information and second subtitle information generated are transmitted in real time to the host terminal (200) that requests subtitle provision.

이를 위해 상기 관리 서버(100)는 상기 호스트 단말기(200) 및 상기 클라이언트 단말기(300)와 데이터 통신 가능하도록 하는 서버 통신부(110) 및 상기 서버 통신부(310)를 통해 상기 호스트 단말기(200)의 자막 제공 요청이 있는 경우, 상기 호스트 단말기(200)가 전송하는 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 기초로 실시간으로 상기 제 1 자막 정보 및 제 2 자막 정보를 생성하는 자막 관리부(120)로 구성된다. To this end, the management server (100) is configured with a server communication unit (110) that enables data communication with the host terminal (200) and the client terminal (300), and a subtitle management unit (120) that generates the first subtitle information and the second subtitle information in real time based on voice information of a specific speaker or voice information of video data transmitted by the host terminal (200) when there is a request for subtitle provision from the host terminal (200) through the server communication unit (310).

여기서 상기 자막 관리부(120)는 상기 호스트 단말기(200)가 자막 제공 요청한 해당 음성 정보의 언어인 제 1 언어로 STT(Speech-To-Text) 변환을 수행하여 텍스트 형태의 제 1 자막 정보를 생성하는 STT 변환부(121) 및 생성된 제 1 자막 정보를 기초로 인공지능 기반의 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 인공지능 번역부(122)를 포함하여 구성된다. Here, the subtitle management unit (120) is configured to include an STT conversion unit (121) that performs STT (Speech-To-Text) conversion into a first language, which is the language of the corresponding voice information for which the host terminal (200) has requested subtitle provision, to generate first subtitle information in text form, and an artificial intelligence translation unit (122) that performs artificial intelligence-based translation based on the generated first subtitle information to generate second subtitle information in text form.

물론 상기 자막 관리부(120)는 상기 STT 변환부(121) 및 인공지능 번역부(122)로부터 생성된 제 1 자막 정보와 제 2 자막 정보를 상기 서버 통신부(110)를 통해 자막 제공 요청한 호스트 단말기(200)로 실시간 전송하게 된다. Of course, the subtitle management unit (120) transmits the first subtitle information and the second subtitle information generated from the STT conversion unit (121) and the artificial intelligence translation unit (122) in real time to the host terminal (200) that requested the subtitle provision through the server communication unit (110).

아울러 상기 관리 서버(100)에는 상기 호스트 단말기(200)가 서버 통신부(110)로 자막 공유 요청을 수행한 경우, 자막 공유를 위한 웹 페이지를 생성하고, 해당 웹 페이지에 접속하도록 하는 공유 Web 연결 정보를 호스트 단말기(200)로 전송하며, 해당 웹 페이지에 접속되는 적어도 하나의 클라이언트 단말기(300)로 상기 제 1 자막 정보 및 제 2 자막 정보를 전송하는 공유 관리부(130)가 포함된다.In addition, the management server (100) includes a sharing management unit (130) that, when the host terminal (200) makes a subtitle sharing request to the server communication unit (110), creates a web page for subtitle sharing, transmits shared web connection information to the host terminal (200) to access the web page, and transmits the first subtitle information and the second subtitle information to at least one client terminal (300) accessing the web page.

이때 공유 Web 연결 정보는 해당 웹 페이지에 접속하기 위한 URL(Uniform Resource Locator) 정보 또는 QR 코드 정보임이 바람직하다. At this time, it is desirable that the shared web connection information be URL (Uniform Resource Locator) information or QR code information for accessing the relevant web page.

또한 상기 공유 관리부(130)는 상기 호스트 단말기(200)와 클라이언트 단말기(300)로부터 각각 대화 정보를 수신하여 상기 웹 페이지 상에 출력함에 따라 호스트 단말기(200)의 사용자와 클라이언트 단말기(300)의 사용자 간 채팅 기능을 제공하도록 구성될 수 있다. In addition, the shared management unit (130) may be configured to provide a chat function between a user of the host terminal (200) and a user of the client terminal (300) by receiving conversation information from each of the host terminal (200) and the client terminal (300) and outputting it on the web page.

이때 상기 공유 관리부(130)는 상기 클라이언트 단말기(300)로부터 각각 수신된 대화 정보를 상기 호스트 단말기(200)로 제공 시에 상기 호스트 단말기(200)가 설정한 언어로 번역을 수행하여 제공하도록 구성됨이 바람직하다. At this time, it is preferable that the shared management unit (130) be configured to provide the conversation information received from each client terminal (300) to the host terminal (200) by translating it into a language set by the host terminal (200).

아울러 상기 공유 관리부(130)에는 호스트 단말기(200) 또는 클라이언트 단말기(300)의 요청에 따라 각 사용자가 사전에 설정한 제 2 언어로 번역된 제 2 자막 정보를 기초로 하여 음성 합성을 통해 음성 정보로 출력하는 TTS(Text-to-Speech) 변환부(도면 미표시)가 더 포함될 수 있다. In addition, the shared management unit (130) may further include a TTS (Text-to-Speech) conversion unit (not shown in the drawing) that outputs voice information through voice synthesis based on second subtitle information translated into a second language set in advance by each user at the request of the host terminal (200) or client terminal (300).

이에 따라 호스트 단말기(200) 또는 클라이언트 단말기(300)에서 음성 출력 요청이 있는 관리 서버(100)의 공유 관리부(130)는 제 1 자막 정보, 제 2 자막 정보 및 제 2 자막 정보에 대해 음성 합성을 수행한 제 2 자막 음성 정보를 모두 호스트 단말기(200) 또는 클라이언트 단말기(300)로 제공하게 된다. Accordingly, the shared management unit (130) of the management server (100) that receives a voice output request from the host terminal (200) or client terminal (300) provides all of the first subtitle information, the second subtitle information, and the second subtitle voice information obtained by performing voice synthesis on the second subtitle information to the host terminal (200) or client terminal (300).

아울러 상기 관리 서버(100)의 STT 변환부(121) 및 인공지능 번역부(122)는 이들이 각각 생성하는 제 1 자막 정보 및 제 2 자막 정보에 대해 전체 구절 또는 문장 단위로 각각 정확 여부를 판단하도록 구성할 수 있다. In addition, the STT conversion unit (121) and the artificial intelligence translation unit (122) of the management server (100) can be configured to determine the accuracy of the first subtitle information and the second subtitle information they each generate, for each entire phrase or sentence.

이에 따라 해당 정확 여부를 기초로 특정 구절 또는 문장에 대해 제 1 자막 정보 및 제 2 자막 정보를 생성하지 않거나 생성된 제 1 자막 정보 및 제 2 자막 정보에 대해 별도의 구별 처리를 수행하여 호스트 단말기(200) 또는 클라이언트 단말기(300)로 전송하도록 구성할 수 있다. Accordingly, based on the accuracy thereof, it is possible to configure the first subtitle information and the second subtitle information not to be generated for a specific phrase or sentence, or to perform separate distinguishing processing on the generated first subtitle information and the second subtitle information and transmit them to the host terminal (200) or the client terminal (300).

즉, 정확도가 사전에 설정된 기준에 따라 너무 낮은 것으로 판단되는 경우에는 제 1 자막 정보 및 제 2 자막 정보를 생성하지 않고, 정확도가 사전에 설정된 기준에 따라 너무 낮지는 않지만 높지도 않는 것으로 판단되는 경우 제 1 자막 정보 및 제 2 자막 정보를 생성하지만 해당 제 1 자막 정보 및 제 2 자막 정보에 대해 별도의 구별 처리를 하여 이를 전송 받는 호스트 단말기(200) 또는 클라이언트 단말기(300)의 사용자가 정확도가 낮은 구절임을 알 수 있도록 하는 것이다. That is, if the accuracy is determined to be too low based on a pre-set criterion, the first subtitle information and the second subtitle information are not generated, and if the accuracy is determined to be neither too low nor too high based on a pre-set criterion, the first subtitle information and the second subtitle information are generated, but the first subtitle information and the second subtitle information are separately processed to enable a user of a host terminal (200) or client terminal (300) receiving the first subtitle information to recognize that the phrase is low in accuracy.

이와 함께 상기 STT 변환부(121) 및 인공지능 번역부(122)는 STT 변환 시 또는 번역 시에 각 구절 또는 각 문장에 대해 복수의 제 1 자막 정보 및 제 2 자막 정보를 생성하도록 구성할 수 있다. In addition, the STT conversion unit (121) and the artificial intelligence translation unit (122) can be configured to generate multiple first subtitle information and second subtitle information for each phrase or each sentence during STT conversion or translation.

이는 상기 STT 변환부(121) 및 인공지능 번역부(122)가 음식 인식 또는 번역 시에 하나의 구절 또는 하나의 문장으로 제 1 자막 정보 또는 제 2 자막 정보가 생성될 수 있고, 경우에 따라 복수개로 음성 인식 가능한, 또는 번역 가능한 구절 또는 문장이 생성될 수 있기 때문이다.This is because the STT conversion unit (121) and the artificial intelligence translation unit (122) can generate first subtitle information or second subtitle information as one phrase or one sentence when recognizing or translating food, and in some cases, multiple phrases or sentences that can be voice-recognized or translated can be generated.

이 경우, 상기 STT 변환부(121) 및 인공지능 번역부(122)는 정확도를 기준으로 우선 순위를 판단하여 높은 순위 순으로 해당 구절 또는 문장을 특정하고, 가장 높은 순위의 해당 구절 또는 문장을 상기 호스트 단말기(200) 또는 클라이언트 단말기(300)로 보내되, 별도의 구별 처리를 하여 전송한다. In this case, the STT conversion unit (121) and the artificial intelligence translation unit (122) determine the priority based on accuracy, specify the corresponding phrase or sentence in order of high priority, and send the corresponding phrase or sentence with the highest priority to the host terminal (200) or client terminal (300), but transmit it after separate processing.

이에 따라 상기 호스트 단말기(200) 또는 클라이언트 단말기(300)가 해당 구별 처리가 된 제 1 자막 정보 또는 제 2 자막 정보를 수신 받을 때, 관리 서버(100)로 이보다 낮은 순위의 해당 구절 또는 문장에 대한 제 1 자막 정보 또는 제 2 자막 정보를 요청할 수 있으며, 만일 상기 호스트 단말기(200) 또는 클라이언트 단말기(300)의 요청이 있는 경우, 상기 STT 변환부(121) 및 인공지능 번역부(122)는 해당 구절 또는 해당 문장에 대한 가장 높은 순위를 제외한 제 1 자막 정보 및 제 2 자막 정보를 순위별로 특정하여 상기 호스트 단말기(200) 또는 클라이언트 단말기(300)로 전송한다. Accordingly, when the host terminal (200) or the client terminal (300) receives the first subtitle information or the second subtitle information that has undergone the corresponding distinction processing, it can request the management server (100) for the first subtitle information or the second subtitle information for the corresponding phrase or sentence with a lower priority, and if there is a request from the host terminal (200) or the client terminal (300), the STT conversion unit (121) and the artificial intelligence translation unit (122) specify the first subtitle information and the second subtitle information, excluding the highest priority for the corresponding phrase or sentence, by priority and transmit them to the host terminal (200) or the client terminal (300).

아울러 이를 수신 받은 호스트 단말기(200) 또는 클라이언트 단말기(300)로부터 해당 구절 또는 해당 문장에 대한 사용자의 선택 정보를 받도록 구성될 수 있는데, 이와 같은 선택 정보는 음성 인식 또는 번역이 상기 STT 변환부(121) 및 인공지능 번역부(122)가 판단한 가장 높은 순위의 제 1 자막 정보 또는 제 2 자막 정보 보다 낮은 순위의 제 1 자막 정보 또는 제 2 자막 정보가 보다 정확한 것으로 호스트 단말기(200) 또는 클라이언트 단말기(300)의 사용자가 판단한 정보이다.In addition, it may be configured to receive user selection information for the corresponding phrase or sentence from the host terminal (200) or client terminal (300) that received it, and such selection information is information that the user of the host terminal (200) or client terminal (300) judges that the first subtitle information or the second subtitle information of a lower rank than the first subtitle information or the second subtitle information of the highest rank judged by the STT conversion unit (121) and the artificial intelligence translation unit (122) is more accurate in voice recognition or translation.

이를 상기 STT 변환부(121) 및 인공지능 번역부(122)가 수신하게 되면, 이와 같은 데이터를 학습 데이터로 활용하여 보다 높은 양질의 인공 지능 모델이 생성될 수 있게 된다. When the STT conversion unit (121) and artificial intelligence translation unit (122) receive this data, a higher quality artificial intelligence model can be created by utilizing this data as learning data.

한편 상기 호스트 단말기(200)는 상기 관리 서버(100)로 특정 화자의 음성 정보 또는 특정 동영상 데이터의 음성 정보를 전송하여 자막 제공 요청을 수행하고, 상기 관리 서버(100)로부터 생성된 상기 제 1 자막 정보 및 제 2 자막 정보를 수신받도록 구비된다. Meanwhile, the host terminal (200) is equipped to perform a subtitle provision request by transmitting voice information of a specific speaker or voice information of specific video data to the management server (100), and to receive the first subtitle information and second subtitle information generated from the management server (100).

이러한 호스트 단말기(200)의 사용자는 강의나 발표를 수행하는 화자가 될 수 있으며, 이외에 다른 사람이 될 수 있다. The user of such a host terminal (200) may be a speaker giving a lecture or presentation, or may be another person.

이와 같은 호스트 단말기(200)는 상기 관리 서버(100) 및 클라이언트 단말기(300)와 데이터 통신 가능하도록 하는 호스트 통신부(210)와, 사용자로부터 입력 정보를 수신받는 사용자 입력부(220)와, 사용자에게 출력 정보를 제공하는 화면 관리부(230) 및 상기 사용자 입력부(220)로부터 자막 제공 요청을 전달받으면 상기 관리 서버(100)로 호스트 통신부(210)를 통해 특정 화자의 음성 정보 또는 특정 동영상 데이터의 음성 정보를 전송하고, 상기 관리 서버(100)로부터 제 1 자막 정보 및 제 2 자막 정보를 수신받아 화면 관리부(230)를 통해 사용자에게 제공하는 호스트 제어부(240)를 포함하여 구성된다. A host terminal (200) of this type is configured to include a host communication unit (210) that enables data communication with the management server (100) and the client terminal (300), a user input unit (220) that receives input information from a user, a screen management unit (230) that provides output information to the user, and a host control unit (240) that, when a subtitle provision request is received from the user input unit (220), transmits voice information of a specific speaker or voice information of specific video data to the management server (100) through the host communication unit (210), receives first subtitle information and second subtitle information from the management server (100), and provides them to the user through the screen management unit (230).

여기서 상기 호스트 제어부(240)는 도 6 및 도 7에 도시된 바와 같이 상기 사용자 입력부(220)로부터 자막 공유 요청을 전달받는 경우, 자막 공유 요청 정보를 호스트 통신부(210)를 통해 상기 관리 서버(100)로 전송하고, 관리 서버(100)로부터 공유 Web 연결 정보를 수신받는다. Here, when the host control unit (240) receives a subtitle sharing request from the user input unit (220) as shown in FIGS. 6 and 7, it transmits subtitle sharing request information to the management server (100) through the host communication unit (210) and receives shared Web connection information from the management server (100).

이에 따라 사용자 입력부(220)로부터 요청된 클라이언트 단말기(300)로 해당 공유 Web 연결 정보를 호스트 통신부(210)를 통해 전송하도록 구성된다. Accordingly, it is configured to transmit the shared Web connection information requested from the user input unit (220) to the client terminal (300) through the host communication unit (210).

또한 상기 화면 관리부(230)는 도 4에 도시된 바와 같이 제 1 자막 정보가 표시되는 제 1 자막 표시부(231)와, 제 2 자막 정보가 표시되는 제 2 자막 표시부(232) 및 제 1 언어 및 제 2 언어를 설정하는 언어 설정부(233)와, 자막 공유 요청을 설정하기 위한 자막 공유 설정부(234)를 포함하여 구성된다. In addition, the screen management unit (230) is configured to include a first subtitle display unit (231) in which first subtitle information is displayed, a second subtitle display unit (232) in which second subtitle information is displayed, a language setting unit (233) for setting the first language and the second language, and a subtitle sharing setting unit (234) for setting a subtitle sharing request, as shown in FIG. 4.

아울러 상기 화면 관리부(230)는 도 5에 도시된 바와 같이 자막 제공 요청 정보가 수신된 음성 정보가 동영상 데이터이고, 사용자 입력부(220)로부터 오버레이 선택 정보가 수신되면, 상기 관리 서버(100)로부터 수신한 제 1 자막 정보 및 제 2 자막 언어 중 적어도 하나를 상기 동영상 데이터의 출력 화면 상에 오버레이시켜 출력할 수 있다. In addition, as shown in FIG. 5, when the voice information for which the subtitle provision request information is received is video data and overlay selection information is received from the user input unit (220), the screen management unit (230) can output at least one of the first subtitle information and the second subtitle language received from the management server (100) by overlaying it on the output screen of the video data.

또한 본 발명의 일예에 따라 상기 호스트 단말기(200)는 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 수신받아 해당 음성 정보의 언어인 제 1 언어에 대해 STT(Speech-To-Text) 변환을 수행하여 텍스트 형태의 제 1 자막 정보를 생성하는 호스트 변환부(250) 및 상기 호스트 변환부(250)로부터 생성된 제 1 자막 정보를 학습된 인공지능 기반의 번역을 수행하되 사전에 설정된 제 2 언어로 번역을 수행하여 텍스트 형태의 제 2 자막 정보를 생성하는 호스트 번역부(260)가 더 포함될 수 있다. In addition, according to an example of the present invention, the host terminal (200) may further include a host conversion unit (250) that receives voice information of a speaker or voice information of video data, performs STT (Speech-To-Text) conversion on a first language, which is the language of the voice information, to generate first subtitle information in the form of text, and a host translation unit (260) that performs translation based on learned artificial intelligence on the first subtitle information generated from the host conversion unit (250), but performs translation into a second language set in advance, to generate second subtitle information in the form of text.

이러한 호스트 변환부(250) 및 호스트 번역부(260)는 통신 환경의 불량으로 상기 관리 서버(100)와의 데이터 통신이 어려운 경우 호스트 단말기(200)에서 직접 화자 또는 동영상 데이터의 음성 정보를 음성 인식하여 STT 변환을 수행하고, 이를 기초로 인공지능 기반의 번역을 수행함으로써 제 1 자막 정보 및 제 2 자막 정보를 생성할 수 있다. In cases where data communication with the management server (100) is difficult due to poor communication environment, the host conversion unit (250) and the host translation unit (260) perform STT conversion by directly recognizing voice information of a speaker or video data in the host terminal (200), and performs artificial intelligence-based translation based on this, thereby generating first subtitle information and second subtitle information.

한편 상기 클라이언트 단말기(300)는 상기 호스트 단말기(200)로부터 상기 공유 Web 연결 정보를 수신받아 상기 관리 서버(100)에 데이터 통신 가능하도록 접속되고, 상기 관리 서버(100)로부터 상기 제 1 자막 정보 및 제 2 자막 정보를 제공받도록 구비된다. Meanwhile, the client terminal (300) receives the shared web connection information from the host terminal (200), connects to the management server (100) so that data communication is possible, and is provided with the first subtitle information and the second subtitle information from the management server (100).

이러한 상기 클라이언트 단말기(300)는 상기 관리 서버(100) 및 호스트 단말기(200)와 데이터 통신 가능하도록 하는 클라이언트 통신부(310)와, 사용자로부터 입력 정보를 수신받는 사용자 입력부(320)와, 사용자에게 출력 정보를 제공하는 화면 관리부(330) 및 상기 호스트 단말기(200)로부터 상기 공유 Web 연결 정보를 수신받으며, 사용자 입력부(320)로부터 상기 공유 Web 연결 정보의 연결 선택 정보를 전달받으면, 해당 웹 페이지를 통해 상기 관리 서버(100)와 접속하여 관리 서버(100)로부터 제 1 자막 정보 및 제 2 자막 정보를 수신받아 화면 관리부(330)를 통해 사용자에게 제공하는 클라이언트 제어부(340)를 포함하여 구성된다. The client terminal (300) is configured to include a client communication unit (310) that enables data communication with the management server (100) and the host terminal (200), a user input unit (320) that receives input information from a user, a screen management unit (330) that provides output information to the user, and a client control unit (340) that receives the shared Web connection information from the host terminal (200), and when connection selection information of the shared Web connection information is transmitted from the user input unit (320), connects to the management server (100) through a corresponding web page, receives first subtitle information and second subtitle information from the management server (100), and provides them to the user through the screen management unit (330).

이에 따라 상기 클라이언트 제어부(340)가 상기 호스트 단말기(200)로부터 공유 Web 연결 정보를 전달받은 경우, 해당 공유 Web 연결 정보에 대응되는 웹 페이지로 접속하게 되면 도 8에 도시된 바와 같이 관리 서버(100)와 연결되어 제 1 자막 정보 및 제 2 자막 정보가 화면 관리부(330)에 의해 사용자에게 제공되게 된다. Accordingly, when the client control unit (340) receives shared Web connection information from the host terminal (200), and accesses a web page corresponding to the shared Web connection information, it is connected to the management server (100) as shown in FIG. 8, and the first subtitle information and the second subtitle information are provided to the user by the screen management unit (330).

도 8에 도시된 바와 같이 클라이언트 단말기(300)의 사용자는 다른 클라이언트 단말기 사용자 또는 호스트 단말기 사용자와의 채팅 기능이 제공되고, 제 1 자막 정보 및 제 2 자막 정보의 동시 출력 또는 제 2 자막 정보만을 표시하도록 설정하는 기능이 제공될 수 있다. As illustrated in FIG. 8, a user of a client terminal (300) may be provided with a chat function with another client terminal user or a host terminal user, and may be provided with a function for setting the simultaneous output of first subtitle information and second subtitle information or displaying only second subtitle information.

이와 같이 본 발명의 일실시예에 따른 실시간 번역 자막 제공시스템은 호스트 단말기(200)에 자막 공유 설정이 가능하여 다른 복수의 클라이언트 단말기(300)가 별도의 로그인 절차나 전용 어플리케이션의 설치 과정 없이 신속하고 용이하게 특정 화자의 음성 정보 또는 실시간 스트리밍 동영상 데이터의 음성 정보에 대한 제 1 자막 정보 및 제 2 자막 정보를 제공받을 수 있게 된다. In this way, the real-time translation subtitle provision system according to one embodiment of the present invention enables subtitle sharing settings on the host terminal (200), so that multiple other client terminals (300) can quickly and easily receive first subtitle information and second subtitle information on voice information of a specific speaker or voice information of real-time streaming video data without a separate login procedure or installation process of a dedicated application.

도 9는 본 발명의 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다. FIG. 9 is a diagram showing the internal configuration of a real-time translation subtitle provision system according to another embodiment of the present invention.

도면을 참조하면, 본 실시예에 따른 실시간 번역 자막 제공시스템(1000)은 호스트 단말기(200)의 통역 요청에 따라 상기 관리 서버(100)와 데이터 통신 가능하도록 접속되어 요청된 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보 및 제 1 자막 정보를 상기 관리 서버(100)로부터 수신받으며, 해당 음성 정보 및 제 1 자막 정보에 대해 통역 요청 시 설정된 제 2 언어로 통역을 수행한 실시간 통역 음성 정보를 상기 관리 서버(100)로 제공하는 통역사 단말기(400)가 더 포함된다. Referring to the drawings, the real-time translation subtitle provision system (1000) according to the present embodiment further includes an interpreter terminal (400) that is connected to the management server (100) so as to enable data communication in response to an interpretation request from a host terminal (200), receives voice information of a specific speaker requested or voice information of video data and first subtitle information from the management server (100), and provides real-time interpretation voice information that performs interpretation in a second language set at the time of the interpretation request for the corresponding voice information and first subtitle information to the management server (100).

이에 따라 상기 관리 서버(100)는, 상기 통역사 단말기(400)로부터 수신된 실시간 통역 음성 정보를 STT 변환을 수행하여 제 2 자막 정보를 생성하고, 생성된 제 2 자막 정보를 통역 서비스를 요청한 상기 호스트 단말기(100)에 상기 제 1 자막 정보와 함께 제공한다. Accordingly, the management server (100) performs STT conversion on real-time interpretation voice information received from the interpretation terminal (400) to generate second subtitle information, and provides the generated second subtitle information together with the first subtitle information to the host terminal (100) that requested the interpretation service.

아울러 상기 관리 서버(100)는 전술한 실시간 번역 자막 제공시스템(1000)에 설명한 바와 같이, 현재 호스트 단말기(200)의 요청에 따라 자막 공유 기능이 수행 중인 경우 상기 공유 Web 연결 정보를 통해 접속된 클라이언트 단말기(300)로 상기 제 1 자막 정보 및 제 2 자막 정보를 함께 제공하게 된다. In addition, as described in the real-time translation subtitle provision system (1000) described above, when the subtitle sharing function is being performed at the request of the current host terminal (200), the management server (100) provides the first subtitle information and the second subtitle information together to the client terminal (300) connected through the shared Web connection information.

도 10은 본 발명의 또 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다. FIG. 10 is a drawing showing the internal configuration of a real-time translation subtitle providing system according to another embodiment of the present invention.

도면을 참조하면, 본 실시예에 따른 실시간 번역 자막 제공시스템(1000)은 아울러 상기 호스트 단말기(200)의 속기 요청에 따라 상기 관리 서버(100)와 데이터 통신 가능하도록 접속되어 요청된 특정 화자의 음성 정보 또는 동영상 데이터의 음성 정보를 상기 관리 서버(100)로부터 수신받으며, 해당 음성 정보에 대한 텍스트 형태의 실시간 속기 정보를 상기 관리 서버(100)로 제공하는 속기사 단말기(500)가 더 포함된다. Referring to the drawings, the real-time translation subtitle provision system (1000) according to the present embodiment further includes a stenographer terminal (500) that is connected to the management server (100) so as to enable data communication in response to a stenography request from the host terminal (200), receives voice information of a specific speaker requested or voice information of video data from the management server (100), and provides real-time stenography information in text form for the corresponding voice information to the management server (100).

이에 따라 상기 관리 서버(100)는, 상기 속기사 단말기(500)로부터 수신된 실시간 속기 정보를 인공지능 기반의 번역을 수행하여 제 2 자막 정보를 생성하고, 속기 요청한 상기 호스트 단말기(200)로 해당 실시간 속기 정보 및 제 2 자막 정보를 제공한다. Accordingly, the management server (100) performs an artificial intelligence-based translation of real-time stenography information received from the stenographer terminal (500) to generate second subtitle information, and provides the real-time stenography information and second subtitle information to the host terminal (200) that requested stenography.

아울러 상기 관리 서버(100)는 전술한 실시간 번역 자막 제공시스템(1000)에 설명한 바와 같이, 현재 호스트 단말기(200)의 요청에 따라 자막 공유 기능이 수행 중인 경우 상기 공유 Web 연결 정보를 통해 접속된 클라이언트 단말기(300)로 해당 실시간 속기 정보 및 제 2 자막 정보를 함께 제공할 수 있다.In addition, as described in the real-time translation subtitle provision system (1000) described above, if the subtitle sharing function is being performed at the request of the current host terminal (200), the management server (100) can provide the real-time stenography information and the second subtitle information together to the client terminal (300) connected through the shared Web connection information.

또한 전술한 실시예에서의 통역사 단말기(400)가 본 실시예에 더 포함되어 호스트 단말기(200)의 통역 요청 및 속기 요청이 동시에 있는 경우 통역사 단말기(400)를 통한 통역과 속기사 단말기(400)를 통한 속기가 동시에 이루어지도록 구성될 수 있다. In addition, the interpreter terminal (400) of the above-described embodiment may be further included in this embodiment so that when an interpretation request and a stenography request from the host terminal (200) are made at the same time, interpretation through the interpreter terminal (400) and stenography through the stenographer terminal (400) may be made at the same time.

도 11은 본 발명의 또 다른 실시예에 따른 실시간 번역 자막 제공시스템의 내부 구성을 나타내는 도면이다. FIG. 11 is a drawing showing the internal configuration of a real-time translation subtitle providing system according to another embodiment of the present invention.

도면을 참조하면, 본 실시예에 따른 실시간 번역 자막 제공시스템(1000)은 호스트 단말기(200)의 요청이 있거나 사전 설정에 따라 관리 서버(100)의 STT 변환부(121) 및 인공지능 번역부(122) 중 적어도 하나에서 출력되는 결과 정보 즉, 제 1 자막 정보 또는 제 2 자막 정보에 대해 검수를 수행하여 보정할 부분이 있는 경우, 이를 보정하여 상기 관리 서버(100)로 보정 정보를 전송하는 검수자 단말기(600)가 더 포함될 수 있다. Referring to the drawings, the real-time translation subtitle provision system (1000) according to the present embodiment may further include an inspector terminal (600) that inspects and corrects the result information, i.e., the first subtitle information or the second subtitle information, output from at least one of the STT conversion unit (121) and the artificial intelligence translation unit (122) of the management server (100) upon request from the host terminal (200) or according to preset settings, and transmits the correction information to the management server (100).

이때 상기 STT 변환부(121)는 음성 인식 시 생성되는 결과 정보에 대해 사전에 설정된 적합 여부를 판단하게 되는데, 만일 적합 여부에 대해 사전에 설정된 제 1 적합 기준의 임계값을 초과하지 않을 경우에는 음성 인식이 불가능한 것으로 판단하여 해당 구절 또는 문장에 대해 제 1 자막 정보를 생성하지 않는다. At this time, the STT conversion unit (121) determines whether the result information generated during voice recognition is suitable as set in advance. If the suitableness does not exceed the threshold value of the first suitable criterion set in advance, it determines that voice recognition is impossible and does not generate the first subtitle information for the corresponding phrase or sentence.

아울러 제 1 적합 기준을 초과하더라도 사전에 설정된 제 2 적합 기준의 임계값을 초과하지 않는 경우에는 음성 인식의 정확성이 낮은 것으로 판단하여 해당 구절 또는 문장에 대해 별도의 구별 처리를 하여 제 1 자막 정보를 전송하도록 구성될 수 있다. In addition, if the accuracy of voice recognition is judged to be low even if the first suitability criterion is exceeded but the threshold of the second suitability criterion set in advance is not exceeded, the phrase or sentence in question may be configured to be processed separately and the first subtitle information transmitted.

이에 따라 상기 검수자 단말기(600)는 상기 STT 변환부(121)에서 제 1 적합 기준의 임계값을 초과하지 못해 제 1 자막 정보로 생성되지 않은 해당 구절 또는 문장이 있거나, 제 2 적합 기준의 임계값을 초과하지 못해 제 1 자막 정보로 생성되되 별도의 구별 처리가 된 해당 구절 또는 문장이 있는 경우, 이에 대한 보정 정보를 생성하여 관리 서버(100)로 실시간 전송한다. Accordingly, if there is a phrase or sentence that is not generated as first subtitle information because it does not exceed the threshold value of the first suitability criterion in the STT conversion unit (121), or if there is a phrase or sentence that is generated as first subtitle information but is processed separately because it does not exceed the threshold value of the second suitability criterion, the inspector terminal (600) generates correction information for this and transmits it to the management server (100) in real time.

이에 따라 관리 서버(100)는 상기 검수자 단말기(600)로부터 보정 정보를 수신 받는 경우, 이를 호스트 단말기(200) 또는 클라이언트 단말기(300)로 이미 제공한 제 1 자막 정보에 대해 보정 정보를 전송하여 기존 해당 구절 또는 문장에 대한 제 1 자막 정보를 대체하도록 구성된다. Accordingly, when the management server (100) receives correction information from the inspector terminal (600), it is configured to transmit the correction information to the first subtitle information already provided to the host terminal (200) or client terminal (300) to replace the first subtitle information for the existing phrase or sentence.

이와 같이 본 발명은 도면에 도시된 일실시예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. Although the present invention has been described with reference to one embodiment shown in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be determined by the technical idea of the appended patent claims.

100 : 관리 서버
110 : 서버 통신부 120 : 자막 관리부
121 : STT 변환부 122 : 인공지능 번역부
130 : 공유 관리부
200 : 호스트 단말기
210 : 호스트 통신부 220 : 사용자 입력부
230 : 화면 관리부 231 : 제 1 자막 표시부
232 : 제 2 자막 표시부 233 : 언어 설정부
234 : 자막 공유 설정부 240 : 호스트 제어부
250 : 호스트 변환부 260 : 호스트 번역부
300 : 클라이언트 단말기
310 : 클라이언트 통신부 320 : 사용자 입력부
330 : 화면 관리부 340 : 클라이언트 제어부
400 : 통역사 단말기
500 : 속기사 단말기
1000 : 실시간 번역 자막 제공시스템 100 : Management Server
110: Server communication section 120: Subtitle management section
121: STT conversion unit 122: AI translation unit
130 : Shared Management Department
200 : Host terminal
210: Host communication section 220: User input section
230: Screen management unit 231: First subtitle display unit
232: Second subtitle display section 233: Language setting section
234: Subtitle sharing settings section 240: Host control section
250: Host conversion section 260: Host translation section
300 : Client terminal
310: Client communication section 320: User input section
330: Screen management unit 340: Client control unit
400 : Interpreter Terminal
500 : Stenographer's Terminal
1000: Real-time translation subtitle provision system

Claims

A management server that receives voice information of a speaker or voice information of video data from which subtitle provision request information has been received, generates first subtitle information in the form of text through STT (Speech-To-Text) conversion for a first language, which is the language of the corresponding voice information, and performs translation based on artificial intelligence learned on the generated first subtitle information into the requested second language to generate second subtitle information in the form of text;
A host terminal connected to the management server so as to enable data communication, transmits voice information of a specific speaker or voice information of specific video data to perform a subtitle provision request, receives the first subtitle information and the second subtitle information generated from the management server, and selectively transmits subtitle sharing request information for the received first subtitle information and the second subtitle information to the management server to receive shared Web connection information for providing the first subtitle information and the second subtitle information from the management server; and
A client terminal that receives the shared Web connection information from the host terminal and is connected to the management server so as to enable data communication, and receives the first subtitle information and the second subtitle information from the management server;
Real-time translation subtitle provision system.

In the first paragraph,
The above host terminal is,
If the received audio information of the subtitle provision request information is video data, at least one of the first subtitle information and the second subtitle language received from the management server is overlaid on the output screen of the video data and outputted.
Real-time translation subtitle provision system.

In the first paragraph,
The above shared web connection information is,
The URL (Uniform Resource Locator) information or QR code information of the web page generated by the above management server,
The above management server
A shared management unit is included that transmits the first subtitle information and the second subtitle information to a client terminal that has accessed a web page through the shared web connection information, and receives conversation information from the host terminal and at least one client terminal and outputs it on the web page, thereby enabling two-way communication.
The above shared management department
When providing conversation information received from each client terminal to the host terminal, translation is performed and provided in a language set by the host terminal.
Real-time translation subtitle provision system.

In the first paragraph,
The above host terminal is
A host conversion unit that receives voice information of a speaker or voice information of video data, performs STT (Speech-To-Text) conversion on a first language, which is the language of the voice information, to generate first subtitle information in the form of text, and a host translation unit that performs translation based on learned artificial intelligence on the first subtitle information generated from the host conversion unit, but performs translation into a second language set in advance to generate second subtitle information in the form of text.
Real-time translation subtitle provision system.

In the first paragraph,
An interpreter terminal is further included, which is connected to the management server so as to enable data communication in response to an interpretation request from the host terminal, receives voice information of a specific speaker requested or voice information of video data and first subtitle information from the management server, and provides real-time interpretation voice information that interprets the voice information and first subtitle information in a second language set at the time of the interpretation request to the management server.
Real-time translation subtitle provision system.

In paragraph 5,
The above management server,
The real-time interpretation voice information received from the above interpreter terminal is converted into STT to generate second subtitle information, and the generated second subtitle information is provided to the requested host terminal together with the first subtitle information, and when the subtitle sharing function is being performed, the first subtitle information and the second subtitle information are provided together to the client terminal connected through the shared Web connection information.
Real-time translation subtitle provision system.

In the first paragraph,
A stenographer terminal is further included, which is connected to the management server so as to enable data communication in response to a stenography request from the host terminal, receives voice information of a specific speaker or voice information of video data requested from the management server, and provides real-time stenography information in text form for the voice information to the management server;
The above management server,
Real-time stenography information received from the above stenographer terminal is translated based on artificial intelligence to generate second subtitle information, and the real-time stenography information and second subtitle information are provided to the host terminal that requested the stenography. In the case where the subtitle sharing function is being performed, the real-time stenography information and second subtitle information are provided together to the client terminal connected through the shared Web connection information.
Real-time translation subtitle provision system.