KR102287431B1

KR102287431B1 - Apparatus for recording meeting and meeting recording system

Info

Publication number: KR102287431B1
Application number: KR1020200129301A
Authority: KR
Inventors: 장명근; 윤재선
Original assignee: 주식회사 셀바스에이아이
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2021-08-09
Anticipated expiration: 2040-10-07

Abstract

The present invention relates to a meeting recording method. The method includes the steps of: obtaining audio data including a user's voice; converting the user's voice into text using the obtained audio data; extracting a guide text corresponding to a predefined intent from among the text; retrieving guide content corresponding to the guide text; and providing the searched guide content and the extracted text to a user's device. Accordingly, even if the meeting is held for a long time, it is possible to reduce burden of recording meeting contents and writing minutes from the participants.

Description

Conference Recording Device and Conference Recording Recording System

본 발명은 회의 녹음 장치 및 회의 녹음 기록 시스템에 관한 것이다. The present invention relates to a conference recording apparatus and a conference recording recording system.

일반적으로 기업체나 국가 기관에서 회의를 진행하는 경우, 회의 내용이 녹음됨과 동시에 발언자 별로 발언한 내용들이 기록을 담당하는 담당자에 의해 회의록으로 작성된다. 이러한 회의록은 회의의 주요 내용이나 회의 결과를 파악하기 위해 사용되며, 회의 기록 담당자는 회의 참석자의 모든 발화 내용을 텍스트로 기록하고, 회의 내용을 요약한다. In general, when a meeting is held in a corporate or national institution, the contents of the meeting are recorded and, at the same time, the contents of each speaker are recorded as minutes by the person in charge of recording. These minutes are used to understand the main contents of the meeting or the results of the meeting, and the meeting recorder records all the utterances of the meeting participants in text and summarizes the meeting contents.

뿐만 아니라, 회의 기록 담당자는 인쇄물 형태로 배부되는 회의 자료와 문서, 동영상 포맷의 회의 자료를 취합하여 회의록에 기록해야 하며, 만약 회의를 기록하는 담당자가 없을 경우에는 회의 참석자가 직접 회의의 주요 내용과 함께 자신에게 해당되는 내용을 개별적으로 메모하게 된다. In addition, the person in charge of the meeting recorder must collect the meeting materials distributed in printed form and the meeting materials in document and video format and record them in the meeting minutes. Together, you will take notes individually that applies to you.

이러한 일련의 회의 기록 과정은 사람에 의해 수행되는 바, 회의 내용을 기록하는 동안 회의 참석자의 발언을 놓치는 상황이 발생하게 되고, 회의록을 작성하기까지 업무의 생산성을 떨어트리게 된다. Since this series of meeting recording process is performed by a human, a situation arises where the speech of the meeting participants is missed while recording the meeting contents, and the productivity of work is reduced until the meeting minutes are written.

발명의 배경이 되는 기술은 본 발명에 대한 이해를 보다 용이하게 하기 위해 작성되었다. 발명의 배경이 되는 기술에 기재된 사항들이 선행기술로 존재한다고 인정하는 것으로 이해되어서는 안 된다.The description underlying the invention has been prepared to facilitate understanding of the invention. It should not be construed as an admission that the matters described in the background technology of the invention exist as prior art.

따라서, 회의에서 모든 발화 내용을 자동으로 신속하게 기록하고, 회의 내용을 일목요연하게 정리할 수 있는 기술이 요구되고 있다. Therefore, there is a need for a technology capable of automatically and quickly recording all utterances in a meeting and organizing the contents of the meeting at a glance.

이에, 본 발명의 발명자들은 다수의 회의 참석자들의 발화 내용을 분리하여 텍스트로 기록함으로써, 회의가 장시간 지속되거나 일부 회의 참석자들의 발언이 겹치는 상황에서도 회의 참석자들의 발화 내용이 누락되지 않도록 하는 방법 및 이를 수행하는 서버, 회의 녹음 장치를 개발하고자 하였다. Accordingly, the inventors of the present invention separate the utterances of a plurality of meeting participants and record them as text, so that the utterances of the meeting participants are not omitted even when the meeting continues for a long time or the utterances of some meeting participants overlap, and a method and performing the same It was intended to develop a server and conference recording device.

그 결과, 본 발명의 발명자들은 회의 참석자가 작성한 메모, 문서, 일부 발화 내용을 토대로 회의의 요지 즉, 회의를 하고자 하는 의도(인텐트, Intent)를 파악하고, 파악된 인텐트를 토대로 회의 내용을 쉽게 이해하기 위한 관련 정보들을 제공하는 방법 및 이를 수행하는 서버, 회의 녹음 장치를 개발하기에 이르렀다. As a result, the inventors of the present invention grasp the gist of the meeting, that is, the intention (intent) of the meeting, based on the memos, documents, and some utterances written by the meeting participants, and the contents of the meeting based on the identified intent. A method for providing related information for easy understanding, a server for performing the same, and a conference recording device have been developed.

더욱이, 본 발명의 발명자들은, 어느 하나의 인텐트에 따라 회의를 진행하는 동안 회의 참석자의 직무, 자격 별로 회의 중 회의 참석자의 구체적인 이해가 필요한 맞춤형 정보들을 제공하고자 하였다.Furthermore, the inventors of the present invention intend to provide customized information that requires a detailed understanding of the meeting participants during the meeting by job and qualification of the meeting participants during the meeting according to any one intent.

본 발명의 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems of the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 일 실시 예에 따른 회의 녹음 기록 방법을 제공한다. 상기 방법은, 사용자의 음성이 포함된 오디오 데이터를 획득하는 단계, 상기 획득된 오디오 데이터를 이용하여 상기 사용자의 음성을 텍스트로 변환하는 단계, 상기 텍스트 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출하는 단계, 상기 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색하는 단계 및 상기 검색된 가이드 콘텐츠 및 상기 추출된 텍스트를 상기 사용자의 디바이스로 제공하는 단계를 포함하도록 구성된다. In order to solve the above problems, there is provided a conference recording recording method according to an embodiment of the present invention. The method includes: acquiring audio data including a user's voice; converting the user's voice into text using the acquired audio data; and a guide text corresponding to a predefined intent among the texts. It is configured to include the steps of extracting, searching for guide content corresponding to the text for the guide, and providing the searched guide content and the extracted text to the user's device.

본 발명의 특징에 따르면, 상기 오디오 데이터를 획득하는 단계 이전에, 상기 사용자의 디바이스로부터 상기 사용자에 의해 작성된 회의록을 수신하는 단계와 상기 수신된 회의록에서 상기 인텐트를 지정하기 위한 텍스트를 추출하는 단계를 더 포함할 수 있다. According to a feature of the present invention, before the step of acquiring the audio data, the steps of receiving the meeting minutes prepared by the user from the user's device and extracting text for designating the intent from the received meeting minutes may further include.

본 발명의 다른 특징에 따르면, 상기 오디오 데이터를 획득하는 단계 이전에, 상기 인텐트가 정의되어 있는지 확인하는 단계와 확인 결과, 인텐트가 정의되어 있지 않은 경우, 상기 오디오 데이터의 초반 부분을 기준으로 미리 결정된 시간 이내의 사용자의 음성을 텍스트로 변환하고, 형태소 분석, 개체명 분석 및 의미역(Semantic role) 분석 방식 중 어느 하나의 방식을 통해 상기 변환된 텍스트에서 인텐트로 정의될 단어를 추출하는 단계를 더 포함할 수 있다. According to another feature of the present invention, before the step of acquiring the audio data, checking whether the intent is defined and as a result of checking, if the intent is not defined, based on the initial part of the audio data Converting a user's voice within a predetermined time into text, and extracting a word to be defined as an intent from the converted text through any one of morpheme analysis, entity name analysis, and semantic role analysis method It may include further steps.

본 발명의 또 다른 특징에 따르면, 상기 음성을 텍스트로 변환하는 단계는, 미리 저장된 사용자의 음성 정보 또는 파형에 따라 그룹화된 오디오 신호를 이용하여 상기 사용자 및 상기 사용자 외 다른 사용자에 대한 오디오 신호를 분리하는 단계를 더 포함할 수 있다. According to another feature of the present invention, the converting of the voice into text comprises separating the audio signal for the user and other users other than the user by using pre-stored voice information of the user or an audio signal grouped according to a waveform. It may further include the step of

본 발명의 또 다른 특징에 따르면, 상기 오디오 데이터를 획득하는 단계 이전에, 상기 인텐트를 정의하기 위한 사용자의 개인 정보 또는 상기 음성 정보를 입력 받는 단계를 더 포함할 수 있다. According to another feature of the present invention, before the step of acquiring the audio data, the method may further include receiving personal information or the voice information of a user for defining the intent.

본 발명의 또 다른 특징에 따르면, 상기 오디오 신호를 분리하는 단계는, 상기 오디오 데이터를 획득한 디바이스의 마이크 채널 수를 확인하고, 상기 확인된 채널 별로 둘 이상의 사용자에 대한 오디오 신호를 수집하는 단계를 더 포함할 수 있다. According to another feature of the present invention, the step of separating the audio signal includes: confirming the number of microphone channels of the device that obtained the audio data, and collecting audio signals for two or more users for each of the checked channels may include more.

본 발명의 또 다른 특징에 따르면, 상기 오디오 신호를 분리하는 단계는, 상기 오디오 데이터를 획득한 디바이스의 마이크 채널 수를 확인하고, 상기 확인된 채널 별로 상기 사용자에 대한 오디오 신호 외 다른 사용자들의 오디오 신호를 삭제하는 단계를 더 포함할 수 있다. According to another feature of the present invention, the separating of the audio signal includes checking the number of microphone channels of the device from which the audio data is obtained, and audio signals of other users other than the audio signal for the user for each checked channel. It may further include the step of deleting.

본 발명의 또 다른 특징에 따르면, 상기 다른 사용자들의 오디오 신호는, 상기 사용자의 디바이스와 인접한 다른 사용자의 디바이스로부터 획득된 오디오 데이터를 통해 수집될 수 있다. According to another feature of the present invention, the audio signals of the other users may be collected through audio data obtained from the device of the other user and adjacent to the device of the user.

본 발명의 또 다른 특징에 따르면, 상기 사용자의 디바이스로 제공하는 단계는, 서로 분리된 상기 사용자와 상기 다른 사용자의 오디오 신호를 시간에 따른 그래픽 형태로 표시하여 제공하는 단계를 더 포함할 수 있다. According to another feature of the present invention, the providing to the user's device may further include displaying and providing the separated audio signals of the user and the other user in a graphic form according to time.

본 발명의 또 다른 특징에 따르면, 상기 가이드 콘텐츠를 검색하는 단계는, 상기 미리 정의된 인텐트에 대응되는 데이터베이스에서 상기 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색하는 단계일 수 있다. According to another feature of the present invention, the step of searching for the guide content may be a step of searching for guide content corresponding to the text for the guide in a database corresponding to the predefined intent.

본 발명의 또 다른 특징에 따르면, 상기 가이드 콘텐츠는, 상기 가이드용 텍스트에 대한 사전적 정의, 상기 가이드용 텍스트와 관련된 법령, 내규 및 안내 지침 중 적어도 하나를 포함할 수 있다. According to another feature of the present invention, the guide content may include at least one of a dictionary definition of the guide text, laws related to the guide text, internal regulations, and guide guidelines.

본 발명의 또 다른 특징에 따르면, 상기 오디오 데이터를 획득하는 단계는, 녹음된 오디오 데이터 또는 실시간으로 녹음 중인 오디오 데이터를 획득하는 단계일 수 있다. According to another aspect of the present invention, the acquiring of the audio data may include acquiring recorded audio data or audio data being recorded in real time.

본 발명의 또 다른 특징에 따르면, 상기 사용자 디바이스로 제공하는 단계 이후에, 상기 변환된 텍스트, 상기 가이드용 텍스트 및 상기 가이드 콘텐츠 중 어느 하나의 정보를 조합하여 상기 오디오 데이터를 요약한 회의록을 생성하는 단계를 더 포함할 수 있다. According to another feature of the present invention, after the step of providing to the user device, the converted text, the guide text, and any one information among the guide content is combined to generate a meeting minute summarizing the audio data. It may include further steps.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 다른 실시 예에 따른, 회의 녹음 기록 서버를 제공한다. 상기 서버는, 통신부, 저장부, 상기 통신부, 상기 저장부와 동작 가능하게 연결된 프로세서를 포함하되, 상기 프로세서는, 사용자의 음성이 포함된 오디오 데이터를 획득하고, 상기 획득된 오디오 데이터를 이용하여 상기 사용자의 음성을 텍스트로 변환하며, 상기 텍스트 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출하고, 상기 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색하며, 상기 검색된 가이드 콘텐츠 및 상기 추출된 텍스트를 상기 사용자의 디바이스로 제공하도록 구성된다. In order to solve the above problems, there is provided a conference recording server according to another embodiment of the present invention. The server includes a communication unit, a storage unit, the communication unit, and a processor operatively connected to the storage unit, wherein the processor obtains audio data including a user's voice, and uses the obtained audio data to obtain the audio data. Converts a user's voice into text, extracts guide text corresponding to a predefined intent from among the text, searches guide content corresponding to the guide text, and uses the searched guide content and the extracted text configured to provide to the user's device.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 또 다른 실시 예에 따른, 회의 녹음 방법을 제공한다. 상기 방법은, 다채널 마이크로부터 사용자의 음성이 포함된 오디오 데이터를 획득하는 단계, 상기 획득된 오디오 데이터에서 마이크의 채널 별로 오디오 신호를 분리하는 단계, 상기 분리된 오디오 신호를 이용하여 상기 사용자 및 상기 사용자와 대화하는 대화 상대의 오디오 신호를 획득하는 단계, 상기 획득된 오디오 신호를 이용하여 상기 사용자의 음성을 텍스트로 변환하고, 변환된 텍스트 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출하는 단계 및 상기 획득된 오디오 데이터, 상기 변환된 텍스트 및 상기 가이드용 텍스트 중 적어도 하나의 정보를 회의 녹음 기록 서버로 송신하는 단계를 포함하도록 구성된다. In order to solve the above problems, there is provided a conference recording method according to another embodiment of the present invention. The method includes: obtaining audio data including a user's voice from a multi-channel microphone; separating an audio signal for each channel of a microphone from the obtained audio data; Acquiring an audio signal of a conversation partner with whom the user is conversing, converting the user's voice into text using the acquired audio signal, and extracting a guide text corresponding to a predefined intent among the converted text and transmitting at least one of the obtained audio data, the converted text, and the text for the guide to a conference recording recording server.

본 발명의 특징에 따르면, 상기 오디오 데이터를 획득하는 단계 이전에, 사용자의 음성 정보 및 상기 회의 녹음 장치와 인접한 다른 회의 녹음 장치로부터 다른 사용자의 음성 정보를 저장하는 단계를 더 포함할 수 있다. According to a feature of the present invention, the method may further include, before the acquiring of the audio data, the user's voice information and the other user's voice information from another meeting recording device adjacent to the meeting recording device.

본 발명의 다른 특징에 따르면, 상기 오디오 신호를 분리하는 단계는, 상기 마이크 채널 별로 분리된 오디오 신호에서, 오디오 신호의 크기와 상기 저장된 사용자의 음성 정보에 따라 상기 사용자 외 다른 사용자의 오디오 신호를 제거하는 단계를 더 포함할 수 있다.According to another feature of the present invention, the step of separating the audio signal may include removing the audio signal of a user other than the user according to the size of the audio signal and the stored voice information of the user from the audio signal separated for each microphone channel. It may further include the step of

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 또 다른 실시 예에 따른, 회의 녹음 장치를 제공한다. 상기 장치는, 입력부, 저장부, 상기 입력부, 상기 저장부와 동작 가능하게 연결된 프로세서를 포함하되, 상기 프로세서는, 다채널 마이크로부터 사용자의 음성이 포함된 오디오 데이터를 획득하고, 상기 획득된 오디오 데이터에서 마이크의 채널 별로 오디오 신호를 분리하며, 상기 분리된 오디오 신호를 이용하여 상기 사용자 및 상기 사용자와 대화하는 대화 상대의 오디오 신호를 획득하고, 상기 획득된 오디오 신호를 이용하여 상기 사용자의 음성을 텍스트로 변환하고, 변환된 텍스트 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출하며, 상기 획득된 오디오 신호, 상기 변환된 텍스트 및 상기 가이드용 텍스트 중 적어도 하나의 정보를 회의 녹음 기록 서버로 송신하도록 구성된다. In order to solve the above problems, there is provided a conference recording apparatus according to another embodiment of the present invention. The apparatus includes an input unit, a storage unit, and a processor operatively connected to the input unit and the storage unit, wherein the processor obtains audio data including a user's voice from a multi-channel microphone, and the obtained audio data separates an audio signal for each channel of the microphone, obtains an audio signal of the user and a conversation partner talking with the user using the separated audio signal, and uses the obtained audio signal to text the user's voice , extracts guide text corresponding to a predefined intent among the converted text, and transmits at least one information of the acquired audio signal, the converted text, and the guide text to the conference recording server configured to do

기타 실시 예의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and drawings.

본 발명은 회의에서 회의 참석자의 음성을 구분하고, 회의 참석자의 모든 발화 내용을 자동으로 신속하게 기록할 수 있다. 특히, 회의 참석자의 음성을 녹음하는 회의 녹음 장치 또는 회의 녹음 기록 서버에 의해 회의 내용이 자동으로 기록됨으로써, 회의가 장시간 진행되더라도 회의 내용의 기록 및 회의록 작성에 대한 회의 참석자의 부담을 덜어줄 수 있다. According to the present invention, it is possible to discriminate the voices of conference participants in a meeting, and automatically and quickly record all utterances of the conference participants. In particular, since the meeting contents are automatically recorded by the meeting recording device or the meeting recording server that records the voices of the meeting participants, even if the meeting is held for a long time, it is possible to reduce the burden on the meeting participants for recording the meeting contents and making the meeting minutes. .

또한, 본 발명은 회의가 진행되기 전이나 회의 내용을 텍스트로 변환하기 전에 회의 참석자가 제공하는 내용을 토대로, 회의를 하고자 하는 의도(인텐트, Intent)를 파악함으로써, 회의 내용을 텍스트로 변환하는 동안 회의 참석자에게 회의 내용을 이해하고, 상대방과의 원활한 소통을 하기 위한 각종 콘텐츠를 제공할 수 있다. In addition, the present invention converts the contents of the meeting into text by understanding the intention (intent, Intent) of the meeting based on the contents provided by the meeting participants before the meeting is conducted or before the meeting contents are converted into text. During the meeting, various contents can be provided to attendees to understand the contents of the meeting and to communicate smoothly with the other party.

예를 들어, 본 발명은 텍스트로 변환된 회의 내용에서 일부 단어에 대한 사전적 정의, 회의 내용 관련된 법령, 내규, 안내 지침 등을 텍스트, 이미지, 영상 등 다양한 포맷으로 회의 참석자에게 제공할 수 있으며, 인텐트 별로 회의에 참고할 수 있는 체크 리스트 및 각종 문서 양식을 회의 참석자에게 제공할 수 있다. For example, the present invention can provide meeting participants with dictionary definitions of some words in the meeting contents converted into text, laws related to meeting contents, by-laws, guidance guidelines, etc. in various formats such as text, images, and videos, A checklist and various document forms that can be referred to in a meeting for each intent can be provided to meeting participants.

더욱이, 본 발명은 각종 콘텐츠를 회의 참석자의 직무, 자격 별로 제공하고, 참석자 별로 원하는 키워드를 통해 추가 콘텐츠를 제공할 수 있어, 참석자 별로 회의의 이해도를 높이고, 회의 및 회의록의 질을 향상시킬 수 있다. Furthermore, the present invention can provide various contents by job and qualification of meeting attendees, and provide additional content through desired keywords for each participant, thereby increasing the understanding of the meeting for each participant and improving the quality of the meeting and meeting minutes. .

본 발명에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 발명 내에 포함되어 있다.The effect according to the present invention is not limited by the contents exemplified above, and more various effects are included in the present invention.

도 1a 및 도 1b는 본 발명의 일 실시 예에 따른 회의 녹음 기록 시스템의 개략도이다.
도 2는 본 발명의 일 실시 예에 따른 회의 녹음 장치의 구성을 나타낸 블록도이다.
도 3은 본 발명의 일 실시 예에 따른 회의 녹음 기록 서버의 구성을 나타낸 블록도이다.
도 4는 본 발명의 일 실시 예에 따른 회의 녹음 장치의 회의 녹음 방법의 개략적인 순서도이다.
도 5는 본 발명의 일 실시 예에 따른 회의 녹음 기록 서버의 회의 녹음 기록 방법의 개략적인 순서도이다.
도 6 및 도 7은 본 발명의 일 실시 예에 따른 회의 녹음 내용이 기록되는 인터페이스를 설명하기 위한 개략도이다.
도 8 및 도 9는 본 발명의 일 실시 예에 따른 회의 가이드용 콘텐츠를 제공하기 위한 인터페이스를 예시적으로 설명한 개략도이다. 1A and 1B are schematic diagrams of a conference recording system according to an embodiment of the present invention.
2 is a block diagram illustrating the configuration of a conference recording apparatus according to an embodiment of the present invention.
3 is a block diagram illustrating the configuration of a conference recording server according to an embodiment of the present invention.
4 is a schematic flowchart of a conference recording method of the conference recording apparatus according to an embodiment of the present invention.
5 is a schematic flowchart of a conference recording recording method of a conference recording recording server according to an embodiment of the present invention.
6 and 7 are schematic diagrams for explaining an interface in which conference recordings are recorded according to an embodiment of the present invention.
8 and 9 are schematic diagrams exemplarily illustrating an interface for providing content for a meeting guide according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조부호가 사용될 수 있다.Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. In connection with the description of the drawings, like reference numerals may be used for like components.

본 문서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this document, expressions such as "has," "may have," "includes," or "may include" refer to the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

본 문서에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는(3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In this document, expressions such as "A or B," "at least one of A and/and B," or "one or more of A or/and B" may include all possible combinations of the items listed together. . For example, "A or B," "at least one of A and B," or "at least one of A or B" means (1) includes at least one A, (2) includes at least one B; Or (3) it may refer to all cases including both at least one A and at least one B.

본 문서에서 사용된 "제1," "제2," "첫째," 또는 "둘째," 등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. 예를 들면, 제1 사용자 기기와 제2 사용자 기기는, 순서 또는 중요도와 무관하게, 서로 다른 사용자 기기를 나타낼 수 있다. 예를 들면, 본 문서에 기재된 권리범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 바꾸어 명명될 수 있다.As used herein, expressions such as "first," "second," "first," or "second," may modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components. For example, the first user equipment and the second user equipment may represent different user equipment regardless of order or importance. For example, without departing from the scope of the rights described in this document, the first component may be named as the second component, and similarly, the second component may also be renamed as the first component.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.A component (eg, a first component) is "coupled with/to (operatively or communicatively)" to another component (eg, a second component) When referring to "connected to", it will be understood that the certain element may be directly connected to the other element or may be connected through another element (eg, a third element). On the other hand, when it is said that a component (eg, a first component) is "directly connected" or "directly connected" to another component (eg, a second component), the component and the It may be understood that other components (eg, a third component) do not exist between other components.

본 문서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~ 를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된)프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.As used herein, the expression "configured to (or configured to)" depends on the context, for example, "suitable for," "having the capacity to ," "designed to," "adapted to," "made to," or "capable of." The term “configured (or configured to)” may not necessarily mean only “specifically designed to” in hardware. Instead, in some circumstances, the expression “a device configured to” may mean that the device is “capable of” with other devices or parts. For example, the phrase “a processor configured (or configured to perform) A, B, and C” refers to a dedicated processor (eg, an embedded processor) for performing the operations, or by executing one or more software programs stored in a memory device. , may mean a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

본 문서에서 사용된 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시 예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 문서에 기재된 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 문서에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 문서에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 문서에서 정의된 용어일지라도 본 문서의 실시 예들을 배제하도록 해석될 수 없다.Terms used in this document are only used to describe specific embodiments, and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meanings as commonly understood by one of ordinary skill in the art described in this document. Among the terms used in this document, terms defined in a general dictionary may be interpreted with the same or similar meaning as the meaning in the context of the related art, and unless explicitly defined in this document, ideal or excessively formal meanings is not interpreted as In some cases, even terms defined in this document cannot be construed to exclude embodiments of this document.

본 발명의 여러 실시 예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시 예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시 가능할 수도 있다.Each feature of the various embodiments of the present invention can be partially or wholly combined or combined with each other, and as those skilled in the art can fully understand, technically various interlocking and driving are possible, and each embodiment may be implemented independently of each other, It may be possible to implement together in a related relationship.

본 명세서의 해석의 명확함을 위해, 이하에서는 본 명세서에서 사용되는 용어들을 정의하기로 한다.For clarity of interpretation of the present specification, terms used herein will be defined below.

도 1a 및 도 1b는 본 발명의 일 실시 예에 따른 회의 녹음 기록 시스템의 개략도이다. 1A and 1B are schematic diagrams of a conference recording system according to an embodiment of the present invention.

도 1a 및 도 1b를 참조하면, 회의 녹음 기록 시스템(1000)은 회의 참석자의 구성에 따라, 도 1a와 같이 다자간 회의를 녹음하고 기록하거나, 도 1b와 같이 일대일 회의를 녹음하고 기록할 수 있다. 구체적으로, 도 1a에서 회의 녹음 기록 시스템(1000)은 회의 참석자(A)(B)(C)(D)들에게 주어진 회의 녹음 장치(100-1)(100-2)(100-3)(100-4)(이하, (100)으로 표기)와 회의 녹음 장치(100)가 획득한 사용자의 음성이 포함된 오디오 데이터를 수신하여, 텍스트로 변환하는 회의 녹음 기록 서버(200) 및 변환된 텍스트를 출력하는 디스플레이 장치(300)를 포함할 수 있다. 1A and 1B , the conference recording system 1000 may record and record a multi-party conference as shown in FIG. 1A or record and record a one-to-one conference as shown in FIG. 1B according to the configuration of the conference participants. Specifically, in FIG. 1A , the conference recording recording system 1000 includes conference recording apparatuses 100-1, 100-2, 100-3, ( 100-4) (hereinafter referred to as (100)) and the conference recording server 200 for receiving and converting audio data including the user's voice acquired by the conference recording apparatus 100 into text and the converted text It may include a display device 300 for outputting .

또한, 도 1b에서 회의 녹음 기록 시스템(100)은 서로 대화하는 사용자(E)(F) 사이에 배치된 회의 녹음 장치(100)와 회의 녹음 장치(100)가 획득한 사용자의 음성을 포함하는 오디오 데이터를 수신하고, 텍스트로 변환하는 회의 녹음 기록 서버(200)를 포함할 수 있다. In addition, in FIG. 1B , the conference recording and recording system 100 provides audio including the user's voice acquired by the conference recording apparatus 100 and the conference recording apparatus 100 disposed between the users E and F talking to each other. It may include a conference recording recording server 200 that receives the data and converts it into text.

회의 녹음 장치(100)는 회의가 진행되는 동안 사용자의 음성이 포함된 오디오 데이터를 획득하고, 사용자의 음성을 텍스트로 변환하는 장치로서, 스마트 폰, 태블릿 PC, PC, 마이크 등 오디오 데이터를 획득할 수 있는 각종 전자 장치를 포함할 수 있다. The conference recording apparatus 100 is a device that obtains audio data including the user's voice during a meeting and converts the user's voice into text. It may include various electronic devices that can

실시 예에 따라, 회의 녹음 장치(100)는 사용자의 정면 또는 사용자 주변에 배치되어 사용자의 음성이 포함된 오디오 데이터를 획득하고, 다수의 사용자들 각각의 오디오 신호를 분리 획득할 수 있다. 예를 들어, 회의 녹음 장치(100)는 회의를 진행하기 전, 회의 녹음 장치(100)에 지정된 사용자의 오디오 신호를 획득하고, 이를 기초로 사용자, 사용자와 대화하는 대화 상대 또는 사용자 외 다른 사용자들의 오디오 신호를 분리 획득할 수 있다. 다른 예를 들어, 회의 녹음 장치(100)는 회의가 종료되어 녹음이 완료된 오디오 데이터에서 분석된 파형에 따라 오디오 신호를 구분 및 그룹화함으로써, 둘 이상의 사용자들의 오디오 신호를 분리 획득할 수 있다. According to an embodiment, the conference recording apparatus 100 may be disposed in front of or near the user to obtain audio data including the user's voice, and separately obtain audio signals of each of a plurality of users. For example, the conference recording apparatus 100 obtains an audio signal of a user specified in the conference recording apparatus 100 before proceeding with the conference, and based on this, the user, the conversation partner with whom the user is talking, or other users other than the user Audio signals may be separately acquired. As another example, the conference recording apparatus 100 may separate and obtain audio signals of two or more users by classifying and grouping audio signals according to waveforms analyzed from audio data recorded after the conference is over.

구체적으로, 회의 녹음 장치(100)는 다채널 마이크를 이용하여 사용자들 각각의 오디오 신호를 분리 획득할 수 있다. 본 발명의 일 실시 예에서 회의 녹음 장치(100)는 사용자의 음성을 획득하는 마이크가 다채널로 이루어질 수 있는데, 회의 녹음 장치(100)는 다채널 마이크 각각의 배치 방향에 따른 오디오 신호의 크기 차이, 획득 시간 차이를 고려하여 사용자들의 오디오 신호를 분리 획득할 수 있다.Specifically, the conference recording apparatus 100 may separately acquire audio signals of users by using a multi-channel microphone. In an embodiment of the present invention, the conference recording apparatus 100 may have a multi-channel microphone for acquiring the user's voice, and the conference recording apparatus 100 has a difference in the size of an audio signal according to the arrangement direction of each of the multi-channel microphones. , it is possible to separately acquire the audio signals of users in consideration of the acquisition time difference.

또한, 회의 녹음 장치(100)는 자신과 인접한 다른 회의 녹음 장치(100)로부터 다른 사용자의 음성 정보 즉, 오디오 신호를 획득할 수 있다. 다만, 여기서 회의 녹음 장치(100)는 다른 사용자의 음성 정보와 함께 다른 회의 녹음 장치(100)가 배치된 방향에 대한 정보를 인식 또는 수신하여, 채널 별로 획득된 오디오 신호를 사용자 별로 구분할 수 있다. Also, the conference recording apparatus 100 may obtain voice information of another user, that is, an audio signal, from another conference recording apparatus 100 adjacent to the conference recording apparatus 100 . However, here, the conference recording apparatus 100 may recognize or receive information about the arrangement direction of the other conference recording apparatus 100 together with the voice information of the other user, so that the audio signals obtained for each channel may be classified for each user.

회의 녹음 장치(100)는 지정된 한 명의 사용자의 음성을 구분하거나, 다수의 사용자 음성을 구분하고 이를 텍스트로 변환할 수 있다. 만약, 회의 녹음 장치(100)가 지정된 한 명의 사용자의 음성을 구분하고자 할 경우, 미리 저장된 사용자의 음성 정보(오디오 신호)에 따라 사용자 외 다른 사용자의 오디오 신호를 제거할 수 있다. The conference recording apparatus 100 may classify the voice of one designated user or classify the voices of a plurality of users and convert them into text. If the conference recording apparatus 100 intends to distinguish the voice of one designated user, audio signals of other users other than the user may be removed according to pre-stored voice information (audio signals) of the user.

실시 예에 따라, 회의 녹음 장치(100)는 사용자의 음성을 텍스트로 변환하고, 변환된 텍스트 중 미리 정의된 인텐트(Intent)에 대응되는 가이드용 텍스트를 추출할 수 있다. 여기서, 인텐트란 녹음하게 될 회의 또는 녹음이 완료된 회의의 주제를 의미하며, 가이드용 텍스트는 인텐트에 따라 사용자에게 필요한 정보들을 제공하기 위한 참조 텍스트(Reference Text)를 의미한다. 예를 들어 인텐트는 “언어 폭력 상담 가이드”, “공공장소 추행 수사 가이드”, “예산안 회의 가이드” 등을 포함할 수 있으며, 가이드용 텍스트는 “어떻게 도움을 요청해야 하는지”, “추경 예산안” 등 회의 진행과정에서 회의 참석자들 간에 정의되진 않으나, 보충 설명을 통해 이해가 필요한 단어, 문구를 포함할 수 있다. According to an embodiment, the conference recording apparatus 100 may convert a user's voice into text and extract a guide text corresponding to a predefined intent from the converted text. Here, the intent means a topic of a meeting to be recorded or a meeting on which the recording is completed, and the guide text means a reference text for providing necessary information to the user according to the intent. For example, the intent may include “Guide to verbal abuse counseling”, “Guide to sexual harassment in public places”, “Guide to budget meeting”, etc. The text for the guide is “How to ask for help”, “Supplementary budget” It is not defined among meeting participants in the course of the meeting, but it may contain words and phrases that need to be understood through supplementary explanations.

즉, 회의 녹음 장치(100)는 최종적으로 오디오 신호의 획득과 함께, 사용자의 음성을 텍스트로 변환하고, 이 중 가이드용 텍스트를 추출할 수 있으며, 세 가지의 정보(오디오 신호, 텍스트, 가이드용 텍스트) 중 적어도 하나의 정보를 회의 녹음 기록 서버(200)로 송신할 수 있다. That is, the conference recording apparatus 100 finally acquires the audio signal, converts the user's voice into text, and can extract the guide text from among them, and includes three pieces of information (audio signal, text, guide use). text) may be transmitted to the conference recording server 200 .

아울러, 회의 녹음 장치(100)가 스마트 폰, 태블릿 PC, PC 등 시각적인 출력 장치를 포함하는 전자 장치로 구현된 경우, 회의 녹음 장치(100)는 텍스트로 변환된 사용자의 음성 대화 내용 및 가이드용 텍스트를 출력할 수 있으며, 반대로, 출력 장치를 포함하지 않을 경우, 도 1a와 같이 별도의 디스플레이 장치(300)를 통해 대화 내용 및 가이드용 텍스트를 출력할 수 있다. In addition, when the conference recording apparatus 100 is implemented as an electronic device including a visual output device such as a smart phone, a tablet PC, or a PC, the conference recording apparatus 100 is used for the user's voice conversation contents and guides converted into text Text may be output, and conversely, when an output device is not included, dialogue content and text for a guide may be output through a separate display device 300 as shown in FIG. 1A .

한편, 도 1a에서 디스플레이 장치(300)가 하나인 것으로 도시되었으나, 회의 참석자(A)(B)(C)(D) 별로 디스플레이 장치(300)가 구비될 수 있으며, 회의 참석자(A)(B)(C)(D)의 직무, 직급 별로 서로 다른 가이드 콘텐츠가 회의 녹음 기록 서버(200)에 의해 제공될 수 있다. Meanwhile, although one display device 300 is illustrated in FIG. 1A , a display device 300 may be provided for each meeting participant A, B, C, and D, and the meeting participant A, B )(C)(D), different guide contents for each job and position may be provided by the conference recording server 200 .

회의 녹음 기록 서버(200)는 오디오 데이터 형태로 저장된 회의 내용을 텍스트로 기록하는 서버로서, PC, 태블릿 PC, 데이터 서버 등 각종 전자 장치를 포함할 수 있다. 실시 예에 따라, 회의 녹음 기록 서버(200)는 회의 녹음 장치(100)에 설치되어 실행 가능한 회의 녹음 및 회의 녹음 기록용 웹 또는 모바일 어플리케이션, 프로그램, 위젯 등을 제공할 수 있다. The conference recording recording server 200 is a server for recording conference contents stored in the form of audio data as text, and may include various electronic devices such as a PC, a tablet PC, and a data server. According to an embodiment, the conference recording server 200 may provide a web or mobile application, program, widget, etc. for conference recording and conference recording that is installed in the conference recording apparatus 100 and is executable.

실시 예에 따라, 회의 녹음 기록 서버(200)는 회의 녹음 장치(100)로부터 사용자의 음성이 포함된 오디오 데이터, 변환된 텍스트, 가이드용 텍스트를 수신하고, 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색하여 제공할 수 있다. 여기서, 가이드 콘텐츠란 가이드용 텍스트에 따라 사용자에게 제공되어야 하는 정보들의 총칭으로, 가이드용 텍스트에 대한 사전적 정의, 가이드용 텍스트와 관련된 법령, 내규, 안내 지침 등을 포함하고, 텍스트, 이미지, 영상 등의 다양한 포맷으로 제공될 수 있다. According to an embodiment, the conference recording recording server 200 receives the audio data including the user's voice, the converted text, and the guide text from the conference recording apparatus 100, and searches for guide content corresponding to the guide text. can be provided. Here, guide content is a generic term for information that must be provided to users according to the guide text. It includes a dictionary definition of guide text, laws related to guide text, bylaws, guide guidelines, etc., and includes text, images, and video It may be provided in various formats, such as

아울러, 가이드 콘텐츠는 미리 정의된 인텐트에 대응되는 데이터베이스에서 검색된 콘텐츠로서, 회의 녹음 기록 서버(200)는 가이드 콘텐츠를 제공할 사용자의 직무, 자격, 성별, 연령 등의 사용자 개인 정보를 추가로 확인하고, 이를 기초로 사용자에게 적합한 가이드 콘텐츠를 빠르게 검색할 수 있다. In addition, the guide content is content retrieved from a database corresponding to a predefined intent, and the conference recording server 200 further confirms user personal information such as job, qualification, gender, age, etc. of a user who will provide the guide content and, based on this, it is possible to quickly search for guide content suitable for the user.

다른 실시 예에 따라, 회의 녹음 기록 서버(200)는 회의 녹음 장치(100)로부터 사용자의 음성이 포함된 오디오 데이터만을 수신하고, 이를 텍스트 변환한 후, 가이드용 텍스트를 추출할 수 있다. According to another embodiment, the conference recording server 200 may receive only audio data including the user's voice from the conference recording apparatus 100 , convert it into text, and then extract the guide text.

한편, 회의 녹음 기록 서버(200)는 녹음된 회의 또는 실시간으로 녹음이 완료된 회의에 대해 텍스트 변환 및 가이드 콘텐츠 제공을 완료하고, 이를 요약한 회의록을 생성할 수 있다. 구체적으로, 회의 녹음 기록 서버(200)는 추출된 텍스트, 가이드용 텍스트 및 가이드 콘텐츠 중 어느 하나의 정보를 조합하여 해당 녹음본(오디오 데이터)을 요약한 회의록을 생성할 수 있다. 여기서, 회의록은 텍스트, 이미지, 영상 등 다양한 형태의 조합으로 이루어질 수 있다. Meanwhile, the conference recording recording server 200 may complete text conversion and guide content provision for a recorded conference or a conference in which recording is completed in real time, and may generate a summary of the conference minutes. Specifically, the conference recording recording server 200 may generate conference minutes summarizing the corresponding recording (audio data) by combining information of any one of the extracted text, guide text, and guide content. Here, the meeting minutes may be formed in a combination of various types such as text, image, and video.

지금까지 본 발명의 일 실시 예에 따른 회의 녹음 기록 시스템(1000)에 대하여 설명하였다. 본 발명에 따르면, 회의 녹음 기록 시스템(1000)을 통해 회의에 참석한 사용자들은 회의 내용을 직접 작성하는 수고를 들이지 않고, 편리하게 회의 내용을 기록할 수 있다. 또한, 회의록을 보게 될 사용자의 특성 별로 적합한 콘텐츠를 제공하여, 회의 내용의 이해를 도울 수 있다. So far, the conference recording and recording system 1000 according to an embodiment of the present invention has been described. According to the present invention, users attending a conference through the conference recording and recording system 1000 can conveniently record the conference contents without the effort of directly writing the conference contents. In addition, content suitable for each characteristic of the user who will view the minutes may be provided to help the understanding of the contents of the meeting.

이하에서는, 회의장에 배치되어 다자 간의 회의 내용을 녹음하는 회의 녹음 장치(100)에 대하여 구체적으로 설명하도록 한다. Hereinafter, the conference recording apparatus 100 arranged in a conference hall to record the contents of a multi-party conference will be described in detail.

도 2는 본 발명의 일 실시 예에 따른 회의 녹음 장치의 구성을 나타낸 블록도이다.2 is a block diagram illustrating the configuration of a conference recording apparatus according to an embodiment of the present invention.

도 2를 참조하면, 회의 녹음 장치(100)는 입력부(110), 저장부(120), 통신부(130) 및 프로세서(140)를 포함할 수 있다. Referring to FIG. 2 , the conference recording apparatus 100 may include an input unit 110 , a storage unit 120 , a communication unit 130 , and a processor 140 .

입력부(110)는 사용자의 음성에 대한 오디오 데이터를 획득할 수 있다. 실시 예에 따라, 입력부(110)는 다채널의 마이크를 통해 사용자의 음성을 입력 받을 수 있으며, 둘 이상의 사용자 음성을 입력 받을 수도 있다. 입력부(110)가 획득한 오디오 데이터는 프로세서(140)에 의해 둘 이상의 오디오 신호로 분리될 수 있다. The input unit 110 may obtain audio data for the user's voice. According to an embodiment, the input unit 110 may receive a user's voice through a multi-channel microphone, or may receive two or more user's voices. The audio data obtained by the input unit 110 may be divided into two or more audio signals by the processor 140 .

저장부(120)는 회의 녹음 장치(100)를 소지한 사용자의 음성 정보(오디오 신호)를 저장할 수 있다. 구체적으로, 회의 녹음 장치(100)가 지정된 사용자만이 사용하는 장치일 경우, 저장부(120)는 사용자의 음성 정보와 사용자 개인 정보(예. 사용자의 직무, 자격, 성별, 연령)를 저장할 수 있으며, 사용자의 개인 정보는 가이드용 텍스트를 추출하기 위해 활용될 수 있다. The storage unit 120 may store voice information (audio signal) of a user who possesses the conference recording apparatus 100 . Specifically, when the conference recording device 100 is a device used only by a designated user, the storage unit 120 may store the user's voice information and user personal information (eg, the user's job, qualification, gender, age). In addition, the user's personal information may be used to extract the guide text.

또한, 저장부(120)는 프로세서(140)에 의해 변환된 텍스트에서 가이드용 텍스트를 추출하기 위한 후보 참조 텍스트(Candidate Reference Text)들을 저장할 수 있다. 여기서, 인텐트란 녹음하게 될 회의 또는 녹음이 완료된 회의의 주제를 의미하며, 가이드용 텍스트는 인텐트에 따라 사용자에게 필요한 정보들을 제공하기 위한 참조 텍스트(Reference Text)를 의미한다. 예를 들어 인텐트는 “언어 폭력 상담 가이드”, “공공장소 추행 수사 가이드”, “예산안 회의 가이드” 등을 포함할 수 있으며, 가이드용 텍스트는 “어떻게 도움을 요청해야 하는지”, “추경 예산안” 등 회의 진행 과정에서 회의 참석자들 간에 정의되진 않으나, 보충 설명을 통해 이해가 필요한 단어, 문구를 포함할 수 있다.Also, the storage unit 120 may store candidate reference texts for extracting the guide text from the text converted by the processor 140 . Here, the intent means a topic of a meeting to be recorded or a meeting on which the recording is completed, and the guide text means a reference text for providing necessary information to the user according to the intent. For example, the intent may include “Guide to verbal abuse counseling”, “Guide to sexual harassment in public places”, “Guide to budget meeting”, etc. The text for the guide is “How to ask for help”, “Supplementary budget” It is not defined among meeting participants in the course of a meeting, but it may contain words and phrases that need to be understood through supplementary explanations.

즉, 회의 녹음 장치(100)의 저장부(120) 보다 회의 녹음 기록 서버(200)의 저장부(220)에 저장된 정보의 양이 방대하기 때문에, 저장부(120)에 저장된 정보를 토대로 후보 참조 텍스트가 추출되면, 이후 회의 녹음 기록 서버(200)에 의해 참조 텍스트 중에서도 실제로 가이드 콘텐츠를 제공하면서, 참조 텍스트로 활용될 텍스트가 특정될 수 있다. That is, since the amount of information stored in the storage 220 of the conference recording recording server 200 is greater than that of the storage 120 of the conference recording apparatus 100, refer to candidates based on the information stored in the storage 120 After the text is extracted, the text to be used as the reference text may be specified by the conference recording server 200 , while actually providing the guide content among the reference texts.

다양한 실시 예에서, 저장부(120)는 각종 데이터, 명령 및 정보를 저장할 수 있는 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다. 예를 들어, 저장부(120)는 플래시 메모리 타입, 하드디스크 타입, 멀티미디어 카드 마이크로 타입, 카드 타입의 메모리(SD, XD 메모리 등), 램, SRAM, 롬, EEPROM, PROM, 네트워크 저장 스토리지, 클라우드, 블록체인 데이터베이스 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.In various embodiments, the storage unit 120 may include a volatile or non-volatile recording medium capable of storing various data, commands, and information. For example, the storage unit 120 is a flash memory type, hard disk type, multimedia card micro type, card type memory (SD, XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, network storage storage, cloud , may include at least one type of storage medium among the blockchain database.

또한, 저장부(120)는 회의 녹음 장치(100)의 동작을 위한 명령어들이 기록되어 있을 수 있다. 다양한 실시 예에서, 저장부(120)는 오디오 데이터로 구현된 회의 내용을 텍스트로 변환하고, 가이드용 텍스트를 추출하기 위한 어플리케이션(미도시)이 기록되어 있을 수 있다.Also, the storage unit 120 may record commands for the operation of the conference recording apparatus 100 . In various embodiments, an application (not shown) for converting meeting contents implemented as audio data into text and extracting text for a guide may be recorded in the storage unit 120 .

통신부(130)는 유/무선 네트워크를 통해 회의 녹음 기록 서버(200)와 데이터를 주고 받을 수 있다. 예를 들어, 통신부(130)는 오디오 데이터, 변환된 텍스트 및 추출된 가이드용 텍스트를 회의 녹음 기록 서버(200)로 송신할 수 있으며, 시각적인 출력 장치의 존재 여부에 따라, 회의 녹음 기록 서버(200)로부터 가이드용 텍스트에 대응되는 가이드 콘텐츠를 수신할 수 있다. The communication unit 130 may send and receive data to and from the conference recording server 200 through a wired/wireless network. For example, the communication unit 130 may transmit the audio data, the converted text, and the extracted guide text to the conference recording server 200, and according to the presence of a visual output device, the conference recording recording server ( 200) may receive guide content corresponding to the guide text.

또한, 통신부(130)는 회의 녹음 장치(100)와 인접한 다른 회의 녹음 장치(100)로부터 다른 사용자의 음성 정보를 수신하고, 이를 저장부(120)에 저장할 수 있다. 예를 들어, 통신부(130)는 근거리 무선 통신망으로 인접한 회의 녹음 장치(100)의 식별 번호, 사용자 정보 및 사용자 음성 정보(오디오 신호)를 수신할 수 있다. Also, the communication unit 130 may receive other user's voice information from another conference recording apparatus 100 adjacent to the conference recording apparatus 100 and store it in the storage unit 120 . For example, the communication unit 130 may receive an identification number, user information, and user voice information (audio signal) of the conference recording apparatus 100 adjacent to the local area wireless network.

프로세서(140)는 입력부(110), 저장부(120) 및 통신부(130)와 동작 가능하게 연결되어, 회의 녹음 장치(100)의 전반적인 동작을 제어할 수 있으며, 저장부(120)에 저장된 어플리케이션 또는 프로그램을 구동하여, 회의 내용을 텍스트 변환하고, 가이드용 텍스트를 추출하기 위한 다양한 명령들을 수행할 수 있다. The processor 140 is operatively connected to the input unit 110 , the storage unit 120 , and the communication unit 130 , and can control the overall operation of the conference recording apparatus 100 , and the application stored in the storage unit 120 . Alternatively, by running the program, the contents of the meeting may be converted into text, and various commands for extracting the text for the guide may be performed.

프로세서(140)는 CPU(Central Processing Unit)나 AP(Application Processor)와 같은 연산 장치에 해당할 수 있다. 또한, 프로세서(140)는 다양한 연산 장치가 통합된 SoC(System on Chip)와 같은 통합 칩(Integrated Chip (IC))의 형태로 구현될 수 있다.The processor 140 may correspond to a computing device such as a central processing unit (CPU) or an application processor (AP). In addition, the processor 140 may be implemented in the form of an integrated chip (IC) such as a system on chip (SoC) in which various computing devices are integrated.

실시 예에 따라, 프로세서(140)는 입력부(110)를 통해 획득된 오디오 데이터에서 채널 별로 오디오 신호를 분리하고, 분리된 오디오 신호를 이용하여 사용자 및 사용자와 대화하는 대화 상대의 오디오 신호를 획득할 수 있다. 예를 들어, 프로세서(140)는 회의를 진행하기 전, 회의 녹음 장치(100)에 지정된 사용자의 오디오 신호를 획득하고, 이를 기초로 사용자, 사용자와 대화하는 대화 상대의 오디오 신호를 분리 획득할 수 있다. 다른 예를 들어, 프로세서(140)는 회의가 종료되어 녹음이 완료된 오디오 데이터에서 분석된 파형에 따라 오디오 신호를 구분 및 그룹화함으로써, 둘 이상의 사용자들의 오디오 신호를 분리 획득할 수 있다.According to an embodiment, the processor 140 separates the audio signal for each channel from the audio data obtained through the input unit 110, and uses the separated audio signal to obtain the user and the audio signal of the conversation partner. can For example, the processor 140 may obtain an audio signal of a user specified in the conference recording apparatus 100 before proceeding with the conference, and based on this, the user and the audio signal of the conversation partner to talk with the user may be separately obtained. there is. For another example, the processor 140 may separate and acquire audio signals of two or more users by classifying and grouping audio signals according to a waveform analyzed from audio data that has been recorded after a meeting is completed.

구체적으로, 프로세서(140)는 다채널 마이크의 배치 방향에 따른 오디오 신호의 크기 차이, 획득 시간 차이를 고려하여 여러 사용자들의 오디오 신호를 분리할 수 있으며, 특정 사용자에 대한 오디오 신호를 저장해 둔 경우, 오디오 신호 별로 사용자를 구분할 수 있다. Specifically, the processor 140 may separate the audio signals of multiple users in consideration of the difference in the size of the audio signal and the difference in acquisition time according to the arrangement direction of the multi-channel microphone, and if the audio signal for a specific user is stored, Users can be classified for each audio signal.

즉, 프로세서(140)는 지정된 한 명의 사용자의 음성을 구분하거나, 다수의 사용자 음성을 구분할 수 있으며, 이를 텍스트로 변환할 수 있다. 이와 동시에, 프로세서(140)는 변환된 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출할 수 있다. 여기서, 추출한다는 것은 텍스트로 변환된 사용자와 대화 상대의 대화 내용에서 가이드용 텍스트를 강조 표시하거나 태그를 부착하는 것으로 이해될 수 있다. That is, the processor 140 may distinguish one designated user's voice or a plurality of user's voices, and may convert them into text. At the same time, the processor 140 may extract the guide text corresponding to the predefined intent among the converted. Here, the extraction may be understood as highlighting or attaching a tag to the guide text in the conversation content of the user and the conversation partner converted into text.

지금까지 본 발명의 일 실시 예에 따른 회의 녹음 장치에 대하여 설명하였다. 본 발명에 따르면, 회의 녹음 장치(100)는 사람의 수기 입력 대신, 다수의 사용자들의 음성을 분리하고, 텍스트로 변환할 수 있어, 회의 과정에서 회의 내용을 기록하기 위한 번거로움을 줄이고, 회의 내용을 빠짐없이 기록으로 남길 수 있다. So far, a conference recording apparatus according to an embodiment of the present invention has been described. According to the present invention, the conference recording apparatus 100 can separate the voices of a plurality of users and convert them into text instead of human handwriting input, thereby reducing the hassle of recording the conference contents in the conference process, and the conference contents can be left on the record without omission.

이하에서는, 회의 내용을 기록하고, 회의 참석자에게 필요한 정보를 제공하는 회의 녹음 기록 서버(200)에 대하여 구체적으로 설명하도록 한다.Hereinafter, the conference recording server 200 that records the conference contents and provides necessary information to conference participants will be described in detail.

도 3은 본 발명의 일 실시 예에 따른 회의 녹음 기록 서버의 구성을 나타낸 블록도이다.3 is a block diagram showing the configuration of a conference recording server according to an embodiment of the present invention.

도 3을 참조하면, 회의 녹음 기록 서버(200)는 통신부(210), 저장부(220), 출력부(230) 및 프로세서(240)를 포함할 수 있다. Referring to FIG. 3 , the conference recording server 200 may include a communication unit 210 , a storage unit 220 , an output unit 230 , and a processor 240 .

통신부(210)는 유/무선 네트워크를 통해 사용자의 디바이스와 데이터를 주고 받을 수 있다. 예를 들어, 사용자의 디바이스가 회의실에 구비되어 시각적인 출력 장치를 포함하지 않는 마이크 타입의 회의 녹음 장치(100)인 경우, 통신부(210)는 해당 회의 녹음 장치(100)로부터 사용자의 음성이 포함된 오디오 데이터 또는 사용자의 음성이 변환된 텍스트를 수신하고, 오디오 데이터의 분석 결과 얻어진 가이드 콘텐츠는 회의실에 구비되어 있으며 회의 녹음 장치(100)와 연결된 디스플레이 장치(300)로 송신할 수 있다. The communication unit 210 may exchange data with the user's device through a wired/wireless network. For example, when the user's device is a microphone-type conference recording apparatus 100 provided in a conference room and does not include a visual output device, the communication unit 210 includes the user's voice from the conference recording apparatus 100 . Received audio data or text converted from a user's voice, and guide content obtained as a result of analyzing the audio data is provided in the conference room and may be transmitted to the display device 300 connected to the conference recording apparatus 100 .

다른 예를 들어, 사용자의 디바이스가 시각적인 출력 장치를 포함하는 회의 녹음 장치(100)인 경우, 통신부(210)는 해당 회의 녹음 장치(100)로부터 사용자의 음성이 포함된 오디오 데이터 또는 사용자의 음성이 변환된 텍스트를 수신하고, 해당 회의 녹음 장치(100)로 가이드 콘텐츠를 송신할 수 있다. For another example, when the user's device is the conference recording apparatus 100 including a visual output device, the communication unit 210 may receive audio data including the user's voice or the user's voice from the corresponding conference recording apparatus 100 . The converted text may be received, and guide content may be transmitted to the corresponding conference recording apparatus 100 .

한편, 통신부(210)가 수신하는 오디오 데이터는 이미 녹음된 오디오 데이터이거나, 실시간으로 녹음 중인 오디오 데이터일 수 있다. Meanwhile, the audio data received by the communication unit 210 may be already recorded audio data or audio data being recorded in real time.

통신부(210)는 사용자의 디바이스로부터 사용자에 의해 작성된 회의록, 사용자에 의해 입력된 사용자 개인 정보 및 음성 정보를 수신할 수 있다. 여기서, 회의록은 회의가 진행되는 동안 사용자가 작성한 회의 관련 메모, 회의 진행에 사용된 문서일 수 있으며, 이는 회의 녹음 기록 서버(200)가 회의의 주제, 즉 인텐트를 지정하기 위해 활용될 수 있다. 아울러, 사용자 개인 정보는 사용자 별로 적합한 가이드 콘텐츠를 제공받기 위한 정보로서, 사용자의 직무, 자격, 성별, 연령 등을 포함하고, 음성 정보는 사용자의 목소리에 대한 오디오 신호로서, 회의 녹음 기록 서버(200)가 사용자 외 다른 사용자에 대한 오디오 신호를 분리하기 위해 사용될 수 있다. The communication unit 210 may receive the meeting minutes written by the user, user personal information and voice information input by the user from the user's device. Here, the meeting minutes may be a meeting-related memo written by the user during the meeting, or a document used for the meeting, which may be utilized by the meeting recording server 200 to designate the subject of the meeting, that is, the intent. . In addition, the user personal information is information for receiving suitable guide content for each user, and includes the user's job, qualification, gender, age, etc., and the voice information is an audio signal for the user's voice, and the conference recording server 200 ) can be used to separate audio signals for users other than the user.

또한, 통신부(210)는 오디오 신호를 분리하기 위해, 가이드 콘텐츠를 제공하는 대상이 되는 사용자 외에 오디오 데이터에 포함된 다른 사용자의 오디오 신호를 추가로 수신할 수 있다. 예를 들어, 통신부(210)는 회의실에 배치된 복수의 회의 녹음 장치(100)로부터 사용자들의 목소리에 대한 오디오 신호를 수신할 수 있다. Also, in order to separate the audio signal, the communication unit 210 may additionally receive an audio signal of another user included in the audio data in addition to the user who is a target of providing the guide content. For example, the communication unit 210 may receive audio signals for the voices of users from a plurality of conference recording apparatuses 100 arranged in a conference room.

저장부(220)는 인텐트 별로 사용자에게 제공할 수 있는 각종 가이드 콘텐츠를 저장할 수 있다. 이를 위해, 저장부(220)는 미리 정의된 인텐트에 대응되는 다수의 데이터베이스를 포함할 수 있다. 여기서, 인텐트란 녹음이 진행되는 회의 또는 녹음이 완료된 회의의 주제를 의미하는데, 예를 들어, 인텐트는 “ㅇㅇ 기준 법령”, “ㅇㅇ 사내 내규”, “문화 콘텐츠 제작”, “박람회 기획”, “정기 세일 기획” 등 기업의 종류에 따라 다양한 인텐트를 포함할 수 있다. The storage 220 may store various guide contents that can be provided to the user for each intent. To this end, the storage 220 may include a plurality of databases corresponding to predefined intents. Here, the intent refers to the subject of a meeting in which recording is being conducted or a meeting in which recording has been completed. , “regular sale planning”, etc. may include various intents depending on the type of company.

뿐만 아니라, 저장부(220)는 인텐트 별로 사용자가 확인해야 하는 사항, 즉 체크 리스트에 대한 정보들과 인텐트 별로 회의 진행 시 필요한 문서 양식을 저장할 수 있다. 예를 들어, 인텐트가 공연 제작 미팅일 경우, 체크 리스트는 일시, 장소, 향후 일정 등을 포함할 수 있다. 이러한 체크 리스트는 인텐트의 유형에 따라 프로세서(240)가 제공하는 가이드 콘텐츠와 함께 선택적으로 제공될 수 있으며, 실시 에에 따라, 체크 리스트가 사용자에 의해 추가/삭제될 수 있다. In addition, the storage unit 220 may store matters to be checked by the user for each intent, that is, information on a checklist and a document form required for conducting a meeting for each intent. For example, when the intent is a performance production meeting, the checklist may include a date and time, a place, a future schedule, and the like. This checklist may be selectively provided together with guide content provided by the processor 240 according to the type of intent, and according to implementation, the checklist may be added/deleted by the user.

아울러, 저장부(220)는 인텐트 별로 인텐트와 유사한 단어, 즉 유의어를 저장할 수 있으며, 유의어는 가이드 콘텐츠를 검색하는 과정에서 데이터베이스를 확장하기 위해 사용될 수 있다. In addition, the storage 220 may store words similar to the intent, ie, synonyms for each intent, and the synonyms may be used to expand the database in the process of searching for guide content.

실시 예에 따라, 회의 녹음 장치(100)가 어느 한 명의 사용자에게 할당된 장치인 경우, 저장부(220)는 해당 회의 녹음 장치(100)와 사용자 정보를 매칭시켜 저장할 수 있다. 구체적으로, 사용자 정보는, 사용자 개인 정보와 음성 정보를 포함할 수 있다. According to an embodiment, when the conference recording apparatus 100 is a device assigned to one user, the storage 220 may match and store the corresponding conference recording apparatus 100 and user information. Specifically, the user information may include user personal information and voice information.

다양한 실시 예에서, 저장부(220)는 각종 데이터, 명령 및 정보를 저장할 수 있는 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다. 예를 들어, 저장부(220)는 플래시 메모리 타입, 하드디스크 타입, 멀티미디어 카드 마이크로 타입, 카드 타입의 메모리(SD, XD 메모리 등), 램, SRAM, 롬, EEPROM, PROM, 네트워크 저장 스토리지, 클라우드, 블록체인 데이터베이스 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.In various embodiments, the storage 220 may include a volatile or non-volatile recording medium capable of storing various data, commands, and information. For example, the storage unit 220 is a flash memory type, hard disk type, multimedia card micro type, card type memory (SD, XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, network storage storage, cloud , may include at least one type of storage medium among the blockchain database.

또한, 저장부(220)는 회의 녹음 기록 서버(200)의 동작을 위한 명령어들이 기록되어 있을 수 있다. 다양한 실시 예에서, 저장부(220)는 오디오 데이터로 구현된 회의 내용을 텍스트로 변환하고, 회의 참석자 별 가이드 콘텐츠를 제공하기 위한 어플리케이션(미도시)이 기록되어 있을 수 있다.Also, the storage 220 may record commands for the operation of the conference recording server 200 . In various embodiments, an application (not shown) for converting meeting contents implemented as audio data into text and providing guide contents for each meeting participant may be recorded in the storage 220 .

출력부(230)는 사용자의 음성을 텍스트로 변환하여 표시할 수 있다. 구체적으로, 출력부(230)는 오디오 데이터의 재생 그래픽을 표시하고, 이와 동시에 실시간으로 변환된 텍스트를 사용자 별로 구분하여 표시할 수 있다. 예를 들어, 프로세서(240)에 의해 분리된 사용자와 다른 사용자의 오디오 신호를 시간에 따른 그래픽 형태로 표시할 수 있으며, 이외에도 프로세서(240)에 의해 추출된 가이드용 텍스트를 별도의 강조 표시로, 가이드용 텍스트에 대응되는 가이드 콘텐츠를 함께 표시할 수 있다. The output unit 230 may convert the user's voice into text and display it. Specifically, the output unit 230 may display the reproduction graphic of the audio data, and at the same time, the real-time converted text may be displayed separately for each user. For example, the audio signal of the user and other users separated by the processor 240 can be displayed in graphic form according to time, and in addition, the text for the guide extracted by the processor 240 is displayed as a separate highlight, Guide content corresponding to the guide text may be displayed together.

실시 예에 따라, 사용자는 사용자의 디바이스(100) 또는 회의 녹음 기록 서버(200)와 직접 컨택하여, 출력부(230)에 표시된 오디오 신호 그래픽을 이동시키면서, 사용자의 음성과 변환된 텍스트를 확인할 수 있으며, 텍스트 변환에 오류가 있는 부분을 수정할 수 있다. According to an embodiment, the user can directly contact the user's device 100 or the conference recording server 200 to check the user's voice and the converted text while moving the audio signal graphic displayed on the output unit 230. And it is possible to correct the part where there is an error in text conversion.

프로세서(240)는 통신부(210), 저장부(220) 및 출력부(230)와 동작 가능하게 연결되어, 회의 녹음 기록 서버(200)의 전반적인 동작을 제어할 수 있으며, 저장부(220)에 저장된 어플리케이션 또는 프로그램을 구동하여, 회의 내용을 텍스트 변환하고, 가이드 콘텐츠를 제공하기 위한 다양한 명령들을 수행할 수 있다. The processor 240 is operatively connected to the communication unit 210 , the storage unit 220 , and the output unit 230 to control the overall operation of the conference recording server 200 , and to the storage unit 220 . By driving a stored application or program, various commands for converting meeting contents into text and providing guide contents may be performed.

프로세서(240)는 CPU(Central Processing Unit)나 AP(Application Processor)와 같은 연산 장치에 해당할 수 있다. 또한, 프로세서(240)는 다양한 연산 장치가 통합된 SoC(System on Chip)와 같은 통합 칩(Integrated Chip (IC))의 형태로 구현될 수 있다.The processor 240 may correspond to a computing device such as a central processing unit (CPU) or an application processor (AP). In addition, the processor 240 may be implemented in the form of an integrated chip (IC) such as a system on chip (SoC) in which various computing devices are integrated.

프로세서(240)는 사용자의 음성을 텍스트로 변환할 수 있는데, 이를 위해 하나의 오디오 데이터에서 여러 사용자들의 오디오 신호를 분리할 수 있다. 예를 들어, 프로세서(240)는 가이드 콘텐츠를 제공하는 대상이 되는 사용자의 음성 정보를 미리 저장해 둔 상태에서, 사용자와 다른 사용자에 대한 오디오 신호를 분리할 수 있다. 구체적으로, 프로세서(240)는 오디오 데이터를 획득한 사용자의 디바이스(회의 녹음 장치(100))에서 다수의 마이크 채널을 확인하고, 채널 별로 오디오 신호를 비교 분석하여 사용자와 다른 사용자에 대한 오디오 신호를 분리할 수 있다. 즉, 오디오 데이터는 다수의 마이크 채널에서 획득한 오디오 신호를 이루어진 바, 프로세서(240)는 채널 별로 오디오 신호의 크기와 획득 시점을 비교하여 다수의 사용자들에 대한 오디오 신호를 분리할 수 있게 된다. 또한, 프로세서(240)는 지정된 한 명의 사용자에 한하여 사용자의 음성을 텍스트를 변환할 수 있으며, 그에 따라 미리 저장된 사용자의 오디오 신호 외에 다른 사용자들의 오디오 신호를 삭제할 수 있다. The processor 240 may convert the user's voice into text. For this, one It is possible to separate the audio signals of multiple users from the audio data. For example, the processor 240 may separate audio signals for the user and other users in a state in which voice information of a user who is a target for providing the guide content is stored in advance. Specifically, the processor 240 identifies a plurality of microphone channels in the user's device (the conference recording apparatus 100) that has obtained the audio data, and compares and analyzes the audio signals for each channel to obtain audio signals for the user and other users. can be separated. That is, the audio data consists of audio signals obtained from a plurality of microphone channels, and the processor 240 can separate the audio signals for a plurality of users by comparing the size of the audio signal for each channel and the acquisition time. In addition, the processor 240 may convert the user's voice to text only for one designated user, and accordingly, may delete audio signals of other users in addition to the pre-stored user's audio signals.

뿐만 아니라, 프로세서(240)는 미리 저장된 음성 사용자의 음성 정보가 없을 경우, 녹음이 완료된 오디오 데이터에서 분석된 파형에 따라 오디오 신호를 구분 및 그룹화함으로써, 둘 이상의 사용자들의 오디오 신호를 분리할 수 있다.In addition, when there is no pre-stored voice information of the voice user, the processor 240 may separate the audio signals of two or more users by classifying and grouping the audio signals according to the waveform analyzed from the recorded audio data.

한편, 프로세서(240)가 사용자들의 오디오 신호를 보다 쉽게 분리할 수 있도록, 오디오 데이터를 획득한 사용자의 디바이스(회의 녹음 장치(100))와 인접한 다른 사용자의 디바이스(다른 회의 녹음 장치(100))로부터 오디오 데이터를 획득할 수 있으며, 오디오 데이터와 함께 다른 사용자의 디바이스가 배치된 방향에 대한 정보를 획득할 수 있다. 즉, 프로세서(240)는 서로 다른 방향에 배치된 마이크 채널 별로 오디오 신호의 크기와 획득 시점을 확인하고, 획득된 오디오 신호에 따라 지정된 사용자 별로 오디오 신호를 분리할 수 있다. On the other hand, so that the processor 240 can more easily separate the audio signals of the users, the device of the user who obtained the audio data (the conference recording apparatus 100) and the device of another user (the other conference recording apparatus 100) adjacent to the user's device (the other conference recording apparatus 100) Audio data may be obtained from the , and information on a direction in which another user's device is arranged may be obtained together with the audio data. That is, the processor 240 may check the size and acquisition time of the audio signal for each microphone channel disposed in different directions, and may separate the audio signal for each designated user according to the acquired audio signal.

실시 예에 따라, 프로세서(240)는 획득된 오디오 데이터를 이용하여 사용자의 음성을 텍스트로 변환하고, 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출할 수 있다. 가이드용 텍스트는 인텐트에 따라 사용자에게 필요한 정보를 제공하기 위한 참조 텍스트(Reference Text)를 의미한다. 예를 들어, 인텐트는 “언어 폭력 상담 가이드”, “공공장소 추행 수사 가이드”, “예산안 회의 가이드” 등을 포함할 수 있으며, 가이드용 텍스트는 “어떻게 도움을 요청해야 하는지”, “추경 예산안” 등 회의 진행과정에서 회의 참석자들 간에 정의되진 않으나, 보충 설명을 통해 이해가 필요한 단어, 문구를 포함할 수 있다.According to an embodiment, the processor 240 may convert the user's voice into text using the obtained audio data, and extract the guide text corresponding to the predefined intent. The guide text refers to reference text for providing necessary information to a user according to an intent. For example, the intent may include “Volunteer Counseling Guide”, “Guide to Public Sexual Abuse Investigation”, “Guide to Budget Meeting”, etc. The text for the guide is “How to ask for help”, “Supplementary Budget Proposal” ”, etc., are not defined among meeting participants in the course of the meeting, but may contain words and phrases that need to be understood through supplementary explanations.

한편, 프로세서(240)는 가이드용 텍스트를 추출하기 위해, 인텐트가 정의되어 있는지 확인할 수 있다. 여기서, 미리 정의된 인텐트는 사용자에 의해 입력 받거나, 프로세서(240)가 사용자가 업로드한 회의록을 통해 지정될 수 있다. 프로세서(240)는 사용자가 입력하거나 업로드한 파일이 존재하는지 확인하고, 가이드용 텍스트 추출 전에 인텐트를 지정할 수 있다. Meanwhile, the processor 240 may check whether an intent is defined in order to extract the guide text. Here, the predefined intent may be input by the user or may be designated by the processor 240 through meeting minutes uploaded by the user. The processor 240 may check whether a file input or uploaded by the user exists, and may designate an intent before extracting text for a guide.

예를 들어, 프로세서(240)는 사용자가 업로드한 회의록을 통해 인텐트를 지정할 수 있다. 구체적으로, 프로세서(240)는 사용자에 의해 작성된 회의록에서 인텐트를 지정하기 위한 텍스트를 추출할 수 있으며, 텍스트의 추출을 위해 저장부(220)에 저장된 이전 회의에 대한 오디오 데이터 및 회의록이 사용될 수 있다.For example, the processor 240 may designate an intent through meeting minutes uploaded by the user. Specifically, the processor 240 may extract text for designating an intent from the meeting minutes created by the user, and audio data and meeting minutes for a previous meeting stored in the storage 220 may be used for extracting the text. there is.

다른 예를 들어, 프로세서(240)는 사용자에 의해 입력 받은 데이터나 회의록이 존재하지 않아 인텐트가 지정되어 있지 않은 경우, 오디오 데이터를 이용하여 인텐트를 지정할 수 있다. 구체적으로, 프로세서(240)는 오디오 데이터의 초반 부분에서 사용자의 음성을 이용하여 인텐트를 지정할 수 있으며, 보다 구체적으로 프로세서(240)는 미리 결정된 시간 이내(예. 1분 이내)의 사용자의 음성을 텍스트로 변환하고, 변환된 텍스트를 분석하여 인텐트로 정의될 단어를 추출할 수 있다. 즉, 회의를 시작하는 초반에는 “A 안건에 대한 회의를 시작하겠습니다”, “B 사건에 대한 조사를 시작합니다”, “이번 C 제품의 마케팅과 관련하여~” 등 회의의 주제와 관련된 발화가 존재하는 바, 프로세서(240)는 미리 결정된 시간 이내에서 사용자의 음성을 텍스트로 변환하고, 다양한 분석 방식을 통해 텍스트 중 인텐트로 정의될 단어를 추출할 수 있다. 예를 들어, 프로세서(240)는 형태소 분석, 개체명 분석 및 의미역(Semantic role) 분석 방식을 이용하여 단어를 추출할 수 있다.As another example, when the intent is not designated because data input by the user or meeting minutes do not exist, the processor 240 may designate the intent using audio data. Specifically, the processor 240 may designate an intent using the user's voice in the initial part of the audio data, and more specifically, the processor 240 may designate the user's voice within a predetermined time (eg, within 1 minute). can be converted into text and a word to be defined as an intent can be extracted by analyzing the converted text. In other words, at the beginning of the meeting, there are utterances related to the topic of the meeting such as “We will start the meeting on Agenda A”, “We will start the investigation on Case B”, and “In relation to the marketing of this C product~” As a result, the processor 240 may convert the user's voice into text within a predetermined time, and extract a word to be defined as an intent from the text through various analysis methods. For example, the processor 240 may extract a word using a morpheme analysis, an entity name analysis, and a semantic role analysis method.

한편, 프로세서(240)는 인텐트가 지정됨에 따라, 가이드 콘텐츠를 검색하여 제공하기 전, 인텐트 별로 회의 진행 시 필요한 문서 양식을 사용자의 디바이스로 제공할 수 있다. 예를 들어, 프로세서(240)는 사용자의 디바이스가 출력 장치를 포함하는 전자 장치로 구현된 경우, 인텐트와 관련된 문서(견적서, 서약서 등)를 통신부(210)를 통해 송신하도록 제어할 수 있다. Meanwhile, as an intent is designated, the processor 240 may provide a document form necessary for a meeting for each intent to the user's device before searching and providing the guide content. For example, when the user's device is implemented as an electronic device including an output device, the processor 240 may control to transmit an intent-related document (such as an estimate, a pledge, etc.) through the communication unit 210 .

프로세서(240)는 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색하고, 실시간으로 변환된 텍스트와 함께 가이드 콘텐츠를 사용자의 디바이스로 제공할 수 있다. 구체적으로, 프로세서(240)는 미리 정의된 인텐트에 대응되는 데이터베이스에서 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색할 수 있다. 예를 들어, 프로세서(240)는 특정 인텐트 카테고리에 대응되는 신경망 기반의 언어 모델의 공간 벡터에 가이드용 텍스트를 프로젝션(projection)한 후, 가이드용 텍스트와 콘텐츠 사이의 코사인 유사도를 계산하여, 계산된 유사도가 지정된 값 이상인 콘텐츠를 사용자에게 제공할 가이드 콘텐츠인 것으로 판단할 수 있다. The processor 240 may search for guide content corresponding to the guide text, and provide the guide content together with the text converted in real time to the user's device. Specifically, the processor 240 may search the guide content corresponding to the text for the guide in the database corresponding to the predefined intent. For example, the processor 240 projects the guide text into a space vector of a neural network-based language model corresponding to a specific intent category, and then calculates and calculates the cosine similarity between the guide text and the content. It may be determined that content having a similarity level greater than or equal to a specified value is guide content to be provided to the user.

아울러, 프로세서(240)가 검색하는 가이드 콘텐츠는 가이드용 텍스트에 따라 사용자에게 제공되어야 하는 정보들의 총칭으로, 가이드용 텍스트에 대한 사전적 정의, 가이드용 텍스트와 관련된 법령, 내규, 안내 지침 등을 포함하고, 텍스트, 이미지, 영상 등의 다양한 포맷으로 제공될 수 있다. In addition, the guide content searched by the processor 240 is a generic name of information that must be provided to the user according to the guide text, and includes a dictionary definition of the guide text, laws related to the guide text, bylaws, guide guidelines, etc. and may be provided in various formats such as text, image, and video.

여기서, 프로세서(240)는 가이드 콘텐츠를 제공할 사용자의 직무, 자격, 성별, 연령 등의 사용자 개인 정보를 확인하고, 이를 기초로 사용자에게 적합한 가이드 콘텐츠를 빠르게 검색할 수 있다. Here, the processor 240 may check user personal information such as job, qualification, gender, age, etc. of the user who will provide the guide content, and quickly search for guide content suitable for the user based thereon.

실시 예에 따라, 프로세서(240)는 가이드 콘텐츠와 함께 앞서 지정된 인텐트에 따라 회의 내에서 확인되어야 할 사항을 검색하여 제공할 수 있다. 예를 들어, 인텐트가 정기 세일 기획 미팅일 경우, 체크 리스트는, 일시, 장소, 세일 가격 등을 포함할 수 있다. According to an embodiment, the processor 240 may search for and provide a matter to be confirmed in the meeting according to the previously designated intent along with the guide content. For example, if the intent is a regular sale planning meeting, the check list may include a date and time, a place, a sale price, and the like.

프로세서(240)는 오디오 데이터에 포함된 모든 오디오 신호를 텍스트로 변환한 이후에, 텍스트, 가이드용 텍스트 및 가이드 콘텐츠 중 어느 하나의 정보를 조합하여 오디오 데이터를 요약한 회의록을 생성할 수 있다. 회의록은 오디오 데이터를 재생하는 그래픽과 함께 텍스트와 강조 표시된 가이드용 텍스트를 포함하고, 다양한 포맷으로 구현된 가이드 콘텐츠를 포함할 수 있다. 프로세서(240)는 가이드용 텍스트 및 가이드 콘텐츠 중에서도 인텐트와 가장 유사도가 높은 텍스트, 콘텐츠를 오디오 데이터의 요약 정보로 활용할 수 있다. After converting all audio signals included in the audio data into text, the processor 240 may combine information on any one of text, guide text, and guide content to generate meeting minutes summarizing the audio data. The minutes may include text and highlighted text for a guide along with graphics that reproduce audio data, and may include guide content implemented in various formats. The processor 240 may utilize text and content having the highest similarity to an intent among guide texts and guide content as summary information of audio data.

지금까지 본 발명의 일 실시 예에 따른 회의 녹음 기록 서버(200)에 대하여 설명하였다. 본 발명에 따르면, 회의 녹음 기록 서버(200)를 통해 회의 내용을 텍스트로 빠르게 변환할 수 있으며, 회의록을 제공받을 사용자에게 적합한 부가 정보들이 제공되어, 사용자는 개인 맞춤형 회의록을 얻을 수 있다. So far, the conference recording server 200 according to an embodiment of the present invention has been described. According to the present invention, meeting contents can be quickly converted into text through the meeting recording server 200, and additional information suitable for the user to be provided with the meeting minutes is provided, so that the user can obtain personalized meeting minutes.

이하에서는, 회의 녹음 장치(100)를 통해 회의 내용을 녹음하고, 회의 녹음 기록 서버(200)를 통해 회의 내용을 기록하는 방법에 대해서 설명하도록 한다. Hereinafter, a method of recording the conference contents through the conference recording apparatus 100 and recording the conference contents through the conference recording recording server 200 will be described.

도 4는 본 발명의 일 실시 예에 따른 회의 녹음 장치의 회의 녹음 방법의 개략적인 순서도이고, 도 5는 본 발명의 일 실시 예에 따른 회의 녹음 기록 서버의 회의 녹음 기록 방법의 개략적인 순서도이다.4 is a schematic flowchart of a conference recording method of a conference recording apparatus according to an embodiment of the present invention, and FIG. 5 is a schematic flowchart of a conference recording recording method of a conference recording recording server according to an exemplary embodiment of the present invention.

도 4를 참조하면, 회의 녹음 장치(100)는 다채널 마이크로부터 사용자의 음성이 포함된 오디오 데이터를 획득한다(S110). 구체적으로, 회의 녹음 장치(100)는 서로 다른 방향으로 배치된 다채널 마이크를 포함하며, 다채널 마이크를 통해 회의 녹음 장치(100)의 정면에 위치한 사용자 외에 다른 사용자들의 음성이 포함된 오디오 데이터를 획득할 수 있다. Referring to FIG. 4 , the conference recording apparatus 100 acquires audio data including the user's voice from the multi-channel microphone ( S110 ). Specifically, the conference recording apparatus 100 includes multi-channel microphones arranged in different directions, and through the multi-channel microphone, audio data including voices of other users other than the user located in front of the conference recording apparatus 100 are recorded. can be obtained

S110 단계 이후, 회의 녹음 장치(100)는 획득된 오디오 데이터에서 마이크 채널 별로 오디오 신호를 분리한다(S120). 예를 들어, 회의 녹음 장치(100)는 다채널 마이크 각각의 배치 방향에 따른 오디오 신호의 크기 차이, 획득 시간 차이를 고려하여 여러 오디오 신호를 분리할 수 있다. After step S110, the conference recording apparatus 100 separates the audio signal for each microphone channel from the obtained audio data (S120). For example, the conference recording apparatus 100 may separate several audio signals in consideration of a difference in amplitudes of audio signals and a difference in acquisition time depending on the arrangement direction of each of the multi-channel microphones.

S130 단계 이후, 회의 녹음 장치(100)는 분리된 오디오 신호를 이용하여 사용자 및 사용자와 대화하는 대화 상대의 오디오 신호를 획득한다(S130). 여기서, 만약 미리 지정된 사용자의 음성 정보(오디오 신호)가 존재한다면, 회의 녹음 장치(100)는 사용자의 음성 정보를 기준으로 대화 상대의 오디오 신호를 분리할 수 있다. 또한, 회의 녹음 장치(100)는 인접한 다른 회의 녹음 장치(100)로부터 위치 정보 및 다른 사용자의 오디오 정보(오디오 신호)를 수신하고, 이를 기초로 사용자와 대화 상대의 오디오 신호를 획득할 수 있다. After step S130, the conference recording apparatus 100 obtains the audio signal of the user and the conversation partner talking to the user by using the separated audio signal (S130). Here, if there is preset user's voice information (audio signal), the conference recording apparatus 100 may separate the conversation partner's audio signal based on the user's voice information. Also, the conference recording apparatus 100 may receive location information and other user's audio information (audio signal) from another adjacent conference recording apparatus 100 , and may obtain an audio signal of the user and the conversation partner based thereon.

실시 예에 따라, 회의 녹음 장치(100)는 분리 획득된 오디오 신호를 텍스트로 변환할 수 있으며, 변환된 텍스트에서 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출할 수 있다. 여기서, 인텐트란 녹음하게 될 회의 또는 녹음이 완료된 회의의 주제를 의미하며, 가이드용 텍스트는 인텐트에 따라 사용자에게 필요한 정보들을 제공하기 위한 참조 텍스트(Reference Text)를 의미하며, 회의 녹음 장치(100)는 변환된 텍스트에서 가이드용 텍스트를 강조 표시하거나, 텍스트에 태그를 부착할 수 있다. According to an embodiment, the conference recording apparatus 100 may convert the separately obtained audio signal into text, and extract text for a guide corresponding to a predefined intent from the converted text. Here, the intent means a subject of a meeting to be recorded or a meeting on which recording has been completed, and the guide text means a reference text for providing necessary information to the user according to the intent, and the conference recording device ( 100) can highlight text for guides in the converted text, or tag text.

S130 단계 이후, 회의 녹음 장치(100)는 획득된 오디오 신호를 회의 녹음 기록 서버로 송신한다(S140). 구체적으로, 회의 녹음 장치(100)는 오디오 신호와 함께 변환된 텍스트 및 가이드용 텍스트를 회의 녹음 기록 서버(200)로 송신할 수 있다. 추가로, 회의 녹음 장치(100)는 사용자의 직무, 자격, 성별, 연령 등의 개인 정보를 함께 회의 녹음 기록 서버(200)로 송신할 수 있으며, 이는 회의록 작성 및 개별 가이드 콘텐츠 제공 시에 활용될 수 있다. After step S130, the conference recording apparatus 100 transmits the obtained audio signal to the conference recording recording server (S140). Specifically, the conference recording apparatus 100 may transmit the converted text and guide text together with the audio signal to the conference recording recording server 200 . In addition, the conference recording apparatus 100 may transmit personal information such as the user's job, qualification, gender, age, etc. to the conference recording recording server 200 together, which will be utilized when creating conference minutes and providing individual guide content. can

이후, 회의 녹음 장치(100)는 시각적인 출력 장치의 구비 여부에 따라, 회의 녹음 기록 서버(200)로부터 가이드용 텍스트에 대응되는 가이드 콘텐츠를 수신할 수 있으며, 출력 장치를 통해 가이드 콘텐츠를 사용자에게 제공할 수 있다. Thereafter, the conference recording apparatus 100 may receive guide content corresponding to the guide text from the conference recording recording server 200 according to whether a visual output device is provided, and provides the guide content to the user through the output device. can provide

한편, 회의 녹음 장치(100)가 회의 녹음만을 수행할 수 있으며, 이러한 경우 회의 녹음 기록 서버(200)에 회의의 내용이 기록되고, 사용자 별로 적합한 콘텐츠가 제공될 수 있다.On the other hand, the conference recording apparatus 100 may perform only conference recording. In this case, the conference contents are recorded in the conference recording recording server 200, and content suitable for each user may be provided.

이하, 도 5를 참조하여, 회의 녹음 기록 서버(200)의 회의 내용 기록 방법에 대하여 설명하도록 한다. Hereinafter, with reference to FIG. 5 , a method for recording meeting contents of the meeting recording server 200 will be described.

도 5를 참조하면, 회의 녹음 기록 서버(200)는 사용자의 음성이 포함된 오디오 데이터를 획득한다(S210). 이때, 회의 녹음 기록 서버(200)는 오디오 데이터를 획득한 사용자의 디바이스(회의 녹음 장치(100))의 마이크 채널 수를 확인하고, 확인된 채널 별로 사용자와 다른 사용자들의 오디오 신호를 분리할 수 있다. 아울러, 회의 녹음 기록 서버(200)는 오디오 데이터를 획득한 사용자의 디바이스(회의 녹음 장치(100))와 인접한 다른 사용자의 디바이스(다른 회의 녹음 장치(100))로부터 오디오 데이터를 획득할 수 있으며, 오디오 데이터와 함께 다른 사용자의 디바이스가 배치된 방향에 대한 정보를 획득할 수 있다. 즉, 회의 녹음 기록 서버(200)는 서로 다른 방향에 배치된 마이크 채널 별로 오디오 신호의 크기와 획득 시점을 확인하고, 획득된 오디오 신호에 따라 지정된 사용자 별로 오디오 신호를 분리할 수 있다.Referring to FIG. 5 , the conference recording server 200 obtains audio data including the user's voice ( S210 ). At this time, the conference recording recording server 200 may check the number of microphone channels of the user's device (conference recording apparatus 100) that has obtained the audio data, and separate the audio signals of the user and other users for each confirmed channel. . In addition, the conference recording recording server 200 may obtain audio data from the device of the user who obtained the audio data (the conference recording apparatus 100) and the device of another user (the other conference recording apparatus 100) adjacent to it, In addition to the audio data, information on a direction in which another user's device is arranged may be acquired. That is, the conference recording server 200 may check the size and acquisition time of the audio signal for each microphone channel disposed in different directions, and separate the audio signal for each designated user according to the acquired audio signal.

이와 같은 오디오 신호 분리를 통해, 회의 녹음 기록 서버(200)는 다수의 사용자들 각각의 오디오 신호를 분리하거나, 미리 저장된 사용자의 오디오 신호 외에 다른 사용자들의 오디오 신호를 삭제할 수 있다.Through such separation of audio signals, the conference recording server 200 may separate audio signals of each of a plurality of users or may delete audio signals of other users in addition to previously stored audio signals of users.

S210 단계 이후, 회의 녹음 기록 서버(200)는 획득된 오디오 데이터를 이용하여 사용자의 음성을 텍스트로 변환하고(S220), 회의 녹음 기록 서버(200)는 텍스트 중 미리 정의된 인텐트에 대응되는 가이드용 텍스트를 추출한다(S230). 다만, 여기서 회의 녹음 기록 서버(200)는 가이드용 텍스트를 추출하기 위해, 인텐트가 정의되어 있는지 확인할 수 있다. 여기서, 미리 정의된 인텐트는 사용자에 의해 입력 받거나, 회의 녹음 기록 서버(200)가 사용자가 업로드한 회의록을 통해 지정될 수 있다. 회의 녹음 기록 서버(200)는 사용자가 입력하거나 업로드한 파일이 존재하는지 확인하고, 가이드용 텍스트 추출 전에 인텐트를 지정할 수 있다. After step S210, the conference recording server 200 converts the user's voice into text using the obtained audio data (S220), and the conference recording recording server 200 provides a guide corresponding to a predefined intent among the texts. The text is extracted for use (S230). However, here, the conference recording server 200 may check whether an intent is defined in order to extract the guide text. Here, the predefined intent may be input by the user or may be designated through the meeting minutes uploaded by the user by the meeting recording server 200 . The conference recording server 200 may check whether a file input or uploaded by a user exists, and may designate an intent before extracting text for a guide.

예를 들어, 회의 녹음 기록 서버(200)는 사용자가 업로드한 회의록을 통해 인텐트를 지정할 수 있다. 구체적으로, 회의 녹음 기록 서버(200)는 사용자에 의해 작성된 회의록에서 인텐트를 지정하기 위한 텍스트를 추출할 수 있으며, 회의 녹음 기록 서버(200)의 텍스트 추출을 위해 회의 녹음 기록 서버(200)에 저장된 이전 회의에 대한 오디오 데이터 및 회의록이 사용될 수 있다.For example, the conference recording recording server 200 may designate an intent through the conference minutes uploaded by the user. Specifically, the conference recording recording server 200 may extract text for designating an intent from the meeting minutes written by the user, and to the conference recording recording server 200 for text extraction of the conference recording recording server 200 . Audio data and minutes of stored previous meetings may be used.

다른 예를 들어, 회의 녹음 기록 서버(200)는 사용자에 의해 입력 받은 데이터나 회의록이 존재하지 않아 인텐트가 지정되어 있지 않은 경우, 오디오 데이터를 이용하여 인텐트를 지정할 수 있다. 구체적으로, 회의 녹음 기록 서버(200)는 오디오 데이터의 초반 부분에서 사용자의 음성을 이용하여 인텐트를 지정할 수 있으며, 보다 구체적으로 회의 녹음 기록 서버(200)는 미리 결정된 시간 이내(예. 1분 이내)의 사용자의 음성을 텍스트로 변환하고, 변환된 텍스트를 분석하여 인텐트로 정의될 단어를 추출할 수 있다. 즉, 회의를 시작하는 초반에는 “A 안건에 대한 회의를 시작하겠습니다”, “B 사건에 대한 조사를 시작합니다”, “이번 C 제품의 마케팅과 관련하여~” 등 회의의 주제와 관련된 발화가 존재하는 바, 프로세서(240)는 미리 결정된 시간 이내에서 사용자의 음성을 텍스트로 변환하고, 다양한 분석 방식을 통해 텍스트 중 인텐트로 정의될 단어를 추출할 수 있다. 예를 들어, 회의 녹음 기록 서버(200)는 형태소 분석, 개체명 분석 및 의미역(Semantic role) 분석 방식을 이용하여 단어를 추출할 수 있다.As another example, the conference recording server 200 may designate an intent by using audio data when the intent is not specified because data input by the user or the meeting minutes do not exist. Specifically, the conference recording recording server 200 may designate an intent using the user's voice in the initial part of the audio data, and more specifically, the conference recording recording server 200 may be configured within a predetermined time (eg, 1 minute). ) converts the user's voice into text, and analyzes the converted text to extract a word to be defined as an intent. In other words, at the beginning of the meeting, there are utterances related to the topic of the meeting, such as “We will start the meeting on Agenda A”, “We will start the investigation on Case B”, and “In relation to the marketing of this product C~”. As such, the processor 240 may convert the user's voice into text within a predetermined time period, and extract a word to be defined as an intent from the text through various analysis methods. For example, the conference recording server 200 may extract a word using a morpheme analysis, an entity name analysis, and a semantic role analysis method.

S230 단계 이후, 회의 녹음 기록 서버(200)는 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색한다(S240). 구체적으로, 회의 녹음 기록 서버(200)는 미리 정의된 인텐트에 대응되는 데이터베이스에서 가이드용 텍스트와 대응되는 가이드 콘텐츠를 검색할 수 있다. 예를 들어, 회의 녹음 기록 서버(200)는 특정 인텐트 카테고리에 대응되는 신경망 기반의 언어 모델의 공간 벡터에 가이드용 텍스트를 프로젝션(projection)한 후, 가이드용 텍스트와 콘텐츠 사이의 코사인 유사도를 계산하여, 계산된 유사도가 지정된 값 이상인 콘텐츠를 사용자에게 제공할 가이드 콘텐츠인 것으로 판단할 수 있다. After step S230, the conference recording server 200 searches for guide content corresponding to the guide text (S240). Specifically, the conference recording server 200 may search for guide text and corresponding guide content from a database corresponding to a predefined intent. For example, the conference recording server 200 projects the guide text into a space vector of a neural network-based language model corresponding to a specific intent category, and then calculates the cosine similarity between the guide text and the content. Accordingly, it may be determined that content having a calculated similarity greater than or equal to a specified value is guide content to be provided to the user.

아울러, 회의 녹음 기록 서버(200)가 검색하게 되는 가이드 콘텐츠는 가이드용 텍스트에 따라 사용자에게 제공되어야 하는 정보들의 총칭으로, 가이드용 텍스트에 대한 사전적 정의, 가이드용 텍스트와 관련된 법령, 내규, 안내 지침 등을 포함하고, 텍스트, 이미지, 영상 등의 다양한 포맷으로 제공될 수 있다. In addition, the guide content searched by the conference recording server 200 is a generic term for information that must be provided to users according to the guide text, and includes a dictionary definition of the guide text, laws related to the guide text, bylaws, and guidance. Including instructions, and the like, may be provided in various formats such as text, image, video, and the like.

뿐만 아니라, 회의 녹음 기록 서버(200)는 가이드 콘텐츠를 제공할 사용자의 직무, 자격, 성별, 연령 등의 사용자 개인 정보를 확인하고, 이를 기초로 사용자에게 적합한 가이드 콘텐츠를 빠르게 검색할 수 있다.In addition, the conference recording server 200 may check user personal information such as job, qualification, gender, age, etc. of the user who will provide the guide content, and quickly search for guide content suitable for the user based on this.

S240 단계 이후, 회의 녹음 기록 서버(200)는 검색된 가이드 콘텐츠 및 추출된 텍스트를 사용자의 디바이스로 제공한다(S250). 아울러, 회의 녹음 기록 서버(200)는 텍스트, 가이드용 텍스트 및 가이드 콘텐츠 중 어느 하나의 정보를 조합하여 오디오 데이터를 요약한 회의록을 생성하고, 이를 사용자의 디바이스로 제공할 수 있다. 회의록은 오디오 데이터를 재생하는 그래픽과 함께 텍스트와 강조 표시된 가이드용 텍스트를 포함하고, 다양한 포맷으로 구현된 가이드 콘텐츠를 포함할 수 있다. 회의 녹음 기록 서버(200)는 가이드용 텍스트 및 가이드 콘텐츠 중에서도 인텐트와 가장 유사도가 높은 텍스트, 콘텐츠를 오디오 데이터의 요약 정보로 활용할 수 있다. After step S240, the conference recording server 200 provides the searched guide content and the extracted text to the user's device (S250). In addition, the conference recording server 200 may combine any one information among text, guide text, and guide content to generate conference minutes summarizing audio data, and provide them to the user's device. The minutes may include text and highlighted text for a guide along with graphics that reproduce audio data, and may include guide content implemented in various formats. The conference recording server 200 may utilize text and content having the highest similarity to the intent among guide texts and guide content as summary information of audio data.

지금까지 본 발명의 일 실시 예에 따른 회의 녹음 기록 방법에 대하여 설명하였으며, 이하 도 6 내지 도 9를 참조하여, 회의 녹음을 기록하기 위한 인터페이스 화면 및 그 결과물에 대하여 설명하도록 한다. So far, a method for recording a conference recording according to an embodiment of the present invention has been described. Hereinafter, an interface screen for recording a conference recording and a result thereof will be described with reference to FIGS. 6 to 9 .

도 6 및 도 7은 본 발명의 일 실시 예에 따른 회의 녹음 내용이 기록되는 인터페이스를 설명하기 위한 개략도이고, 도 8 및 도 9는 본 발명의 일 실시 예에 따른 회의 가이드용 콘텐츠를 제공하기 위한 인터페이스를 예시적으로 설명한 개략도이다.6 and 7 are schematic diagrams for explaining an interface in which meeting recordings are recorded according to an embodiment of the present invention, and FIGS. 8 and 9 are for providing content for a meeting guide according to an embodiment of the present invention It is a schematic diagram illustrating an interface by way of example.

도 6을 참조하면, 회의 녹음 장치(100)가 시각적인 출력 장치를 포함하는 PC인 경우, 회의가 진행되는 동안 회의 녹음 내용이 다음과 같이 표시될 수 있다. 구체적으로, 회의 녹음 장치(100)는 사용자 정보(61)와 함께 오디오 신호(61)를 그래픽 형태로 표시하고, 회의 녹음 장치(100) 또는 회의 녹음 기록 서버(200)에 의해 텍스트 변환된 회의 내용(대화 내용)이 하단에 표시될 수 있으며, 회의 내용은 질문자(G)와 답변자(H)가 구분되어 표시될 수 있다. Referring to FIG. 6 , when the conference recording apparatus 100 is a PC including a visual output device, the conference recording content may be displayed as follows while the conference is in progress. Specifically, the conference recording apparatus 100 displays the audio signal 61 together with the user information 61 in graphic form, and the conference contents converted into text by the conference recording apparatus 100 or the conference recording recording server 200 . (Content of conversation) may be displayed at the bottom, and the meeting contents may be displayed separately from the questioner (G) and the answerer (H).

아울러, 회의 녹음을 실시간으로 진행하면서 텍스트를 변환하는 상황에서, 인텐트가 정의되어 있지 않은 경우, 회의 녹음 장치(100) 또는 회의 녹음 기록 서버(200)에 의해 초기 녹음 내용에서 인텐트가 지정될 수 있다. 그에 따라, 회의 녹음 장치(100) 또는 회의 녹음 기록 서버(200)에 의해 “성추행 사건으로 피해진술을 하러 왔습니다”라는 대화 내용(63) 및 사용자 정보(61)를 기초로 “추행 사건 가이드”라는 인텐트가 지정될 수 있다. In addition, in a situation where text is converted while recording a meeting in real time, if the intent is not defined, the intent is to be designated in the initial recording by the meeting recording device 100 or the meeting recording recording server 200 . can Accordingly, based on the conversation content (63) and user information (61) of “I came to make a statement of damage in a sexual harassment case” by the conference recording device 100 or the conference recording recording server 200, Intent may be specified.

이후, 회의 녹음 장치(100) 또는 회의 녹음 기록 서버(200)에 의해 회의 내용(대화 내용)에서 “①갈 곳도 없었거든요”라는 가이드용 텍스트가 추출되면, 질문자(G)가 답변자(H)에게 설명해줄 수 있는 ②보호시설 설명과 관련된 가이드 콘텐츠(64)가 회의 녹음 기록 서버(200)에 의해 제공되고, 해당 콘텐츠가 회의 녹음 장치(100)로 표시될 수 있다. After that, when the guide text “① There was nowhere to go” is extracted from the meeting content (conversation content) by the meeting recording device 100 or the meeting recording recording server 200, the questioner (G) responds to the answerer (H) ② Guide content 64 related to the description of the shelter that can be explained to the user is provided by the conference recording recording server 200 , and the corresponding content may be displayed by the conference recording apparatus 100 .

도 7을 참조하면, 회의 녹음 장치(100)가 시각적인 출력 장치를 포함하는 PC인 경우, 녹음된 회의 내용이 다음과 같이 표시될 수 있다. 구체적으로, 회의 녹음 장치(100)는 텍스트 변환된 내용 중 가이드용 텍스트(71)(예. 1. NTTP 연수)에 대응되는 가이드 콘텐츠(72)(예. NTTP 연수의 정의, 주최기관의 사이트 링크)가 제공되어 표시될 수 있다. 또한, 예를 들어, “예산안 심의 가이드”가 미리 지정된 인텐트인 경우, 가이드용 텍스트(예. 2. 추경 예산안- 3. 찬성 8명)에 대응되는 가이드 콘텐츠(73) (예. 찬성한 8인의 이름/직함)가 제공되어 표시될 수 있으며, 여기서의 가이드 콘텐츠는 인텐트와 대응되는 데이터베이스에서 검색된 정보이거나 회의 참석자가 회의 녹음 기록 서버(200)로 업로드한 회의 자료에서 검색된 정보일 수 있다. Referring to FIG. 7 , when the conference recording apparatus 100 is a PC including a visual output device, the recorded conference contents may be displayed as follows. Specifically, the conference recording apparatus 100 provides the guide content 72 (eg, definition of NTTP training, host organization site link) corresponding to the guide text 71 (eg, 1. NTTP training) among the text-converted contents. ) may be provided and displayed. Also, for example, if “Budget Deliberation Guide” is a predefined intent, guide content 73 (eg, 8 in favor) corresponding to the text for guide (eg 2. Supplementary Budget - 3. 8 in favor) person's name/title) may be provided and displayed, and the guide content here may be information retrieved from a database corresponding to the intent or information retrieved from meeting materials uploaded to the meeting recording server 200 by the meeting participants.

아울러, 회의 녹음 장치(100)에는 가이드 콘텐츠를 제공하는 영역의 하단에 요약된 회의록(74)을 제공되어 표시될 수 있으며, 회의록은 텍스트, 가이드용 텍스트 및 가이드 콘텐츠를 이용하여 생성될 수 있다. In addition, the meeting recording apparatus 100 may provide and display summarized meeting minutes 74 at the bottom of the area providing guide content, and the meeting minutes may be generated using text, guide text, and guide content.

도 8을 참조하면, 회의 녹음 장치(100)를 이용하는 사용자는 통계, 권한관리, 인텐트, 유의어, 인식률을 포함하는 카테고리(81)에서, “인텐트” 항목을 선택하고, 인텐트 별 녹음된 회의 내용을 확인할 수 있다. 구체적으로, 사용자는 ①분류 항목을 선택하여 인텐트의 유형을 선택할 수 있으며, 그에 따라 회의 녹음 장치(100)에 추행과 관련된 인텐트(82)와 인텐트 별 녹음된 회의 내용(83)이 제목과 소제목 형식으로 표시될 수 있다. Referring to FIG. 8 , a user using the conference recording apparatus 100 selects an “intent” item from a category 81 including statistics, authority management, intent, synonym, and recognition rate, and records You can check the meeting details. Specifically, the user can select the type of intent by selecting the category ① by selecting the category, and accordingly, the intent 82 related to sexual harassment and the contents of the meeting recorded by each intent 83 are titled in the meeting recording device 100 . and may be displayed in the form of subheadings.

마지막으로, 도 9를 참조하면, 회의 녹음 장치(100)를 이용하는 사용자는 통계, 권한관리, 인텐트, 유의어, 인식률을 포함하는 카테고리(91)에서, “유의어” 항목을 선택하고, 인텐트에 추가될 키워드들(92)을 추가하거나 삭제할 수 있다. 여기서, 키워드는 가이드 콘텐츠를 검색하기 위한 인텐트 별 데이터베이스를 확장하기 위해 사용될 수 있다. Finally, referring to FIG. 9 , a user using the conference recording device 100 selects a “synonym” item from the category 91 including statistics, authority management, intent, synonym, and recognition rate, and enters the intent. Keywords 92 to be added may be added or deleted. Here, the keyword may be used to expand the database for each intent for searching guide content.

지금까지 본 발명의 일 실시 예에 따른 회의 녹음 기록 시스템(1000)과 회의 녹음 및 기록 방법에 대하여 설명하였다. 본 발명에 따르면, 회의 녹음 기록 서버(200)가 회의를 하고자 하는 의도(인텐트)를 파악함으로써, 회의 내용을 텍스트로 변환하는 동안 회의 참석자에게 회의 내용을 이해하기 위한 각종 콘텐츠를 제공할 수 있다. So far, the conference recording system 1000 and the conference recording and recording method according to an embodiment of the present invention have been described. According to the present invention, the conference recording server 200 can provide various contents for understanding the conference contents to the conference participants while converting the conference contents into text by identifying the intention (intent) of the conference. .

이상 첨부된 도면을 참조하여 본 발명의 일 실시 예들을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시 예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형 실시될 수 있다. 따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 그러므로, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although one embodiment of the present invention has been described in more detail with reference to the accompanying drawings, the present invention is not necessarily limited to these embodiments, and various modifications may be made within the scope without departing from the technical spirit of the present invention. . Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

1000: 회의 녹음 기록 시스템
100: 회의 녹음 장치
110: 입력부
120: 저장부
130: 통신부
140: 프로세서
200: 회의 녹음 기록 서버
210: 통신부
220: 저장부
230: 출력부
240: 프로세서
300: 디스플레이 장치 1000: conference recording recording system
100: conference recording device
110: input unit
120: storage
130: communication department
140: processor
200: conference recording recording server
210: communication unit
220: storage
230: output unit
240: processor
300: display device

Claims

obtaining audio data including a user's voice;
converting the user's voice into text using the acquired audio data;
extracting a guide text corresponding to a predefined intent from among the text and corresponding to a reference text for providing necessary information to the user according to the intent;
searching for guide content corresponding to the text for the guide and including content for supplementary information different for each user; and
providing the searched guide content and the extracted text to the user's device; including,
The providing step is
A method of recording a meeting, which is a step of providing, to the user's device, a first screen in which user information and conversation contents are converted into text and displayed and a second screen in which different guide contents are displayed for each user.

According to claim 1,
Prior to obtaining the audio data,
receiving the meeting minutes prepared by the user from the user's device, and
and extracting text for designating the intent from the received meeting minutes.

According to claim 1,
Prior to obtaining the audio data,
checking whether the intent is defined, and
As a result of checking, if the intent is not defined, the user's voice is converted into text within a predetermined time based on the initial part of the audio data, and morpheme analysis, entity name analysis, and semantic role analysis method and extracting a word to be defined as an intent from the converted text through any one method.

According to claim 1,
Converting the voice to text comprises:
Separating audio signals for the user and other users other than the user by using pre-stored audio signals grouped according to the user's voice information or waveform.

5. The method of claim 4,
Prior to obtaining the audio data,
and receiving the user's personal information or the voice information for defining the intent.

5. The method of claim 4,
Separating the audio signal comprises:
and confirming the number of microphone channels of the device that obtained the audio data, and collecting audio signals for two or more users for each of the checked channels.

5. The method of claim 4,
Separating the audio signal comprises:
and confirming the number of microphone channels of the device that obtained the audio data, and deleting audio signals of other users other than the audio signals for the user for each of the checked channels.

8. The method of claim 7,
The audio signals of the other users are
The method of claim 1, which is collected through audio data obtained from a device of another user adjacent to the device of the user.

5. The method of claim 4,
The step of providing to the user's device comprises:
The method further comprising the step of displaying and providing the separated audio signals of the user and the other user in a graphic form according to time.

According to claim 1,
The step of searching the guide content is,
and searching for guide content corresponding to the guide text in a database corresponding to the intent.

11. The method of claim 10,
The guide content is
A conference recording recording method comprising at least one of a dictionary definition of the guide text, laws related to the guide text, bylaws, and guide guidelines.

According to claim 1,
Acquiring the audio data includes:
A method for recording conference recording, the step of acquiring recorded audio data or audio data being recorded in real time.

According to claim 1,
After providing to the user's device,
and generating meeting minutes summarizing the audio data by combining information of any one of the converted text, the guide text, and the searched guide content.

communication department;
storage;
a processor operatively connected to the communication unit and the storage unit; including,
The processor is
Acquire audio data including the user's voice,
A guide that converts the user's voice into text using the obtained audio data, corresponds to a predefined intent among the text, and corresponds to a reference text for providing necessary information to the user according to the intent Doedoe configured to extract a text for a guide, search for guide content that is the text for the guide, and include content for supplementary information different for each user, and provide the searched guide content and the extracted text to the user's device,
The processor is configured to provide, to the user's device, a first screen in which user information and conversation contents are converted into text and displayed and a second screen in which guide content different for each user is displayed together.

obtaining audio data including a user's voice from a multi-channel microphone;
separating an audio signal for each channel of a microphone from the obtained audio data;
obtaining an audio signal of the user and a conversation partner talking with the user by using the separated audio signal;
Converts the user's voice into text using the obtained audio signal, corresponds to a predefined intent among the converted text, and corresponds to a reference text for providing necessary information to the user according to the intent extracting text for a guide; and
transmitting at least one of the acquired audio data, the converted text, and the guide text to a conference recording server; A method of recording a meeting comprising a.

16. The method of claim 15,
Prior to obtaining the audio data,
and storing the user's voice information and other user's voice information from another conference recording device adjacent to the conference recording device.

17. The method of claim 16,
Separating the audio signal comprises:
removing an audio signal of a user other than the user according to the size of the audio signal and the stored user's voice information from the audio signal separated for each microphone channel;

input unit;
storage;
a processor operatively connected to the input unit and the storage unit; including,
The processor is
Acquires audio data including a user's voice from a multi-channel microphone, separates an audio signal for each channel of the microphone from the obtained audio data, and uses the separated audio signal to communicate with the user and the user obtains an audio signal of , converts the user's voice into text using the obtained audio signal, corresponds to a predefined intent among the converted text, and provides necessary information to the user according to the intent and extracting a guide text corresponding to a reference text for doing so, and transmitting at least one of the obtained audio signal, the converted text, and the guide text to a conference recording recording server.