KR102301149B1

KR102301149B1 - Method, computer program and system for amplification of speech

Info

Publication number: KR102301149B1
Application number: KR1020190143002A
Authority: KR
Inventors: 홍성화; 문일준; 설혜윤
Original assignee: 사회복지법인 삼성생명공익재단; 성균관대학교 산학협력단
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2021-09-10
Anticipated expiration: 2039-11-08
Also published as: KR20210056183A

Abstract

본 발명의 일 실시예에 따른 특정 화자의 음성을 선택적으로 증폭하는 방법은, 복수의 후보 음성을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 상기 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하는 단계; 상기 목표 음성의 적어도 하나의 특성값을 추출하는 단계; 상기 특성값에 기초하여 입력 음향에 상기 목표 음성이 포함되어 있는지 여부를 확인하고, 상기 목표 음성이 포함된 경우 상기 목표 음성만을 증폭하여 증폭 음성을 생성하는 단계; 및 상기 증폭 음성을 출력하는 단계;를 포함할 수 있다.In a method for selectively amplifying a specific speaker's voice according to an embodiment of the present invention, a target voice that a user wants to hear from among the plurality of candidate voices based on the user's brain waves while reproducing a plurality of candidate voices in time series identifying a; extracting at least one characteristic value of the target voice; checking whether the target voice is included in the input sound based on the characteristic value, and when the target voice is included, amplifying only the target voice to generate an amplified voice; and outputting the amplified voice.

Description

Method, computer program and system for selective amplification of voice

본 발명은 사용자의 뇌파에 기초하여 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하여 증폭하는 음성의 선택적 증폭 방법, 컴퓨터 프로그램 및 시스템에 관한 것이다.The present invention relates to a method for selectively amplifying a voice, a computer program and a system for identifying and amplifying a target voice to be heard by a user from among a plurality of candidate voices based on the user's brain waves.

이어폰과 같은 개인용 음향 감상 도구의 보급 확대와 인구의 신속한 고령화로 난청 환자의 수가 점차 증가하고 있다.The number of patients with hearing loss is gradually increasing due to the expansion of distribution of personal sound listening tools such as earphones and the rapid aging of the population.

노인성 난청의 경우, 1차적으로 말초 청각 기관의 손실로 고주파수의 소리를 잘 듣지 못하고, 2차적으로는 중추 청각 기관의 손실로 주변의 배경소음 혹은 반향음이 있는 공간에서 상대방의 말소리를 이해하기가 어려우며, 시간처리 기능의 저하로 상대방이 빠르게 이야기할 때 말소리를 이해하기 힘든 특징이 있다. 뿐만 아니라, 작업기억 등의 인지 기능의 노화로 인해 긴 문장을 이해하거나 다화자의 대화 흐름을 원활하게 따라가기가 어렵다. 따라서 난청 환자들, 특히 노인성 난청 환자들을 고려한 효과적인 재활 치료 방법이 요구되는 상황이다. In the case of age-related hearing loss, it is primarily difficult to hear high-frequency sounds due to the loss of the peripheral auditory organ, and secondly, the loss of the central auditory organ makes it difficult to understand the other person's speech in a space with surrounding background noise or echoes. It is difficult and it is difficult to understand the speech sound when the other person is speaking quickly due to the deterioration of the time processing function. In addition, due to the aging of cognitive functions such as working memory, it is difficult to understand long sentences or to follow the conversation flow of a multi-speaker smoothly. Therefore, there is a need for an effective rehabilitation treatment method considering hearing loss patients, especially elderly hearing loss patients.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-mentioned background art is technical information possessed by the inventor for derivation of the present invention or acquired in the process of derivation of the present invention, and cannot necessarily be said to be a known technique disclosed to the general public prior to the filing of the present invention.

본 발명은 난청 환자인 사용자가 청취하고자 하는 목표 음성만을 증폭하여 제공함으로써 난청의 재활 치료가 보다 잘 수행될 수 있도록 하고자 한다.An object of the present invention is to improve hearing loss rehabilitation treatment by amplifying and providing only a target voice that a user who is a hearing loss patient wants to hear.

상기 목표 음성을 식별하는 단계는 상기 복수의 후보 음성의 청취 중 상기 뇌파의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 상기 목표 음성으로 식별할 수 있다. 이때 상기 물리량은 상기 뇌파의 진동수 및 상기 뇌파의 진폭 중 어느 하나 이상을 포함하고, 상기 소정의 조건은 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진동수가 가장 큰 음성일 조건 및 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 포함할 수 있다.In the identifying of the target voice, a voice in which at least one of the physical quantities of the EEG satisfies a predetermined condition may be identified as the target voice while listening to the plurality of candidate voices. In this case, the physical quantity includes at least one of a frequency of the brain wave and an amplitude of the brain wave, and the predetermined condition is a condition that the user's brain wave frequency is the highest among the plurality of candidate voices and the plurality of candidate voices Among them, it may include any one or more of the conditions that the user's EEG amplitude is the smallest voice.

상기 목표 음성을 식별하는 단계는 학습된 인공 신경망을 이용하여, 상기 사용자가 청취하고자 하는 목표 음성을 결정하는 단계;를 포함하고, 상기 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.The step of identifying the target voice includes the step of determining the target voice that the user wants to listen to by using the learned artificial neural network, wherein the artificial neural network includes EEG data in which the user wants to listen Based on the learning data, it may be a neural network trained to output whether the user wants to listen in response to the input of EEG data.

상기 적어도 하나의 특성값은 상기 목표 음성의 화자의 성별, 상기 목표 음성의 음고, 상기 목표 음성의 진동수 및 상기 목표 음성의 발화 패턴 중 적어도 하나를 포함할 수 있다.The at least one characteristic value may include at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and an utterance pattern of the target voice.

상기 입력 음향은 실시간으로 입력되는 음향으로, 복수의 화자에 의해 발화된 복수의 발화 음성을 포함하고, 상기 증폭 음성을 생성하는 단계는 제1 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인하는 단계; 상기 발화 특성값과 상기 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인하는 단계; 및 상기 소정의 임계 유사도를 초과하는 경우 상기 증폭 음성의 출력 특성값 중 적어도 하나를 조절하는 단계;를 포함할 수 있다.The input sound is a sound input in real time, and includes a plurality of spoken voices uttered by a plurality of speakers, and the generating of the amplified voice includes an utterance characteristic value that is a characteristic value of a speaker's voice uttered at a first time point. to confirm; checking whether the similarity between the speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity; and adjusting at least one of the output characteristic values of the amplified speech when the predetermined threshold similarity is exceeded.

상기 출력 특성값 중 적어도 하나를 조절하는 단계는 상기 제1 시점에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인하는 단계; 및 상기 입력 음향에서, 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도 보다 높게 설정하는 단계;를 포함할 수 있다.The adjusting at least one of the output characteristic values may include: identifying a frequency band to which a speaker's voice uttered at the first time point belongs; and setting, in the input sound, the amplification level of the sound in the identified frequency band to be higher than the amplification level of the sound in a band other than the identified frequency band.

상기 증폭 음성을 생성하는 단계는 상기 제1 시점에서 발화하는 화자의 음성이 종료되는 시점인 제2 시점을 검출하는 단계; 상기 제2 시점부터 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도와 동일하게 설정하는 단계;를 포함할 수 있다.The generating of the amplified voice may include: detecting a second time point at which the speaker's voice uttered at the first time point ends; setting the amplification degree of the sound in the identified frequency band from the second time point to be the same as the amplification degree of the sound in a band other than the identified frequency band.

상기 증폭 음성을 출력하는 단계는 상기 입력 음향을 상쇄시키는 상쇄 음향을 출력하는 단계; 및 상기 증폭 음성을 출력하는 단계;를 포함할 수 있다. 이때 상기 상쇄 음향은 상기 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다.The outputting of the amplified sound may include: outputting an offsetting sound for canceling the input sound; and outputting the amplified voice. In this case, the offset sound may be a sound having the same amplitude and opposite phase to the input sound.

본 발명의 일 실시예에 따른 특정 화자의 음성을 선택적으로 증폭하는 음성의 선택적 증폭 시스템에 있어서, 상기 시스템은 사용자의 뇌파를 감지하는 뇌파 감지 장치; 상기 사용자의 청각 기관에 선택적으로 증폭된 음성을 출력하는 음성 출력 유닛; 및 상기 뇌파 감지 장치가 감지한 뇌파에 기초하여 상기 증폭된 음성을 생성하여 상기 음성 출력 유닛에 제공하는 사용자 단말;을 포함할 수 있다. 이때 상기 사용자 단말은 상기 음성 출력 유닛을 통하여 복수의 후보 음성을 시계열적으로 재생하면서, 상기 뇌파 감지 장치가 감지한 사용자의 뇌파에 기초하여 상기 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하고, 상기 목표 음성의 적어도 하나의 특성값을 추출하고, 상기 특성값에 기초하여 입력 음향에 상기 목표 음성이 포함되어 있는지 여부를 확인하고, 상기 목표 음성이 포함된 경우 상기 목표 음성만을 증폭하여 증폭 음성을 생성하고, 상기 증폭 음성을 상기 음성 출력 유닛이 출력하도록 제어할 수 있다.A system for selectively amplifying a voice for selectively amplifying a specific speaker's voice according to an embodiment of the present invention, the system comprising: an EEG sensing device for detecting a user's EEG; a voice output unit for selectively outputting amplified voice to the user's auditory organ; and a user terminal that generates the amplified voice based on the brain wave detected by the brain wave sensing device and provides the amplified voice to the voice output unit. In this case, the user terminal identifies a target voice to be heard by the user from among the plurality of candidate voices based on the user's brain waves detected by the EEG sensing device while time-series reproduction of a plurality of candidate voices is performed through the voice output unit. extracting at least one characteristic value of the target voice, checking whether the target voice is included in the input sound based on the characteristic value, and amplifying and amplifying only the target voice when the target voice is included A voice may be generated, and the amplified voice may be controlled so that the voice output unit outputs the amplified voice.

상기 사용자 단말은 상기 복수의 후보 음성의 청취 중 상기 뇌파의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 상기 목표 음성으로 식별할 수 있다.The user terminal may identify, as the target voice, a voice in which at least one of the physical quantities of the EEG satisfies a predetermined condition while listening to the plurality of candidate voices.

상기 물리량은 상기 뇌파의 진동수 및 상기 뇌파의 진폭 중 어느 하나 이상을 포함하고, 상기 소정의 조건은 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진동수가 가장 큰 음성일 조건 및 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 포함할 수 있다.The physical quantity includes at least one of a frequency of the brain wave and an amplitude of the brain wave, and the predetermined condition is a condition that the user's brain wave frequency is the highest among the plurality of candidate voices, and among the plurality of candidate voices. It may include any one or more of the condition that the amplitude of the user's brain wave is the smallest voice.

상기 사용자 단말은 학습된 인공 신경망을 이용하여, 상기 사용자가 청취하고자 하는 목표 음성을 결정하고, 상기 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.The user terminal determines a target voice that the user wants to listen to by using the learned artificial neural network, and the artificial neural network uses the learning data including the EEG data indicating whether the user wants to listen to the EEG data. It may be a neural network trained to output whether the user wants to listen in response to the input.

상기 입력 음향은 상기 사용자 단말 및 상기 음성 출력 유닛 중 어느 하나가 실시간으로 감지하는 음향으로, 복수의 화자에 의해 발화된 복수의 발화 음성을 포함하고, 상기 사용자 단말은 제1 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인하고, 상기 발화 특성값과 상기 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인하고, 상기 소정의 임계 유사도를 초과하는 경우 상기 음성 출력 유닛의 출력 특성값 중 적어도 하나를 조절할 수 있다.The input sound is a sound sensed by any one of the user terminal and the voice output unit in real time, and includes a plurality of spoken voices uttered by a plurality of speakers, and the user terminal is a sound of a speaker uttered at a first point in time. A speech characteristic value that is a characteristic value of a voice is checked, and it is checked whether a similarity between the speech characteristic value and a characteristic value of the target voice exceeds a predetermined threshold similarity, and when the predetermined threshold similarity is exceeded, the speech output At least one of the output characteristic values of the unit may be adjusted.

상기 사용자 단말은 상기 제1 시점에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인하고, 상기 입력 음향에서 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.The user terminal identifies a frequency band to which the speaker's voice uttered at the first time point belongs, and determines the amplification degree of the sound of the identified frequency band from the input sound. It can be set higher.

상기 사용자 단말은 상기 제1 시점에서 발화하는 화자의 음성이 종료되는 시점인 제2 시점을 검출하고, 상기 제2 시점부터 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The user terminal detects a second time point, which is a time point at which the speaker's voice uttered at the first time point ends, and determines the amplification degree of the sound of the identified frequency band from the second time point in a band other than the identified frequency band. It can be set to be the same as the amplification degree of the sound.

상기 사용자 단말은 상기 음성 출력 유닛이 상기 입력 음향을 상쇄시키는 상쇄 음향 및 상기 증폭 음성을 함께 출력하도록 제어할 수 있다. 이때 상기 상쇄 음향은 상기 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다.The user terminal may control the audio output unit to output both the offset sound for canceling the input sound and the amplified sound. In this case, the offset sound may be a sound having the same amplitude and opposite phase to the input sound.

본 발명에 의하면 난청 환자인 사용자가 청취하고자 하는 목표 음성만을 증폭하여 제공함으로써 난청의 재활 치료가 보다 잘 수행될 수 있도록 한다.According to the present invention, by amplifying and providing only the target voice that the user who is a hearing loss patient wants to hear, rehabilitation treatment for hearing loss can be performed better.

또한 본 발명은 사용자가 청취하고자 하는 목표 음성만을 제공하되, 목표 음성을 제외한 나머지 음성만을 제공함으로써 난청 재활 치료의 효율성을 향상시킬 수 있다.In addition, the present invention provides only the target voice that the user wants to listen to, but provides only the other voices excluding the target voice, thereby improving the effectiveness of the hearing loss rehabilitation treatment.

도 1은 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템의 구성을 개략적으로 도시한다.
도 2는 본 발명의 일 실시예에 따른 음성 출력 유닛(100)의 구성을 개략적으로 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 사용자 단말(200)의 구성을 개략적으로 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 제어부(212)가 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 선택적 실시예에 따른 제어부(212)가 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.
도 6은 복수의 화자에 의해 발화된 음성을 시간의 흐름에 따라 표시한 도면이다.
도 7은 복수의 화자의 음성이 속하는 주파수 대역을 도시한 도면이다.
도 8은 입력 음향의 각 주파수 대역 별 증폭 정도를 도시한 도면이다.
도 9 내지 도 10은 본 발명의 일 실시예에 따른 사용자 단말(200)에 의해 수행되는 음성의 선택적 증폭 방법을 설명하기위한 흐름도이다.1 schematically shows the configuration of a system for selective amplification of speech according to an embodiment of the present invention.
2 is a diagram schematically illustrating the configuration of the audio output unit 100 according to an embodiment of the present invention.
3 is a diagram schematically illustrating a configuration of a user terminal 200 according to an embodiment of the present invention.
4 is a diagram for explaining a process in which the controller 212 identifies a target voice according to an embodiment of the present invention.
5 is a diagram for explaining a process in which the controller 212 identifies a target voice using a learned artificial neural network according to an optional embodiment of the present invention.
6 is a diagram illustrating voices uttered by a plurality of speakers according to the passage of time.
7 is a diagram illustrating a frequency band to which voices of a plurality of speakers belong.
8 is a diagram illustrating an amplification degree for each frequency band of an input sound.
9 to 10 are flowcharts for explaining a method for selectively amplifying voice performed by the user terminal 200 according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이러한 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 본 명세서에 기재되어 있는 특정 형상, 구조 및 특성은 본 발명의 정신과 범위를 벗어나지 않으면서 일 실시예로부터 다른 실시예로 변경되어 구현될 수 있다. 또한, 각각의 실시예 내의 개별 구성요소의 위치 또는 배치도 본 발명의 정신과 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 행하여지는 것이 아니며, 본 발명의 범위는 특허청구범위의 청구항들이 청구하는 범위 및 그와 균등한 모든 범위를 포괄하는 것으로 받아들여져야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 구성요소를 나타낸다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0023] Reference is made to the accompanying drawings, which show by way of illustration specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein may be implemented with changes from one embodiment to another without departing from the spirit and scope of the present invention. In addition, it should be understood that the location or arrangement of individual components within each embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the following detailed description is not to be taken in a limiting sense, and the scope of the present invention should be taken as encompassing the scope of the claims and all equivalents thereto. In the drawings, like reference numerals refer to the same or similar elements throughout the various aspects.

이하에서는, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 여러 실시예에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to enable those of ordinary skill in the art to easily practice the present invention.

도 1은 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템의 구성을 개략적으로 도시한 도면이다. 도 2는 본 발명의 일 실시예에 따른 음성 출력 유닛(100)의 구성을 개략적으로 도시한 도면이다. 도 3은 본 발명의 일 실시예에 따른 사용자 단말(200)의 구성을 개략적으로 도시한 도면이다. 이하에서는 도 1내지 도 3을 함께 참조하여 설명한다.1 is a diagram schematically illustrating the configuration of a system for selective amplification of speech according to an embodiment of the present invention. 2 is a diagram schematically illustrating the configuration of the audio output unit 100 according to an embodiment of the present invention. 3 is a diagram schematically illustrating a configuration of a user terminal 200 according to an embodiment of the present invention. Hereinafter, it will be described with reference to FIGS. 1 to 3 together.

본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템은 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 식별된 목표 음성만을 증폭하여 제공할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템은 음성 출력 유닛(100), 사용자 단말(200) 및 뇌파 감지 장치(300)를 포함할 수 있다.The system for selectively amplifying a voice according to an embodiment of the present invention may identify a target voice that the user wants to listen to based on the user's brain waves, and amplify and provide only the identified target voice. To this end, the system for selectively amplifying voice according to an embodiment of the present invention may include a voice output unit 100 , a user terminal 200 , and an EEG sensing device 300 .

본 발명의 일 실시예에 따른 음성 출력 유닛(100)은 사용자 단말(200)이 생성하거나 스스로 생성한 제어신호에 기초하여, 사용자의 청취를 위한 소리를 출력하는 다양한 형태의 수단을 의미할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 음성 출력 유닛(100)은 통신부(111), 제어부(112), 음향 입력부(113), 음향 출력부(114) 및 메모리(115)를 포함할 수 있다.The voice output unit 100 according to an embodiment of the present invention may refer to various types of means for outputting a sound for a user's listening based on a control signal generated by the user terminal 200 or generated by itself. . To this end, the voice output unit 100 according to an embodiment of the present invention may include a communication unit 111 , a control unit 112 , a sound input unit 113 , a sound output unit 114 , and a memory 115 .

본 발명의 일 실시예에 따른 통신부(111)는 음성 출력 유닛(100)이 사용자 단말(200) 및/또는 뇌파 감지 장치(300)와 데이터를 송수신하도록 하는 수단을 의미할 수 있다. 가령 통신부(111)는 음향 출력부(114)를 통하여 사용자 단말(200)로부터 사용자의 청각 기관으로 출력하고자 하는 음향 콘텐츠에 대한 데이터를 수신할 수 있다. 또한 통신부(111)는 음향 입력부(113)가 획득한 음향에 대한 데이터를 사용자 단말(200)에 제공할 수도 있다. 다만 이는 예시적인 것으로 통신부(111)의 역할이 이에 한정되는 것은 아니며, 가령 통신부(111)는 음성 출력 유닛(100)에 대한 사용자의 조작 신호 등을 사용자 단말(200)에 제공할 수도 있다.The communication unit 111 according to an embodiment of the present invention may mean a means for allowing the voice output unit 100 to transmit/receive data to and from the user terminal 200 and/or the EEG sensing device 300 . For example, the communication unit 111 may receive data about the sound content to be output to the user's auditory organ from the user terminal 200 through the sound output unit 114 . Also, the communication unit 111 may provide data on the sound acquired by the sound input unit 113 to the user terminal 200 . However, this is an example, and the role of the communication unit 111 is not limited thereto. For example, the communication unit 111 may provide a user's operation signal for the voice output unit 100 to the user terminal 200 .

한편 본 발명의 일 실시예에 따른 통신부(111)는 데이터의 송수신을 위해 유선 또는 무선으로 사용자 단말(200)과 연결될 수 있다. 이때 유선 또는 무선 연결 방식에는 다양한 공지의 방식이 사용될 수 있으므로, 구체적인 방식의 열거는 생략한다.Meanwhile, the communication unit 111 according to an embodiment of the present invention may be connected to the user terminal 200 by wire or wirelessly for data transmission/reception. In this case, since various well-known methods may be used for the wired or wireless connection method, a detailed enumeration of the method is omitted.

본 발명의 일 실시예에 따른 제어부(112)는 소정의 알고리즘에 따라 음성 출력 유닛(100)의 구성요소들, 즉 통신부(111), 음향 입력부(113), 음향 출력부(114) 및 메모리(115)의 제어를 수행할 수 있다. 가령 제어부(112)는 통신부(111)를 통하여 음향 입력부(113)가 획득한 입력 음향을 사용자 단말(200)로 전송하고, 사용자 단말(200)이 수신된 입력 음향에 기초하여 생성한 증폭 음성을 통신부(111)를 통하여 수신할 수 있다. 또한 제어부(112)는 수신된 증폭 음성을 음향 출력부(114)를 통하여 소리 형태로 출력할 수 있다.The control unit 112 according to an embodiment of the present invention includes the components of the voice output unit 100, that is, the communication unit 111, the sound input unit 113, the sound output unit 114, and the memory ( 115) can be performed. For example, the control unit 112 transmits the input sound obtained by the sound input unit 113 to the user terminal 200 through the communication unit 111, and the user terminal 200 generates an amplified voice based on the received input sound. It may be received through the communication unit 111 . Also, the controller 112 may output the received amplified voice in the form of a sound through the sound output unit 114 .

본 발명의 일 실시예에 따른 음향 입력부(113)는 사용자 주변의 소리를 전기신호 형태로 변환하는 다양한 종류의 수단을 의미할 수 있으며, 이와 반대로 음향 출력부(114)는 전기신호를 소리의 형태로 출력하는 다양한 종류의 수단을 의미할 수 있다. The sound input unit 113 according to an embodiment of the present invention may mean various types of means for converting sounds around the user into an electric signal form, and on the contrary, the sound output unit 114 converts the electric signal into a sound form. It may mean various kinds of means for outputting as .

가령 음향 출력부(114)는 사용자 단말(200)이 생성한 증폭 음성을 소리의 형태로 출력하여 사용자의 청각 기관에 제공할 수 있다. 또한 음향 출력부(114)는 사용자 단말(200)이 제공하는 다양한 종류의 음향 콘텐츠를 소리의 형태로 출력하여 사용자의 청각 기관에 제공할 수도 있다.For example, the sound output unit 114 may output the amplified voice generated by the user terminal 200 in the form of sound and provide it to the user's hearing organ. Also, the sound output unit 114 may output various types of sound content provided by the user terminal 200 in the form of sound and provide it to the user's hearing organ.

본 발명에서 설명하는 음성의 선택적 증폭 방법이 음성 출력 유닛(100)에 의해 수행되는 선택적 실시예에서, 음향 출력부(114)는 제어부(112)가 생성한 증폭 음성을 소리의 형태로 출력할 수도 있다. In an optional embodiment in which the method for selectively amplifying voice described in the present invention is performed by the voice output unit 100, the sound output unit 114 may output the amplified voice generated by the control unit 112 in the form of sound. have.

본 발명의 일 실시예에 따른 메모리(115)는 제어부(112)에 의해 수행되는 동작들에 대한 프로그램이 저장하는 수단을 의미할 수 있다.The memory 115 according to an embodiment of the present invention may mean a means for storing programs for operations performed by the controller 112 .

음성의 선택적 증폭 방법이 음성 출력 유닛(100)에 의해 수행되는 선택적 실시예에서, 메모리(115)는 사용자가 청취하고자 하는 목표 음성 및/또는 목표 음성의 특성값을 저장하는 수단을 의미할 수도 있다.In an optional embodiment in which the method for selectively amplifying voice is performed by the voice output unit 100, the memory 115 may mean a means for storing a target voice that the user wants to listen to and/or a characteristic value of the target voice. .

이와 같은 메모리(115)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(Random Access Memory), ROM(Read Only Memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(Permanent Mass Storage Device)를 포함할 수 있다. The memory 115 is a computer-readable recording medium, and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 사용자의 뇌파를 감지하여 사용자 단말(200) 및/또는 음성 출력 유닛(100)에 제공하는 수단을 의미할 수 있다. The EEG sensing apparatus 300 according to an embodiment of the present invention may mean a means for detecting a user's EEG and providing it to the user terminal 200 and/or the voice output unit 100 .

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 다양한 공지의 기법으로 구현될 수 있다. 가령 뇌파 감지 장치(300)는 도 1에 도시된 바와 같이 비침습형 방식으로 구현될 수도 있고, 침습형 방식으로 구현될 수도 있다. 또한 뇌파 감지 장치(300)는 뇌파 유도 방식으로 구현될 수도 있고, 뇌파 인식 방식으로 구현될 수도 있다. 다만 상술한 방식들은 예시적인 것으로 본 발명의 사상이 이에 한정되는 것은 아니며, 사용자의 뇌파를 감지할 수 있는 수단이면 본 발명의 뇌파 감지 장치(300)로 사용될 수 있다.The brain wave sensing apparatus 300 according to an embodiment of the present invention may be implemented by various known techniques. For example, the EEG sensing device 300 may be implemented in a non-invasive manner as shown in FIG. 1 or may be implemented in an invasive manner. In addition, the EEG sensing device 300 may be implemented as an EEG induction method or may be implemented as an EEG recognition method. However, the above-described methods are exemplary and the spirit of the present invention is not limited thereto, and any means capable of detecting the user's brain waves may be used as the brain wave sensing device 300 of the present invention.

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 사용자 단말(200) 및/또는 음성 출력 유닛(100)과 유선 및/또는 무선으로 연결될 수 있으며, 구체적인 통신 방식의 열거는 생략한다.The brain wave sensing apparatus 300 according to an embodiment of the present invention may be connected to the user terminal 200 and/or the voice output unit 100 by wire and/or wirelessly, and a detailed communication method will be omitted.

본 발명의 일 실시예에 따른 사용자 단말(200)은 뇌파 감지 장치(300)가 감지한 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 입력 음향에서 식별된 목표 음성만을 증폭하여 음성 출력 유닛(100)에 제공할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 사용자 단말(200)은 통신부(211), 제어부(212), 메모리(213) 및 디스플레이부(214)를 포함할 수 있다.The user terminal 200 according to an embodiment of the present invention identifies a target voice that the user wants to listen to based on the user's brain wave detected by the brain wave sensing device 300, and amplifies only the target voice identified from the input sound. may be provided to the audio output unit 100 . To this end, the user terminal 200 according to an embodiment of the present invention may include a communication unit 211 , a control unit 212 , a memory 213 , and a display unit 214 .

본 발명의 일 실시예에 따른 통신부(211)는 상술한 음성 출력 유닛(100) 및 뇌파 감지 장치(300)와 데이터를 송수신하기 위한 수단을 의미할 수 있다. 가령 통신부(211)는 뇌파 감지 장치(300)로부터 감지된 뇌파 데이터를 수신할 수 있다. 또한 통신부(211)는 생성된 증폭 음성을 음성 출력 유닛(100)에 제공할 수도 있다.The communication unit 211 according to an embodiment of the present invention may refer to a means for transmitting and receiving data to and from the above-described voice output unit 100 and the EEG sensing device 300 . For example, the communication unit 211 may receive EEG data sensed from the EEG sensing device 300 . Also, the communication unit 211 may provide the generated amplified voice to the voice output unit 100 .

한편 본 발명의 일 실시예에 따른 통신부(211)는 데이터의 송수신을 위해 유선 또는 무선으로 시스템의 구성요소들과 연결될 수 있다. 이때 유선 또는 무선 연결 방식에는 다양한 공지의 방식이 사용될 수 있으므로, 구체적인 방식의 열거는 생략한다.Meanwhile, the communication unit 211 according to an embodiment of the present invention may be connected to the components of the system by wire or wirelessly for data transmission/reception. In this case, since various well-known methods may be used for the wired or wireless connection method, a detailed enumeration of the method is omitted.

본 발명의 일 실시예에 따른 제어부(212)는 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 입력 음향에서 식별된 목표 음성만을 증폭하여 음성 출력 유닛(100)에 제공할 수 있다.The control unit 212 according to an embodiment of the present invention may identify a target voice that the user wants to listen to based on the user's brain waves, amplify only the target voice identified from the input sound, and provide it to the voice output unit 100 . have.

이와 같은 제어부(212)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(213) 또는 통신부(211)에 의해 프로세서로 제공될 수 있다. 예를 들어 제어부(212)는 메모리(213)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The control unit 212 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. The command may be provided to the processor by the memory 213 or the communication unit 211 . For example, the control unit 212 may be configured to execute a received command according to a program code stored in a recording device such as the memory 213 .

본 발명의 일 실시예에 따른 메모리(213)는 제어부(212)에 의해 수행되는 동작들에 대한 프로그램이 저장하는 수단을 의미할 수 있다.The memory 213 according to an embodiment of the present invention may mean a means for storing programs for operations performed by the controller 212 .

음성의 선택적 증폭 방법이 사용자 단말(200)에 의해 수행되는 실시예에서, 메모리(213)는 사용자가 청취하고자 하는 목표 음성 및/또는 목표 음성의 특성값을 저장하는 수단을 의미할 수도 있다.In an embodiment in which the method for selectively amplifying a voice is performed by the user terminal 200 , the memory 213 may mean a means for storing a target voice that the user wants to listen to and/or a characteristic value of the target voice.

이와 같은 메모리(213)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(Random Access Memory), ROM(Read Only Memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(Permanent Mass Storage Device)를 포함할 수 있다. The memory 213 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.

본 발명의 일 실시예에 따른 디스플레이부(214)는 음성의 선택적 증폭 과정 중 사용자의 확인이 필요한 정보 등을 표시하는 수단을 의미할 수 있다. 가령 디스플레이부(214)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 다만 이는 예시적인 것으로 본 발명의 사상이 이에 한정되는 것은 아니다.The display unit 214 according to an embodiment of the present invention may refer to a means for displaying information that needs to be confirmed by the user during the selective amplification of voice. For example, the display unit 214 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. However, this is an example, and the spirit of the present invention is not limited thereto.

또한, 다른 실시예에서 사용자 단말(200)은 전술한 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 이와 같은 사용자 단말(200)은 휴대용 단말(201, 202, 203)일 수도 있고 퍼스널 컴퓨터(204)일 수도 있다. Also, in another embodiment, the user terminal 200 may include more components than the above-described components. Such a user terminal 200 may be the portable terminals 201 , 202 , 203 or the personal computer 204 .

본 발명에 따른 음성의 선택적 증폭 방법은 상술한 기재에서 선택적으로 기재된 바와 같이, 사용자 단말(200)에 의해 수행될 수도 있고, 음성 출력 유닛(100)에 의해 수행될 수도 있다. The method for selectively amplifying voice according to the present invention may be performed by the user terminal 200 or the voice output unit 100 as selectively described in the above description.

다만 이하에서는 설명의 편의를 위하여 음성의 선택적 증폭 방법이 사용자 단말(200)에 의해 수행됨을 전제로, 사용자 단말(200)의 제어부(212)의 동작을 중심으로 설명한다.However, hereinafter, for convenience of explanation, on the assumption that the method for selectively amplifying voice is performed by the user terminal 200 , the operation of the control unit 212 of the user terminal 200 will be mainly described.

본 발명의 일 실시예에 따른 사용자 단말(200)의 제어부(212)는 음성 출력 유닛(100)을 통하여 복수의 후보 음성(즉 복수의 후보 음성에 따른 음향 콘텐츠)을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 재생된 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별할 수 있다. 이때 사용자의 뇌파는 뇌파 감지 장치(300)가 획득하여 사용자 단말(200)에 제공한 것 일 수 있다. 본 발명에서 때때로 '음성'은 화자의 '음성 그 자체'를 의미할 수도 있고, 화자의 '음성에 따른 음향 콘텐츠'를 의미할 수 있다.The control unit 212 of the user terminal 200 according to an embodiment of the present invention reproduces a plurality of candidate voices (that is, sound content according to the plurality of candidate voices) through the voice output unit 100 in time series, while the user It is possible to identify a target voice that the user wants to hear from among a plurality of candidate voices reproduced based on the EEG of the . In this case, the user's EEG may be obtained by the EEG sensing device 300 and provided to the user terminal 200 . In the present invention, sometimes 'voice' may mean 'voice itself' of the speaker or 'sound content according to the speaker's voice'.

도 4는 본 발명의 일 실시예에 따른 제어부(212)가 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a process in which the controller 212 identifies a target voice according to an embodiment of the present invention.

설명의 편의를 위해, 도 4에 도시된 바와 같이 4 명의 화자의 음성(411, 412, 413, 414)을 포함하는 복수의 후보 음성(410)이 음성 출력 유닛(100)을 통하여 출력되었고, 복수의 후보 음성(410)을 청취하는 과정 중 각각의 화자의 음성에 대한 사용자의 뇌파(511, 512, 513, 514)가 도시된 바와 같음을 전제로 설명한다.For convenience of explanation, as shown in FIG. 4 , a plurality of candidate voices 410 including voices 411 , 412 , 413 , and 414 of four speakers are output through the voice output unit 100 , In the process of listening to the candidate voice 410 of , it is assumed that the user's brain waves 511, 512, 513, and 514 for each speaker's voice are as shown.

본 발명의 일 실시예에 따른 제어부(212)는 복수의 후보 음성(410)의 청취 중 뇌파(510)의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 제어부(212)는 뇌파(510)의 진동수 및/또는 진폭이 소정의 조건을 만족하는지 여부에 기초하여 목표 음성을 식별할 수 있다.The controller 212 according to an embodiment of the present invention may identify a voice in which at least one of the physical quantities of the EEG 510 satisfies a predetermined condition while listening to the plurality of candidate voices 410 as the target voice. For example, the controller 212 may identify the target voice based on whether the frequency and/or amplitude of the brain wave 510 satisfies a predetermined condition.

본 발명의 일 실시예에 따른 제어부(212)는 복수의 후보 음성(410) 중 뇌파의 진동수가 가장 큰 음성일 조건 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 도 4에서 제어부(212)는 4 명의 화자의 음성(411, 412, 413, 414) 중에서 진동수가 가장 크고 진폭이 가장 작은 뇌파(512)를 나타내는 음성인 화자 2의 음성(412)을 목표 음성으로 식별할 수 있다.The control unit 212 according to an embodiment of the present invention selects a voice that satisfies at least one of the condition that the EEG frequency is the highest among the plurality of candidate voices 410 and the condition that the EEG has the smallest amplitude. can be identified as For example, in FIG. 4 , the controller 212 selects the voice 412 of speaker 2, which is the voice representing the brain wave 512 with the largest frequency and smallest amplitude among the voices 411, 412, 413, and 414 of the four speakers. can be identified as

선택적 실시예에서, 제어부(212)는 학습된 인공 신경망을 이용하여 사용자가 청취하고자 하는 목표 음성을 결정할 수 있다. 이때 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여 학습된 것으로, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.In an optional embodiment, the controller 212 may determine a target voice that the user wants to listen to by using the learned artificial neural network. In this case, the artificial neural network is learned based on learning data including EEG data in which whether the user wants to listen is marked, and may be a neural network trained to output whether the user wants to listen in response to the input of EEG data.

도 5는 본 발명의 선택적 실시예에 따른 제어부(212)가 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다. 설명의 편의를 위하여, 도 4과 동일한 가정을 전제로 설명한다.5 is a diagram for explaining a process in which the controller 212 identifies a target voice using a learned artificial neural network according to an optional embodiment of the present invention. For convenience of description, the same assumptions as in FIG. 4 are assumed.

본 발명의 일 실시예에 따른 제어부(212)는 각 4 명의 화자의 음성(411, 412, 413, 414)에 대한 뇌파(511, 512, 513, 514) 각각을 학습된 인공 신경망(610)에 입력하여, 각 뇌파(511, 512, 513, 514) 별 목표 음성 여부(621, 622, 623, 624)를 확인할 수 있다. The control unit 212 according to an embodiment of the present invention transmits each of the brain waves 511, 512, 513, and 514 for the voices 411, 412, 413, and 414 of each of the four speakers to the learned artificial neural network 610. By inputting, it is possible to check whether the target voice 621 , 622 , 623 , 624 for each EEG 511 , 512 , 513 , 514 is performed.

이때 인공 신경망(610)은 학습 방식에 따라서 도 5에 도시된 바와 같이 바이너리(Binary) 방식으로 목표 음성인지 여부(620)를 출력할 수도 있고, 목표 음성일 확률을 출력할 수도 있다. 인공 신경망(610)이 목표 음성일 확률을 출력하도록 학습되는 실시예에서, 제어부(212)는 확률이 소정의 조건을 만족하는 음성을 목표 음성으로 결정할 수도 있다.At this time, according to the learning method, the artificial neural network 610 may output whether the target voice 620 is a target voice or not, as shown in FIG. 5, in a binary scheme, or may output a probability of the target voice. In an embodiment in which the artificial neural network 610 is trained to output a probability of being a target voice, the controller 212 may determine a voice whose probability satisfies a predetermined condition as the target voice.

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해서 결정된 목표 음성의 적어도 하나의 특성값을 추출할 수 있다. 이때 특성값은 가령 목표 음성의 화자의 성별, 목표 음성의 음고, 목표 음성의 진동수 및 목표 음성의 발화 패턴 중 적어도 하나일 수 있다. 다만 이와 같은 특성값은 예시적인 것으로, 목표 음성의 특성을 계량하여 나타낼 수 있는 방식이면 본 발명의 특성값 추출 방식으로 사용될 수 있다. 선택적 실시예에서 제어부(212)는 목표 음성의 특성값을 수 차원의 벡터(Vector) 형태로 추출할 수도 있다. The controller 212 according to an embodiment of the present invention may extract at least one characteristic value of the target voice determined by the above-described process. In this case, the characteristic value may be, for example, at least one of the gender of the speaker of the target voice, the pitch of the target voice, the frequency of the target voice, and an utterance pattern of the target voice. However, such a characteristic value is an example, and as long as it is a method capable of quantifying and expressing the characteristic of the target voice, it can be used as the characteristic value extraction method of the present invention. In an optional embodiment, the control unit 212 may extract the characteristic value of the target voice in the form of a multi-dimensional vector (Vector).

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해 추출된 목표 음성의 특성값을 이용하여, 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인할 수 있다. 또한 제어부(212)는 입력 음향에 목표 음성이 포함된 경우 목표 음성만을 증폭하여 증폭 음성을 생성할 수 있다.The controller 212 according to an embodiment of the present invention may check whether the target voice is included in the input sound by using the characteristic value of the target voice extracted by the above-described process. Also, when the target voice is included in the input sound, the controller 212 may amplify only the target voice to generate an amplified voice.

이때 '입력 음향'은 실시간으로 입력되는 음향으로 가령 사용자 단말(200)에 의해 감지되거나, 음성 출력 유닛(100)의 음향 입력부(113)에 의해 감지된 것 일 수 있다. 한편 입력 음향은 도 6에 도시된 바와 같이 복수의 화자에 의해 발화된 발화 음성을 포함할 수 있다. 가령 입력 음향은 도 6에 도시된 바와 같이 3 명의 화자에 의해 발화된 음성을 포함할 수 있고, 서로 발화 시구간을 달리하는 음성들(S1_V1, S2_V1, S3_V1, S1_V2)을 포함할 수 있다.In this case, the 'input sound' may be a sound input in real time, for example, detected by the user terminal 200 or detected by the sound input unit 113 of the voice output unit 100 . Meanwhile, as shown in FIG. 6 , the input sound may include spoken voices uttered by a plurality of speakers. For example, as shown in FIG. 6 , the input sound may include voices uttered by three speakers, and voices S1_V1 , S2_V1 , S3_V1 , and S1_V2 having different utterance time periods.

본 발명의 일 실시예에 따른 제어부(212)는 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인하기 위해, 복수의 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인할 수 있다. 가령 제어부(212)는 t1 시점에서 음성(S2_V1)의 발화 특성값을 확인할 수 있다. The controller 212 according to an embodiment of the present invention may check an utterance characteristic value, which is a characteristic value of a speaker's voice uttered at a plurality of viewpoints, in order to determine whether the target voice is included in the input sound. For example, the controller 212 may check the utterance characteristic value of the voice S2_V1 at time t1.

또한 제어부(212)는 확인된 발화 특성값과 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인할 수 있다. 제어부(212)는 확인된 유사도가 소정의 임계 유사도를 초과하는 경우 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다. Also, the controller 212 may check whether the similarity between the identified speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity. When the confirmed similarity exceeds a predetermined threshold similarity, the controller 212 may adjust at least one of the output characteristic values of the amplified voice.

가령 도 4 및 도 5에서 설명한 과정에 따라 화자 2의 음성(412)이 사용자의 목표 음성으로 식별되었고, 제1 시점(t1) 내지 제2 시점(t2)에서 화자 2의 발화가 발생한 경우, 제어부(212)는 음성(412)과 음성(S2_V1)의 유사도가 임계 유사도를 초과하는 것으로 판단할 수 있다. 물론 제어부(212)는 판단된 유사도가 임계 유사도를 초과함에 따라 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.For example, when the speaker 2's voice 412 is identified as the user's target voice according to the process described with reference to FIGS. 4 and 5 and the speaker 2 utterance occurs between the first time points t1 and the second time points t2, the controller At 212, it may be determined that the similarity between the voice 412 and the voice S2_V1 exceeds a threshold similarity. Of course, as the determined similarity exceeds a threshold similarity, the controller 212 may adjust at least one of the output characteristic values of the amplified speech.

본 발명의 일 실시예에 따른 제어부(212)가 증폭 음성의 출력 특성값을 조절하는 과정을 자세히 살펴보면, 제어부(212)는 제1 시점(t1)에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인할 수 있다. 가령 제어부(212)는 도 7에 도시된 바와 같이 제1 시점(t1)에서 발화하는 화자인 화자 2의 음성이 속하는 주파수 대역(S2_f, 즉 f1 내지 f2)을 확인할 수 있다. 물론 제어부(212)는 다른 시점에서 발화하는 화자들의 음성이 속하는 주파수 대역을 확인할 수도 있다. 가령 제어부(212)는 화자 1의 음성이 속하는 주파수 대역(S1_f)과 화자 3의 음성이 속하는 주파수 대역(S3_f)을 확인할 수도 있다.Looking in detail at the process in which the control unit 212 adjusts the output characteristic value of the amplified voice according to an embodiment of the present invention, the control unit 212 checks the frequency band to which the speaker's voice uttered at the first time point t1 belongs. can For example, as shown in FIG. 7 , the controller 212 may identify the frequency bands S2_f, that is, f1 to f2, to which the voice of the speaker 2, who is the speaker speaking at the first time point t1, belongs. Of course, the controller 212 may check the frequency band to which the voices of the speakers uttered at different points of time belong. For example, the controller 212 may identify a frequency band S1_f to which the speaker 1's voice belongs and a frequency band S3_f to which the speaker 3 voice belongs.

이어서 제어부(212)는 도 8에 도시된 바와 같이 입력 음향에서 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.Subsequently, as shown in FIG. 8 , the controller 212 may set the amplification level of the sound in the frequency band S2_f identified in the input sound to be higher than the amplification level of the sound in a band other than the confirmed frequency band S2_f.

본 발명의 일 실시예에 따른 제어부(212)는 제1 시점(t1)과 제2 시점(t2) 내에서만 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다. 이때 제1 시점(t1)은 화자(특히 목표 음성의 화자)의 발화가 시작된 시점을 의미할 수 있고, 제2 시점(t2)은 제1 시점(t1)에서 발화를 시작한 화자의 음성(또는 발화)이 종료되는 시점을 의미할 수 있다.The control unit 212 according to an embodiment of the present invention determines the degree of amplification of the sound in the frequency band S2_f identified only within the first time point t1 and the second time point t2 other than the confirmed frequency band S2_f. It can be set higher than the amplification level of the sound of the band. In this case, the first time point t1 may mean a time point at which the speaker (particularly, the speaker of the target voice) started speaking, and the second time point t2 is the speaker's voice (or the speech) who started the speech at the first time point t1 . ) may mean the end time.

본 발명의 일 실시예에 따른 제어부(212)는 제2 시점(t2)부터는 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The controller 212 according to an embodiment of the present invention sets the amplification level of the sound in the confirmed frequency band S2_f from the second time point t2 to the same as the amplification level of the sound in a band other than the confirmed frequency band S2_f. can be set to

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해 생성된 증폭 음성을 출력할 수 있다. 가령 제어부(212)는 음성 출력 유닛(100)을 통하여 생성된 증폭 음성을 출력할 수 있다.The controller 212 according to an embodiment of the present invention may output the amplified voice generated by the above-described process. For example, the controller 212 may output the amplified voice generated through the voice output unit 100 .

선택적 실시예에서, 제어부(212)는 입력 음향을 상쇄시키는 상쇄 음향과 함께 증폭 음성을 출력할 수 있다. 이때 상쇄 음향은 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다. 이로써 본 발명은 증폭된 음성만 또렷하게 사용자에게 제공할 수 있다.In an optional embodiment, the controller 212 may output an amplified voice together with a cancellation sound that cancels the input sound. In this case, the offset sound may be a sound having the same amplitude and opposite phase to the input sound. Accordingly, the present invention can clearly provide only the amplified voice to the user.

도 9 내지 도 10은 본 발명의 일 실시예에 따른 사용자 단말(200)에 의해 수행되는 음성의 선택적 증폭 방법을 설명하기위한 흐름도이다. 이하에서는 도 1 내지 도 8에서 설명한 내용과 중복되는 내용의 설명은 생략하되, 도 1 내지 도 8을 함께 참조하여 설명한다.9 to 10 are flowcharts for explaining a method for selectively amplifying voice performed by the user terminal 200 according to an embodiment of the present invention. Hereinafter, descriptions of the contents overlapping those described with reference to FIGS. 1 to 8 will be omitted, but will be described with reference to FIGS. 1 to 8 together.

본 발명의 일 실시예에 따른 사용자 단말(200)은 음성 출력 유닛(100)을 통하여 복수의 후보 음성(즉 복수의 후보 음성에 따른 음향 콘텐츠)을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 재생된 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별할 수 있다.(S910) 이때 사용자의 뇌파는 뇌파 감지 장치(300)가 획득하여 사용자 단말(200)에 제공한 것 일 수 있다. 본 발명에서 때때로 '음성'은 화자의 '음성 그 자체'를 의미할 수도 있고, 화자의 '음성에 따른 음향 콘텐츠'를 의미할 수 있다.The user terminal 200 according to an embodiment of the present invention reproduces a plurality of candidate voices (that is, sound content according to the plurality of candidate voices) through the voice output unit 100 in time-series, based on the user's brain waves. A target voice to be heard by the user may be identified from among the plurality of reproduced candidate voices. (S910) In this case, the user's brainwave may be obtained by the brainwave sensing device 300 and provided to the user terminal 200 . In the present invention, sometimes 'voice' may mean 'voice itself' of the speaker or 'sound content according to the speaker's voice'.

다시 도 4를 참조하여, 본 발명의 일 실시예에 따른 사용자 단말(200)이 목표 음성을 식별하는 과정을 설명한다. 설명의 편의를 위해, 도 4에 도시된 바와 같이 4 명의 화자의 음성(411, 412, 413, 414)을 포함하는 복수의 후보 음성(410)이 음성 출력 유닛(100)을 통하여 출력되었고, 복수의 후보 음성(410)을 청취하는 과정 중 각각의 화자의 음성에 대한 사용자의 뇌파(511, 512, 513, 514)가 도시된 바와 같음을 전제로 설명한다.Referring again to FIG. 4 , a process in which the user terminal 200 identifies a target voice according to an embodiment of the present invention will be described. For convenience of explanation, as shown in FIG. 4 , a plurality of candidate voices 410 including voices 411 , 412 , 413 , and 414 of four speakers are output through the voice output unit 100 , In the process of listening to the candidate voice 410 of , it is assumed that the user's brain waves 511, 512, 513, and 514 for each speaker's voice are as shown.

본 발명의 일 실시예에 따른 사용자 단말(200)은 복수의 후보 음성(410)의 청취 중 뇌파(510)의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 사용자 단말(200)은 뇌파(510)의 진동수 및/또는 진폭이 소정의 조건을 만족하는지 여부에 기초하여 목표 음성을 식별할 수 있다.The user terminal 200 according to an embodiment of the present invention may identify a voice in which at least one of the physical quantities of the brain wave 510 satisfies a predetermined condition while listening to the plurality of candidate voices 410 as the target voice. For example, the user terminal 200 may identify the target voice based on whether the frequency and/or amplitude of the brain wave 510 satisfies a predetermined condition.

본 발명의 일 실시예에 따른 사용자 단말(200)은 복수의 후보 음성(410) 중 뇌파의 진동수가 가장 큰 음성일 조건 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 도 4에서 사용자 단말(200)은 4 명의 화자의 음성(411, 412, 413, 414) 중에서 진동수가 가장 크고 진폭이 가장 작은 뇌파(512)를 나타내는 음성인 화자 2의 음성(412)을 목표 음성으로 식별할 수 있다.The user terminal 200 according to an embodiment of the present invention targets a voice that satisfies any one or more of the condition that the EEG frequency is the highest among the plurality of candidate voices 410 and the condition that the EEG amplitude is the smallest. can be identified by voice. For example, in FIG. 4 , the user terminal 200 targets the voice 412 of speaker 2, which is a voice representing an EEG 512 with the largest frequency and smallest amplitude among the voices 411, 412, 413, and 414 of the four speakers. can be identified by voice.

선택적 실시예에서, 사용자 단말(200)은 학습된 인공 신경망을 이용하여 사용자가 청취하고자 하는 목표 음성을 결정할 수 있다. 이때 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여 학습된 것으로, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.In an optional embodiment, the user terminal 200 may determine a target voice that the user wants to listen to by using the learned artificial neural network. In this case, the artificial neural network is learned based on learning data including EEG data in which whether the user wants to listen is marked, and may be a neural network trained to output whether the user wants to listen in response to the input of EEG data.

도 5는 본 발명의 선택적 실시예에 따른 사용자 단말(200)이 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다. 설명의 편의를 위하여, 도 4과 동일한 가정을 전제로 설명한다.5 is a diagram for explaining a process in which the user terminal 200 identifies a target voice using a learned artificial neural network according to an optional embodiment of the present invention. For convenience of description, the same assumptions as in FIG. 4 are assumed.

본 발명의 일 실시예에 따른 사용자 단말(200)은 각 4 명의 화자의 음성(411, 412, 413, 414)에 대한 뇌파(511, 512, 513, 514) 각각을 학습된 인공 신경망(610)에 입력하여, 각 뇌파(511, 512, 513, 514) 별 목표 음성 여부(621, 622, 623, 624)를 확인할 수 있다. The user terminal 200 according to an embodiment of the present invention learns each of the EEGs 511 , 512 , 513 , and 514 for the voices 411 , 412 , 413 , 414 of each of the four speakers through an artificial neural network 610 . By inputting to , it is possible to check whether the target voice 621 , 622 , 623 , 624 for each EEG 511 , 512 , 513 , 514 is performed.

이때 인공 신경망(610)은 학습 방식에 따라서 도 5에 도시된 바와 같이 바이너리(Binary) 방식으로 목표 음성인지 여부(620)를 출력할 수도 있고, 목표 음성일 확률을 출력할 수도 있다. 인공 신경망(610)이 목표 음성일 확률을 출력하도록 학습되는 실시예에서, 사용자 단말(200)은 확률이 소정의 조건을 만족하는 음성을 목표 음성으로 결정할 수도 있다.At this time, according to the learning method, the artificial neural network 610 may output whether the target voice 620 is a target voice or not, as shown in FIG. 5, in a binary scheme, or may output a probability of the target voice. In an embodiment in which the artificial neural network 610 is trained to output a probability of being a target voice, the user terminal 200 may determine a voice whose probability satisfies a predetermined condition as the target voice.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해서 결정된 목표 음성의 적어도 하나의 특성값을 추출할 수 있다.(S920) 이때 특성값은 가령 목표 음성의 화자의 성별, 목표 음성의 음고, 목표 음성의 진동수 및 목표 음성의 발화 패턴 중 적어도 하나일 수 있다. 다만 이와 같은 특성값은 예시적인 것으로, 목표 음성의 특성을 계량하여 나타낼 수 있는 방식이면 본 발명의 특성값 추출 방식으로 사용될 수 있다. 선택적 실시예에서 사용자 단말(200)은 목표 음성의 특성값을 수 차원의 벡터(Vector) 형태로 추출할 수도 있다. The user terminal 200 according to an embodiment of the present invention may extract at least one characteristic value of the target voice determined by the above-described process. (S920) In this case, the characteristic value is, for example, the gender and target of the speaker of the target voice. It may be at least one of a pitch of a voice, a frequency of a target voice, and an utterance pattern of the target voice. However, such a characteristic value is an example, and as long as it is a method capable of quantifying and expressing the characteristic of the target voice, it can be used as the characteristic value extraction method of the present invention. In an optional embodiment, the user terminal 200 may extract the characteristic value of the target voice in the form of a vector of several dimensions.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해 추출된 목표 음성의 특성값을 이용하여, 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인할 수 있다. 또한 사용자 단말(200)은 입력 음향에 목표 음성이 포함된 경우 목표 음성만을 증폭하여 증폭 음성을 생성할 수 있다.(S930)The user terminal 200 according to an embodiment of the present invention may check whether the target voice is included in the input sound by using the characteristic value of the target voice extracted by the above-described process. Also, when the target voice is included in the input sound, the user terminal 200 may amplify only the target voice to generate an amplified voice (S930).

본 발명의 일 실시예에 따른 사용자 단말(200)은 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인하기 위해, 복수의 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인할 수 있다.(S931) 가령 사용자 단말(200)은 t1 시점에서 음성(S2_V1)의 발화 특성값을 확인할 수 있다. The user terminal 200 according to an embodiment of the present invention may check an utterance characteristic value, which is a characteristic value of a speaker's voice uttered at a plurality of viewpoints, in order to determine whether the target voice is included in the input sound. (S931) For example, the user terminal 200 may check the utterance characteristic value of the voice S2_V1 at time t1.

또한 사용자 단말(200)은 확인된 발화 특성값과 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인할 수 있다.(S932) 사용자 단말(200)은 확인된 유사도가 소정의 임계 유사도를 초과하는 경우 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.(S933) In addition, the user terminal 200 may determine whether the similarity between the identified speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity. When the similarity is exceeded, at least one of the output characteristic values of the amplified voice may be adjusted (S933).

가령 도 4 및 도 5에서 설명한 과정에 따라 화자 2의 음성(412)이 사용자의 목표 음성으로 식별되었고, 제1 시점(t1) 내지 제2 시점(t2)에서 화자 2의 발화가 발생한 경우, 사용자 단말(200)은 음성(412)과 음성(S2_V1)의 유사도가 임계 유사도를 초과하는 것으로 판단할 수 있다. 물론 사용자 단말(200)은 판단된 유사도가 임계 유사도를 초과함에 따라 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.For example, when the speaker 2's voice 412 is identified as the user's target voice according to the process described with reference to FIGS. 4 and 5 and the speaker 2 utterance occurs between the first time points t1 and the second time points t2, the user The terminal 200 may determine that the similarity between the voice 412 and the voice S2_V1 exceeds a threshold similarity. Of course, as the determined similarity exceeds a threshold similarity, the user terminal 200 may adjust at least one of the output characteristic values of the amplified voice.

본 발명의 일 실시예에 따른 사용자 단말(200)이 증폭 음성의 출력 특성값을 조절하는 과정을 자세히 살펴보면, 사용자 단말(200)은 제1 시점(t1)에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인할 수 있다. 가령 사용자 단말(200)은 도 7에 도시된 바와 같이 제1 시점(t1)에서 발화하는 화자인 화자 2의 음성이 속하는 주파수 대역(S2_f, 즉 f1 내지 f2)을 확인할 수 있다. 물론 사용자 단말(200)은 다른 시점에서 발화하는 화자들의 음성이 속하는 주파수 대역을 확인할 수도 있다. 가령 사용자 단말(200)은 화자 1의 음성이 속하는 주파수 대역(S1_f)과 화자 3의 음성이 속하는 주파수 대역(S3_f)을 확인할 수도 있다.Looking in detail at the process in which the user terminal 200 adjusts the output characteristic value of the amplified voice according to an embodiment of the present invention, the user terminal 200 is a frequency band to which the speaker's voice uttered at the first time point t1 belongs. can confirm. For example, as shown in FIG. 7 , the user terminal 200 may identify the frequency bands S2_f, that is, f1 to f2, to which the voice of the speaker 2, the speaker, uttered at the first time point t1 belongs. Of course, the user terminal 200 may check the frequency band to which the voices of the speakers uttered at different points of time belong. For example, the user terminal 200 may identify the frequency band S1_f to which the speaker 1's voice belongs and the frequency band S3_f to which the speaker 3 voice belongs.

이어서 사용자 단말(200)은 도 8에 도시된 바와 같이 입력 음향에서 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.Subsequently, the user terminal 200 may set the amplification degree of the sound in the frequency band S2_f identified in the input sound higher than the amplification degree of the sound in a band other than the confirmed frequency band S2_f as shown in FIG. 8 . .

본 발명의 일 실시예에 따른 사용자 단말(200)은 제1 시점(t1)과 제2 시점(t2) 내에서만 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다. 이때 제1 시점(t1)은 화자(특히 목표 음성의 화자)의 발화가 시작된 시점을 의미할 수 있고, 제2 시점(t2)은 제1 시점(t1)에서 발화를 시작한 화자의 음성(또는 발화)이 종료되는 시점을 의미할 수 있다.The user terminal 200 according to an embodiment of the present invention determines the amplification degree of the sound of the frequency band S2_f identified only within the first time point t1 and the second time point t2 other than the confirmed frequency band S2_f. It can be set higher than the level of amplification of the sound of the band. In this case, the first time point t1 may mean a point in time when the speaker (particularly, the speaker of the target voice) started speaking, and the second time point t2 is the speaker's voice (or the speech) who started the speech at the first time point t1. ) may mean the end time.

본 발명의 일 실시예에 따른 사용자 단말(200)은 제2 시점(t2)부터는 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The user terminal 200 according to an embodiment of the present invention determines the amplification degree of the sound of the confirmed frequency band S2_f from the second time point t2 with the amplification degree of the sound in a band other than the confirmed frequency band S2_f. can be set the same.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해 생성된 증폭 음성을 출력할 수 있다.(S940) 가령 사용자 단말(200)은 음성 출력 유닛(100)을 통하여 생성된 증폭 음성을 출력할 수 있다.The user terminal 200 according to an embodiment of the present invention may output the amplified voice generated by the above-described process ( S940 ). For example, the user terminal 200 may be amplified through the voice output unit 100 . Audio can be output.

선택적 실시예에서, 사용자 단말(200)은 입력 음향을 상쇄시키는 상쇄 음향과 함께 증폭 음성을 출력할 수 있다. 이때 상쇄 음향은 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다. 이로써 본 발명은 증폭된 음성만 또렷하게 사용자에게 제공할 수 있다.In an optional embodiment, the user terminal 200 may output an amplified voice together with a cancellation sound that cancels the input sound. In this case, the offset sound may be a sound having the same amplitude and opposite phase to the input sound. Accordingly, the present invention can clearly provide only the amplified voice to the user.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

100: 음성 출력 유닛
111: 통신부
112: 제어부
113: 음향 입력부
114: 음향 출력부
115: 메모리
200: 사용자 단말
211: 통신부
212: 제어부
213: 메모리
214: 디스플레이부
300: 뇌파 감지 장치100: audio output unit
111: communication department
112: control unit
113: sound input unit
114: sound output unit
115: memory
200: user terminal
211: communication department
212: control unit
213: memory
214: display unit
300: brain wave detection device

Claims

A method for selectively amplifying a specific speaker's voice, the method comprising:
A step of reproducing a plurality of candidate voices in time series and identifying a target voice to be heard by the user from among the plurality of candidate voices based on the user's brain waves, wherein at least one of the physical quantities of the brain waves while listening to the plurality of candidate voices identifying a voice in which one satisfies a predetermined condition as the target voice;
extracting at least one characteristic value of the target voice;
checking whether the target voice is included in the input sound based on the characteristic value, and when the target voice is included, amplifying only the target voice to generate an amplified voice; and
Including; outputting the amplified voice;
The physical quantity is
Including any one or more of the frequency of the brain wave and the amplitude of the brain wave,
The predetermined condition is
A condition that the frequency of the user's brain wave is the highest among the plurality of candidate voices; and
The method for selectively amplifying a voice, including any one or more of a condition that the user's brainwave has the smallest amplitude among the plurality of candidate voices.

delete

The method according to claim 1
The step of identifying the target voice is
Determining a target voice to be heard by the user by using the learned artificial neural network;
The artificial neural network
A method for selective amplification of voice, which is a neural network trained to output whether the user wants to listen in response to the input of the EEG data, based on the learning data including the EEG data in which the user's listening desire is marked.

The method according to claim 1
The at least one characteristic value is
The method for selectively amplifying a voice, comprising at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and an utterance pattern of the target voice.

The method according to claim 1
The input sound is
A sound input in real time, including a plurality of spoken voices uttered by a plurality of speakers,
The step of generating the amplified voice is
checking an utterance characteristic value that is a characteristic value of a speaker's voice uttering at a first time point;
checking whether the similarity between the speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity; and
and adjusting at least one of the output characteristic values of the amplified voice when the predetermined threshold similarity is exceeded.

7. The method of claim 6
The step of adjusting at least one of the output characteristic values includes:
identifying a frequency band to which a speaker's voice uttered at the first time point belongs; and
In the input sound, setting the amplification level of the sound in the identified frequency band to be higher than the amplification level of the sound in a band other than the identified frequency band;

8. The method of claim 7
The step of generating the amplified voice is
detecting a second time point at which the speaker's voice uttered at the first time point ends;
Setting the amplification degree of the sound in the identified frequency band from the second time point to be the same as the amplification degree of the sound in a band other than the identified frequency band;

The method according to claim 1
The step of outputting the amplified voice is
outputting a cancellation sound for canceling the input sound; and
Outputting the amplified voice; Containing, a selective amplification method of the voice.

10. The method of claim 9
The canceling sound is
The method of claim 1, wherein the input sound has the same amplitude and is opposite in phase to the input sound.

using a computer
A computer program stored in a medium for executing the method of any one of claims 1 and 4 to 10.

A selective amplification system for selectively amplifying a specific speaker's voice, the system comprising:
EEG sensing device for detecting the user's EEG;
a voice output unit for selectively outputting amplified voice to the user's auditory organ; and
a user terminal for generating the amplified voice based on the brain wave detected by the brain wave sensing device and providing it to the voice output unit;
The user terminal is
While reproducing a plurality of candidate voices in time series through the voice output unit, a target voice to be heard by the user is identified from among the plurality of candidate voices based on the user's brainwave detected by the EEG sensing device, Identifying, as the target voice, a voice in which at least one of the physical quantities of the brain wave satisfies a predetermined condition while listening to a candidate voice;
extracting at least one characteristic value of the target voice;
Checking whether the target voice is included in the input sound based on the characteristic value, and if the target voice is included, amplifying only the target voice to generate an amplified voice;
controlling the amplified voice to be output by the voice output unit,
The physical quantity is
Including any one or more of the frequency of the brain wave and the amplitude of the brain wave,
The predetermined condition is
A condition that the user's brainwave frequency is the highest among the plurality of candidate voices; and
The selective amplification system for voice, including any one or more of the condition that the amplitude of the user's brainwave is the smallest among the plurality of candidate voices.

delete

13. The method of claim 12
The user terminal is
Determining a target voice that the user wants to listen to by using the learned artificial neural network,
The artificial neural network
A system for selective amplification of voice, which is a neural network trained to output whether the user wants to listen in response to the input of the EEG data, based on the learning data including the EEG data in which the user's listening desire is marked.

13. The method of claim 12
The at least one characteristic value is
The selective amplification system of a voice, comprising at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and an utterance pattern of the target voice.

13. The method of claim 12
The input sound is
A sound sensed by any one of the user terminal and the voice output unit in real time, and including a plurality of spoken voices uttered by a plurality of speakers,
The user terminal is
a speech characteristic value that is a characteristic value of the speaker's voice uttered at a first time point is checked, and it is determined whether a similarity between the speech characteristic value and a characteristic value of the target voice exceeds a predetermined threshold similarity, and the predetermined threshold and adjusting at least one of the output characteristic values of the audio output unit when the similarity is exceeded.

18. The method of claim 17
The user terminal is
Identifying a frequency band to which the speaker's voice uttered at the first point of time belongs, and setting the amplification level of the sound of the identified frequency band in the input sound to be higher than the amplification level of the sound of a band other than the identified frequency band , a system for selective amplification of speech.

19. The method of claim 18
The user terminal is
A second time point, which is a time point at which the speaker's voice uttered at the first time point ends, is detected, and the degree of amplification of the sound of the identified frequency band from the second time point is amplified of the sound of a band other than the identified frequency band. A selective amplification system for voice, set equal to the degree.

13. The method of claim 12
The user terminal is
and controlling the audio output unit to output both an offsetting sound for canceling the input sound and the amplified sound.

21. The method of claim 20
The canceling sound is
The system for selectively amplifying speech, wherein the input sound and the sound are equal in amplitude and opposite in phase.