[go: up one dir, main page]

WO2016187910A1 - Voice-to-text conversion method and device, and storage medium - Google Patents

Voice-to-text conversion method and device, and storage medium Download PDF

Info

Publication number
WO2016187910A1
WO2016187910A1 PCT/CN2015/081688 CN2015081688W WO2016187910A1 WO 2016187910 A1 WO2016187910 A1 WO 2016187910A1 CN 2015081688 W CN2015081688 W CN 2015081688W WO 2016187910 A1 WO2016187910 A1 WO 2016187910A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
user
text
voice information
microphones
Prior art date
Application number
PCT/CN2015/081688
Other languages
French (fr)
Chinese (zh)
Inventor
吴建明
Original Assignee
西安中兴新软件有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安中兴新软件有限责任公司 filed Critical 西安中兴新软件有限责任公司
Publication of WO2016187910A1 publication Critical patent/WO2016187910A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the present invention relates to information conversion technology, and in particular, to a voice text conversion method and device, and a storage medium.
  • the microphone signal-to-noise ratio is improved, and the high-definition recording achieved by the professional recorder can be realized on the mobile phone through a reasonable design layout.
  • ADC Analog-to-Digital Converter
  • the recording quality is guaranteed, and the recognition rate of the voice-to-text engine is high, and the recording-to-text is completely commercialized.
  • the function of mobile phone voice to text is mainly simple, and only a section of voice can be roughly converted into text. Due to the performance limitation of hardware or software, the recognition rate is not very high. The speaker cannot be identified, and many people can speak to the text at the same time, and the classification logo cannot be completed. Recording of a long recording, such as conference recordings, class presentations, group discussions, etc., can only be translated into a paragraph of text, without regulations, and can not separate the voice, completely does not meet the high quality, efficient design concept, reducing the human machine Interactivity.
  • the current mobile phone installs a voice-to-text application (APP, APPlication), which mainly collects voice through a microphone, uploads it to the cloud through the network, and transfers text through the cloud engine.
  • APP voice-to-text application
  • the actual text recognition rate, the collection distance is short and the conversion effect is general, and the user experience is poor.
  • the voice-to-text function in the mobile phone can only solve the voice conversion of a single voice, and needs to be connected to the cloud server, and the recognition rate is not high, and the recognition and elimination of multiple simultaneous speech cannot be performed, and the classification cannot be performed. Conversion.
  • an embodiment of the present invention provides a voice text conversion method and device, and a storage medium.
  • the voice information corresponding to each user is converted into corresponding text information.
  • the method before the analyzing and processing the voice information collected by the microphones, the method further includes:
  • the voice information collected by the microphones is analyzed and processed to obtain sound source characteristic parameters of each user, including:
  • the sound source characteristic parameters of each user are calculated according to the time difference in which the respective microphones receive the concurrent speech.
  • the method further includes:
  • the classification displays the text information corresponding to each user.
  • the method further includes:
  • the text information corresponding to one or more users is displayed in categories.
  • the information collecting unit is configured to collect voice information of one or more users by using two or more microphones;
  • the voice analyzing unit is configured to analyze and process the voice information collected by the microphones to obtain sound source characteristic parameters of each user; and classify the collected voice information according to the sound source characteristic parameters of each user, and obtain Voice information corresponding to each user;
  • the voice text conversion unit is configured to convert the voice information corresponding to each user into corresponding text information.
  • the device further includes:
  • the noise filtering unit is configured to filter out background noise in the voice information collected by the microphones.
  • the voice analysis unit includes:
  • the analyzing subunit is configured to analyze the voice information collected by the microphones to obtain a time difference between the received voices of the microphones;
  • the calculating subunit is configured to calculate a sound source characteristic parameter of each user according to a time difference in which the respective microphones receive the concurrent speech.
  • the device further includes:
  • the display unit is configured to display the text information corresponding to each user separately.
  • the device further includes:
  • the display unit is configured to display, according to the selected user identifier, text information corresponding to one or more users respectively.
  • a storage medium storing a computer program configured to perform the aforementioned method for converting a voice text.
  • the voice text conversion device has high-performance hardware, including: N (N ⁇ 2) reasonable layout high SNR microphones, forming a microphone array; high performance ADC, high Performance of Digital Signal Processing (DSP, Digital Signal Processing).
  • the device can collect high-definition voice information. When collecting voice information, it can distinguish the user's spoken content by calculating the user's angle and distance, and when another person speaks at the same time, calculate another sound source characteristic parameter to show The difference is that the voice information of each user is separated according to different sound source characteristic parameters.
  • the local voice engine can be used to convert the voice information of each user into a corresponding text without connecting to the cloud, thereby solving the problem that multiple voices are converted into corresponding ones according to the user classification. The problem with the text.
  • FIG. 1 is a schematic flowchart diagram of a method for converting voice characters according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a voice collection scenario according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram 1 of a text conversion interface of a classification according to an embodiment of the present invention.
  • FIG. 4 is a second schematic diagram of a text conversion interface of a classification according to an embodiment of the present invention.
  • FIG. 5 is a third schematic diagram of a text conversion interface of a classification according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a voice text conversion device according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a voice text conversion method according to an embodiment of the present invention.
  • the voice text conversion method in this example is applied to a voice text conversion device.
  • the voice text conversion method includes the following steps. :
  • Step 101 Acquire voice information of more than one user by using two or more microphones.
  • the voice text conversion device may be an electronic device such as a mobile phone, a tablet computer, or a notebook computer.
  • the voice text conversion device has high-performance hardware, including: N (N ⁇ 2) reasonable layout high SNR microphones to form a microphone array; high-performance ADC, high-performance digital Signal Processor (DSP, Digital Signal Processing).
  • N N ⁇ 2
  • ADC high-performance digital Signal Processor
  • DSP Digital Signal Processing
  • the voice text conversion device when more than one user simultaneously inputs voice information to the voice text conversion device, two or more microphones in the voice text conversion device start and collect voice information of one or more users. It can be seen that for each microphone, the collected voice information is voice information in which a plurality of users are mixed together.
  • the example of the present invention aims to separate voice information of different users to perform voice text conversion processing on voice information of each user respectively.
  • Step 102 Perform analysis processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user.
  • the background noise in the voice information collected by the microphones is filtered out before the analysis and processing of the voice information collected by the microphones.
  • the background noise in the speech information is filtered out.
  • the voice information collected by each microphone is analyzed, and the time difference between the received voices of each microphone is obtained. According to the time difference of the received voices of the microphones, the sound source characteristic parameters of each user are calculated. .
  • concurrent speech refers to the same voice.
  • user A speaks “hello” voice.
  • the voice text conversion device has two microphones. Since microphone 1 and microphone 2 have different positions, microphone 1 receives There is a time difference between the "Hello” voice and the moment when the microphone 2 receives the "Hello” voice.
  • the two "hello" voices in the microphone 1 and the microphone 2 are concurrent voices.
  • the position coordinate of the user A is (x1, y1)
  • the position of the microphone 1 and the microphone 2 and the time difference of the analyzed concurrent speech are known, and the user A can be calculated.
  • the position determines the source characteristic parameters.
  • the sound source characteristic parameter may be a parameter such as an angle, a distance, and the like of the user with respect to the microphone, and the parameters may be characterized by the position coordinates of the user.
  • user B speaks a "pretty” voice
  • the voice text conversion device has two microphones. Since the positions of the microphone 1 and the microphone 2 are different, the microphone 1 receives the "pretty” voice and the microphone 2 receives the " Beautiful” voices have different moments and have a time difference.
  • the two "pretty” voices in the microphone 1 and the microphone 2 are concurrent speech. Assuming that the position coordinates of the user B are (x2, y2), the position of the microphone 1 and the microphone 2 and the time difference of the analyzed concurrent speech are known, and the position of the user B can be calculated to determine the sound source characteristic parameter.
  • Step 103 Classify the collected voice information according to the sound source characteristic parameters of each user, and obtain voice information corresponding to each user.
  • the geographic locations of different users are different, so the sound source characteristic parameters of different users are different. Therefore, the voice information of multiple users can be classified according to the sound source characteristic parameters, thereby obtaining corresponding to different users. Voice message.
  • Step 104 Convert the voice information corresponding to each user into corresponding text information.
  • the voice information corresponding to each user may be converted into corresponding text information by using a local voice engine.
  • the text information corresponding to each user is displayed in a classified manner.
  • the text information corresponding to one or more users is displayed in a category.
  • the technical solution of the embodiment of the present invention can realize the conversion of the voice information of each user into a corresponding text by using a local voice engine without connecting to the cloud, thereby solving the scenario according to the scenario in which multiple people speak at the same time.
  • the problem of user classification converting voice into corresponding text can realize the conversion of the voice information of each user into a corresponding text by using a local voice engine without connecting to the cloud, thereby solving the scenario according to the scenario in which multiple people speak at the same time.
  • the following is a method for converting a voice text according to an embodiment of the present invention in combination with a specific application scenario. Step by step.
  • the device includes a microphone 1 and a microphone 2, when A and B are alternately discussed, or A, B, C three alternately speak.
  • the voice information conversion device of the embodiment of the present invention sequentially passes the collected voice information through the information collection unit, the voice analysis unit, and the voice text conversion unit.
  • the device can separately separate the voice and text of the three persons A, B, and C, and the user can select to generate the voice and text of A, or B, or C.
  • the classification processing text result shown in FIG. 3 is formed.
  • a conference speech scene or a topic speech scene such as A as a presenter
  • the technical solution of the embodiment of the present invention can only retain the presenter.
  • the sound of A only convert the sound of A into text, and remove the sound of B and C.
  • the classification processing text result shown in FIG. 4 is formed.
  • the interactive part of the meeting questions such as A as the presenter, may need to interact with other members when speaking.
  • the interaction between the presenter A and the questioner B can be used to perform voice collection and text in chronological order. Conversion.
  • the classification processing text result shown in FIG. 5 is formed.
  • FIG. 6 is a schematic structural diagram of a voice text conversion device according to an embodiment of the present invention. As shown in FIG. 6, the device includes:
  • the information collecting unit 61 is configured to collect voice information of one or more users by using two or more microphones;
  • the voice analyzing unit 62 is configured to perform analysis processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user, and classify the collected voice information according to the sound source characteristic parameters of each user. Obtaining voice information corresponding to each user;
  • the voice text conversion unit 63 is configured to convert the voice information corresponding to each user into corresponding text information.
  • the device further includes:
  • the noise filtering unit 64 is configured to filter out background noise in the voice information collected by the respective microphones.
  • the voice analyzing unit 62 includes:
  • the analyzing sub-unit 621 is configured to analyze the voice information collected by the microphones to obtain a time difference between the received voices of the microphones;
  • the calculating sub-unit 622 is configured to calculate a sound source characteristic parameter of each user according to a time difference that the respective microphones receive the concurrent speech;
  • the classification sub-unit 623 is configured to classify the collected voice information according to the sound source characteristic parameters of the users, and obtain voice information corresponding to each user.
  • the device further includes:
  • the display unit 65 is configured to classify and display the text information corresponding to each of the users.
  • the display unit 65 is further configured to display, according to the selected user identifier, text information corresponding to one or more users.
  • the embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the voice text conversion method of the foregoing embodiments.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage medium includes: a mobile storage device, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk. The medium in which the program code is stored.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a removable storage device, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.
  • the invention separates the voice information of each user according to different sound source characteristic parameters.
  • the local voice engine can be used to convert the voice information of each user into a corresponding text without connecting to the cloud, thereby solving the problem that multiple voices are converted into corresponding ones according to the user classification. The problem with the text.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice-to-text conversion method and device. The method comprises: using two or more microphones to acquire voice information about one or more users (101); analysing and processing the voice information acquired by the various microphones to obtain sound source feature parameters of various users (102); classifying the acquired voice information according to the sound source feature parameters of the various users to obtain voice information respectively corresponding to the various users (103); and converting the voice information respectively corresponding to the various users into corresponding text information (104).

Description

一种语音文字的转换方法及设备、存储介质Voice text conversion method and device, and storage medium 技术领域Technical field
本发明涉及信息转换技术,尤其涉及一种语音文字的转换方法及设备、存储介质。The present invention relates to information conversion technology, and in particular, to a voice text conversion method and device, and a storage medium.
背景技术Background technique
手机作为智能终端,智能化水平越来越高,人机交互性的需求也越来越强。语音作为人机交互的一种基本媒介,具有无可替代的作用。新一代的语音手机,持有人完全可以通过语音命令控制手机的各种操作,如拨打电话,读写短信,打开应用等,如何深挖语音的潜能必将成为语音产品的一种趋势。As a smart terminal, mobile phones are becoming more and more intelligent, and the demand for human-computer interaction is becoming stronger. As a basic medium for human-computer interaction, speech has an irreplaceable role. A new generation of voice mobile phones, holders can completely control the various operations of the mobile phone through voice commands, such as making calls, reading and writing text messages, opening applications, etc. How to dig deep into the voice will become a trend of voice products.
随着手机的录音芯片模数转换器(ADC,Analog-to-Digital Converter)性能的提升,麦克风信噪比的提高,通过合理的设计布局,在手机上也能实现专业录音笔达到的高清录音水平,录音质量得到了保障,配合语音转文字引擎识别率较高,录音转文字完全达到了商业化的程度。With the performance improvement of the mobile phone's recording chip analog-to-digital converter (ADC, Analog-to-Digital Converter), the microphone signal-to-noise ratio is improved, and the high-definition recording achieved by the professional recorder can be realized on the mobile phone through a reasonable design layout. At the level, the recording quality is guaranteed, and the recognition rate of the voice-to-text engine is high, and the recording-to-text is completely commercialized.
目前手机语音转文字的功能主要简单,只能粗略的将一段语音转化为文字,由于硬件或软件的性能限制,识别率不是很高。无法完成对说话人进行识别,多人同时说话转文字,无法完成分类标识。对一段长录音,如会议录音,课堂演讲,小组讨论等场景的录音,只能转化为一段文字,无条例性,更无法分离语音,完全不符合高质量,高效的设计理念,降低了人机交互性。At present, the function of mobile phone voice to text is mainly simple, and only a section of voice can be roughly converted into text. Due to the performance limitation of hardware or software, the recognition rate is not very high. The speaker cannot be identified, and many people can speak to the text at the same time, and the classification logo cannot be completed. Recording of a long recording, such as conference recordings, class presentations, group discussions, etc., can only be translated into a paragraph of text, without regulations, and can not separate the voice, completely does not meet the high quality, efficient design concept, reducing the human machine Interactivity.
并且,目前的手机安装了语音转文字的应用(APP,APPlication),主要是通过麦克风采集语音后,通过网络上传到云端,通过云端的引擎进行转文字。实际转文字识别率,采集距离短及转化效果一般,用户体验较差。 Moreover, the current mobile phone installs a voice-to-text application (APP, APPlication), which mainly collects voice through a microphone, uploads it to the cloud through the network, and transfers text through the cloud engine. The actual text recognition rate, the collection distance is short and the conversion effect is general, and the user experience is poor.
综上所述,目前手机中语音转文字功能只能解决单一人声的语音文字转化,同时需要连接云端服务器,且识别率不高,无法进行多人同时说话的识别与剔除,且无法进行分类转化。In summary, the voice-to-text function in the mobile phone can only solve the voice conversion of a single voice, and needs to be connected to the cloud server, and the recognition rate is not high, and the recognition and elimination of multiple simultaneous speech cannot be performed, and the classification cannot be performed. Conversion.
发明内容Summary of the invention
为解决上述技术问题,本发明实施例提供了一种语音文字的转换方法及设备、存储介质。To solve the above technical problem, an embodiment of the present invention provides a voice text conversion method and device, and a storage medium.
本发明实施例提供的语音文字的转换方法包括:The voice text conversion method provided by the embodiment of the present invention includes:
利用两个以上麦克风采集一个以上用户的语音信息;Collecting voice information of more than one user by using more than two microphones;
对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量;Performing analysis and processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user;
根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息;And classifying the collected voice information according to the sound source characteristic parameters of each user, and obtaining voice information corresponding to each user;
将所述各用户分别对应的语音信息转化为对应的文字信息。The voice information corresponding to each user is converted into corresponding text information.
在本发明一实施例中,所述对所述各麦克风采集到的语音信息进行分析处理之前,所述方法还包括:In an embodiment of the invention, before the analyzing and processing the voice information collected by the microphones, the method further includes:
滤除所述各麦克风采集到的语音信息中的背景噪声。Filtering background noise in the voice information collected by the respective microphones.
在本发明一实施例中,所述对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量,包括:In an embodiment of the present invention, the voice information collected by the microphones is analyzed and processed to obtain sound source characteristic parameters of each user, including:
对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;Performing analysis on the voice information collected by each microphone to obtain a time difference between each microphone receiving the concurrent voice;
根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量。The sound source characteristic parameters of each user are calculated according to the time difference in which the respective microphones receive the concurrent speech.
在本发明一实施例中,所述将所述各用户分别对应的语音信息转化为对应的文字信息之后,所述方法还包括:In an embodiment of the present invention, after the voice information corresponding to each user is converted into corresponding text information, the method further includes:
分类显示所述各用户分别对应的文字信息。 The classification displays the text information corresponding to each user.
在本发明一实施例中,所述将所述各用户分别对应的语音信息转化为对应的文字信息之后,所述方法还包括:In an embodiment of the present invention, after the voice information corresponding to each user is converted into corresponding text information, the method further includes:
按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。According to the selected user identifier, the text information corresponding to one or more users is displayed in categories.
本发明实施例提供的语音文字的转换设备包括:The voice text conversion device provided by the embodiment of the present invention includes:
信息采集单元,配置为利用两个以上麦克风采集一个以上用户的语音信息;The information collecting unit is configured to collect voice information of one or more users by using two or more microphones;
语音分析单元,配置为对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量;根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息;The voice analyzing unit is configured to analyze and process the voice information collected by the microphones to obtain sound source characteristic parameters of each user; and classify the collected voice information according to the sound source characteristic parameters of each user, and obtain Voice information corresponding to each user;
语音文字转换单元,配置为将所述各用户分别对应的语音信息转化为对应的文字信息。The voice text conversion unit is configured to convert the voice information corresponding to each user into corresponding text information.
在本发明一实施例中,所述设备还包括:In an embodiment of the invention, the device further includes:
滤噪单元,配置为滤除所述各麦克风采集到的语音信息中的背景噪声。The noise filtering unit is configured to filter out background noise in the voice information collected by the microphones.
在本发明一实施例中,所述语音分析单元包括:In an embodiment of the invention, the voice analysis unit includes:
分析子单元,配置为对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;The analyzing subunit is configured to analyze the voice information collected by the microphones to obtain a time difference between the received voices of the microphones;
计算子单元,配置为根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量。The calculating subunit is configured to calculate a sound source characteristic parameter of each user according to a time difference in which the respective microphones receive the concurrent speech.
在本发明一实施例中,所述设备还包括:In an embodiment of the invention, the device further includes:
显示单元,配置为分类显示所述各用户分别对应的文字信息。The display unit is configured to display the text information corresponding to each user separately.
在本发明一实施例中,所述设备还包括:In an embodiment of the invention, the device further includes:
显示单元,配置为按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。The display unit is configured to display, according to the selected user identifier, text information corresponding to one or more users respectively.
一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行前述的语音文字的转换方法。 A storage medium storing a computer program configured to perform the aforementioned method for converting a voice text.
本发明实施例的技术方案中,语音文字的转换设备中具有高性能的硬件,包括:N个(N≥2)合理布局的高信噪比的麦克风,形成麦克风阵列;高性能的ADC,高性能的数字信号处理器(DSP,Digital Signal Processing)。设备能够采集到高清的语音信息,在采集语音信息时,通过计算用户的角度、距离等声源特征参量,区分用户说话的内容,当另一人同时说话时,计算另一声源特征参量,以示区别,如此,根据不同的声源特征参量将各用户的语音信息进行分离。在语音转文字时,通过本地的语音引擎,无需连接云端,即可实现将各用户的语音信息分类转换为对应的文字,从而解决了多人同时讲话的场景下根据用户分类将语音转化为对应的文字的问题。In the technical solution of the embodiment of the present invention, the voice text conversion device has high-performance hardware, including: N (N ≥ 2) reasonable layout high SNR microphones, forming a microphone array; high performance ADC, high Performance of Digital Signal Processing (DSP, Digital Signal Processing). The device can collect high-definition voice information. When collecting voice information, it can distinguish the user's spoken content by calculating the user's angle and distance, and when another person speaks at the same time, calculate another sound source characteristic parameter to show The difference is that the voice information of each user is separated according to different sound source characteristic parameters. In the case of voice-to-text, the local voice engine can be used to convert the voice information of each user into a corresponding text without connecting to the cloud, thereby solving the problem that multiple voices are converted into corresponding ones according to the user classification. The problem with the text.
附图说明DRAWINGS
图1为本发明实施例的语音文字的转换方法的流程示意图;FIG. 1 is a schematic flowchart diagram of a method for converting voice characters according to an embodiment of the present invention;
图2为本发明实施例的语音采集场景示意图;2 is a schematic diagram of a voice collection scenario according to an embodiment of the present invention;
图3为本发明实施例的分类的文字转换界面示意图一;3 is a schematic diagram 1 of a text conversion interface of a classification according to an embodiment of the present invention;
图4为本发明实施例的分类的文字转换界面示意图二;4 is a second schematic diagram of a text conversion interface of a classification according to an embodiment of the present invention;
图5为本发明实施例的分类的文字转换界面示意图三;FIG. 5 is a third schematic diagram of a text conversion interface of a classification according to an embodiment of the present invention; FIG.
图6为本发明实施例的语音文字的转换设备的结构组成示意图。FIG. 6 is a schematic structural diagram of a voice text conversion device according to an embodiment of the present invention.
具体实施方式detailed description
为了能够更加详尽地了解本发明实施例的特点与技术内容,下面结合附图对本发明实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本发明实施例。The embodiments of the present invention are described in detail below with reference to the accompanying drawings.
图1为本发明实施例的语音文字的转换方法的流程示意图,本示例中的语音文字的转换方法应用于语音文字的转换设备,如图1所示,所述语音文字的转换方法包括以下步骤: 1 is a schematic flowchart of a voice text conversion method according to an embodiment of the present invention. The voice text conversion method in this example is applied to a voice text conversion device. As shown in FIG. 1 , the voice text conversion method includes the following steps. :
步骤101:利用两个以上麦克风采集一个以上用户的语音信息。Step 101: Acquire voice information of more than one user by using two or more microphones.
本发明实施例中,所述语音文字的转换设备可以是手机、平板电脑、笔记本电脑等电子设备。In the embodiment of the present invention, the voice text conversion device may be an electronic device such as a mobile phone, a tablet computer, or a notebook computer.
本发明实施例中,语音文字的转换设备中具有高性能的硬件,包括:N个(N≥2)合理布局的高信噪比的麦克风,形成麦克风阵列;高性能的ADC,高性能的数字信号处理器(DSP,Digital Signal Processing)。In the embodiment of the present invention, the voice text conversion device has high-performance hardware, including: N (N ≥ 2) reasonable layout high SNR microphones to form a microphone array; high-performance ADC, high-performance digital Signal Processor (DSP, Digital Signal Processing).
本发明实施例中,当一个以上用户同时向语音文字的转换设备输入语音信息时,语音文字的转换设备中的两个以上麦克风采都启动并采集一个以上用户的语音信息。可见,对于每个麦克风,采集到的语音信息为多个用户混杂在一起的语音信息,本发明示例旨在分离不同用户的语音信息,以分别对各用户的语音信息进行语音文字转化处理。In the embodiment of the present invention, when more than one user simultaneously inputs voice information to the voice text conversion device, two or more microphones in the voice text conversion device start and collect voice information of one or more users. It can be seen that for each microphone, the collected voice information is voice information in which a plurality of users are mixed together. The example of the present invention aims to separate voice information of different users to perform voice text conversion processing on voice information of each user respectively.
步骤102:对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量。Step 102: Perform analysis processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user.
本发明实施例中,对所述各麦克风采集到的语音信息进行分析处理之前,滤除所述各麦克风采集到的语音信息中的背景噪声。这里,为了消除非人声噪音,对语音信息中的背景噪声进行滤除。In the embodiment of the present invention, before the analysis and processing of the voice information collected by the microphones, the background noise in the voice information collected by the microphones is filtered out. Here, in order to eliminate non-human noise, the background noise in the speech information is filtered out.
本发明实施例中,对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量。In the embodiment of the present invention, the voice information collected by each microphone is analyzed, and the time difference between the received voices of each microphone is obtained. According to the time difference of the received voices of the microphones, the sound source characteristic parameters of each user are calculated. .
具体地,并发语音是指相同的语音,例如,用户A说出“你好”的语音,语音文字的转换设备中具有两个麦克风,由于麦克风1和麦克风2的位置不同,因此,麦克风1接收到“你好”语音和麦克风2接收到“你好”语音的时刻不同,具有时间差。这里,麦克风1和麦克风2中的两个“你好”语音为并发语音。假设用户A的位置坐标为(x1,y1),已知麦克风1和麦克风2的位置以及分析出来的并发语音的时间差,可计算得到用户A 的位置,进而确定出声源特征参量。这里,声源特征参量可以是用户相对于麦克风的角度、距离等参数,可用用户的位置坐标表征这些参数。同理,用户B说出“漂亮”的语音,语音文字的转换设备中具有两个麦克风,由于麦克风1和麦克风2的位置不同,因此,麦克风1接收到“漂亮”语音和麦克风2接收到“漂亮”语音的时刻不同,具有时间差。这里,麦克风1和麦克风2中的两个“漂亮”语音为并发语音。假设用户B的位置坐标为(x2,y2),已知麦克风1和麦克风2的位置以及分析出来的并发语音的时间差,可计算得到用户B的位置,进而确定出声源特征参量。Specifically, concurrent speech refers to the same voice. For example, user A speaks “hello” voice. The voice text conversion device has two microphones. Since microphone 1 and microphone 2 have different positions, microphone 1 receives There is a time difference between the "Hello" voice and the moment when the microphone 2 receives the "Hello" voice. Here, the two "hello" voices in the microphone 1 and the microphone 2 are concurrent voices. Assuming that the position coordinate of the user A is (x1, y1), the position of the microphone 1 and the microphone 2 and the time difference of the analyzed concurrent speech are known, and the user A can be calculated. The position, in turn, determines the source characteristic parameters. Here, the sound source characteristic parameter may be a parameter such as an angle, a distance, and the like of the user with respect to the microphone, and the parameters may be characterized by the position coordinates of the user. Similarly, user B speaks a "pretty" voice, and the voice text conversion device has two microphones. Since the positions of the microphone 1 and the microphone 2 are different, the microphone 1 receives the "pretty" voice and the microphone 2 receives the " Beautiful" voices have different moments and have a time difference. Here, the two "pretty" voices in the microphone 1 and the microphone 2 are concurrent speech. Assuming that the position coordinates of the user B are (x2, y2), the position of the microphone 1 and the microphone 2 and the time difference of the analyzed concurrent speech are known, and the position of the user B can be calculated to determine the sound source characteristic parameter.
步骤103:根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息。Step 103: Classify the collected voice information according to the sound source characteristic parameters of each user, and obtain voice information corresponding to each user.
本发明实施例中,不同的用户的地理位置不同,因此不同用户的声源特征参量不同,因此,可以根据声源特征参量对混杂的多个用户的语音信息进行分类,从而得到不同用户所对应的语音信息。In the embodiment of the present invention, the geographic locations of different users are different, so the sound source characteristic parameters of different users are different. Therefore, the voice information of multiple users can be classified according to the sound source characteristic parameters, thereby obtaining corresponding to different users. Voice message.
步骤104:将所述各用户分别对应的语音信息转化为对应的文字信息。Step 104: Convert the voice information corresponding to each user into corresponding text information.
本发明实施例中,可通过本地的语音引擎将各用户分别对应的语音信息转化为对应的文字信息。In the embodiment of the present invention, the voice information corresponding to each user may be converted into corresponding text information by using a local voice engine.
本发明实施例中,将所述各用户分别对应的语音信息转化为对应的文字信息之后,分类显示所述各用户分别对应的文字信息。In the embodiment of the present invention, after the voice information corresponding to each user is converted into corresponding text information, the text information corresponding to each user is displayed in a classified manner.
或者,按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。Or, according to the selected user identifier, the text information corresponding to one or more users is displayed in a category.
本发明实施例的技术方案在语音转文字时,通过本地的语音引擎,无需连接云端,即可实现将各用户的语音信息分类转换为对应的文字,从而解决了多人同时讲话的场景下根据用户分类将语音转化为对应的文字的问题。The technical solution of the embodiment of the present invention can realize the conversion of the voice information of each user into a corresponding text by using a local voice engine without connecting to the cloud, thereby solving the scenario according to the scenario in which multiple people speak at the same time. The problem of user classification converting voice into corresponding text.
下面结合具体应用场景对本发明实施例的语音文字的转换方法做进一 步阐述。The following is a method for converting a voice text according to an embodiment of the present invention in combination with a specific application scenario. Step by step.
参照图2,多人会议场景,三人或三人以上,以A、B、C三人为例说明,设备(手机)包含麦克风1与麦克风2,当A、B两人交替讨论,或A、B、C三人交替发言。利用本发明实施例的语音文字的转换设备,将采集到的语音信息依次通过信息采集单元、语音分析单元及语音文字转换单元。该设备可以分别将A、B、C三人的语音及文字进行分离,用户可以选择生成A、或B、或C的语音及文字。形成图3所示的分类处理文字结果。Referring to FIG. 2, a multi-person conference scene, three or more people, taking A, B, and C as an example, the device (mobile phone) includes a microphone 1 and a microphone 2, when A and B are alternately discussed, or A, B, C three alternately speak. The voice information conversion device of the embodiment of the present invention sequentially passes the collected voice information through the information collection unit, the voice analysis unit, and the voice text conversion unit. The device can separately separate the voice and text of the three persons A, B, and C, and the user can select to generate the voice and text of A, or B, or C. The classification processing text result shown in FIG. 3 is formed.
参照图2,会议演讲场景或者主题发言场景,如A作为主讲人,当转文字需要将A作为主讲人,B、C的声音进行抑制时,利用本发明实施例的技术方案可以只保留主讲人A的声音,只将A的声音转化为文字,剔出B、C的声音。形成图4所示的分类处理文字结果。Referring to FIG. 2, a conference speech scene or a topic speech scene, such as A as a presenter, when the text needs to use A as the presenter, and the sounds of B and C are suppressed, the technical solution of the embodiment of the present invention can only retain the presenter. The sound of A, only convert the sound of A into text, and remove the sound of B and C. The classification processing text result shown in FIG. 4 is formed.
参照图2,会议提问互动环节,如A作为主讲人,发言时可能需要与其他成员进行一个互动,此时可以将主讲人A和提问人B的互动情况,按照时间先后顺序进行语音采集和文字转换。形成图5所示的分类处理文字结果。Referring to Figure 2, the interactive part of the meeting questions, such as A as the presenter, may need to interact with other members when speaking. At this time, the interaction between the presenter A and the questioner B can be used to perform voice collection and text in chronological order. Conversion. The classification processing text result shown in FIG. 5 is formed.
图6为本发明实施例的语音文字的转换设备的结构组成示意图,如图6所示,所述设备包括:FIG. 6 is a schematic structural diagram of a voice text conversion device according to an embodiment of the present invention. As shown in FIG. 6, the device includes:
信息采集单元61,配置为利用两个以上麦克风采集一个以上用户的语音信息;The information collecting unit 61 is configured to collect voice information of one or more users by using two or more microphones;
语音分析单元62,配置为对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量;根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息;The voice analyzing unit 62 is configured to perform analysis processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user, and classify the collected voice information according to the sound source characteristic parameters of each user. Obtaining voice information corresponding to each user;
语音文字转换单元63,配置为将所述各用户分别对应的语音信息转化为对应的文字信息。The voice text conversion unit 63 is configured to convert the voice information corresponding to each user into corresponding text information.
在本发明实施例中,所述设备还包括: In the embodiment of the present invention, the device further includes:
滤噪单元64,配置为滤除所述各麦克风采集到的语音信息中的背景噪声。The noise filtering unit 64 is configured to filter out background noise in the voice information collected by the respective microphones.
在本发明实施例中,所述语音分析单元62包括:In the embodiment of the present invention, the voice analyzing unit 62 includes:
分析子单元621,配置为对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;The analyzing sub-unit 621 is configured to analyze the voice information collected by the microphones to obtain a time difference between the received voices of the microphones;
计算子单元622,配置为根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量;The calculating sub-unit 622 is configured to calculate a sound source characteristic parameter of each user according to a time difference that the respective microphones receive the concurrent speech;
分类子单元623,配置为根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息。The classification sub-unit 623 is configured to classify the collected voice information according to the sound source characteristic parameters of the users, and obtain voice information corresponding to each user.
在本发明实施例中,所述设备还包括:In the embodiment of the present invention, the device further includes:
显示单元65,配置为分类显示所述各用户分别对应的文字信息。The display unit 65 is configured to classify and display the text information corresponding to each of the users.
所述显示单元65,还配置为按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。The display unit 65 is further configured to display, according to the selected user identifier, text information corresponding to one or more users.
本领域技术人员应当理解,图6所示的语音文字的转换设备中的各单元及其子单元的实现功能可参照前述语音文字的转换方法的相关描述而理解。It should be understood by those skilled in the art that the implementation functions of the units and their subunits in the speech file conversion device shown in FIG. 6 can be understood by referring to the related description of the foregoing speech text conversion method.
本发明实施例还记载了一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行前述各实施例的语音文字的转换方法。The embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the voice text conversion method of the foregoing embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。 In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; The unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage medium includes: a mobile storage device, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk. The medium in which the program code is stored.
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a removable storage device, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可 轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The foregoing is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can within the technical scope disclosed by the present invention. It is easy to think of variations or substitutions that are within the scope of the invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.
工业实用性Industrial applicability
本发明根据不同的声源特征参量将各用户的语音信息进行分离。在语音转文字时,通过本地的语音引擎,无需连接云端,即可实现将各用户的语音信息分类转换为对应的文字,从而解决了多人同时讲话的场景下根据用户分类将语音转化为对应的文字的问题。 The invention separates the voice information of each user according to different sound source characteristic parameters. In the case of voice-to-text, the local voice engine can be used to convert the voice information of each user into a corresponding text without connecting to the cloud, thereby solving the problem that multiple voices are converted into corresponding ones according to the user classification. The problem with the text.

Claims (11)

  1. 一种语音文字的转换方法,所述方法包括:A method for converting a voice text, the method comprising:
    利用两个以上麦克风采集一个以上用户的语音信息;Collecting voice information of more than one user by using more than two microphones;
    对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量;Performing analysis and processing on the voice information collected by the microphones to obtain sound source characteristic parameters of each user;
    根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息;And classifying the collected voice information according to the sound source characteristic parameters of each user, and obtaining voice information corresponding to each user;
    将所述各用户分别对应的语音信息转化为对应的文字信息。The voice information corresponding to each user is converted into corresponding text information.
  2. 根据权利要求1所述的语音文字的转换方法,其中,所述对所述各麦克风采集到的语音信息进行分析处理之前,所述方法还包括:The method for converting voice words according to claim 1, wherein before the analyzing and processing the voice information collected by the microphones, the method further comprises:
    滤除所述各麦克风采集到的语音信息中的背景噪声。Filtering background noise in the voice information collected by the respective microphones.
  3. 根据权利要求1所述的语音文字的转换方法,其中,所述对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量,包括:The method for converting voice words according to claim 1, wherein the analyzing the voice information collected by the microphones to obtain the sound source characteristic parameters of each user comprises:
    对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;Performing analysis on the voice information collected by each microphone to obtain a time difference between each microphone receiving the concurrent voice;
    根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量。The sound source characteristic parameters of each user are calculated according to the time difference in which the respective microphones receive the concurrent speech.
  4. 根据权利要求1至3任一项所述的语音文字的转换方法,其中,所述将所述各用户分别对应的语音信息转化为对应的文字信息之后,所述方法还包括:The method for converting a voice text according to any one of claims 1 to 3, wherein after the converting the voice information corresponding to each user into the corresponding text information, the method further includes:
    分类显示所述各用户分别对应的文字信息。The classification displays the text information corresponding to each user.
  5. 根据权利要求1至3任一项所述的语音文字的转换方法,其中,所述将所述各用户分别对应的语音信息转化为对应的文字信息之后,所述方法还包括: The method for converting a voice text according to any one of claims 1 to 3, wherein after the converting the voice information corresponding to each user into the corresponding text information, the method further includes:
    按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。According to the selected user identifier, the text information corresponding to one or more users is displayed in categories.
  6. 一种语音文字的转换设备,所述设备包括:A voice text conversion device, the device comprising:
    信息采集单元,配置为利用两个以上麦克风采集一个以上用户的语音信息;The information collecting unit is configured to collect voice information of one or more users by using two or more microphones;
    语音分析单元,配置为对所述各麦克风采集到的语音信息进行分析处理,得到各用户的声源特征参量;根据所述各用户的声源特征参量,对采集到的语音信息进行分类,得到各用户分别对应的语音信息;The voice analyzing unit is configured to analyze and process the voice information collected by the microphones to obtain sound source characteristic parameters of each user; and classify the collected voice information according to the sound source characteristic parameters of each user, and obtain Voice information corresponding to each user;
    语音文字转换单元,配置为将所述各用户分别对应的语音信息转化为对应的文字信息。The voice text conversion unit is configured to convert the voice information corresponding to each user into corresponding text information.
  7. 根据权利要求6所述的语音文字的转换设备,其中,所述设备还包括:The voice text conversion device according to claim 6, wherein the device further comprises:
    滤噪单元,配置为滤除所述各麦克风采集到的语音信息中的背景噪声。The noise filtering unit is configured to filter out background noise in the voice information collected by the microphones.
  8. 根据权利要求6所述的语音文字的转换设备,其中,所述语音分析单元包括:The voice text conversion device according to claim 6, wherein the voice analysis unit comprises:
    分析子单元,配置为对所述各麦克风采集到的语音信息进行分析,得到各麦克风接收到并发语音的时间差;The analyzing subunit is configured to analyze the voice information collected by the microphones to obtain a time difference between the received voices of the microphones;
    计算子单元,配置为根据所述各麦克风接收到并发语音的时间差,计算得到各用户的声源特征参量。The calculating subunit is configured to calculate a sound source characteristic parameter of each user according to a time difference in which the respective microphones receive the concurrent speech.
  9. 根据权利要求6至8任一项所述的语音文字的转换设备,其中,所述设备还包括:The voice text conversion device according to any one of claims 6 to 8, wherein the device further comprises:
    显示单元,配置为分类显示所述各用户分别对应的文字信息。The display unit is configured to display the text information corresponding to each user separately.
  10. 根据权利要求6至8任一项所述的语音文字的转换设备,其中,所述设备还包括:The voice text conversion device according to any one of claims 6 to 8, wherein the device further comprises:
    显示单元,配置为按照选择的用户标识,分类显示一个或多个用户分别对应的文字信息。 The display unit is configured to display, according to the selected user identifier, text information corresponding to one or more users respectively.
  11. 一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行权利要求1至5任一项所述的语音文字的转换方法。 A storage medium storing a computer program configured to execute the method for converting a voice character according to any one of claims 1 to 5.
PCT/CN2015/081688 2015-05-22 2015-06-17 Voice-to-text conversion method and device, and storage medium WO2016187910A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510266912.8A CN106297794A (en) 2015-05-22 2015-05-22 The conversion method of a kind of language and characters and equipment
CN201510266912.8 2015-05-22

Publications (1)

Publication Number Publication Date
WO2016187910A1 true WO2016187910A1 (en) 2016-12-01

Family

ID=57392481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081688 WO2016187910A1 (en) 2015-05-22 2015-06-17 Voice-to-text conversion method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN106297794A (en)
WO (1) WO2016187910A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106653042A (en) * 2016-12-13 2017-05-10 安徽声讯信息技术有限公司 Smart phone having voice stenography transliteration function

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527623B (en) * 2017-08-07 2021-02-09 广州视源电子科技股份有限公司 Screen transmission method and device, electronic equipment and computer readable storage medium
CN107910006A (en) * 2017-12-06 2018-04-13 广州宝镜智能科技有限公司 Audio recognition method, device and multiple source speech differentiation identifying system
CN108053828A (en) * 2017-12-25 2018-05-18 无锡小天鹅股份有限公司 Determine the method, apparatus and household electrical appliance of control instruction
CN108847225B (en) * 2018-06-04 2021-01-12 上海智蕙林医疗科技有限公司 Robot for multi-person voice service in airport and method thereof
CN110875056B (en) * 2018-08-30 2024-04-02 阿里巴巴集团控股有限公司 Speech transcription device, system, method and electronic device
CN110648665A (en) * 2019-09-09 2020-01-03 北京左医科技有限公司 Session process recording system and method
CN110941737B (en) * 2019-12-06 2023-01-20 广州国音智能科技有限公司 Single-machine voice storage method, device and equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010037195A1 (en) * 2000-04-26 2001-11-01 Alejandro Acero Sound source separation using convolutional mixing and a priori sound source knowledge
CN1815556A (en) * 2005-02-01 2006-08-09 松下电器产业株式会社 Method and system capable of operating and controlling vehicle using voice instruction
US20090150146A1 (en) * 2007-12-11 2009-06-11 Electronics & Telecommunications Research Institute Microphone array based speech recognition system and target speech extracting method of the system
US20100070274A1 (en) * 2008-09-12 2010-03-18 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition based on sound source separation and sound source identification
CN101882370A (en) * 2010-06-30 2010-11-10 中山大学 A voice recognition remote control
CN102074230A (en) * 2009-11-20 2011-05-25 索尼公司 Speech recognition device, speech recognition method, and program
CN104464750A (en) * 2014-10-24 2015-03-25 东南大学 Voice separation method based on binaural sound source localization

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009104332A1 (en) * 2008-02-19 2009-08-27 日本電気株式会社 Speech segmentation system, speech segmentation method, and speech segmentation program
JP5534413B2 (en) * 2010-02-12 2014-07-02 Necカシオモバイルコミュニケーションズ株式会社 Information processing apparatus and program
CN102592596A (en) * 2011-01-12 2012-07-18 鸿富锦精密工业(深圳)有限公司 Voice and character converting device and method
CN102509548B (en) * 2011-10-09 2013-06-12 清华大学 Audio indexing method based on multi-distance sound sensor
JP5791081B2 (en) * 2012-07-19 2015-10-07 日本電信電話株式会社 Sound source separation localization apparatus, method, and program
TWI502583B (en) * 2013-04-11 2015-10-01 Wistron Corp Apparatus and method for voice processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010037195A1 (en) * 2000-04-26 2001-11-01 Alejandro Acero Sound source separation using convolutional mixing and a priori sound source knowledge
CN1815556A (en) * 2005-02-01 2006-08-09 松下电器产业株式会社 Method and system capable of operating and controlling vehicle using voice instruction
US20090150146A1 (en) * 2007-12-11 2009-06-11 Electronics & Telecommunications Research Institute Microphone array based speech recognition system and target speech extracting method of the system
US20100070274A1 (en) * 2008-09-12 2010-03-18 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition based on sound source separation and sound source identification
CN102074230A (en) * 2009-11-20 2011-05-25 索尼公司 Speech recognition device, speech recognition method, and program
CN101882370A (en) * 2010-06-30 2010-11-10 中山大学 A voice recognition remote control
CN104464750A (en) * 2014-10-24 2015-03-25 东南大学 Voice separation method based on binaural sound source localization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106653042A (en) * 2016-12-13 2017-05-10 安徽声讯信息技术有限公司 Smart phone having voice stenography transliteration function

Also Published As

Publication number Publication date
CN106297794A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2016187910A1 (en) Voice-to-text conversion method and device, and storage medium
US9293133B2 (en) Improving voice communication over a network
CN110914828B (en) Speech translation method and device
CN103327181B (en) Voice chatting method capable of improving efficiency of voice information learning for users
CN102843543B (en) Video conferencing reminding method, device and video conferencing system
CN1287353C (en) Speech processing device
US20130211826A1 (en) Audio Signals as Buffered Streams of Audio Signals and Metadata
WO2020073633A1 (en) Conference loudspeaker box, conference recording method, device and system, and computer storage medium
CN112148922A (en) Conference recording method, conference recording device, data processing device and readable storage medium
CN110324723A (en) Method for generating captions and terminal
WO2020057102A1 (en) Speech translation method and translation device
CN107123418B (en) Voice message processing method and mobile terminal
Ogunfunmi et al. Speech and audio processing for coding, enhancement and recognition
CN105551498A (en) Voice recognition method and device
KR101559364B1 (en) Mobile apparatus executing face to face interaction monitoring, method of monitoring face to face interaction using the same, interaction monitoring system including the same and interaction monitoring mobile application executed on the same
CN111883168A (en) Voice processing method and device
CN108257594A (en) A kind of conference system and its information processing method
CN115225829A (en) Video generation method and device and computer readable storage medium
CN107578770A (en) Networking telephone audio recognition method, device, computer equipment and storage medium
CN113779234B (en) Method, device, equipment and medium for generating speaking summary of conference speaker
KR20160108874A (en) Method and apparatus for generating conversation record automatically
JP7400364B2 (en) Speech recognition system and information processing method
CN104902112B (en) Method and device for generating conference summary
WO2021135140A1 (en) Word collection method matching emotion polarity
CN113259620B (en) Video conference data synchronization method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892991

Country of ref document: EP

Kind code of ref document: A1