CN213694055U

CN213694055U - Voice acquisition equipment

Info

Publication number: CN213694055U
Application number: CN202023183752.6U
Authority: CN
Inventors: 邹凯文
Original assignee: Shanghai Shencong Semiconductor Co ltd
Current assignee: Shencong Semiconductor Jiangsu Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-07-13
Anticipated expiration: 2030-12-25

Abstract

The utility model discloses a voice acquisition device, aiming at the problems of large workload and low efficiency caused by the existing need of manually acquiring and labeling audio, the personal information of a recorder is recorded in advance through a recorder; the display displays the entries to be recorded and the display modes of the entries; the audio collector collects the voice sent by a sound recorder according to the entry in the display according to the set sampling frequency, the set sampling digit and the set number of the sound channels; the audio processor identifies the voice collected by the audio collector and compares the recognized entries displayed by the voice displayer; and the memory automatically stores the audio files transmitted by the audio processor and names the audio files in a manner of entry-personal information. The efficiency of pronunciation collection and mark is promoted, reduces the manual work, save time and cost.

Description

Voice acquisition equipment

Technical Field

The utility model belongs to the technical field of pronunciation collection, especially, relate to a pronunciation collection equipment.

Background

Sound is a wave generated by the vibration of an object, and when the object vibrates, surrounding narrow air is continuously compressed and relaxed and is diffused to the surroundings, which is a sound wave, and the frequency range of sound that a person can hear is 20Hz to 20 kHz. Three elements of sound that a person can hear are intensity, pitch, and timbre, where intensity is the intensity of the sound, depending on the amplitude between the sounds; the tone is related to the frequency of the sound, high frequency leads to the sound pitch, low frequency leads to the sound low; the timbre is determined by the overtones mixed with the fundamental tones. Each fundamental tone has a natural frequency and overtones of different pitch strength, so that each sound has a particular timbre effect.

Audio technologies include audio acquisition (analog to digital conversion to computer recognized digital signals), speech decoding/encoding, text-to-sound conversion, music synthesis, speech recognition and understanding, audio data transmission, audio video synchronization, audio effects and editing, and the like. There are two methods commonly used to implement computer speech output, recording/playback and text-to-speech conversion.

The common methods for collecting audio data include 3 methods: directly acquiring the existing audio, capturing and intercepting sound by using audio processing software, and recording the sound by using a microphone.

For recording sound by a microphone, a common mode at present is to manually pronounce a piece of paper, pronounce a vocabulary entry, store and name an audio, and the efficiency is extremely low. And the entries are completely pronounced, and then the audio is cut and labeled manually. The two modes need a large amount of labor and time, are low in working efficiency and cannot meet the requirements of people.

SUMMERY OF THE UTILITY MODEL

The utility model aims at providing a pronunciation collection equipment solves artifical collection and marks the problem that the audio frequency work load is big and inefficiency.

In order to solve the above problem, the technical scheme of the utility model is that:

a speech acquisition device comprising:

the recorder is used for recording personal information of a sound recorder in advance; the personal information comprises gender, age and region;

the display is used for displaying the entries to be recorded and the display modes of the entries;

the audio collector is used for collecting the voice sent by a sound recorder according to the entry in the display according to the set sampling frequency, the set sampling digit and the set number of the sound channels;

the audio processor is used for identifying the voice collected by the audio collector and comparing the identified voice with the entries displayed by the display;

and the memory is used for automatically storing the audio files transmitted by the audio processor and naming the audio files in a manner of entry-personal information.

According to an embodiment of the present invention, the input device is a touch screen with an input method or a personal information option.

According to an embodiment of the present invention, the input device is a keyboard.

According to the utility model discloses an embodiment, be equipped with data input interface on the display, data input interface is used for leading-in the vocabulary entry that needs record.

According to the utility model discloses an embodiment, be equipped with vocabulary entry list selection key and display mode selection key on the display.

According to an embodiment of the present invention, the audio processor includes a pause detection part and an entry comparison part;

the pause detection piece is used for detecting whether the voice collected by the audio collector pauses for a preset time length or not, and if so, stopping collecting the voice and performing voice recognition;

the vocabulary entry comparison part is used for comparing the voice recognized by the pause detection part with the vocabulary entries displayed by the display, judging whether the voice is consistent with the vocabulary entries, and if so, carrying out audio tagging and transmitting the voice to the memory; if not, the voice is discarded.

The utility model discloses owing to adopt above technical scheme, make it compare with prior art and have following advantage and positive effect:

1) the voice acquisition equipment in an embodiment of the utility model records the personal information of a recorder in advance through the recorder aiming at the problems of large workload and low efficiency caused by the existing need of manually acquiring and labeling audio; the display is used for displaying the entries to be recorded and the display modes of the entries; the audio collector collects the voice sent by the recorder according to the entry in the display according to the set sampling frequency, the set sampling digit and the set number of the sound channels; the audio processor is used for identifying the voice collected by the audio collector and comparing the recognized entries displayed by the voice displayer; and the memory automatically stores the audio files transmitted by the audio processor and names the audio files in a manner of entry-personal information. The efficiency of pronunciation collection and mark is promoted, reduces the manual work, save time and cost.

2) The utility model relates to an embodiment's pronunciation collection equipment, its memory can be according to the personal information automatic naming of recording person, does not need the manual work to name audio file one by one, has greatly reduced the cost of labor, also makes things convenient for follow-up screening to audio file.

Drawings

Fig. 1 is a schematic diagram of a voice collecting device in an embodiment of the present invention.

Description of reference numerals:

1: an input device; 2: a display; 3: an audio collector; 4: an audio processor; 5: a memory.

Detailed Description

The following describes a voice collecting device according to the present invention in further detail with reference to the accompanying drawings and specific embodiments. The advantages and features of the present invention will become more fully apparent from the following description and appended claims.

The present embodiment provides a voice collecting apparatus, please refer to fig. 1, the voice collecting apparatus includes:

the recorder 1 is used for recording personal information of a sound recorder in advance; the personal information includes name, gender, age, and region. The personal information is used for naming the audio files subsequently, and the subsequent screening or retrieval of the audio files is facilitated. In practical applications, the input device 1 may be a touch screen with an input method or a personal information option, or may be a keyboard.

And the display 2 is used for displaying the entries to be recorded and the display modes of the entries. The display 2 can remind the user of the entry required to be recorded in a prompting mode. The display 2 is provided with a data input interface for importing entries to be recorded. The display 2 is also provided with an entry list selection key and a display mode selection key. The entries can be imported through a data input interface, and a user can select the entries to be recorded through an entry list selection key and can select a key through a display mode to display in sequence or randomly display. In addition, according to actual needs, function keys for adding, deleting, searching or modifying the entries can be further arranged on the display 2.

And the audio collector 3 is used for collecting the voice sent by the sound recorder according to the entry in the display 2 according to the set sampling frequency, the set sampling digit and the set number of the sound channels. The sampling frequency of the audio collector 3 can be set to 16KHz, the number of sampling bits can be set to 16 bits (high fidelity tone quality), and the number of channels can be set to mono. Of course, the sampling frequency, the number of sampling bits, and the number of channels may be set to other values as necessary.

And the audio processor 4 is used for identifying the voice collected by the audio collector 3 and comparing the identified voice with the entries displayed by the display 2. The audio processor 4 comprises a pause detection part and an entry comparison part; the pause detection part is used for detecting whether the speech collected by the audio collector 3 pauses for a preset time, and if yes, stopping collecting the speech and performing speech recognition. The pause detector judges whether the speech is silent (pause) or not based on the energy of each frame of speech data, and judges that the silence (pause) occurs when the energy of each frame is relatively small. And if the mute time reaches the preset time (such as 2s), stopping voice acquisition and starting to recognize the acquired voice.

The vocabulary entry comparison part is used for comparing the voice recognized by the pause detection part with the vocabulary entries displayed by the display 2, judging whether the voice is consistent with the vocabulary entries, if so, carrying out audio annotation and transmitting the voice annotation to the memory 5, and the display 2 displays the next vocabulary entry in sequence; if not, the speech is discarded and the display 2 repeats the display of the entry.

And the memory 5 is used for automatically storing the audio files transmitted by the audio processor 4 and naming the audio files in a manner of entry-personal information. If the personal information of the recorder is sex male, age 18, and hang state in Zhejiang, the entries being recorded are turned on for illumination, then the audio file is named as: dakaizhaoming-Y18-zhe A-X (X for female, Y for male, number for age, zhe Hangzhou, Zhejiang, X for the number of recordings of the entry).

The operation of the speech acquisition device is briefly described as follows:

firstly, recording personal related information of a recorder such as gender, age, region and the like, reading out the entry by the recorder according to the entry displayed by the display, collecting and judging voice by the audio collector and the audio processor (mainly judging whether the recorder pauses or not and whether the spoken entry is the same as the entry prompted by the display or not), stopping recording and starting voice recognition on the recording when the audio processor judges that the recorder pauses, judging whether the recording content is consistent with the entry prompted by the display or not, storing the recording and naming and storing according to the recorded gender, age and region of the recorder according to naming rules, and displaying the next entry by the display after the storage is finished. If the recorded content is not consistent with the entry prompted by the display, the audio is discarded, and the display repeatedly displays the entry. The audio processor also needs to judge whether the entry displayed on the display is the last entry, and if so, the recording is finished.

On one hand, the voice acquisition equipment in the embodiment can realize high-efficiency voice acquisition, so that the acquisition efficiency is greatly improved; on the other hand, the collected voice files are automatically named according to the personal information of the sound recorder, and people do not need to name the voice files one by one, so that the labor cost is greatly reduced.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments. Even if various changes are made to the present invention, the changes are still within the scope of the present invention if they fall within the scope of the claims and their equivalents.

Claims

1. A speech acquisition device, comprising:

2. The speech acquisition device of claim 1 wherein the input device is a touch screen with input methods or personal information options.

3. The speech acquisition device of claim 1 wherein the input is a keyboard.

4. The speech acquisition device of claim 1 wherein the display is provided with a data input interface for importing entries to be recorded.

5. The speech sound pickup device according to claim 4, wherein an entry list selection key and a display mode selection key are provided on the display.

6. The speech acquisition device of claim 1 wherein the audio processor comprises a pause detection element and an entry comparison element;