CN207302623U

CN207302623U - A kind of remote speech processing system

Info

Publication number: CN207302623U
Application number: CN201720914569.8U
Authority: CN
Inventors: 王玮; 谈冰; 崔芳; 朱胜强; 苏文畅; 王兆育; 殷丹丹
Original assignee: Anhui Hear Technology Co Ltd
Current assignee: Anhui Hear Technology Co Ltd
Priority date: 2017-07-26
Filing date: 2017-07-26
Publication date: 2018-05-01
Anticipated expiration: 2027-07-26

Abstract

The utility model discloses a kind of remote speech processing system.The system includes：Electric terminal, for sending the multi-medium data with voice and the corresponding speech processes instruction of the multi-medium data by network；Remote server, it is connected with the electric terminal by network, for receiving the multi-medium data and speech processes instruction, the corresponding word processing of voice in the speech processes instruction generation multi-medium data by the word processing result as a result, and return to the electric terminal.The utility model embodiment, to realize reduction manual sorting, improves the work efficiency for arranging multimedia data contents, while the multimedia data contents real-time display of arrangement is come out by the interaction of electric terminal and remote server.

Description

A kind of remote speech processing system

Technical field

The utility model embodiment is related to speech ciphering equipment technology, more particularly to a kind of remote speech processing system.

Background technology

Currently in the work of the industries such as media, education, shorthand, worker is needed real-time/non-real time multi-medium data Data to arrange is into written form, and substantial amounts of multimedia data contents are generally required to take more time and arranged.

At present, multi-medium data easily, timely cannot be converted into manuscript form by the existing product of in the market, very much When user when recording arranges manuscript, it is necessary to manually remove playback, manuscript is then write, when completing manuscript, it is necessary to text Role in original text distinguishes, then proofreading again, this can expend many manpowers.The subtitle product on existing market is also all desirable at the same time Timing code is manually entered and manually adjusts, the duration that user also needs more than 3 times goes to handle the multi-medium data of same time Subtitle works, and if necessary to arrange bilingual subtitles, that just needs more times to go to translate.

Usually arrange multi-medium data way be manual operation, not only expend a large amount of manpowers work at the same time it is inefficient, And the work values of early period can not be embodied.

Utility model content

The utility model provides a kind of remote speech processing system, to realize reduction manual sorting, improves and arranges multimedia The work efficiency of data content, while the multimedia data contents real-time display of arrangement is come out.

The utility model embodiment provides a kind of remote speech processing system, which includes：

Electric terminal, for sending the multi-medium data with voice and the corresponding voice of the multi-medium data by network Process instruction；

Remote server, is connected with the electric terminal by network, for receiving at the multi-medium data and voice Reason instruction, generates the corresponding word processing of voice in the multi-medium data as a result, and will according to speech processes instruction The word processing result returns to the electric terminal.

The utility model embodiment, to realize reduction manual sorting, is carried by the interaction of electric terminal and remote server Height arranges the work efficiency of multimedia data contents, while the multimedia data contents real-time display of arrangement is come out.

Brief description of the drawings

Fig. 1 is a kind of structure chart of the remote speech processing system provided in the utility model embodiment one；

Fig. 2 is the structure of electric terminal in a kind of remote speech processing system provided in the utility model embodiment one Figure.

Embodiment

It is new to this practicality below in conjunction with the accompanying drawings in order to make the purpose of this utility model, technical solution and advantage clearer Type specific embodiment is described in further detail.It is understood that specific embodiment described herein is used only for solving Release the utility model, rather than the restriction to the utility model.

Embodiment one

Fig. 1 is a kind of structure chart for remote speech system that the utility model embodiment one provides, and the present embodiment is applicable In effectively arrangement multimedia data contents and the situation of real-time display out.

As shown in Figure 1, the system comprises：Electric terminal 110 and remote server 120, wherein：

Electric terminal 110 is used to send the multi-medium data with voice and the corresponding language of the multi-medium data by network Sound process instruction.

Wherein, electric terminal can be but be not limited to mobile terminal (for example, tablet computer, smart mobile phone etc.), dress Equipment (for example, intelligent watch, motion bracelet etc.).

Wherein, network used in system can be public network, LAN or other private network forms.

Wherein, multi-medium data can be voice data and/or video data (referring to the video data with voice).

Remote server 120 is connected with electric terminal 110 by network, for receiving at the multi-medium data and voice Reason instruction, generates the corresponding word processing of voice in the multi-medium data as a result, and will according to speech processes instruction The word processing result returns to electric terminal 110.

The operation principle of the remote speech system：

User sends the multi-medium data with voice using electric terminal 110 by network, and is sent out to remote server 120 Send the multi-medium data corresponding speech processes instruction.Remote server 120 receives the multi-medium data and speech processes Instruction, generates the corresponding word processing of voice in the multi-medium data as a result, and by institute according to speech processes instruction State word processing result and return to the electric terminal 110.

Based on the above technical solutions, electric terminal 110 is additionally operable to receive the text that remote server 120 returns Word processing result.Further, electric terminal 110 further includes display screen, for showing the word processing result.

Sender that can be not only as multi-medium data for electric terminal 110 and word processing result Recipient, can also possess other more rich functions.As shown in Fig. 2, electric terminal 110 further sets microphone 111, use In collection voice data as multi-medium data, generation main body and hair equivalent to electric terminal 110 while as voice data Main body is sent, realizes the local generation, quick transmission and quick processing of voice data, improves the processing speed of voice in voice data Degree.Similarly, camera 112 further can also be set in electric terminal 110, camera 112 and microphone 111 coordinate collection Video data with voice as multi-medium data, equivalent to electric terminal 110 at the same time as video data generation main body with Main body is sent, realizes the local generation, quick transmission and quick processing of video data, improves the processing speed of voice in video data Degree.

For remote server 120, its can built-in a variety of modules for carrying out speech processes, correspondence realizes various languages Sound process instruction.Such as sound identification module is set in remote server 120, for the voice in the multi-medium data Carry out Text region and generate the first word processing result.

Wherein, the voice that sound identification module can be changed directly in the multi-medium data obtains the first word processing knot Fruit, can also carry out word processing to the different role automatic distinguishing of the voice in multi-medium data, realize text role's vocal print It is automatically separated.Voice in multi-medium data has different role characteristics under different speech production scenes.Such as adopting The multi-medium data of visit process generation, the main reporter included in acquisition and multiple and different interviewees；Taught in education The multi-medium data generated during, the main religion including Faculty and Students and the interactive link awarded；In talk shorthand process The multi-medium data of middle generation, mainly includes talker and the memcon by talker.In sound identification module to multimedia When voice in data is identified, the language of identical vocal print feature can be will be provided with to the vocal print feature confirmation of synchronization in voice Sound confirmation sends for same role., can be with for the voice recognition result of same role during voice is identified Add identical role's mark, sound identification module can with so that complete in the multi-medium data of reporter and interviewee's generation Voice carries out being converted to word processing result word by word and sentence by sentence, in chronological order.

On the basis of the first word processing result that sound identification module obtains, remote server 120 can be with built-in Translation module, for the languages for specifying the character translation in the first word processing result into the speech processes, To generate the second word processing result.

Wherein, translation module refers to the character translation in the first word processing result into the speech processes Fixed languages, without adjustment time code, real time translation, to generate the second word processing as a result, improving word processing efficiency.

The presentation mode of various word processing results is instructed according to speech processes and determined, such as the first word processing knot Fruit is text or subtitle file.Certainly, the second word processing result can also be text or subtitle file.

What is recorded in text is in the word content obtained according to speech recognition or the word obtained to speech recognition Hold the word content derived after being translated, text general user individually opens and checks.

The word content also included in text recorded in subtitle file, but word content is built in units of sentence Stood and multi-medium data in time shaft correspondence, can with the broadcasting of multi-medium data simultaneous display.If The generation of some video data has the first word processing result (subtitle file) and the second word processing result (subtitle file), can be with Selected in electric terminal 110 according to user while show multilingual subtitle file.

The technical solution of the present embodiment, by the interaction of electric terminal and remote server, to realize reduction manual sorting, The work efficiency for arranging multimedia data contents is improved, while the multimedia data contents real-time display of arrangement is come out.

Note that it above are only the preferred embodiment and institute's application technology principle of the utility model.Those skilled in the art's meeting Understand, the utility model is not limited to specific embodiment described here, can carry out for a person skilled in the art various bright Aobvious change, readjust and substitute without departing from the scope of protection of the utility model.Therefore, although passing through above example The utility model is described in further detail, but the utility model is not limited only to above example, is not departing from In the case that the utility model is conceived, other more equivalent embodiments can also be included, and the scope of the utility model is by appended Right determine.

Claims

A kind of 1. remote speech processing system, it is characterised in that including：

Electric terminal, for sending the multi-medium data with voice and the corresponding speech processes of the multi-medium data by network Instruction；

Remote server, is connected with the electric terminal by network, is referred to for receiving the multi-medium data and speech processes Order, the corresponding word processing of voice in the speech processes instruction generation multi-medium data is as a result, and by described in Word processing result returns to the electric terminal.
2. remote speech processing system according to claim 1, it is characterised in that the electric terminal is additionally operable to receive institute State the word processing result of remote server return.
3. remote speech processing system according to claim 2, it is characterised in that the electric terminal further includes display Screen, for showing the word processing result.
4. remote speech processing system according to claim 1, it is characterised in that the electric terminal further includes Mike Wind, for gathering voice data as multi-medium data.
5. remote speech processing system according to claim 4, it is characterised in that the electric terminal further includes shooting Head, for coordinating video data of the collection with voice as multi-medium data with microphone.
6. remote speech processing system according to claim 1, it is characterised in that the remote server is known including voice Other module, the first word processing result is generated for carrying out Text region to the voice in the multi-medium data.
7. remote speech processing system according to claim 6, it is characterised in that the remote server further includes translation Module, for the languages for specifying the character translation in the first word processing result into the speech processes, with life Into the second word processing result.
8. the remote speech processing system according to claim 6 or 7, it is characterised in that the first word processing result For text or subtitle file.
9. remote speech processing system according to claim 7, it is characterised in that the second word processing result is text This document or subtitle file.