CN106340291A

CN106340291A - Bilingual subtitle making method and system

Info

Publication number: CN106340291A
Application number: CN201610860011.6A
Authority: CN
Inventors: 王金龙; 丁小响
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2016-09-27
Filing date: 2016-09-27
Publication date: 2017-01-18

Abstract

The invention is suitable for the technical field of computers, and provides a bilingual subtitle making method and a bilingual subtitle making system, wherein the method comprises the following steps: the method comprises the steps of receiving an audio and video file input by a user, extracting audio in the audio and video file, dividing the audio into a plurality of audio segments, recording time information of each audio segment, carrying out voice recognition on the plurality of audio segments to generate a subtitle text belonging to a first language, translating the subtitle text belonging to the first language into a subtitle text belonging to a second language, and outputting the subtitle text belonging to the first language and the subtitle text belonging to the second language according to the time information. The method and the device realize the automatic addition of the bilingual subtitles to the non-subtitle audio and video under the condition of no text draft, solve the problems of low efficiency, long time consumption and labor consumption of the bilingual subtitle production in the prior art, and reduce the production cost of the bilingual subtitles.

Description

A method and system for making bilingual subtitles

技术领域technical field

本发明属于计算机技术领域，尤其涉及一种双语字幕制作方法及系统。The invention belongs to the technical field of computers, and in particular relates to a method and system for making bilingual subtitles.

背景技术Background technique

随着网络视频技术的发展，音乐、电台等音频节目以及电视剧、电影、综艺、网络直播等视频节目成为人们业余生活必不可少的部分，其中来自韩国、日本、美国的电视剧、电影、综艺等节目备受欢迎。人们既要欣赏这些原汁原味的音视频节目，又要很好地理解这些节目的语言意思，就离不开双语字幕的帮助。With the development of network video technology, audio programs such as music and radio stations, as well as video programs such as TV dramas, movies, variety shows, and live webcasts have become an indispensable part of people's spare time life. Among them, TV dramas, movies, variety shows, etc. The program is very popular. People not only want to appreciate these original audio and video programs, but also understand the language meaning of these programs well, and they cannot do without the help of bilingual subtitles.

然而，现有的双语字幕制作技术是通过字幕人员听音频记录字幕文本，人工翻译，使用字幕制作软件，手动添加时间轴，才得到最后的字幕文件。双语字幕制作的现有技术存在效率低、耗时长、耗费人力的缺点，无法满足用户随时为无字幕视频生成字幕的需求。However, in the existing bilingual subtitle production technology, subtitle personnel listen to the audio and record the subtitle text, manually translate, use subtitle production software, and manually add a time axis to obtain the final subtitle file. The existing technology of bilingual subtitle production has the disadvantages of low efficiency, long time consumption, and manpower consumption, and cannot meet the needs of users to generate subtitles for unsubtitled videos at any time.

发明内容Contents of the invention

本发明的目的在于提供一种双语字幕制作方法及系统，旨在解决由于现有技术无法提供一种高效的双语字幕制作方法，导致双语字幕制作效率低、耗费时间和人力。The purpose of the present invention is to provide a method and system for making bilingual subtitles, aiming at solving the problem of low efficiency, time-consuming and manpower-consuming bilingual subtitle production due to the inability to provide an efficient bilingual subtitle production method in the prior art.

一方面，本发明提供了一种双语字幕制作方法，所述方法包括下述步骤：On the one hand, the present invention provides a kind of bilingual subtitle production method, described method comprises the following steps:

接收用户输入的音视频文件，提取所述音视频文件中的音频；Receive the audio and video files input by the user, and extract the audio in the audio and video files;

将所述音频分割为多个音频段，记录每个音频段的时间信息；The audio is divided into multiple audio segments, and the time information of each audio segment is recorded;

对所述多个音频段进行语音识别，生成属于第一语种的字幕文本；performing speech recognition on the plurality of audio segments to generate subtitle text belonging to the first language;

将所述属于第一语种的字幕文本翻译为属于第二语种的字幕文本；Translating the subtitle text belonging to the first language into subtitle text belonging to the second language;

根据所述时间信息，输出所述属于第一语种的字幕文本和属于第二语种的字幕文本。According to the time information, the subtitle text belonging to the first language and the subtitle text belonging to the second language are output.

另一方面，本发明提供了一种双语字幕制作系统，所述系统包括：On the other hand, the present invention provides a kind of bilingual subtitle production system, and described system comprises:

音频获取单元，用于接收用户输入的音视频文件，提取所述音视频文件中的音频；The audio acquisition unit is used to receive the audio and video files input by the user, and extract the audio in the audio and video files;

音频分割单元，用于将所述音频分割为多个音频段，记录每个音频段的时间信息；an audio segmentation unit, configured to divide the audio into multiple audio segments, and record the time information of each audio segment;

语音识别单元，用于对所述多个音频段进行语音识别，生成属于第一语种的字幕文本；A speech recognition unit, configured to perform speech recognition on the plurality of audio segments to generate subtitle text belonging to the first language;

文本翻译单元，用于将所述属于第一语种的字幕文本翻译为属于第二语种的字幕文本；以及a text translation unit for translating the subtitle text belonging to the first language into subtitle text belonging to the second language; and

双语字幕生成单元，用于根据所述时间信息，输出所述属于第一语种的字幕文本和属于第二语种的字幕文本。The bilingual subtitle generating unit is configured to output the subtitle text belonging to the first language and the subtitle text belonging to the second language according to the time information.

本发明在接收用户输入的音视频文件后，提取音视频文件中的音频，将音频分割为多个音频段，记录每个音频段的时间信息，对多个音频段进行语音识别，生成属于第一语种的字幕文本，将属于第一语种的字幕文本翻译为属于第二语种的字幕文本，根据时间信息，输出属于第一语种的字幕文本和属于第二语种的字幕文本，从而提高了双语字幕的生成效率，降低了双语字幕的制作成本，实现为无字幕视频自动、快速地提供双语字幕。After receiving the audio and video files input by the user, the present invention extracts the audio in the audio and video files, divides the audio into multiple audio segments, records the time information of each audio segment, performs speech recognition on the multiple audio segments, and generates For the subtitle text in one language, translate the subtitle text belonging to the first language into the subtitle text belonging to the second language, and output the subtitle text belonging to the first language and the subtitle text belonging to the second language according to the time information, thereby improving the bilingual subtitle text. The production efficiency of bilingual subtitles is reduced, and bilingual subtitles can be automatically and quickly provided for videos without subtitles.

附图说明Description of drawings

图1是本发明实施例一提供的双语字幕制作方法的实现流程图；Fig. 1 is the implementation flowchart of the bilingual subtitle production method that the embodiment of the present invention provides;

图2是本发明实施例二提供的双语字幕制作方法中将音频分割为多个音频段，记录每个音频段的时间信息步骤的实现流程图；Fig. 2 is the implementation flowchart of dividing the audio into a plurality of audio segments and recording the time information steps of each audio segment in the bilingual subtitle production method provided by Embodiment 2 of the present invention;

图3是本发明实施例三提供的双语字幕制作方法中对多个音频段进行语音识别，生成属于第一语种的字幕文本步骤的实现流程图；Fig. 3 is the implementation flowchart of the step of performing speech recognition on multiple audio segments and generating subtitle text belonging to the first language in the bilingual subtitle production method provided by Embodiment 3 of the present invention;

图4是本发明实施例四提供的双语字幕制作方法中将属于第一语种的字幕文本翻译为属于第二语种的字幕文本的步骤的实现流程图；4 is a flow chart of the steps of translating the subtitle text belonging to the first language into the subtitle text belonging to the second language in the bilingual subtitle production method provided by Embodiment 4 of the present invention;

图5是本发明实施例五提供的双语字幕制作系统的结构图；以及FIG. 5 is a structural diagram of a bilingual subtitle production system provided by Embodiment 5 of the present invention; and

图6是本发明实施例六提供的双语字幕制作系统的结构图；FIG. 6 is a structural diagram of a bilingual subtitle production system provided by Embodiment 6 of the present invention;

具体实施方式detailed description

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

以下结合具体实施例对本发明的具体实现进行详细描述：The specific realization of the present invention is described in detail below in conjunction with specific embodiment:

实施例一：Embodiment one:

图1示出了本发明实施例一提供的双语字幕制作方法的实现流程，为了便于说明，仅示出了与本发明实施例相关的部分，详述如下：Fig. 1 shows the implementation process of the bilingual subtitle production method provided by Embodiment 1 of the present invention. For the convenience of description, only the parts relevant to the embodiment of the present invention are shown, and the details are as follows:

在步骤S101中，接收用户输入的音视频文件，提取音视频文件中的音频。In step S101, an audio and video file input by a user is received, and audio in the audio and video file is extracted.

本发明实施例适用于支持音视频播放的计算机、智能手机等音视频播放设备，其中，音视频文件为无字幕的音频或者视频，当音视频文件为视频文件时，需分离出该视频文件的音频流，将音频流保存为音频文件，方便后续的分割和识别操作。The embodiment of the present invention is applicable to audio and video playback devices such as computers and smart phones that support audio and video playback, wherein the audio and video files are audio or video without subtitles, and when the audio and video files are video files, the video files need to be separated Audio stream, save the audio stream as an audio file to facilitate subsequent segmentation and recognition operations.

优选地，提取出音视频文件中的音频后可对该音频进行预处理，预处理方式包括音频信号去噪、分贝值设置等，以去除音频中的噪音干扰和弱化背景音，从而使得音频中的发音更为清晰，得到更适于语音识别的音频。Preferably, after extracting the audio in the audio and video file, the audio can be preprocessed. The preprocessing method includes audio signal denoising, decibel value setting, etc., to remove noise interference and weaken the background sound in the audio, so that in the audio The pronunciation is clearer, resulting in audio that is more suitable for speech recognition.

在步骤S102中，将音频分割为多个音频段，记录每个音频段的时间信息。In step S102, the audio is divided into multiple audio segments, and the time information of each audio segment is recorded.

在本发明实施例中，预先设定语音停顿时间间隔阈值和播放该音频的音视频播放设备的显示屏幕的宽度阈值，依据这两个阈值对音频进行分割，具体分割方法如实施例二所述，在此不再赘述。分割后得到多个音频段，其中，每个音频段的时间信息包括播放整段音频时该音频段的开始时间和结束时间，开始时间和结束时间用于计算每个音频段的持续时间间隔以及用于制作字幕文本的时间轴。In the embodiment of the present invention, the voice pause time interval threshold and the width threshold of the display screen of the audio and video playback device playing the audio are preset, and the audio is segmented according to these two thresholds. The specific segmentation method is as described in Embodiment 2 , which will not be repeated here. Multiple audio segments are obtained after division, wherein the time information of each audio segment includes the start time and end time of the audio segment when the entire audio segment is played, and the start time and end time are used to calculate the duration interval of each audio segment and A timeline for subtitle text.

在步骤S103中，对多个音频段进行语音识别，生成属于第一语种的字幕文本。In step S103, voice recognition is performed on a plurality of audio segments to generate subtitle text belonging to the first language.

在本发明实施例中，根据音频段对应的语音特征，识别出每个音频段对应的字幕段，进而得到以第一语种表示的字幕文本，在这里第一语种即为音频段的原有语种。In the embodiment of the present invention, according to the voice features corresponding to the audio segment, the subtitle segment corresponding to each audio segment is identified, and then the subtitle text represented by the first language is obtained, where the first language is the original language of the audio segment .

在步骤S104中，将属于第一语种的字幕文本翻译为属于第二语种的字幕文本。In step S104, the subtitle text belonging to the first language is translated into the subtitle text belonging to the second language.

在本发明实施例中，预先设定双字幕中另一字幕所属语种，为了便于描述，在这里将该语种称为第二语种。在得到属于第一语种的字幕文本后，可通过能够实现多语言互译的翻译程序或系统，将第一语种的字幕文本翻译为第二语种的字幕文本。In the embodiment of the present invention, the language of the other subtitle in the double subtitle is preset, and for the convenience of description, this language is referred to as the second language here. After the subtitle text in the first language is obtained, the subtitle text in the first language can be translated into the subtitle text in the second language through a translation program or system capable of inter-translation among multiple languages.

在步骤S105中，根据时间信息，输出属于第一语种的字幕文本和属于第二语种的字幕文本。In step S105, according to the time information, the subtitle text belonging to the first language and the subtitle text belonging to the second language are output.

在本发明实施例中，时间信息是指每个音频段的时间信息，可包括每个音频段在完整音频播放过程中对应的开始时间和结束时间，根据开始时间和结束时间，分别为属于第一语种的字幕文本和属于第二语种的字幕文本建立时间轴，时间轴用于字幕显示和视频播放两者间的校准，根据属于第一语种的字幕文本和属于第二语种的字幕文本的时间轴的对应关系，同步输出两种语种的字幕内容，最终得到双语字幕，从而实现双语字幕的自动生成或制作，提高了双语字幕的制作效率。In the embodiment of the present invention, the time information refers to the time information of each audio segment, which may include the corresponding start time and end time of each audio segment in the complete audio playback process. According to the start time and end time, respectively belong to the first The subtitle text of one language and the subtitle text of the second language establish a time axis, and the time axis is used for calibration between subtitle display and video playback, according to the time of the subtitle text of the first language and the subtitle text of the second language The corresponding relationship between axes, synchronously output subtitle content in two languages, and finally get bilingual subtitles, so as to realize the automatic generation or production of bilingual subtitles, and improve the production efficiency of bilingual subtitles.

实施例二：Embodiment two:

图2示出了本发明实施例二提供的双语字幕制作方法中将音频分割为多个音频段，记录每个音频段的时间信息的实现流程，为了便于说明，仅示出了与本发明实施例相关的部分，详述如下：Figure 2 shows the implementation process of dividing the audio into multiple audio segments and recording the time information of each audio segment in the bilingual subtitle production method provided by Embodiment 2 of the present invention. The relevant parts of the example are described in detail below:

在步骤S201中，根据音频的语音停顿，对音频进行初步分割。In step S201, the audio is preliminarily segmented according to the speech pause of the audio.

在本发明实施例中，预先设定用于分割音频的语音停顿时间间隔阈值。由于人进行说话时通常在两句话之间会有短暂的停顿，因此，可检测音频中语音停顿处或者静音处，计算停顿或静音持续的时间间隔，当该时间间隔大于预先设定的语音停顿时间间隔阈值时，对音频分割。从而实现依据语音停顿对音频进行分割的操作，可有效提高后续对音频段进行语音识别的准确率。In the embodiment of the present invention, a speech pause time interval threshold for segmenting audio is preset. Since people usually have a short pause between two sentences when speaking, it is possible to detect the pause or silence in the audio, and calculate the time interval of the pause or silence. When the time interval is greater than the preset voice Split the audio when the pause interval threshold is reached. In this way, the operation of segmenting the audio according to the pause of speech can be realized, which can effectively improve the accuracy of the subsequent speech recognition of the audio segment.

在步骤S202中，根据显示屏幕的宽度，对初步分割后的音频进行二次分割和合并，以得到多个音频段，并记录每个音频端的时间信息。In step S202, according to the width of the display screen, the pre-segmented audio is divided and merged twice to obtain multiple audio segments, and the time information of each audio end is recorded.

在本发明实施例中，音频经初步分割后得到持续时间间隔长短不一的音频段。有些持续时间间隔过长的音频段超出播放该视频的音视频显示设备的显示屏幕的宽度，无法完整显示。有些持续时间间隔过短的音频段单独显示时播放时间太短而不利于观看。因此，可预先设置播放该音频的音视频播放设备的显示屏幕的宽度阈值，根据显示屏幕的宽度阈值设置最长音频时间间隔阈值和最短音频时间间隔阈值。遍历初步分割后得到的音频段，对持续时间间隔大于最长音频时间间隔阈值的音频段进行二次分割，为避免二次分割后得到持续时间间隔小于最短音频时间间隔的音频段，二次分割可采用平均分割的方法。对初步分割后持续时间间隔小于最短音频时间间隔阈值的音频段，查询该音频段相邻音频段的持续时间间隔，将该音频段和持续时间间隔较短的相邻音频段合并。最终，经过初步分割、二次分割、合并这三步操作得到多个音频段，可有效提高后续对音频段进行语音识别的准确率。In the embodiment of the present invention, after the audio is initially divided, audio segments with different duration intervals are obtained. Some audio segments with long duration intervals exceed the width of the display screen of the audio and video display device playing the video, and cannot be completely displayed. Some audio segments with short duration intervals are too short to be viewed individually. Therefore, the width threshold of the display screen of the audio and video playback device playing the audio can be preset, and the longest audio time interval threshold and the shortest audio time interval threshold can be set according to the width threshold of the display screen. Traverse the audio segments obtained after the initial segmentation, and perform secondary segmentation on the audio segments whose duration interval is greater than the threshold of the longest audio time interval. The average split method can be used. For an audio segment whose duration interval is smaller than the shortest audio time interval threshold after preliminary segmentation, query the duration interval of adjacent audio segments of the audio segment, and merge the audio segment with the adjacent audio segment with a shorter duration interval. Finally, multiple audio segments are obtained through the three-step operation of primary segmentation, secondary segmentation, and merging, which can effectively improve the accuracy of subsequent speech recognition of audio segments.

音频段的时间信息包括该音频段在完整音频播放过程中对应的开始时间以及结束时间，此时间信息用于分割操作中计算音频段的持续时间间隔以及用于后续字幕文本时间轴的制作。The time information of the audio segment includes the corresponding start time and end time of the audio segment during the entire audio playback process. This time information is used to calculate the duration interval of the audio segment in the segmentation operation and for the production of the subsequent subtitle text time axis.

优选地，为了避免分割出现错误以及减少分割过程中的误差，在初步分割以及二次分割过程中，可只对音频段的开始时间和结束时间进行标记，并不进行实际的分割行为，这样，在首次标记后可进一步对标记点不断进行调整，从而可对分割后的音频段再次进行多次调整，调整完成后再进行实际的分割操作。Preferably, in order to avoid segmentation errors and reduce errors in the segmentation process, in the preliminary segmentation and secondary segmentation process, only the start time and end time of the audio segment can be marked, and the actual segmentation behavior is not performed. Like this, After the first marking, the marking points can be further adjusted continuously, so that the divided audio segment can be adjusted multiple times again, and the actual division operation can be performed after the adjustment is completed.

实施例三：Embodiment three:

图3示出了本发明实施例三提供的双语字幕制作方法中对多个音频段进行语音识别，生成属于第一语种的字幕文本步骤的实现流程，为了便于说明，仅示出了与本发明实施例相关的部分，详述如下：Fig. 3 shows the implementation process of performing voice recognition on multiple audio segments and generating subtitle texts belonging to the first language in the bilingual subtitle production method provided by Embodiment 3 of the present invention. The relevant part of the embodiment is described in detail as follows:

在步骤S301中，在预先构建的高频语音识别库中对待识别的音频段进行匹配。In step S301, the audio segment to be recognized is matched in the pre-built high-frequency speech recognition library.

在本发明实施例中，提取待识别音频段的语音特征，将该语音特征输入到预先构建的高频语音识别库，和高频语音识别库中的语音特征进行匹配，得到语音识别结果。预先构建的高频语音识别库收集了常见的语音和该语音对应的识别结果，从而提高语音识别的准确率和效率。In the embodiment of the present invention, the speech features of the audio segment to be recognized are extracted, the speech features are input into the pre-built high-frequency speech recognition library, and the speech features in the high-frequency speech recognition library are matched to obtain the speech recognition result. The pre-built high-frequency speech recognition library collects common speech and the corresponding recognition results of the speech, thereby improving the accuracy and efficiency of speech recognition.

在步骤S302中，当对待识别的音频段匹配成功时，获取待识别的音频段对应的属于第一语种的字幕文本。In step S302, when the audio segment to be identified is successfully matched, the subtitle text in the first language corresponding to the audio segment to be identified is acquired.

在本发明实施例中，当对待识别的音频段匹配成功时，以第一语种记录语音识别的结果，得到属于第一语种的字幕文本。当待识别的音频段匹配不成功时，可将待识别语音段输入预设的语音识别系统进行识别。优选地，预设的语音识别系统可以为优同科技语音识别系统、科大讯飞语音识别系统，从而可有效提高识别结果的准确率。In the embodiment of the present invention, when the audio segment to be recognized is successfully matched, the speech recognition result is recorded in the first language, and the subtitle text belonging to the first language is obtained. When the audio segment to be recognized is unsuccessfully matched, the speech segment to be recognized can be input into a preset voice recognition system for recognition. Preferably, the preset voice recognition system can be the voice recognition system of Youtong Technology and the voice recognition system of HKUST Xunfei, so that the accuracy of the recognition result can be effectively improved.

实施例四：Embodiment four:

图4示出了本发明实施例四提供的双语字幕制作方法中将属于第一语种的字幕文本翻译为属于第二语种的字幕文本的步骤的实现流程，为了便于说明，仅示出了与本发明实施例相关的部分，详述如下：Fig. 4 shows the implementation flow of the step of translating the subtitle text belonging to the first language into the subtitle text belonging to the second language in the bilingual subtitle production method provided by Embodiment 4 of the present invention. The parts related to the embodiments of the invention are described in detail as follows:

在步骤S401中，将属于第一语种的字幕文本划分为多个文本段。In step S401, the subtitle text belonging to the first language is divided into multiple text segments.

在步骤S402中，在预先构建的高频词翻译库中对待翻译的文本段进行匹配。In step S402, the text segment to be translated is matched in the pre-built high-frequency word translation database.

在本发明实施例中，对属于第一语种的字幕文本进行划分，得到便于翻译的文本段，将待翻译的文本段输入到预先构建的高频词翻译库中，得到翻译结果。预先构建的高频词翻译库中收集了常见的词汇或短语，以及这些词汇和短语对应的翻译结果，从而可有效提高文本翻译的准确率和效率。In the embodiment of the present invention, the subtitle text belonging to the first language is divided to obtain text segments that are easy to translate, and the text segments to be translated are input into a pre-built high-frequency word translation database to obtain translation results. The pre-built high-frequency word translation database collects common words or phrases, as well as the corresponding translation results of these words and phrases, which can effectively improve the accuracy and efficiency of text translation.

在步骤S403中，当对待翻译的文本段匹配成功时，获取待翻译的文本段对应的属于第二语种的字幕文本。In step S403, when the text segment to be translated is successfully matched, the subtitle text corresponding to the text segment to be translated and belonging to the second language is acquired.

在本发明实施例中，当对待翻译的文本段匹配成功时，以第二语种记录翻译结果，得到属于第二语种的字幕文本。当待翻译的文本段未匹配成功时，可将待翻译文本输入预设的翻译系统。优选地，预设的翻译系统可以为google在线翻译系统、有道在线翻译系统等，从而可有效提高翻译结果的准确率。In the embodiment of the present invention, when the text segment to be translated is successfully matched, the translation result is recorded in the second language, and the subtitle text belonging to the second language is obtained. When the text segment to be translated is not successfully matched, the text to be translated can be input into a preset translation system. Preferably, the preset translation system can be google online translation system, Youdao online translation system, etc., which can effectively improve the accuracy of translation results.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，所述的程序可以存储于一计算机可读取存储介质中，所述的存储介质，如ROM/RAM、磁盘、光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage Media such as ROM/RAM, magnetic disk, optical disk, etc.

实施例五：Embodiment five:

图5示出了本发明实施例五提供的双语字幕制作系统的结构，为了便于说明，仅出示与本发明实施例有关的部分。FIG. 5 shows the structure of the bilingual subtitle production system provided by the fifth embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown.

在本发明实施例中，双语字幕制作系统包括音频获取单元51、音频分割单元52、音频识别单元53、文本翻译单元54、双语字幕生成单元55，其中：In an embodiment of the present invention, the bilingual subtitle production system includes an audio acquisition unit 51, an audio segmentation unit 52, an audio recognition unit 53, a text translation unit 54, and a bilingual subtitle generation unit 55, wherein:

音频获取单元51，用于接收用户输入的音视频文件，提取音视频文件中的音频。The audio acquisition unit 51 is configured to receive audio and video files input by the user, and extract audio from the audio and video files.

音频分割单元52，用于将音频分割为多个音频段，记录每个音频段的时间信息。The audio segmentation unit 52 is configured to divide the audio into multiple audio segments, and record the time information of each audio segment.

语音识别单元53，用于对多个音频段进行语音识别，生成属于第一语种的字幕文本。The voice recognition unit 53 is configured to perform voice recognition on a plurality of audio segments to generate subtitle text belonging to the first language.

文本翻译单元54，用于将属于第一语种的字幕文本翻译为属于第二语种的字幕文本。The text translation unit 54 is configured to translate the subtitle text belonging to the first language into the subtitle text belonging to the second language.

双语字幕生成单元55，用于根据时间信息，输出属于第一语种的字幕文本和属于第二语种的字幕文本。The bilingual subtitle generation unit 55 is configured to output the subtitle text belonging to the first language and the subtitle text belonging to the second language according to the time information.

在本发明实施例中，双语字幕制作系统的各单元可由相应的硬件或软件单元实现，各单元可以为独立的软、硬件单元，也可以集成为计算机设备或系统的一个软、硬件单元，在此不用以限制本发明。双语字幕制作系统的各单元的具体实施方式可参考实施例一对应步骤的描述，在此不再赘述。In the embodiment of the present invention, each unit of the bilingual subtitle production system can be realized by corresponding hardware or software units, and each unit can be an independent software and hardware unit, or can be integrated into a software and hardware unit of a computer device or system. This is not intended to limit the invention. For the specific implementation manner of each unit of the bilingual subtitle production system, reference may be made to the description of the corresponding steps in Embodiment 1, which will not be repeated here.

实施例六：Embodiment six:

图6示出了本发明实施例六提供的双语字幕制作系统的结构，为了便于说明，仅示出了与本发明实施例相关的部分，该双语字幕制作系统包括：Figure 6 shows the structure of the bilingual subtitle production system provided by Embodiment 6 of the present invention. For ease of description, only the parts related to the embodiment of the present invention are shown. The bilingual subtitle production system includes:

音频获取单元61，用于接收用户输入的音视频文件，提取音视频文件中的音频；Audio acquisition unit 61, for receiving the audio-video file input by the user, extracting the audio in the audio-video file;

音频分割单元62，用于将音频分割为多个音频段，记录每个音频段的时间信息；Audio segmentation unit 62, for audio is divided into a plurality of audio segments, record the time information of each audio segment;

语音识别单元63，用于对多个音频段进行语音识别，生成属于第一语种的字幕文本；Speech recognition unit 63, for carrying out speech recognition to a plurality of audio segments, generating the subtitle text belonging to the first language;

文本翻译单元64，用于将属于第一语种的字幕文本翻译为属于第二语种的字幕文本；以及A text translation unit 64, configured to translate the subtitle text belonging to the first language into the subtitle text belonging to the second language; and

双语字幕生成单元65，用于根据时间信息，输出属于第一语种的字幕文本和属于第二语种的字幕文本。The bilingual subtitle generating unit 65 is configured to output subtitle text belonging to the first language and subtitle text belonging to the second language according to the time information.

在本发明实施例中，优选地，音频分割单元62包括：In the embodiment of the present invention, preferably, the audio segmentation unit 62 includes:

初步处理单元621，用于根据音频的语音停顿，对音频进行初步分割；以及A preliminary processing unit 621, configured to perform preliminary segmentation on the audio according to the speech pause of the audio; and

二次处理单元622，用于根据显示屏幕的宽度，对初步分割后的音频进行二次分割和合并。The secondary processing unit 622 is configured to perform secondary segmentation and merging on the pre-segmented audio according to the width of the display screen.

优选地，语音识别单元63包括：Preferably, the voice recognition unit 63 includes:

语音匹配单元631，用于在预先构建的高频语音库中对待识别的音频进行匹配；以及Voice matching unit 631, for matching the audio to be recognized in the pre-built high-frequency voice library; and

第一文本生成单元632，用于当待识别语音匹配成功时，获取待识别的音频段对应的属于第一语种的字幕文本。The first text generating unit 632 is configured to obtain subtitle text corresponding to the audio segment to be recognized and belonging to the first language when the speech to be recognized is successfully matched.

优选地，文本翻译单元64包括：Preferably, the text translation unit 64 includes:

文本划分单元641，用于将属于第一语种的字幕文本划分为多个文本段；A text division unit 641, configured to divide the subtitle text belonging to the first language into a plurality of text segments;

文本匹配单元642，用于在预先构建的高频词翻译库中对待翻译的文本段进行匹配；以及A text matching unit 642, configured to match the text segment to be translated in the pre-built high-frequency word translation database; and

第二文本生成单元643，用于当对待翻译的文本段匹配成功时，获取待翻译的文本段对应的属于第二语种的字幕文本。The second text generation unit 643 is configured to obtain subtitle text in the second language corresponding to the text segment to be translated when the text segment to be translated is successfully matched.

优选地，双语字幕生成单元65包括：Preferably, the bilingual subtitle generation unit 65 includes:

时间轴制作单元651，用于根据时间信息构建属于第一语种的字幕文本的第一时间轴、属于第二语种的字幕文本的第二时间轴；以及A time axis creation unit 651, configured to construct a first time axis of the subtitle text belonging to the first language and a second time axis of the subtitle text belonging to the second language according to the time information; and

双语字幕生成子单元652，用于根据第一时间轴和第二时间轴的对应关系，同步输出属于第一语种的字幕文本和属于第二语种的字幕文本，以得到双语字幕。The bilingual subtitle generating subunit 652 is configured to synchronously output the subtitle text belonging to the first language and the subtitle text belonging to the second language according to the corresponding relationship between the first time axis and the second time axis, so as to obtain bilingual subtitles.

在本发明实施例中，根据开始时间和结束时间，分别为属于第一语种的字幕文件和属于第二语种的字幕文本建立时间轴，进而根据属于第一语种的字幕文本和属于第二语种的字母文本的时间轴的对应关系，同步输出两种语种的字幕内容，以得到双语字幕，从而提高了双语字幕的生成效率，降低了双语字幕的制作成本，实现为无字幕视频自动、快速地提供双语字幕。In the embodiment of the present invention, according to the start time and the end time, a time axis is respectively established for the subtitle file belonging to the first language and the subtitle text belonging to the second language, and then according to the subtitle text belonging to the first language and the subtitle text belonging to the second language The corresponding relationship between the time axis of the alphabetic text, synchronously output subtitle content in two languages to obtain bilingual subtitles, thereby improving the generation efficiency of bilingual subtitles, reducing the production cost of bilingual subtitles, and realizing automatic and fast provision of subtitle-free videos Bilingual subtitles.

在本发明实施例中，双语字幕系统装置的各单元可由相应的硬件或软件单元实现，各单元可以为独立的软、硬件单元，也可以集成为一个软、硬件单元，在此不用以限制本发明。双语字幕制作系统的各单元的具体实施方式可参考前述实施例对应步骤的描述，在此不再赘述。In the embodiment of the present invention, each unit of the bilingual subtitle system device can be realized by a corresponding hardware or software unit, and each unit can be an independent software and hardware unit, or can be integrated into a software and hardware unit, which is not intended to limit this invention. For the specific implementation manner of each unit of the bilingual subtitle production system, reference may be made to the description of the corresponding steps in the foregoing embodiments, and details are not repeated here.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims

1. a kind of bilingual subtitles preparation method is it is characterised in that methods described comprises the steps:

The audio-video document of receiving user's input, extracts the audio frequency in described audio-video document；

Described audio segmentation is multiple audio sections, records the temporal information of each audio section；

The plurality of audio section is carried out with speech recognition, generates the captioned test belonging to the first languages；

The described captioned test belonging to the first languages is translated as belonging to the captioned test of the second languages；

According to described temporal information, belong to the captioned test of the first languages described in output and belong to the captioned test of the second languages.

2. the method for claim 1, it is characterised in that being multiple audio sections by described audio segmentation, records each sound The step of the temporal information of frequency range, comprising:

According to the speech pause of described audio frequency, primary segmentation is carried out to described audio frequency；

According to the width of display screen, secondary splitting and merging are carried out to the described audio frequency after primary segmentation, described many to obtain Individual audio section, and record the temporal information of each audio section.

3. it is characterised in that carrying out speech recognition to the plurality of audio section, generation belongs to the method for claim 1 The step of the captioned test of the first languages, comprising:

The High frequency speech identification storehouse building in advance is mated to audio section to be identified；

When the match is successful to described audio section to be identified, obtain that described audio section to be identified is corresponding belongs to the first languages Captioned test.

4. the method for claim 1 is it is characterised in that be translated as belonging to the described captioned test belonging to the first languages The step of the captioned test of the second languages, comprising:

The described captioned test belonging to the first languages is divided into multiple text chunks；

The high frequency words translation storehouse building in advance is mated to text chunk to be translated；

When the match is successful to described text chunk to be translated, obtain that described text chunk to be translated is corresponding belongs to the second languages Captioned test.

5. the method for claim 1 is it is characterised in that according to described temporal information, belong to the first languages described in output Captioned test and belong to the second languages captioned test step, comprising:

Belong to the very first time axle of the captioned test of the first languages according to described temporal information builds, belong to the second languages Second time shaft of captioned test；

According to the corresponding relation of described very first time axle and described second time shaft, belong to the word of the first languages described in synchronism output Curtain text and the captioned test belonging to the second languages, to obtain bilingual subtitles.

6. a kind of bilingual subtitles manufacturing system is it is characterised in that described system includes:

Audio frequency acquiring unit, for the audio-video document of receiving user's input, extracts the audio frequency in described audio-video document；

Audio segmentation unit, for being multiple audio sections by described audio segmentation, records the temporal information of each audio section；

Voice recognition unit, for the plurality of audio section is carried out with speech recognition, generates the captioned test belonging to the first languages；

Text translation unit, for being translated as belonging to the captions literary composition of the second languages by the described captioned test belonging to the first languages This；And

Bilingual subtitles signal generating unit, for according to described temporal information, belonging to captioned test and the genus of the first languages described in output Captioned test in the second languages.

7. system as claimed in claim 6 is it is characterised in that described audio segmentation unit includes:

Preliminary treatment unit, for the speech pause according to audio frequency, carries out primary segmentation to described audio frequency；And

After-treatment unit, for the width according to display screen, the described audio frequency after primary segmentation is carried out secondary splitting and Merge, to obtain the plurality of audio section, and record the temporal information of each audio section.

8. system as claimed in claim 6 is it is characterised in that described voice recognition unit includes:

Voice match unit, for mating to audio section to be identified in the High frequency speech identification storehouse building in advance；With And

First text generation unit, for, when successful to described Audio Matching to be identified, obtaining described audio frequency to be identified The corresponding captioned test belonging to the first languages of section.

9. system as claimed in claim 6 is it is characterised in that described text translation unit, comprising:

Text division unit, for being divided into multiple text chunks by the described captioned test belonging to the first languages；

Text matches unit, for mating to text chunk to be translated in the high frequency words translation storehouse building in advance；And

Second text generation unit, for when the match is successful to described text chunk to be translated, obtaining described literary composition to be translated This section of corresponding captioned test belonging to the second languages.

10. system as claimed in claim 6 is it is characterised in that described bilingual subtitles signal generating unit includes:

Time shaft production unit, for according to described temporal information build described in belong to the first languages captioned test first when Countershaft, belong to the second languages captioned test the second time shaft；And

Bilingual subtitles generate subelement, for the corresponding relation according to described very first time axle and described second time shaft, synchronous The captioned test belonging to the first languages described in output and the captioned test belonging to the second languages, to obtain bilingual subtitles.