[go: up one dir, main page]

CN116933806A - Concurrent translation system and concurrent translation terminal - Google Patents

Concurrent translation system and concurrent translation terminal Download PDF

Info

Publication number
CN116933806A
CN116933806A CN202311024945.2A CN202311024945A CN116933806A CN 116933806 A CN116933806 A CN 116933806A CN 202311024945 A CN202311024945 A CN 202311024945A CN 116933806 A CN116933806 A CN 116933806A
Authority
CN
China
Prior art keywords
translation
quality
module
voice
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311024945.2A
Other languages
Chinese (zh)
Other versions
CN116933806B (en
Inventor
黄发洋
李艳雄
席艺涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Yilian Technology Co ltd
Original Assignee
Ningbo Yilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Yilian Technology Co ltd filed Critical Ningbo Yilian Technology Co ltd
Priority to CN202311024945.2A priority Critical patent/CN116933806B/en
Publication of CN116933806A publication Critical patent/CN116933806A/en
Application granted granted Critical
Publication of CN116933806B publication Critical patent/CN116933806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种同传翻译系统及同传翻译终端,涉及翻译系统技术领域,质量评估模块基于质量分析模型综合分析理解数据以及翻译数据,评估当前翻译质量是否合格,当评估结果为当前翻译质量不合格时,调控模块唤醒二次优化模块,二次优化模块选择其它翻译器重新对获取内容进行多次翻译,若重新翻译质量连续超过两次不合格,则提示演讲者重新输入语音,若翻译质量合格,翻译数据发送至文本合成模块,当评估结果为当前翻译质量合格时,调控模块将翻译数据发送至文本合成模块。本发明翻译系统在进行视频会议翻译时,能够实时评估翻译质量并做出处理,有效保障翻译准确性,从而保障会议的稳定进行。

The invention discloses a simultaneous interpretation system and a simultaneous interpretation terminal, which relate to the technical field of translation systems. The quality assessment module comprehensively analyzes understanding data and translation data based on a quality analysis model to evaluate whether the current translation quality is qualified. When the evaluation result is the current translation When the quality is unqualified, the control module wakes up the secondary optimization module, and the secondary optimization module selects other translators to re-translate the obtained content multiple times. If the re-translation quality fails more than two consecutive times, the speaker is prompted to re-enter the voice. If the translation quality is qualified, the translation data is sent to the text synthesis module. When the evaluation result is that the current translation quality is qualified, the control module sends the translation data to the text synthesis module. When the translation system of the present invention performs video conference translation, it can evaluate the translation quality in real time and perform processing, effectively ensuring the accuracy of translation, thereby ensuring the stable progress of the conference.

Description

Concurrent translation system and concurrent translation terminal
Technical Field
The application relates to the technical field of translation systems, in particular to a concurrent translation system and a concurrent translation terminal.
Background
The simultaneous interpretation system, also called as 'simultaneous interpretation system', is a technical tool specially designed for real-time interpretation and interpretation, and aims to realize instant and accurate language interpretation in the occasions of lectures, conferences, negotiations and the like, so that people with different language backgrounds can effectively communicate and understand each other;
in the middle of the 20 th century, multilingual communication is becoming more and more common in international conferences and other activities, but language barriers become a significant problem, and the traditional continuous interpretation requires a long time, so that communication is easily interrupted, and the occurrence of a simultaneous interpretation system aims to solve the problem.
The prior art has the following defects:
when an emergency situation occurs (such as a serious event of an enterprise), the enterprise holds an emergency video conference and can not configure translation staff in time, and auxiliary translation is needed to be performed through a concurrent translation system at this time, however, when the conventional concurrent translation system performs real-time video conference translation, the translation quality is not evaluated, so that inaccurate translation results or wrong translation are easily caused, and the conference is affected.
Disclosure of Invention
The application aims to provide a concurrent translation system and a concurrent translation terminal, which are used for solving the defects in the background technology.
In order to achieve the above object, the present application provides the following technical solutions: the simultaneous transmission translation system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;
a voice input module: converting speech input of a presenter into digitized speech data;
and a voice recognition module: converting the voice data into a text form;
semantic understanding module: semantic analysis and understanding are carried out on the identified text, and the intention of a lecturer and expressed content are obtained;
and a translation module: converting the text of the source language into the text of the target language, and performing language translation;
the quality evaluation module: comprehensively analyzing the understanding data and the translation data based on the quality analysis model, and evaluating whether the current translation quality is qualified;
and a regulation and control module: when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module;
and a secondary optimization module: repeatedly translating the acquired content, prompting a speaker to re-input voice if the re-translating quality is continuously more than twice and is unqualified, and transmitting translation data to a text synthesis module if the translating quality is qualified;
and a text synthesis module: converting the translated target language text into voice data;
and the voice output module is used for: delivering the synthesized voice data to a listener through an audio output device;
a user interface module: and displaying the prompt information to a user.
Preferably, the understanding data comprises a voice correct recognition rate, and the translation data comprises a translation result similarity index, word level matching degree and a network packet loss rate during translation.
Preferably, the mass analysis model establishment includes the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
wherein zqy is a voice correct recognition rate, xsf is a translation result similarity index, jpf is word level matching degree, dbw is a network packet loss rate during translation, and α, β, γ, δ are respectively the voice correct recognition rate, the translation result similarity index, the word level matching degree, and the proportionality coefficient of the network packet loss rate during translation, and α, β, γ, δ are all greater than 0;
after the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
Preferably, after the quality evaluation module obtains the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, the evaluation of whether the current translation quality is qualified or not based on the analysis of the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation by the quality analysis model comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
Preferably, the calculation expression of the voice correct recognition rate is:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, and cw is the voice unrecognizable word number.
Preferably, the calculation expression of the translation result similarity index is:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total number of N-grams in the candidate texts.
Preferably, the word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
where P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter that balances the exact match rate and the recall rate.
Preferably, the calculation expression of the F1 score F is:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation.
Preferably, the calculation expression of the network packet loss rate during translation is:
where dsb is the number of packets lost and zfb is the total number of packets sent.
The application also provides a concurrent translation terminal and a concurrent translation system.
In the technical scheme, the application has the technical effects and advantages that:
1. according to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a block diagram of a system according to the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1: referring to fig. 1, the simultaneous transmission translation system of the present embodiment includes a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation module, a secondary optimization module, a text synthesis module, a voice output module, and a user interface module;
a voice input module: the module is used to convert the speech input of the presenter into digitized speech data, typically implemented by a microphone or other speech input device, which is sent to the speech recognition module.
And a voice recognition module: the module converts the voice data into a text form, namely converts the voice data of a presenter into corresponding characters, and sends the recognition text to the semantic understanding module;
pretreatment: the collected audio signals may contain noise, echo and other interference, and preprocessing is needed to improve the accuracy of voice recognition; the preprocessing may include denoising, audio enhancement, etc.;
feature extraction: converting the audio signal into a mathematical feature representation is a key step in speech recognition; audio signals are typically converted into a series of eigenvectors using techniques such as mel-frequency cepstral coefficients (MFCCs);
acoustic model: the acoustic model is an important component of speech recognition, and is a trained model for mapping feature vectors to units of text at the phoneme or subword level; common acoustic models include Hidden Markov Models (HMMs) and deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and the like;
decoding: in the decoding stage, the speech recognition system uses the probability distribution generated by the acoustic model and the assistance of the language model to find the most probable text sequence; the decoding process generally uses techniques such as viterbi algorithm;
post-treatment: the decoded text sequence may contain erroneous or unnatural parts, and the post-processing step may further optimize recognition results, such as performing spelling correction, grammar correction, etc.;
outputting text: finally, the speech recognition module outputs the decoded text sequence as a textual representation of the speech content of the presenter.
Semantic understanding module: the module performs semantic analysis and understanding on the identified text to acquire the intention of a speaker and expressed content, the acquired content is sent to the translation module and the secondary optimization module, and understanding data is sent to the quality evaluation module;
lexical analysis: dividing the recognized text into words or phrases, performing lexical analysis, and determining the part of speech, morphology and basic attribute of each word;
syntax analysis: syntactic analysis is a process of analyzing sentence structure, and determining grammatical relation and hierarchical structure among words; this helps to understand the main predicate relationships, modifier relationships, etc. in sentences;
semantic role labeling: semantic role labeling marks each word in a sentence as a different semantic role, such as a subject, an action, an object, etc.; this helps capture semantic relationships and logical structures in sentences;
named entity identification: identifying named entities in the text, such as person names, place names, organization names, etc., to help understand specific information in sentences;
dependency analysis: the dependency analysis is to analyze the dependency relationship between words and determine the relationship between each word and other words in sentences; this can help understand the structure and meaning of sentences;
semantic parsing: semantic parsing is a process of converting sentences into semantic representations, and modeling the relationship between words in the sentences and sentence meanings; this helps capture semantic information of sentences;
intent analysis: deducing the intention and the purpose of a presenter according to the semantic representation of the sentence; this may involve an operation, action, request, etc. extracted from the sentence;
emotion analysis: in some cases, the semantic understanding module may also need emotion analysis to determine emotion colors expressed in sentences to better understand the emotion attitudes of the presenter.
And a translation module: the module converts the text of the source language into the text of the target language to realize the translation function of the language, and the module can use machine translation technology such as statistical machine translation or neural machine translation, and the translation data is sent to the quality evaluation module;
pretreatment: before machine translation is performed, preprocessing is required to be performed on the source language text, including word segmentation, punctuation removal, lower case conversion and the like; these steps help provide better input data to the translation model;
feature extraction (for SMT): in statistical machine translation, it is necessary to convert the source language text into a feature vector representation; this typically involves the use of vocabularies, phrase tables, and language models;
coding (for NMT): in neural machine translation, the source language text is encoded as a continuous vector representation, for example using a Recurrent Neural Network (RNN) or a transducer encoder;
decoding: decoding is the process of converting the feature vector or encoded representation into target language text; in statistical machine translation, phrase translation tables and language models may be used for decoding; in neural machine translation, a decoder is used to generate target language text;
and (3) generating a translation result: in the decoding process, generating text of a target language, which can be word, phrase or subword level;
post-treatment: the generated target language text may need post-processing, such as re-word segmentation, case processing, etc., to obtain a more natural translation result;
outputting the target language text: finally, the translation module outputs the generated target language text as a translation result.
The quality evaluation module: and comprehensively analyzing the understanding data and the translation data based on the quality analysis model, evaluating whether the current translation quality is qualified or not, and transmitting an evaluation result to the regulation and control module.
And a regulation and control module: and when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module.
And a secondary optimization module: and selecting other translators to translate the acquired content for multiple times, if the re-translation quality is continuously more than two times of disqualification, prompting a presenter to re-input voice, and if the translation quality is qualified, transmitting translation data to a text synthesis module and prompting information to a user interface module.
And a text synthesis module: the module converts the translated target language text into speech data for delivery to a listener, typically using text-to-speech synthesis techniques;
text analysis: firstly, analyzing a translated target language text, and knowing information such as content, mood, emotion and the like of the text; this helps to determine the proper pronunciation, intonation, and speed of speech;
speech synthesis engine selection: selecting proper speech synthesis engines, wherein the engines can generate natural and smooth speech based on different technologies and models;
acoustic model generation: generating an acoustic model using the selected speech synthesis engine, the model modeling a mapping relationship of text and sound; the acoustic model may be statistical-based or neural network-based;
pronunciation rules and speech library: in text-to-speech synthesis, pronunciation rules and a speech library need to be considered to ensure that the synthesized speech pronounces accurately and naturally; pronunciation rules may include pronunciation specifications for specific vocabulary, phonemes, accents, etc.;
setting synthesis parameters: setting synthesis parameters such as speech speed, tone, emotion and the like; these parameters can be adjusted according to the specific scene so that the synthesized voice more meets the requirements of the audience;
and (3) speech synthesis generation: inputting the translation text of the target language into a voice synthesis engine, and generating corresponding voice by the engine according to the acoustic model, the pronunciation rules and the parameters;
post-treatment: the synthesized speech may need post-processing to improve its quality and naturalness; post-processing may include smoothing of audio, denoising, etc.;
outputting voice data: finally, the text synthesis module outputs synthesized voice data, and the data can be transmitted to listeners to realize voice playing of the translated text.
And the voice output module is used for: the module delivers the synthesized voice data to the listener through a speaker or other audio output device so that the listener can hear the translation result;
audio transmission: the synthesized voice data needs to be sent to a loudspeaker or an audio output device through a proper audio transmission mode; this may be a wired or wireless transmission, such as audio cable, bluetooth, wi-Fi, etc.;
audio playback apparatus: selecting an appropriate audio playback device, such as a speaker, headphones, etc., to ensure that the listener can hear the resulting speech;
audio play control: controlling the operations of starting, pausing, stopping and the like of audio playing, and ensuring that synthesized voice is played at a proper time;
volume control: controlling the volume of the audio so that the voice of the translation result can be transmitted to the audience with proper volume, and avoiding the volume of the audio being too large or too small;
and (3) optimizing sound quality: for some special scenarios, it may be desirable to optimize sound quality, such as removing noise, adjusting timbre, etc., to provide a better hearing experience.
A user interface module: the prompt information is displayed to the user, the module provides a friendly interface for the user so that the user can operate and control the functions of the system, and the user interface can be a graphical interface, a voice interaction interface or other forms;
interface design: designing a user-friendly interface, and considering elements such as layout, colors, icons and the like, so that a user can intuitively understand the functions and operations of the interface;
and (3) interactive design: designing an interaction mode of a user and an interface, wherein the interaction mode comprises interaction elements such as buttons, text boxes, sliding bars and the like, so that the user can conveniently interact with the system;
graphical interface: if a graphical interface is adopted, visual presentation of the interface and interaction with a user are required to be realized; the user can interact with the system by clicking a button, filling in a text and the like;
voice interaction interface: for some scenarios, the user may prefer to interact through speech; the voice interaction interface can receive voice instructions of the user, recognize the intention of the user and execute corresponding operations;
feedback and cues: the user interface needs to be able to provide timely feedback and prompts to the user, informing the user of information such as whether the system is processing, operating is successful or not;
and (3) function control: the user interface allows the user to control various functions of the system, such as initiating speech recognition, starting translation, adjusting volume, etc.;
language selection: in a multi-language environment, the user interface may provide language selection functionality for a user to select a source language and a target language;
setting options: providing some configurable setting options, and enabling the user to adjust parameters of the system according to own requirements.
According to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.
Example 2: the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, evaluates whether the current translation quality is qualified or not, and sends an evaluation result to the regulation and control module.
The understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation;
the mass analysis model establishment comprises the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
in the formula, zqy is the correct recognition rate of voice, xsf is the similarity index of the translation result, jpf is the word level matching degree, dbw is the network packet loss rate during translation, and α, β, γ and δ are the correct recognition rate of voice, the similarity index of the translation result, the word level matching degree and the proportionality coefficient of the network packet loss rate during translation, respectively, and α, β, γ and δ are all larger than 0.
After the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
After the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
According to the application, after the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation are obtained through the quality evaluation module, whether the current translation quality is qualified or not is evaluated based on the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the analysis is more comprehensive, and the data processing efficiency is effectively improved.
In the application, the following components are added:
the calculation expression of the correct recognition rate of the voice is as follows:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, cw is the voice unrecognizable word number, and the greater the voice correct recognition rate is, the higher the voice recognition accuracy of the translation system to the speaker is, the translation system is shown:
1) More accurate translation basis: the translation system will accurately convert the speaker's words into text, which provides accurate input for subsequent translation steps;
2) More accurate translation: accurate speech recognition can help the translation system to better understand the intent and content of the presenter, thereby generating more accurate translation results;
3) Reducing misunderstandings and ambiguities: the voice recognition accuracy is high, misunderstanding and ambiguity caused by a wrong recognition result can be reduced, and the meaning of a presenter can be accurately conveyed by translation content;
4) Translation efficiency is improved: high accuracy speech recognition can reduce the work of correction and revision of the translator, thereby improving the translation efficiency.
The calculation expression of the translation result similarity index is as follows:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total N-gram number in the candidate texts;
the specific logic is as follows: for each ofCalculating the sum of the similarity between the candidate text and the n-gram in all the reference texts, summing and averaging the similarity, and dividing the sum by the total number of n-gram in the candidate texts;
the larger the translation result similarity index, the higher the similarity between the translation system and the reference text, i.e. the better the consistency between the candidate translation and the multiple reference translations.
The word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
wherein P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter for balancing the exact match rate and the recall rate;
wherein, the calculation expression of the F1 score F is as follows:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation;
τ is a parameter for balancing the exact match rate and recall rate, typically ranging from 0 to 1, considering only the exact match rate when τ is 0; when tau is 1, only the recall rate is considered, and the importance of the exact matching rate and the recall rate can be balanced according to specific requirements and scenes by adjusting the value of tau;
the greater the word level matching degree, the higher the translation quality of the translation system, and the better the translation system is in terms of word matching, fluency, semantic consistency and the like.
The calculation expression of the network packet loss rate during translation is as follows:
in the formula, dsb is the number of lost data packets, zfb is the total number of data packets sent, the number of lost data packets refers to the number of data packets which fail to reach a destination in the transmission process, the total number of data packets sent in the transmission process refers to the total number of data packets sent in the transmission process, the network packet loss rate during translation represents the proportion of the lost data packets in the data transmission process, and the high packet loss rate can cause interruption and distortion of voice transmission, so that the translation quality is affected.
And when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module.
And the secondary optimizing module selects other translators to translate the acquired content for multiple times again, if the re-translating quality is continuously more than twice and is unqualified, the speaker is prompted to re-input the voice, if the translating quality is qualified, the translating data is sent to the text synthesizing module, and the prompting information is sent to the user interface module.
The secondary optimizing module selects other translators to translate the acquired content for a plurality of times again, evaluates whether the translation quality is qualified or not based on the quality analysis model, and if the re-translation quality is continuously more than twice unqualified, the secondary optimizing module indicates that voice input errors or network influences possibly exist, so that a presenter needs to be prompted to re-input voice.
The concurrent translation terminal is used for operating the concurrent translation system.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. The preferred embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. A simultaneous translation system, characterized by: the system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;
a voice input module: converting speech input of a presenter into digitized speech data;
and a voice recognition module: converting the voice data into a text form;
semantic understanding module: semantic analysis and understanding are carried out on the identified text, and the intention of a lecturer and expressed content are obtained;
and a translation module: converting the text of the source language into the text of the target language, and performing language translation;
the quality evaluation module: comprehensively analyzing the understanding data and the translation data based on the quality analysis model, and evaluating whether the current translation quality is qualified;
and a regulation and control module: when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module;
and a secondary optimization module: repeatedly translating the acquired content, prompting a speaker to re-input voice if the re-translating quality is continuously more than twice and is unqualified, and transmitting translation data to a text synthesis module if the translating quality is qualified;
and a text synthesis module: converting the translated target language text into voice data;
and the voice output module is used for: delivering the synthesized voice data to a listener through an audio output device;
a user interface module: and displaying the prompt information to a user.
2. The concurrent translation system according to claim 1, wherein: the understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation.
3. A concurrent translation system according to claim 2 wherein: the mass analysis model establishment comprises the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
wherein zqy is a voice correct recognition rate, xsf is a translation result similarity index, jpf is word level matching degree, dbw is a network packet loss rate during translation, and α, β, γ, δ are respectively the voice correct recognition rate, the translation result similarity index, the word level matching degree, and the proportionality coefficient of the network packet loss rate during translation, and α, β, γ, δ are all greater than 0;
after the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
4. A concurrent translation system according to claim 3, wherein: after the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
5. The concurrent translation system according to claim 4, wherein: the calculation expression of the voice correct recognition rate is as follows:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, and cw is the voice unrecognizable word number.
6. The concurrent translation system according to claim 5, wherein: the calculation expression of the translation result similarity index is as follows:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Between (a) and (b)Similarity, typically calculated using BLEU or other similarity measure between N-grams, M is the number of reference texts and N is the total N-gram number in the candidate texts.
7. The concurrent translation system according to claim 6, wherein: the word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
where P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter that balances the exact match rate and the recall rate.
8. The concurrent translation system according to claim 7, wherein: the calculation expression of the F1 score F is as follows:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation.
9. The concurrent translation system according to claim 8, wherein: the calculation expression of the network packet loss rate during translation is as follows:
where dsb is the number of packets lost and zfb is the total number of packets sent.
10. The simultaneous transmission translation terminal is characterized in that: a concurrent translation system as claimed in any one of claims 1 to 9.
CN202311024945.2A 2023-08-15 2023-08-15 A simultaneous translation system and simultaneous translation terminal Active CN116933806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311024945.2A CN116933806B (en) 2023-08-15 2023-08-15 A simultaneous translation system and simultaneous translation terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311024945.2A CN116933806B (en) 2023-08-15 2023-08-15 A simultaneous translation system and simultaneous translation terminal

Publications (2)

Publication Number Publication Date
CN116933806A true CN116933806A (en) 2023-10-24
CN116933806B CN116933806B (en) 2025-04-15

Family

ID=88375390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311024945.2A Active CN116933806B (en) 2023-08-15 2023-08-15 A simultaneous translation system and simultaneous translation terminal

Country Status (1)

Country Link
CN (1) CN116933806B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 A voice cloning method for translation headphones
CN118378640A (en) * 2024-06-26 2024-07-23 临沂大学 A language translation method and system based on big data
CN118468898A (en) * 2024-07-11 2024-08-09 北京蜂巢世纪科技有限公司 Translation method, translation device, wearable device, terminal device and readable storage medium
CN118586408A (en) * 2024-08-02 2024-09-03 临沂大学 A corpus-based folk vocabulary translation system and method
CN119204030A (en) * 2024-11-25 2024-12-27 临沂大学 A speech translation method and device for resolving speech ambiguity
CN119380721A (en) * 2024-11-20 2025-01-28 深圳市聚云物联有限公司 A global intercom translation system and intercom translation terminal based on public network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739867A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for scoring interpretation quality by using computer
CN106486125A (en) * 2016-09-29 2017-03-08 安徽声讯信息技术有限公司 A kind of simultaneous interpretation system based on speech recognition technology
US20170132217A1 (en) * 2015-11-06 2017-05-11 Samsung Electronics Co., Ltd. Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model
US20180373706A1 (en) * 2017-06-21 2018-12-27 Sap Se Assessing translation quality
CN110401671A (en) * 2019-08-06 2019-11-01 董玉霞 Terminal is translated in a kind of simultaneous interpretation translation system and simultaneous interpretation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739867A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for scoring interpretation quality by using computer
US20170132217A1 (en) * 2015-11-06 2017-05-11 Samsung Electronics Co., Ltd. Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model
CN106486125A (en) * 2016-09-29 2017-03-08 安徽声讯信息技术有限公司 A kind of simultaneous interpretation system based on speech recognition technology
US20180373706A1 (en) * 2017-06-21 2018-12-27 Sap Se Assessing translation quality
CN110401671A (en) * 2019-08-06 2019-11-01 董玉霞 Terminal is translated in a kind of simultaneous interpretation translation system and simultaneous interpretation

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 A voice cloning method for translation headphones
CN117275455B (en) * 2023-11-22 2024-02-13 深圳市阳日电子有限公司 Sound cloning method for translation earphone
CN118378640A (en) * 2024-06-26 2024-07-23 临沂大学 A language translation method and system based on big data
CN118468898A (en) * 2024-07-11 2024-08-09 北京蜂巢世纪科技有限公司 Translation method, translation device, wearable device, terminal device and readable storage medium
CN118468898B (en) * 2024-07-11 2024-10-22 北京蜂巢世纪科技有限公司 Translation method, translation device, wearable device, terminal device and readable storage medium
CN118586408A (en) * 2024-08-02 2024-09-03 临沂大学 A corpus-based folk vocabulary translation system and method
CN119380721A (en) * 2024-11-20 2025-01-28 深圳市聚云物联有限公司 A global intercom translation system and intercom translation terminal based on public network
CN119380721B (en) * 2024-11-20 2026-01-09 深圳市聚云物联有限公司 A global walkie-talkie translation system and terminal based on a public network
CN119204030A (en) * 2024-11-25 2024-12-27 临沂大学 A speech translation method and device for resolving speech ambiguity

Also Published As

Publication number Publication date
CN116933806B (en) 2025-04-15

Similar Documents

Publication Publication Date Title
CN116933806B (en) A simultaneous translation system and simultaneous translation terminal
US12327091B2 (en) System and method for direct speech translation system
WO2019165748A1 (en) Speech translation method and apparatus
KR20170103209A (en) Simultaneous interpretation system for generating a synthesized voice similar to the native talker's voice and method thereof
JP6471074B2 (en) Machine translation apparatus, method and program
US11295730B1 (en) Using phonetic variants in a local context to improve natural language understanding
CN110600013B (en) Non-parallel corpus voice conversion data augmentation model training method and device
CN101154221A (en) Means for performing input speech translation processing
CN111489752B (en) Speech output method, device, electronic equipment and computer-readable storage medium
US20240274122A1 (en) Speech translation with performance characteristics
JP6580281B1 (en) Translation apparatus, translation method, and translation program
KR102062524B1 (en) Voice recognition and translation method and, apparatus and server therefor
AU2022203531B1 (en) Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore
JP2022111977A (en) Voice recognition system and method
KR102637025B1 (en) Multilingual rescoring models for automatic speech recognition
US9218807B2 (en) Calibration of a speech recognition engine using validated text
TWI467566B (en) Polyglot speech synthesis method
CN119920244B (en) An intelligent real-time language synchronous translation system and terminal thereof
US20240420680A1 (en) Simultaneous and multimodal rendering of abridged and non-abridged translations
WO2024167660A1 (en) Speech translation with performance characteristics
JP2001117752A (en) Information processing apparatus, information processing method, and recording medium
CN114519358A (en) Translation quality evaluation method and device, electronic equipment and storage medium
KR102891902B1 (en) LEARNING METHOD OF TTS(Text-To-Speech) MODEL, TTS DEVICE AND METHOD OF PROVIDING TTS SERVICE USING TTS DEVICE
US20250336396A1 (en) Transcription generation
Kurapati et al. Improving Multilingual Speech Recognition for Cognitive Voice Interfaces Using Real Code-Switching Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant