[go: up one dir, main page]

CN117119108A - Simultaneous interpretation system, method, apparatus, and readable storage medium - Google Patents

Simultaneous interpretation system, method, apparatus, and readable storage medium Download PDF

Info

Publication number
CN117119108A
CN117119108A CN202310919856.8A CN202310919856A CN117119108A CN 117119108 A CN117119108 A CN 117119108A CN 202310919856 A CN202310919856 A CN 202310919856A CN 117119108 A CN117119108 A CN 117119108A
Authority
CN
China
Prior art keywords
audio data
simultaneous interpretation
target
target audio
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310919856.8A
Other languages
Chinese (zh)
Inventor
王恒
郭远林
黄梓华
何永华
欧喜鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou DSPPA Audio Co Ltd
Original Assignee
Guangzhou DSPPA Audio Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou DSPPA Audio Co Ltd filed Critical Guangzhou DSPPA Audio Co Ltd
Priority to CN202310919856.8A priority Critical patent/CN117119108A/en
Publication of CN117119108A publication Critical patent/CN117119108A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a simultaneous interpretation system, a method, a device and a readable storage medium, wherein the simultaneous interpretation system provided by the embodiment of the application comprises the following components: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone; the simultaneous interpretation server is connected with the cloud platform server; the cloud platform server is connected with each simultaneous interpretation host; each simultaneous interpretation host is connected with one conference host, and each conference host is connected with at least one microphone, so that the simultaneous interpretation system can be connected with at least two conference hosts simultaneously, can synchronously simultaneous interpret audio data collected by a plurality of conference hosts, can effectively realize simultaneous interpretation service of multiple persons in a conference, can effectively shorten simultaneous interpretation time delay, can effectively provide simultaneous interpretation service of different languages for multiple persons, and effectively improves simultaneous interpretation service quality and efficiency of the conference.

Description

Simultaneous interpretation system, method, apparatus, and readable storage medium
Technical Field
The present application relates to the field of simultaneous interpretation technologies, and in particular, to a simultaneous interpretation system, method, apparatus, and readable storage medium.
Background
Along with the development of scientific technology, the simultaneous interpretation technology develops better and better, a plurality of simultaneous interpretation products are derived, based on intelligent voice and language technology, common simultaneous interpretation products can provide real-time translations of different languages in a whole scene, standard services and enterprise services can be provided for scenes such as conferences, exhibitions and offices, a multi-level ecological service system is constructed, and integrated simultaneous transmission services such as real-time translation and on-screen display of multiple languages, simultaneous transmission of multiple languages, multiple language voice synthesis, conference shorthand, conference record sharing and the like are provided through related products of simultaneous interpretation. For example, some common contemporaneous interpretation products may support closed captioning, mobile terminal live streaming, custom optimization, and enhanced contemporaneous interpretation services more resembling the form of a lecture.
However, when the conventional simultaneous interpretation product is applied to a multi-person conference, only simultaneous interpretation service of one person of a talker can be generally provided, other conference participants can hardly use the service, audio multiplexing cannot be realized, and the number of languages that the conventional simultaneous interpretation product can support interpretation is limited, so that the simultaneous interpretation service requirement of the multi-person conference cannot be met.
Disclosure of Invention
The present application is directed to at least one of the above-mentioned drawbacks, and accordingly, the present application provides a simultaneous interpretation system, method, apparatus and readable storage medium for solving the technical defect that the simultaneous interpretation service in the prior art cannot meet the simultaneous interpretation requirement of a multi-person conference.
A simultaneous interpretation system, the system comprising: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone;
wherein,
the simultaneous interpretation server is connected with the cloud platform server;
the cloud platform server is connected with each simultaneous interpretation host;
each simultaneous interpretation host is connected with one conference host, and each conference host is connected with at least one microphone;
each conference host collects target audio data of a speaker through each microphone connected with the conference host;
each conference host transmits the target audio data to a simultaneous interpretation host connected with the conference host;
each simultaneous interpretation host transmits the target audio data to the cloud platform server;
after receiving the target audio data, the cloud platform server initiates a target access request to the simultaneous interpretation server, wherein the target access request comprises target information;
The simultaneous interpretation server receives the target access request, analyzes the target access request, returns response information to the cloud platform server, and establishes a target connection channel with the cloud platform server;
the cloud platform server sends the target audio data to the simultaneous interpretation server through the target connection channel;
after receiving the target audio data, the simultaneous interpretation server processes the target audio data and then sends a interpretation result corresponding to the target audio data to the cloud platform server;
after receiving the translation result of the target audio data, the cloud platform server sends the translation result to a simultaneous interpretation host corresponding to the translation result;
after receiving the translation result of the target audio data, the simultaneous interpretation host corresponding to the target audio data displays the text translation result in the translation result of the target audio data, and forwards the audio translation result in the translation result of the target audio data to a conference host connected with the simultaneous interpretation host;
and the conference host corresponding to the target audio data plays the audio translation result in the translation results of the target audio data.
Preferably, the cloud platform server includes a target component, based on which each simultaneous interpretation host establishes two sockets with the target component in the cloud platform server, wherein one socket is used for transmitting text information, and the other socket is used for transmitting an audio stream in a first target format.
Preferably, when a thread in the cloud platform server reads and writes data from a socket channel of each conference host, if no data is found to be readable, the thread processes other tasks.
Preferably, the process of sending the target audio data to the simultaneous interpretation server by the cloud platform server through the target connection channel includes:
the cloud platform server sends simultaneous interpretation data packets corresponding to the target audio data to the simultaneous interpretation server through the target connection channel;
the simultaneous interpretation data packet comprises a start frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and voice synthesis result configuration information of the target audio data.
Preferably, the process of sending the translation result corresponding to the target audio data to the cloud platform server after processing the target audio data after the simultaneous interpretation server receives the target audio data includes:
After receiving the simultaneous interpretation data packet corresponding to the target audio data, the simultaneous interpretation server identifies a start frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and configuration information of a voice synthesis result of the target audio data;
the simultaneous interpretation server recalls a text translation result of the target audio data and an audio translation result of a second target format according to the starting frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and voice synthesis result configuration information of the target audio data;
and the simultaneous interpretation server sends the text translation result of the target audio data and the audio translation result in the second target format to the cloud platform server.
Preferably, the process of receiving the target access request and analyzing the target access request by the simultaneous interpretation server, returning response information to the cloud platform server, and establishing a target connection channel with the cloud platform server includes:
the simultaneous interpretation server receives the target access request;
The simultaneous interpretation server analyzes target information in the target access request;
the simultaneous interpretation server upgrades the current connection mode with the cloud platform server into a target connection mode according to the target information;
and the simultaneous interpretation server returns response information about the connection mode with the cloud platform server to the cloud platform server, and establishes a target connection channel with the cloud platform server according to the target connection mode.
Preferably, the conference host supports at least two different language interpretations, the conference host providing at least two language modes for user selection.
A simultaneous interpretation method applied to the simultaneous interpretation system as claimed in any preceding introduction, the method comprising:
collecting target audio data of a speaker using a microphone;
transmitting the target audio data to a conference host connected to the microphone;
transmitting the target audio data to a simultaneous interpretation host connected with the conference host through the conference host;
transmitting the target audio data to a cloud platform server connected with the simultaneous interpretation host through the simultaneous interpretation host;
Invoking the cloud platform server to initiate a target access request to the simultaneous interpretation server, wherein the target access request comprises target information;
receiving the target access request through the simultaneous interpretation server, analyzing the target access request, returning response information to the cloud platform server, and establishing a target connection channel for the simultaneous interpretation server and the cloud platform server so that the cloud platform server can send the target audio data to the simultaneous interpretation server through the target connection channel;
processing the target audio data by using the simultaneous interpretation server, and sending a translation result of the simultaneous interpretation server and the target audio data to the cloud platform server;
receiving a translation result of the target audio data sent by the simultaneous interpretation server through the cloud platform server, and sending the translation result to a simultaneous interpretation host corresponding to the target audio data;
receiving a translation result of the target audio data by using a simultaneous interpretation host corresponding to the target audio data, displaying a text translation result in the translation result of the target audio data, and forwarding the audio translation result in the translation result of the target audio data to a conference host connected with the translation result;
And playing the audio translation result in the translation results of the target audio data by utilizing the conference host corresponding to the target audio data.
A contemporaneous interpretation device, comprising: one or more processors, and memory;
the memory has stored therein computer readable instructions which, when executed by the one or more processors, implement the steps of the simultaneous interpretation method as described in the foregoing description.
A readable storage medium: the readable storage medium has stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to implement the steps of the simultaneous interpretation method as described in the preceding description.
According to the technical scheme, when the simultaneous interpretation service entity of different languages is required to be provided for a plurality of persons in a multi-person conference, the simultaneous interpretation system can be provided, and the simultaneous interpretation system can comprise: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone; the simultaneous interpretation server can be connected with the cloud platform server; the cloud platform server can be connected with each simultaneous interpretation host; each simultaneous interpretation host may be connected to one conference host, and each conference host may be connected to at least one microphone; in the actual application process, each conference host can collect target audio data of different speakers through each microphone connected with the conference host; each conference host may also transmit the collected target audio data to the contemporaneous interpretation host connected thereto after collecting the target audio data of different speakers; after receiving the target audio data sent by the conference host connected with the simultaneous interpretation host, each simultaneous interpretation host can further transmit the target audio data to the cloud platform server; the cloud platform server can further initiate a target access request to the simultaneous interpretation server after receiving the target audio data sent by the cloud platform server connected with the cloud platform server, wherein the target access request can comprise target information; through the target information, the simultaneous interpretation server can analyze the target access request and return response information to the cloud platform server when receiving the target access request, and can establish a target connection channel with the cloud platform server; so that the cloud platform server can send the received target audio data to the simultaneous interpretation server through the target connection channel; the simultaneous interpretation server can process the target audio data after receiving the target audio data sent by the cloud platform server, so that a translation result corresponding to the target audio data can be obtained, and the translation result corresponding to the target audio data is sent to the cloud platform server after processing the target audio data; after receiving the translation result of the target audio data, the cloud platform server can send the translation result of the target audio data to the simultaneous interpretation host corresponding to the translation result; after receiving the translation result of the target audio data, the simultaneous interpretation host corresponding to the target audio data can further display a text translation result in the translation result of the target audio data on the simultaneous interpretation host, and can forward the audio translation result in the translation result of the target audio data to a conference host connected with the simultaneous interpretation host; so that the conference host corresponding to the target audio data can play the audio translation result in the translation results of the target audio data.
According to the technical scheme, the simultaneous interpretation system provided by the embodiment of the application can be connected with at least two conference hosts at the same time, can synchronously and simultaneously interpret audio data collected by a plurality of conference hosts, can effectively realize simultaneous interpretation service of multiple persons in a conference, can effectively shorten simultaneous interpretation time delay, can effectively provide simultaneous interpretation service of different languages for multiple persons, and effectively improves simultaneous interpretation service quality and efficiency of the conference.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a schematic diagram of a system architecture for implementing a multi-user simultaneous interpretation service according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating connection between a platform server and a simultaneous interpretation server according to an embodiment of the present application;
Fig. 3 is a connection schematic diagram of a netty component according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for implementing simultaneous interpretation according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a simultaneous interpretation device according to an embodiment of the present application;
fig. 6 is a block diagram of a hardware structure of a simultaneous interpretation device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In view of the fact that most simultaneous interpretation schemes are difficult to adapt to complex and changeable business requirements at present, the inventor researches a simultaneous interpretation scheme, and the simultaneous interpretation scheme can be simultaneously connected with at least two conference hosts, can synchronously interpret audio data collected by a plurality of conference hosts, can effectively realize simultaneous interpretation service of multiple persons in a conference, can effectively shorten simultaneous interpretation time delay, can effectively provide simultaneous interpretation service of different languages for multiple persons, and effectively improves simultaneous interpretation service quality and efficiency of the conference.
The methods provided by embodiments of the present application may be used in a number of general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The embodiment of the application provides a simultaneous interpretation scheme which can be applied to various simultaneous interpretation systems or conference interpretation systems, and can also be applied to various computer terminals or intelligent terminals, wherein an execution subject can be a processor or a server of the computer terminal or the intelligent terminal.
An alternative system architecture for implementing a multi-person simultaneous interpretation service according to an embodiment of the present application is described below with reference to fig. 1, where, as shown in fig. 1, the simultaneous interpretation system may include: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone;
wherein,
the simultaneous interpretation server can be connected with the cloud platform server;
wherein,
FIG. 2 illustrates a schematic diagram of a connection of a platform server to a simultaneous interpretation server;
the cloud platform server can be connected with each simultaneous interpretation host;
Each simultaneous interpretation host can be connected with one conference host, and each conference host can be connected with at least one microphone;
each conference host can collect target audio data of different speakers through each microphone connected with the conference host;
each conference host may pass the target audio data to the contemporaneous interpretation host connected thereto;
each simultaneous interpretation host can transmit target audio data to the cloud platform server;
the cloud platform server can initiate a target access request to the simultaneous interpretation server after receiving the target audio data;
wherein,
the target access request may include target information;
for example, the number of the cells to be processed,
the target access request may be an HTTP access request;
wherein,
the HTTP access request contains special 'Upgrade: webSocket' information, which can indicate that the HTTP access request is an access request updated from HTTP access request to WebSocket.
Wherein,
HTTP access request refers to a request message from a client to a server.
HTTP access requests may generally include: a method for requesting a resource in a message header line, an identifier of the resource, and a protocol used.
WebSocket is an application layer protocol that is defined on top of the TCP/IP protocol stack, the URI of the WebSocket connection server usually starts with either "ws" or "wss", the default for ws is TCP port 80, and the default for ws is 443.
The WebSocket protocol is defined in RFC6455, and is divided into two parts: a handshake phase and a data communication phase.
WebSocket is based on the HTTP protocol or borrows the HTTP protocol to complete a portion of the handshake.
The initial handshake phase is the HTTP protocol, and after the handshake phase is completed, the WebSocket protocol can be switched to and completely separated from the HTTP protocol.
When the WebSocket connection is used for establishing communication, a client actively initiates a connection request, and a server passively monitors.
For example, in the embodiment of the present application, the cloud platform server initiates a connection request to the simultaneous interpretation server actively, and the simultaneous interpretation server monitors passively.
Once the connection is established, the communication is in "full duplex" mode, i.e., both the server and client are free to send data at any time.
Therefore, the WebSocket connection is particularly suitable for a service scene that the server actively pushes real-time data to the client. The interactive mode is no longer a "request-reply" mode.
The data communicated may be frame-based data, may be text data, or may be binary data. The WebSocket connection has higher and more stable data transmission efficiency.
Therefore, the WebSocket connection can be established between the cloud platform server and the simultaneous interpretation server, so that data can be efficiently transmitted between the cloud platform server and the simultaneous interpretation server.
After receiving the target access request, the simultaneous interpretation server can analyze the target access request, and after analyzing the target access request, returns response information to the cloud platform server, and can establish a target connection channel with the cloud platform server;
after the cloud platform server establishes a target connection channel with the simultaneous interpretation server, the cloud platform server can further efficiently send target audio data to the simultaneous interpretation server through the target connection channel;
the simultaneous interpretation server can process the target audio data after receiving the target audio data, so that a translation result of the target audio data can be obtained, and after the translation result of the target audio data is obtained, the translation result corresponding to the target audio data can be sent to the cloud platform server;
after receiving the translation result of the target audio data, the cloud platform server may retransmit the translation result of the target audio data to the corresponding simultaneous interpretation host.
Wherein,
the translation result of the target audio data may include a text translation result and an audio translation result.
After receiving the translation result of the target audio data, the simultaneous interpretation host corresponding to the target audio data can display text translation results in the translation result of the target audio data and forward the audio translation result in the translation result of the target audio data to a conference host connected with the simultaneous interpretation host; so that the conference host corresponding to the target audio data can play the audio translation result among the translation results of the target audio data.
In the actual application process, the cloud platform server can comprise a target component;
wherein,
the target component may be a netty component.
The netty component is an asynchronous event-driven-based network application framework component, can be used for rapidly developing a high-performance and high-reliability network IO program, and is the most popular NIO framework component at present.
The netty component can be suitable for unified API blocking and non-blocking sockets of various transmission types; based on a flexible and extensible event model, the netty component can clearly separate the points of interest; may be used for a highly customizable thread model-single thread, one or more thread pools.
Compared with the traditional IO program, the netty component has higher performance, higher throughput and lower delay; the netty component can be beneficial to reducing resource consumption; unnecessary memory duplication is minimized.
Compared with the traditional IO program, the netty component can obtain complete SSL/TLS and StartTLS support, and is beneficial to improving safety.
Based on the advantages of the netty component, the thread which has higher performance, higher safety and convenient management can be independently created for the connection of a plurality of conference hosts by using the netty component in the embodiment of the application.
Based on this, each simultaneous interpretation host can establish two sockets with the target component in the cloud platform server, wherein one socket is used for transmitting text information, and the other socket is used for transmitting the audio stream in the first target format.
Wherein,
the audio stream of the first target format may be an audio stream of a pcm format;
by Socket, it is meant an abstraction of endpoints that communicate bi-directionally between application processes on different hosts in a network. A socket is the end of the network where processes communicate and may provide a mechanism for application layer processes to exchange data using network protocols.
In terms of the position of a Socket, the Socket is connected with an application process in an upper mode, and a lower network protocol stack is an interface for an application program to communicate through a network protocol and an interface for the application program to interact with the network protocol stack.
For example, the number of the cells to be processed,
the simultaneous interpretation host can establish two TCP sockets with a netty component in the cloud platform server, and the simultaneous interpretation host can be used for transmitting text information and pcm-format audio streams respectively.
Further, when a thread in the cloud platform server reads and writes data from the socket channels of each conference host, if the cloud platform server finds that no data is readable, the thread of the cloud platform server can process other tasks first.
For example, the number of the cells to be processed,
FIG. 3 illustrates a connection diagram of a netty component.
As shown in FIG. 3, the netty IO thread NioEventLoop, due to the aggregation of multiplexer selectors, can handle hundreds to thousands of conference host connections simultaneously.
When a thread reads and writes data from a conference host Socket channel, if no data is found to be available, the thread can perform other tasks.
In practice, threads typically use the idle time of non-blocking IOs for performing IO operations on other lanes, so that a single thread can manage multiple input and output lanes.
Further, the method comprises the steps of,
as can be seen from the above description, in the system provided by the embodiment of the present application, the cloud platform server may send the target audio data to the simultaneous interpretation server through the target connection channel, and the following process may be described, where the process may include:
the cloud platform server sends simultaneous interpretation data packets corresponding to the target audio data to the simultaneous interpretation server through the target connection channel;
wherein,
the contemporaneous interpretation data packet may include a start frame of the target audio data, a language of the target audio data, an audio sample rate of the target audio data, and speech synthesis result configuration information of the target audio data.
Further, the simultaneous interpretation server in the system provided by the embodiment of the application can further process the target audio data after receiving the target audio data, and send the interpretation result corresponding to the target audio data to the cloud platform server after processing the target audio data, and then introduce the process, and the process can include the following steps:
step S101, after receiving the simultaneous interpretation data packet corresponding to the target audio data, the simultaneous interpretation server identifies a start frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and configuration information of a speech synthesis result of the target audio data;
Step S102, the simultaneous interpretation server can call back text translation results of the target audio data and audio translation results of the second target format according to the start frame of the target audio data, languages of the target audio data, audio sampling rate of the target audio data and voice synthesis result configuration information of the target audio data;
wherein,
the audio translation result in the second target format may be an audio translation result in MP3 format.
For example, the number of the cells to be processed,
the cloud platform server may send the pcm format audio stream sent by the conference host to the contemporaneous interpretation server after receiving it. After processing by the simultaneous interpretation server, the text translation result in the audio stream in the pcm format and the audio translation result in the MP3 format can be called back through an onMessage reloading method respectively.
In step S103, the simultaneous interpretation server may send the text translation result of the target audio data and the audio translation result in the second target format to the cloud platform server.
For example, the number of the cells to be processed,
the cloud platform server can transmit the text translation callback result and the MP3 format audio translation result through TCP Socket connection, and finally, the text translation callback result and the MP3 format audio translation result are displayed on software on simultaneous interpretation and audio is played on a conference host.
Further, the simultaneous interpretation server of the system provided by the embodiment of the application can also receive the target access request, can analyze the received target access request, and after analyzing the received target access request, returns response information to the cloud platform server, and establishes a target connection channel with the cloud platform server, and the process is introduced, and can include the following steps:
in step S201, the contemporaneous interpretation server receives a target access request.
Wherein,
the target access request may be an HTTP access request.
In step S202, the contemporaneous interpretation server parses the target information in the target access request.
Wherein,
the target information may be special "Upgrade: webSocket" information contained in the HTTP access request, which may indicate that this is an access request that is upgraded from the HTTP access request to WebSocket.
And step S203, the simultaneous interpretation server upgrades the current connection mode with the cloud platform server into a target connection mode according to the target information.
Wherein,
the target connection mode may be WebSocket connection mode.
Step S204, the simultaneous interpretation server returns response information about the connection mode with the cloud platform server to the cloud platform server, and establishes a target connection channel with the cloud platform server according to the target connection mode.
For example, the number of the cells to be processed,
as shown in fig. 2, the cloud platform server initiates an HTTP access request to the contemporaneous translating server, where the access request header includes special "Upgrade: webSocket" information, which indicates that the request is a request from HTTP access to WebSocket, and the contemporaneous translating server may return response information to the cloud platform server after resolving the HTTP access request, and establish a WebSocket connection channel with the cloud platform server.
Furthermore, the conference host of the system provided by the embodiment of the application can support at least two different language interpretation, each conference host can provide at least two language modes for users to select, and the users can select favorite language modes according to personal preference.
Compared with the prior art, the scheme provided by the embodiment of the application has the advantages that the simultaneous interpretation speed is high, and the interpretation result can be controlled within 2 to 3 seconds after the delay; different serial ports and a multithreading mode are used, so that the effects of meeting of multiple people and simultaneous meeting of multiple concurrences of at most 10 meeting hosts can be achieved; the interpretation result can also be directly displayed on the conference host; multiple language interpretation is supported, and languages can be selected through interface operations.
Next, a simultaneous interpretation method that can be applied to the simultaneous interpretation system described above is described with reference to fig. 4, and as shown in fig. 4, the method may include the following steps:
step S301, collecting target audio data of a speaker by using a microphone.
In step S302, the target audio data is transmitted to a conference host connected to the microphone.
In step S303, the target audio data is transferred to the contemporaneous interpretation host connected to the conference host by the conference host.
In step S304, the target audio data is transmitted to the cloud platform server connected to the simultaneous interpretation host through the simultaneous interpretation host.
In step S305, the cloud platform server is invoked to initiate a target access request to the simultaneous interpretation server, where the target access request includes target information.
Step S306, receiving the target access request through the simultaneous interpretation server, analyzing the target access request, returning response information to the cloud platform server, and establishing a target connection channel for the simultaneous interpretation server and the cloud platform server so that the cloud platform server can send target audio data to the simultaneous interpretation server through the target connection channel.
Step S307, processing the target audio data by using the simultaneous interpretation server, and transmitting the translation result of the simultaneous interpretation server and the target audio data to the cloud platform server;
Step S308, receiving a translation result of the target audio data sent by the simultaneous interpretation server through the cloud platform server, and sending the translation result to a simultaneous interpretation host corresponding to the target audio data;
step S309, receiving the translation result of the target audio data by the simultaneous interpretation host corresponding to the target audio data, displaying the text translation result in the translation result of the target audio data, and forwarding the audio translation result in the translation result of the target audio data to the conference host connected with the same.
Step S310, playing the audio translation result in the translation results of the target audio data by using the conference host corresponding to the target audio data.
According to the technical scheme, the simultaneous interpretation system provided by the embodiment of the application can be connected with at least two conference hosts at the same time, can synchronously and simultaneously interpret audio data collected by a plurality of conference hosts, can effectively realize simultaneous interpretation service of multiple persons in a conference, can effectively shorten simultaneous interpretation time delay, can effectively provide simultaneous interpretation service of different languages for multiple persons, and effectively improves simultaneous interpretation service quality and efficiency of the conference.
The specific process flow of the simultaneous interpretation method may be described with reference to the simultaneous interpretation system section, and will not be described herein.
The simultaneous interpretation device provided by the embodiment of the present application is described below, and the simultaneous interpretation device described below and the simultaneous interpretation method described above may be referred to correspondingly.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a simultaneous interpretation device according to an embodiment of the present application.
As shown in fig. 5, the simultaneous interpretation apparatus may include:
a first processing unit 101 for collecting target audio data of a speaker using a microphone;
a second processing unit 102, configured to transmit the target audio data to a conference host connected to the microphone;
a third processing unit 103, configured to transmit, by the conference host, the target audio data to a simultaneous interpretation host connected to the conference host;
a fourth processing unit 104, configured to transmit, by the simultaneous interpretation host, the target audio data to a cloud platform server connected to the simultaneous interpretation host;
a fifth processing unit 105, configured to invoke the cloud platform server to initiate a target access request to the simultaneous interpretation server, where the target access request includes target information;
The sixth processing unit 106 is configured to receive the target access request through the simultaneous interpretation server, analyze the target access request, then return response information to the cloud platform server, and establish a target connection channel for the simultaneous interpretation server and the cloud platform server, so that the cloud platform server sends the target audio data to the simultaneous interpretation server through the target connection channel;
a seventh processing unit 107, configured to process the target audio data with the simultaneous interpretation server, and send a translation result of the simultaneous interpretation server and the target audio data to the cloud platform server;
an eighth processing unit 108, configured to receive, by using the cloud platform server, a translation result of the target audio data sent by the simultaneous interpretation server, and send the translation result to a simultaneous interpretation host corresponding to the target audio data;
a ninth processing unit 109, configured to receive a translation result of the target audio data by using a contemporaneous interpretation host corresponding to the target audio data, display a text translation result in the translation result of the target audio data, and forward the audio translation result in the translation result of the target audio data to a conference host connected thereto;
The tenth processing unit 110 is configured to play an audio translation result in the translation results of the target audio data by using a conference host corresponding to the target audio data.
As can be seen from the above-described technical solutions, when it is required to provide a simultaneous interpretation service of different languages for multiple people in a multi-people conference, the embodiment of the present application may provide a simultaneous interpretation device, where the simultaneous interpretation device may include: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone; the simultaneous interpretation server can be connected with the cloud platform server; the cloud platform server can be connected with each simultaneous interpretation host; each simultaneous interpretation host may be connected to one conference host, and each conference host may be connected to at least one microphone; in the actual application process, each conference host can collect target audio data of different speakers through each microphone connected with the conference host; each conference host may also transmit the collected target audio data to the contemporaneous interpretation host connected thereto after collecting the target audio data of different speakers; after receiving the target audio data sent by the conference host connected with the simultaneous interpretation host, each simultaneous interpretation host can further transmit the target audio data to the cloud platform server; the cloud platform server can further initiate a target access request to the simultaneous interpretation server after receiving the target audio data sent by the cloud platform server connected with the cloud platform server, wherein the target access request can comprise target information; through the target information, the simultaneous interpretation server can analyze the target access request and return response information to the cloud platform server when receiving the target access request, and can establish a target connection channel with the cloud platform server; so that the cloud platform server can send the received target audio data to the simultaneous interpretation server through the target connection channel; the simultaneous interpretation server can process the target audio data after receiving the target audio data sent by the cloud platform server, so that a translation result corresponding to the target audio data can be obtained, and the translation result corresponding to the target audio data is sent to the cloud platform server after processing the target audio data; after receiving the translation result of the target audio data, the cloud platform server can send the translation result of the target audio data to the simultaneous interpretation host corresponding to the translation result; after receiving the translation result of the target audio data, the simultaneous interpretation host corresponding to the target audio data can further display a text translation result in the translation result of the target audio data on the simultaneous interpretation host, and can forward the audio translation result in the translation result of the target audio data to a conference host connected with the simultaneous interpretation host; so that the conference host corresponding to the target audio data can play the audio translation result in the translation results of the target audio data.
According to the technical scheme, the simultaneous interpretation device provided by the embodiment of the application can be connected with at least two conference hosts at the same time, can synchronously and simultaneously interpret audio data collected by a plurality of conference hosts, can effectively realize simultaneous interpretation service of multiple persons in a conference, can effectively shorten simultaneous interpretation time delay, can effectively provide simultaneous interpretation service of different languages for multiple persons, and effectively improves simultaneous interpretation service quality and efficiency of the conference.
The specific process flow of each unit included in the simultaneous interpretation device may be described with reference to the simultaneous interpretation method section, and will not be repeated here.
The simultaneous interpretation device provided by the embodiment of the application can be applied to simultaneous interpretation equipment, such as a terminal: cell phones, computers, etc. Alternatively, fig. 6 shows a block diagram of a hardware structure of the coaural device, and referring to fig. 6, the hardware structure of the coaural device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4.
In the embodiment of the present application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4.
Processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to: and realizing each processing flow in the terminal simultaneous interpretation scheme.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to: and realizing each processing flow of the terminal in the simultaneous interpretation scheme.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. The various embodiments may be combined with one another. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A simultaneous interpretation system, the system comprising: the system comprises a simultaneous interpretation server, a cloud platform server, at least one simultaneous interpretation host, at least one conference host and at least one microphone;
wherein,
the simultaneous interpretation server is connected with the cloud platform server;
the cloud platform server is connected with each simultaneous interpretation host;
Each simultaneous interpretation host is connected with one conference host, and each conference host is connected with at least one microphone;
each conference host collects target audio data of a speaker through each microphone connected with the conference host;
each conference host transmits the target audio data to a simultaneous interpretation host connected with the conference host;
each simultaneous interpretation host transmits the target audio data to the cloud platform server;
after receiving the target audio data, the cloud platform server initiates a target access request to the simultaneous interpretation server, wherein the target access request comprises target information;
the simultaneous interpretation server receives the target access request, analyzes the target access request, returns response information to the cloud platform server, and establishes a target connection channel with the cloud platform server;
the cloud platform server sends the target audio data to the simultaneous interpretation server through the target connection channel;
after receiving the target audio data, the simultaneous interpretation server processes the target audio data and then sends a interpretation result corresponding to the target audio data to the cloud platform server;
After receiving the translation result of the target audio data, the cloud platform server sends the translation result to a simultaneous interpretation host corresponding to the translation result;
after receiving the translation result of the target audio data, the simultaneous interpretation host corresponding to the target audio data displays the text translation result in the translation result of the target audio data, and forwards the audio translation result in the translation result of the target audio data to a conference host connected with the simultaneous interpretation host;
and the conference host corresponding to the target audio data plays the audio translation result in the translation results of the target audio data.
2. The system of claim 1, wherein the cloud platform server includes a target component based on which each of the contemporaneous hosts establishes two sockets with the target component in the cloud platform server, wherein one socket is for transmitting text information and the other socket is for transmitting an audio stream in a first target format.
3. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
when the thread in the cloud platform server reads and writes data from the socket channels of the conference hosts, if no data is found to be readable, the thread processes other tasks.
4. The system of claim 1, wherein the process of the cloud platform server sending the target audio data to the contemporaneous interpretation server over the target connection channel comprises:
the cloud platform server sends simultaneous interpretation data packets corresponding to the target audio data to the simultaneous interpretation server through the target connection channel;
the simultaneous interpretation data packet comprises a start frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and voice synthesis result configuration information of the target audio data.
5. The system of claim 4, wherein the process of processing the target audio data after the simultaneous interpretation server receives the target audio data and transmitting the translation result corresponding to the target audio data to the cloud platform server comprises:
after receiving the simultaneous interpretation data packet corresponding to the target audio data, the simultaneous interpretation server identifies a start frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and configuration information of a voice synthesis result of the target audio data;
The simultaneous interpretation server recalls a text translation result of the target audio data and an audio translation result of a second target format according to the starting frame of the target audio data, languages of the target audio data, an audio sampling rate of the target audio data and voice synthesis result configuration information of the target audio data;
and the simultaneous interpretation server sends the text translation result of the target audio data and the audio translation result in the second target format to the cloud platform server.
6. The system of claim 1, wherein the process of the simultaneous interpretation server receiving the target access request and analyzing the target access request, returning response information to the cloud platform server and establishing a target connection channel with the cloud platform server, comprises:
the simultaneous interpretation server receives the target access request;
the simultaneous interpretation server analyzes target information in the target access request;
the simultaneous interpretation server upgrades the current connection mode with the cloud platform server into a target connection mode according to the target information;
and the simultaneous interpretation server returns response information about the connection mode with the cloud platform server to the cloud platform server, and establishes a target connection channel with the cloud platform server according to the target connection mode.
7. The system of any of claims 1-6, wherein the conference host supports at least two different language interpretations, the conference host providing at least two language modes for selection by a user.
8. A simultaneous interpretation method as claimed in any one of claims 1 to 7, the method comprising:
collecting target audio data of a speaker using a microphone;
transmitting the target audio data to a conference host connected to the microphone;
transmitting the target audio data to a simultaneous interpretation host connected with the conference host through the conference host;
transmitting the target audio data to a cloud platform server connected with the simultaneous interpretation host through the simultaneous interpretation host;
invoking the cloud platform server to initiate a target access request to the simultaneous interpretation server, wherein the target access request comprises target information;
receiving the target access request through the simultaneous interpretation server, analyzing the target access request, returning response information to the cloud platform server, and establishing a target connection channel for the simultaneous interpretation server and the cloud platform server so that the cloud platform server can send the target audio data to the simultaneous interpretation server through the target connection channel;
Processing the target audio data by using the simultaneous interpretation server, and sending a translation result of the simultaneous interpretation server and the target audio data to the cloud platform server;
receiving a translation result of the target audio data sent by the simultaneous interpretation server through the cloud platform server, and sending the translation result to a simultaneous interpretation host corresponding to the target audio data;
receiving a translation result of the target audio data by using a simultaneous interpretation host corresponding to the target audio data, displaying a text translation result in the translation result of the target audio data, and forwarding the audio translation result in the translation result of the target audio data to a conference host connected with the translation result;
and playing the audio translation result in the translation results of the target audio data by utilizing the conference host corresponding to the target audio data.
9. A contemporaneous interpretation device, comprising: one or more processors, and memory;
stored in the memory are computer readable instructions which, when executed by the one or more processors, implement the steps of the simultaneous interpretation method as claimed in claim 8.
10. A readable storage medium, characterized by: the readable storage medium has stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to implement the steps of the simultaneous interpretation method as claimed in claim 8.
CN202310919856.8A 2023-07-25 2023-07-25 Simultaneous interpretation system, method, apparatus, and readable storage medium Pending CN117119108A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310919856.8A CN117119108A (en) 2023-07-25 2023-07-25 Simultaneous interpretation system, method, apparatus, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310919856.8A CN117119108A (en) 2023-07-25 2023-07-25 Simultaneous interpretation system, method, apparatus, and readable storage medium

Publications (1)

Publication Number Publication Date
CN117119108A true CN117119108A (en) 2023-11-24

Family

ID=88804630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310919856.8A Pending CN117119108A (en) 2023-07-25 2023-07-25 Simultaneous interpretation system, method, apparatus, and readable storage medium

Country Status (1)

Country Link
CN (1) CN117119108A (en)

Similar Documents

Publication Publication Date Title
TWI440346B (en) Open architecture based domain dependent real time multi-lingual communication service
CN112543297B (en) Video conference live broadcast method, device and system
JP5039024B2 (en) Method and apparatus for multi-mode voice and web services
WO2020124725A1 (en) Audio and video pushing method and audio and video stream pushing client based on webrtc protocol
US20010009014A1 (en) Facilitating real-time, multi-point communications over the internet
CN113741837B (en) Information processing method, device, system and storage medium
CN106254899A (en) The control method of a kind of live even wheat and system
CN110166729B (en) Cloud video conference method, device, system, medium and computing equipment
CN107682657A (en) WebRTC-based multi-user voice video call method and system
US20250158845A1 (en) Audio data pushing method, apparatus and system, and electronic device and storage medium
WO2020248649A1 (en) Audio and video data synchronous playback method, apparatus and system, electronic device and medium
CN112019792B (en) Conference control method, conference control device, terminal equipment and storage medium
US5740384A (en) Interactive multimedia system using active backplane having programmable interface to reconfigure the media stream produced by each component
CN103843323A (en) Method for realizing multimedia conference, related equipment and system
US20250008170A1 (en) Data stream-based playing method and apparatus, device, and medium
WO2025092375A1 (en) Interaction data processing method and apparatus
CN117119108A (en) Simultaneous interpretation system, method, apparatus, and readable storage medium
CN110335610A (en) The control method and display of multimedia translation
WO2024041556A1 (en) Voice chat display method and apparatus, electronic device and computer-readable medium
CN110225287A (en) Audio-frequency processing method and device
CN102934426A (en) Data distribution apparatus, data distribution method, and program
CN119324998A (en) Live broadcast method and system based on Robot proxy mechanism
JP2025523467A (en) Collaborative distribution display method, device, electronic device, and computer-readable medium
CN116437132A (en) Video live broadcast method and video live broadcast device
WO2023197897A1 (en) Method and apparatus for processing live-streaming audio and video stream, and device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination