[go: up one dir, main page]

WO2023219556A1 - A system and method to manage a plurality of language audio streams - Google Patents

A system and method to manage a plurality of language audio streams Download PDF

Info

Publication number
WO2023219556A1
WO2023219556A1 PCT/SG2022/050321 SG2022050321W WO2023219556A1 WO 2023219556 A1 WO2023219556 A1 WO 2023219556A1 SG 2022050321 W SG2022050321 W SG 2022050321W WO 2023219556 A1 WO2023219556 A1 WO 2023219556A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
stream
streams
user device
translated
Prior art date
Application number
PCT/SG2022/050321
Other languages
French (fr)
Inventor
Peng SONG
Original Assignee
Song Peng
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Song Peng filed Critical Song Peng
Priority to PCT/SG2022/050321 priority Critical patent/WO2023219556A1/en
Publication of WO2023219556A1 publication Critical patent/WO2023219556A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4852End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4396Processing of audio elementary streams by muting the audio signal

Definitions

  • the present invention relates to a system, and method to manage a plurality of language audio streams, particularly during a live translation instance for a video stream.
  • the world has recently embraced online meetings and events as a viable alternative to in-person meetings and events.
  • the covid-19 pandemic has substantially hastened the digitization of events and meetings, and correspondingly, the audience reach is no longer constrained by geography, with language typically being the only barrier for effective communication.
  • a system to manage a plurality of language audio streams comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
  • a data processor implemented method to manage a plurality of language audio streams comprising: transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
  • FIG 1 provides a schematic view of a first embodiment of the present invention
  • FIG 2 provides a schematic view of second embodiment of the present invention
  • FIG 3 provides a schematic view of an example of a system for managing a plurality of language audio streams
  • FIG 4 is a schematic diagram showing components of an example user device of the system shown in FIG 3;
  • FIG 5 is a schematic diagram showing components of an example central server shown in FIG 3.
  • the present invention provides a system and method to manage a plurality of language audio streams.
  • the system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams.
  • Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
  • browser extension (add-on) based approaches are implemented to enable any online meeting and event platform on desktop and laptop computers (either Windows or Mac) with on-demand delivery of simultaneous interpretation by human interpreters.
  • desktop and laptop computers either Windows or Mac
  • video conference or live streaming platforms There are typically two types of video conference or live streaming platforms that can be enabled in browsers (Chrome, Firefox or Microsoft Edge, etc.), namely those which rely on:
  • the method and system of the present invention is for both types of platforms, correspondingly offering the same advantages.
  • FIG 1 An example of a broad overview of a first embodiment of the present invention which is applicable to the instance of hidden media streams from video conference/live streaming platforms is shown in FIG 1 .
  • a method 100 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is hidden, there is provided a browser extension (add-on) to open a new tab with interpretation audio streams.
  • step 105 there are various language streams for consumption.
  • the original language (floor) stream and three other language streams are available for selection by the user.
  • the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
  • step 1 10 the user is able to toggle a desired language stream between an “on” state or an “off” state.
  • that language stream will be consumable by the user.
  • the “off” state that language stream will not be consumable by the user.
  • the state of the language stream is dependent on a selection of the user at step 1 15.
  • the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection.
  • FIG 1 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
  • the method 100 enables seamless real-time switching of channels between floor (original meeting audio) and a plurality of translations.
  • a translation streaming URL can be provided to a browser extension, which enables opening of a new tab.
  • This new tab can include a graphical user interface that is configured to provide, for example:
  • the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language.
  • the method 100 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like.
  • the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
  • FIG 2 there is shown an example of a second embodiment of the present invention which is applicable to the instance of exposed media streams from video conference/live streaming platforms.
  • a method 200 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is exposed, there is also provided a browser extension (add-on) to open a new tab with translation audio streams, and the browser extension being configured to provide a graphical user interface.
  • step 205 there are various language streams available for consumption.
  • the original language (floor) stream and three other language streams are available for selection by the user.
  • the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
  • the user is able to toggle a desired language stream between an “on” state or an “off” state.
  • that language stream will be consumable by the user.
  • that language stream will not be consumable by the user.
  • the state of the language stream is dependent on a selection of the user at step 215.
  • the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection.
  • FIG 2 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
  • the exposed language streams are able to enable volume control of the respective streams. This allows the user to listen to both an actual language of the meeting/event, together with a desired translation language, each at different respective volume levels.
  • the method 200 enables seamless concurrent real-time consumption of channels between floor (original meeting audio) and a plurality of translations.
  • translated language streams are transmitted (via WebRTC) separately from the meeting stream, and shown in a semitransparent movable pop-up frame window floating on top of the meeting/event video, with a graphical user interface that is configured to provide, for example:
  • the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language, and if desired, concurrently with an original language of the meeting/event.
  • Consuming the meeting/event video with both the original language concurrently with a translated language provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
  • the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos.
  • the capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement.
  • This is important in an era when online live selling, and online auctions are growing in popularity.
  • online live selling, and online auctions when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
  • the method 200 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like.
  • the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
  • the system 300 includes one or more user devices 320, a communications network 350, one or more content generators 380 (for example, a broadcaster, a live auctioneer, an event provider and so forth who may not be based at the same physical location), and a central server 360 (eg. a central administrator for providing a translation service for online meetings/events.).
  • the one or more user devices 320 and the one or more content generators 380 communicate with the central server 360 via the communications network 350.
  • the communications network 350 can be of any appropriate form, such as the Internet and/or a number of local area networks (LANs). Further details of respective components of the system 300 will be provided in a following portion of the description. It will be appreciated that the configuration shown in FIG 3 is not limiting and for the purpose of illustration only.
  • the user device 320 of any of the examples herein may be a laptop computer or a desktop computer, being configured with a capability to access the internet and/or download and operate web browsers, while being connectable to the communications network 350.
  • the user device 320 should be able to run the graphical user interfaces of methods 100/200 when the methods are being carried out.
  • the user device 320 includes the following components in electronic communication via a bus 41 1 :
  • non-volatile memory 403
  • RAM random access memory
  • transceiver component 405 that includes a transceiver(s);
  • software 409 is stored in the non-volatile memory 403 to enable the user device 320 to operate a web browser. Once the user device 320 is able to operate the web browser, plug-ins can then be enabled to enable the carrying out of the methods 100/200.
  • FIG 4 is not intended to be a hardware diagram; thus many of the components depicted in FIG 4 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 4.
  • the central server 360 is a hardware and software suite comprised of preprogrammed logic, algorithms and other means of processing information coming in, in order to send out information which is useful to the objective of the system 300 in which the central server 360 resides.
  • hardware which can be used by the central server 360 will be described briefly herein.
  • the central server 360 can broadly comprise a database which stores pertinent information, and processes information packets from the user devices 320.
  • the central administrator for providing a translation service for online meetings/events runs the central server 360.
  • the central server 360 can be operated from a commercial hosted service such for example, Amazon Web Services, Facebook Cloud and so forth. .
  • the central server 360 is represented in a form as shown in FIG 5.
  • the central server 360 is in communication with a communications network 350, as shown in FIG 3.
  • the central server 360 is able to communicate with the user devices 320, the content generators 380 and/or other processing devices, as required, over the communications network 350.
  • the user devices 320 communicate via a direct communication channel (LAN or WIFI) with the central server 360.
  • LAN or WIFI direct communication channel
  • the components of the central server 360 can be configured in a variety of ways.
  • the components can be implemented entirely by software to be executed on standard computer server hardware, which may comprise one hardware unit or different computer hardware units distributed over various locations, some of which may require the communications network 350 for communication.
  • the central server 360 is a commercially available computer system based on a 32 bit or a 64 bit Intel architecture, and the processes and/or methods executed or performed by the central server 360 are implemented in the form of programming instructions of one or more software components or modules 502 stored on non-volatile computer-readable storage 503 associated with the central server 360.
  • the server 360 includes at least one or more of the following standard, commercially available, computer components, all interconnected by a bus 505:
  • RAM random access memory
  • CPU central processing unit
  • FIG 5 is not intended to be a hardware diagram; thus many of the components depicted in FIG 5 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 5.
  • each content generator 380 should be capable of providing at least one audio recording stream, so as to ensure that there is content to be translated.
  • each content generator 380 should use a device which is configured to carry out at least the following tasks:
  • system 300 enables the methods 100/200 to be carried out in a desired manner as described in earlier paragraphs. However, it should also be noted that the methods 100/200 need not be carried out only using the system
  • the system 300 is able to enable the same advantages.
  • the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement.
  • This is critical in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a system and method to manage a plurality of language audio streams. The system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams. Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.

Description

A SYSTEM AND METHOD TO MANAGE A PLURALITY OF LANGUAGE AUDIO STREAMS
Field of the Invention
The present invention relates to a system, and method to manage a plurality of language audio streams, particularly during a live translation instance for a video stream.
Background
The world has recently embraced online meetings and events as a viable alternative to in-person meetings and events. The covid-19 pandemic has substantially hastened the digitization of events and meetings, and correspondingly, the audience reach is no longer constrained by geography, with language typically being the only barrier for effective communication.
Currently, many online meetings and event platforms are based on WebRTC protocols to enable browser-based versions of the platforms. However, many of such platforms do not include built-in simultaneous translation capabilities to facilitate multilingual communication.
In many instances, machine implemented translations are less desirable than human translations, for example, in relation to contextual nuances. Thus, the need for human simultaneous interpretation for online meetings has increased substantially.
Thus, a system and method to provide a desirable experience and functionality for users when providing human translations on WebRTC platforms is currently lacking.
Summary In a first aspect, there is provided a system to manage a plurality of language audio streams, the system comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
In a second aspect, there is provided a data processor implemented method to manage a plurality of language audio streams, the method comprising: transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
It will be appreciated that the broad forms of the invention and their respective features can be used in conjunction, interchangeably and/or independently, and reference to separate broad forms is not intended to be limiting.
Brief Description of the Drawings A non-limiting example of the present invention will now be described with reference to the accompanying drawings, in which:
FIG 1 provides a schematic view of a first embodiment of the present invention;
FIG 2 provides a schematic view of second embodiment of the present invention;
FIG 3 provides a schematic view of an example of a system for managing a plurality of language audio streams;
FIG 4 is a schematic diagram showing components of an example user device of the system shown in FIG 3; and
FIG 5 is a schematic diagram showing components of an example central server shown in FIG 3.
Detailed Description
The present invention provides a system and method to manage a plurality of language audio streams. The system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams. Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
In the present method, browser extension (add-on) based approaches are implemented to enable any online meeting and event platform on desktop and laptop computers (either Windows or Mac) with on-demand delivery of simultaneous interpretation by human interpreters. There are typically two types of video conference or live streaming platforms that can be enabled in browsers (Chrome, Firefox or Microsoft Edge, etc.), namely those which rely on:
1 ) hidden media stream that cannot be controlled, such as Zoom, Webex, and so forth, and
2) exposed media stream that can be controlled, such as Teams, Google Meet, Youtube Live, and so forth.
The method and system of the present invention is for both types of platforms, correspondingly offering the same advantages.
An example of a broad overview of a first embodiment of the present invention which is applicable to the instance of hidden media streams from video conference/live streaming platforms is shown in FIG 1 .
A method 100 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is hidden, there is provided a browser extension (add-on) to open a new tab with interpretation audio streams.
At step 105, there are various language streams for consumption. In this example, the original language (floor) stream and three other language streams are available for selection by the user. Even though it is envisaged that the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
Subsequently, at step 1 10, the user is able to toggle a desired language stream between an “on” state or an “off” state. In the “on” state, that language stream will be consumable by the user. In the “off” state, that language stream will not be consumable by the user. However, the state of the language stream is dependent on a selection of the user at step 1 15. At step 115, the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection. FIG 1 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
It should be appreciated that the method 100 enables seamless real-time switching of channels between floor (original meeting audio) and a plurality of translations.
During use of the method 100, for example, a translation streaming URL can be provided to a browser extension, which enables opening of a new tab. This new tab can include a graphical user interface that is configured to provide, for example:
- audio volume display,
- a channel/language selection list; and
- an exit selector.
Once the channel/language is selected from the graphical user interface in the tab, the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language.
For the purpose of illustration, it is assumed that the method 100 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like. Typically, the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100. Referring to FIG 2, there is shown an example of a second embodiment of the present invention which is applicable to the instance of exposed media streams from video conference/live streaming platforms.
A method 200 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is exposed, there is also provided a browser extension (add-on) to open a new tab with translation audio streams, and the browser extension being configured to provide a graphical user interface.
At step 205, there are various language streams available for consumption. In this example, the original language (floor) stream and three other language streams are available for selection by the user. Even though it is envisaged that the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
Subsequently, at step 210, the user is able to toggle a desired language stream between an “on” state or an “off” state. In the “on” state, that language stream will be consumable by the user. In the “off” state, that language stream will not be consumable by the user. However, the state of the language stream is dependent on a selection of the user at step 215.
At step 215, the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection. FIG 2 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
At step 220, the exposed language streams are able to enable volume control of the respective streams. This allows the user to listen to both an actual language of the meeting/event, together with a desired translation language, each at different respective volume levels.
It should be appreciated that the method 200 enables seamless concurrent real-time consumption of channels between floor (original meeting audio) and a plurality of translations.
During use of the method 200, for example, translated language streams are transmitted (via WebRTC) separately from the meeting stream, and shown in a semitransparent movable pop-up frame window floating on top of the meeting/event video, with a graphical user interface that is configured to provide, for example:
- a language/channel selection list,
- language audio penetration functionality;
- volume control functionality for all language streams;
- refresh connector toggle; and
- an exit selector.
Once the channel/language is selected from the graphical user interface in the tab, the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language, and if desired, concurrently with an original language of the meeting/event. Consuming the meeting/event video with both the original language concurrently with a translated language provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
It should be appreciated that the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement. This is important in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
For the purpose of illustration, it is assumed that the method 200 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like. Typically, the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
An example of a system 300 to manage a plurality of language audio streams will now be described with reference to FIG 3.
In this example, the system 300 includes one or more user devices 320, a communications network 350, one or more content generators 380 (for example, a broadcaster, a live auctioneer, an event provider and so forth who may not be based at the same physical location), and a central server 360 (eg. a central administrator for providing a translation service for online meetings/events.). The one or more user devices 320 and the one or more content generators 380 communicate with the central server 360 via the communications network 350. The communications network 350 can be of any appropriate form, such as the Internet and/or a number of local area networks (LANs). Further details of respective components of the system 300 will be provided in a following portion of the description. It will be appreciated that the configuration shown in FIG 3 is not limiting and for the purpose of illustration only.
User Device 320
The user device 320 of any of the examples herein may be a laptop computer or a desktop computer, being configured with a capability to access the internet and/or download and operate web browsers, while being connectable to the communications network 350. The user device 320 should be able to run the graphical user interfaces of methods 100/200 when the methods are being carried out.
An exemplary embodiment of the user device 320 is shown in FIG 4. As shown, the user device 320 includes the following components in electronic communication via a bus 41 1 :
1. a display 402;
2. non-volatile memory 403;
3. random access memory ("RAM") 404;
4. data processor(s) 401 ;
5. a transceiver component 405 that includes a transceiver(s);
6. an image capture module 410; and
7. input controls 407.
In some embodiments, software 409 is stored in the non-volatile memory 403 to enable the user device 320 to operate a web browser. Once the user device 320 is able to operate the web browser, plug-ins can then be enabled to enable the carrying out of the methods 100/200.
Although the components depicted in FIG 4 represent physical components, FIG 4 is not intended to be a hardware diagram; thus many of the components depicted in FIG 4 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 4.
Central Server 360
The central server 360 is a hardware and software suite comprised of preprogrammed logic, algorithms and other means of processing information coming in, in order to send out information which is useful to the objective of the system 300 in which the central server 360 resides. For the sake of illustration, hardware which can be used by the central server 360 will be described briefly herein.
The central server 360 can broadly comprise a database which stores pertinent information, and processes information packets from the user devices 320. In some embodiments, the central administrator for providing a translation service for online meetings/events runs the central server 360. The central server 360 can be operated from a commercial hosted service such for example, Amazon Web Services, Alibaba Cloud and so forth. .
In one possible embodiment, the central server 360 is represented in a form as shown in FIG 5.
The central server 360 is in communication with a communications network 350, as shown in FIG 3. The central server 360 is able to communicate with the user devices 320, the content generators 380 and/or other processing devices, as required, over the communications network 350. In some instances, the user devices 320, communicate via a direct communication channel (LAN or WIFI) with the central server 360.
The components of the central server 360 can be configured in a variety of ways. The components can be implemented entirely by software to be executed on standard computer server hardware, which may comprise one hardware unit or different computer hardware units distributed over various locations, some of which may require the communications network 350 for communication.
In the example shown in FIG 5, the central server 360 is a commercially available computer system based on a 32 bit or a 64 bit Intel architecture, and the processes and/or methods executed or performed by the central server 360 are implemented in the form of programming instructions of one or more software components or modules 502 stored on non-volatile computer-readable storage 503 associated with the central server 360.
The server 360 includes at least one or more of the following standard, commercially available, computer components, all interconnected by a bus 505:
1 . random access memory (RAM) 506; and
2. at least one central processing unit (CPU) 507.
Although the components depicted in FIG 5 represent physical components, FIG 5 is not intended to be a hardware diagram; thus many of the components depicted in FIG 5 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 5.
Content Generator 380
It should be appreciated that each content generator 380 should be capable of providing at least one audio recording stream, so as to ensure that there is content to be translated. Typically, each content generator 380 should use a device which is configured to carry out at least the following tasks:
- record an audio stream; and
- connect to the communications network 350.
It should be appreciated that a capability of capturing a video stream can be optional for each content generator 380.
It should be appreciated that the system 300 enables the methods 100/200 to be carried out in a desired manner as described in earlier paragraphs. However, it should also be noted that the methods 100/200 need not be carried out only using the system
300. Other systems may also be used to enable the methods 100/200.
As also mentioned for the method 200, the system 300 is able to enable the same advantages. For example, the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement. This is critical in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
Throughout this specification and claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers or steps but not the exclusion of any other integer or group of integers.
Persons skilled in the art will appreciate that numerous variations and modifications will become apparent. All such variations and modifications which become apparent to persons skilled in the art, should be considered to fall within the spirit and scope that the invention broadly appearing before described.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:
1. A system to manage a plurality of language audio streams, the system comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
2. The system of claim 1 , wherein the original language audio stream is obtained from a video stream.
3. The system of claim 2, wherein the at least two translated language streams are transmitted separately from the video stream.
4. The system of any of claims 1 -3, wherein the browser extension is configured to provide a graphical user interface.
5. The system of any of claims 1 to 4, wherein the at least two translated language streams are human generated.
6. The system of any of claims 1 to 4, wherein the at least two translated language streams are machine generated.
7. A data processor implemented method to manage a plurality of language audio streams, the method comprising: transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
8. The method of claim 7, wherein the original language audio stream is obtained from a video stream.
9. The method of claim 8, wherein the at least two translated language streams are transmitted separately from the video stream.
10. The method of any of claims 7-9, wherein the browser extension is configured to provide a graphical user interface.
1 1. The method of any of claims 7 to 10, wherein the at least two translated language streams are human generated.
12. The method of any of claims 7 to 10, wherein the at least two translated language streams are machine generated.
PCT/SG2022/050321 2022-05-13 2022-05-13 A system and method to manage a plurality of language audio streams WO2023219556A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2022/050321 WO2023219556A1 (en) 2022-05-13 2022-05-13 A system and method to manage a plurality of language audio streams

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2022/050321 WO2023219556A1 (en) 2022-05-13 2022-05-13 A system and method to manage a plurality of language audio streams

Publications (1)

Publication Number Publication Date
WO2023219556A1 true WO2023219556A1 (en) 2023-11-16

Family

ID=88730681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2022/050321 WO2023219556A1 (en) 2022-05-13 2022-05-13 A system and method to manage a plurality of language audio streams

Country Status (1)

Country Link
WO (1) WO2023219556A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010044726A1 (en) * 2000-05-18 2001-11-22 Hui Li Method and receiver for providing audio translation data on demand
US20170011740A1 (en) * 2011-08-31 2017-01-12 Google Inc. Text transcript generation from a communication session
US20180276203A1 (en) * 2013-11-08 2018-09-27 Google Llc User interface for realtime language translation
CN110166729A (en) * 2019-05-30 2019-08-23 上海赛连信息科技有限公司 Cloud video-meeting method, device, system, medium and calculating equipment
US20190364303A1 (en) * 2018-05-22 2019-11-28 Beijing Baidu Netcom Science Technology Co., Ltd. Live broadcast processing method, apparatus, device, and storage medium
CN112188241A (en) * 2020-10-09 2021-01-05 上海网达软件股份有限公司 Method and system for real-time subtitle generation of live stream
US20210042477A1 (en) * 2017-06-14 2021-02-11 Microsoft Technology Licensing, Llc Customized transcribed conversations
CN112735430A (en) * 2020-12-28 2021-04-30 传神语联网网络科技股份有限公司 Multilingual online simultaneous interpretation system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010044726A1 (en) * 2000-05-18 2001-11-22 Hui Li Method and receiver for providing audio translation data on demand
US20170011740A1 (en) * 2011-08-31 2017-01-12 Google Inc. Text transcript generation from a communication session
US20180276203A1 (en) * 2013-11-08 2018-09-27 Google Llc User interface for realtime language translation
US20210042477A1 (en) * 2017-06-14 2021-02-11 Microsoft Technology Licensing, Llc Customized transcribed conversations
US20190364303A1 (en) * 2018-05-22 2019-11-28 Beijing Baidu Netcom Science Technology Co., Ltd. Live broadcast processing method, apparatus, device, and storage medium
CN110166729A (en) * 2019-05-30 2019-08-23 上海赛连信息科技有限公司 Cloud video-meeting method, device, system, medium and calculating equipment
CN112188241A (en) * 2020-10-09 2021-01-05 上海网达软件股份有限公司 Method and system for real-time subtitle generation of live stream
CN112735430A (en) * 2020-12-28 2021-04-30 传神语联网网络科技股份有限公司 Multilingual online simultaneous interpretation system

Similar Documents

Publication Publication Date Title
CN100544259C (en) Systems and methods for determining remote device media capabilities
US6760749B1 (en) Interactive conference content distribution device and methods of use thereof
US7991801B2 (en) Real-time dynamic and synchronized captioning system and method for use in the streaming of multimedia data
CA2884407C (en) System and method for broadcasting interactive content
US20200021545A1 (en) Method, device and storage medium for interactive message in video page
US8713454B2 (en) Method and apparatus for sharing virtual workspaces
US8997154B2 (en) Apparatus and method for obtaining media content
US8019276B2 (en) Audio transmission method and system
US10187435B2 (en) Queued sharing of content in online conferencing
CN1449620A (en) Video messaging
US9973820B2 (en) Method and apparatus for creating dynamic webpages in a media communication system
US11533347B2 (en) Selective internal forwarding in conferences with distributed media servers
US11064232B2 (en) Media broadcast system
US11838572B2 (en) Streaming video trunking
US9172594B1 (en) IPv6 to web architecture
CN113497945A (en) Live broadcast and configuration method based on cloud mobile phone and related device and system
CN104618785A (en) Audio and video playing method, device and system
US20140006915A1 (en) Webpage browsing synchronization in a real time collaboration session field
US20200226953A1 (en) System and method for facilitating masking in a communication session
WO2023219556A1 (en) A system and method to manage a plurality of language audio streams
CN109948082B (en) Live broadcast information processing method and device, electronic equipment and storage medium
US20110252156A1 (en) System and Method for Providing Information to Users of a Communication Network
KR102198799B1 (en) Conferencing apparatus and method for sharing content thereof
How et al. The technical infrastructure for remote participation in the European Fusion Programme
US20110162023A1 (en) Method and system for providing correlated advertisement for complete internet anywhere

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22941816

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE