WO2023219556A1 - A system and method to manage a plurality of language audio streams - Google Patents
A system and method to manage a plurality of language audio streams Download PDFInfo
- Publication number
- WO2023219556A1 WO2023219556A1 PCT/SG2022/050321 SG2022050321W WO2023219556A1 WO 2023219556 A1 WO2023219556 A1 WO 2023219556A1 SG 2022050321 W SG2022050321 W SG 2022050321W WO 2023219556 A1 WO2023219556 A1 WO 2023219556A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- language
- stream
- streams
- user device
- translated
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 11
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000001066 destructive effect Effects 0.000 abstract description 3
- 230000009467 reduction Effects 0.000 abstract description 3
- 230000001755 vocal effect Effects 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 14
- 238000013519 translation Methods 0.000 description 14
- 230000014616 translation Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 208000025721 COVID-19 Diseases 0.000 description 1
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8106—Monomedia components thereof involving special audio data, e.g. different tracks for different languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4852—End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4856—End-user interface for client configuration for language selection, e.g. for the menu or subtitles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/152—Multipoint control units therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4396—Processing of audio elementary streams by muting the audio signal
Definitions
- the present invention relates to a system, and method to manage a plurality of language audio streams, particularly during a live translation instance for a video stream.
- the world has recently embraced online meetings and events as a viable alternative to in-person meetings and events.
- the covid-19 pandemic has substantially hastened the digitization of events and meetings, and correspondingly, the audience reach is no longer constrained by geography, with language typically being the only barrier for effective communication.
- a system to manage a plurality of language audio streams comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
- a data processor implemented method to manage a plurality of language audio streams comprising: transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
- FIG 1 provides a schematic view of a first embodiment of the present invention
- FIG 2 provides a schematic view of second embodiment of the present invention
- FIG 3 provides a schematic view of an example of a system for managing a plurality of language audio streams
- FIG 4 is a schematic diagram showing components of an example user device of the system shown in FIG 3;
- FIG 5 is a schematic diagram showing components of an example central server shown in FIG 3.
- the present invention provides a system and method to manage a plurality of language audio streams.
- the system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams.
- Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
- browser extension (add-on) based approaches are implemented to enable any online meeting and event platform on desktop and laptop computers (either Windows or Mac) with on-demand delivery of simultaneous interpretation by human interpreters.
- desktop and laptop computers either Windows or Mac
- video conference or live streaming platforms There are typically two types of video conference or live streaming platforms that can be enabled in browsers (Chrome, Firefox or Microsoft Edge, etc.), namely those which rely on:
- the method and system of the present invention is for both types of platforms, correspondingly offering the same advantages.
- FIG 1 An example of a broad overview of a first embodiment of the present invention which is applicable to the instance of hidden media streams from video conference/live streaming platforms is shown in FIG 1 .
- a method 100 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is hidden, there is provided a browser extension (add-on) to open a new tab with interpretation audio streams.
- step 105 there are various language streams for consumption.
- the original language (floor) stream and three other language streams are available for selection by the user.
- the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
- step 1 10 the user is able to toggle a desired language stream between an “on” state or an “off” state.
- that language stream will be consumable by the user.
- the “off” state that language stream will not be consumable by the user.
- the state of the language stream is dependent on a selection of the user at step 1 15.
- the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection.
- FIG 1 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
- the method 100 enables seamless real-time switching of channels between floor (original meeting audio) and a plurality of translations.
- a translation streaming URL can be provided to a browser extension, which enables opening of a new tab.
- This new tab can include a graphical user interface that is configured to provide, for example:
- the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language.
- the method 100 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like.
- the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
- FIG 2 there is shown an example of a second embodiment of the present invention which is applicable to the instance of exposed media streams from video conference/live streaming platforms.
- a method 200 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is exposed, there is also provided a browser extension (add-on) to open a new tab with translation audio streams, and the browser extension being configured to provide a graphical user interface.
- step 205 there are various language streams available for consumption.
- the original language (floor) stream and three other language streams are available for selection by the user.
- the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
- the user is able to toggle a desired language stream between an “on” state or an “off” state.
- that language stream will be consumable by the user.
- that language stream will not be consumable by the user.
- the state of the language stream is dependent on a selection of the user at step 215.
- the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection.
- FIG 2 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
- the exposed language streams are able to enable volume control of the respective streams. This allows the user to listen to both an actual language of the meeting/event, together with a desired translation language, each at different respective volume levels.
- the method 200 enables seamless concurrent real-time consumption of channels between floor (original meeting audio) and a plurality of translations.
- translated language streams are transmitted (via WebRTC) separately from the meeting stream, and shown in a semitransparent movable pop-up frame window floating on top of the meeting/event video, with a graphical user interface that is configured to provide, for example:
- the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language, and if desired, concurrently with an original language of the meeting/event.
- Consuming the meeting/event video with both the original language concurrently with a translated language provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
- the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos.
- the capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement.
- This is important in an era when online live selling, and online auctions are growing in popularity.
- online live selling, and online auctions when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
- the method 200 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like.
- the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
- the system 300 includes one or more user devices 320, a communications network 350, one or more content generators 380 (for example, a broadcaster, a live auctioneer, an event provider and so forth who may not be based at the same physical location), and a central server 360 (eg. a central administrator for providing a translation service for online meetings/events.).
- the one or more user devices 320 and the one or more content generators 380 communicate with the central server 360 via the communications network 350.
- the communications network 350 can be of any appropriate form, such as the Internet and/or a number of local area networks (LANs). Further details of respective components of the system 300 will be provided in a following portion of the description. It will be appreciated that the configuration shown in FIG 3 is not limiting and for the purpose of illustration only.
- the user device 320 of any of the examples herein may be a laptop computer or a desktop computer, being configured with a capability to access the internet and/or download and operate web browsers, while being connectable to the communications network 350.
- the user device 320 should be able to run the graphical user interfaces of methods 100/200 when the methods are being carried out.
- the user device 320 includes the following components in electronic communication via a bus 41 1 :
- non-volatile memory 403
- RAM random access memory
- transceiver component 405 that includes a transceiver(s);
- software 409 is stored in the non-volatile memory 403 to enable the user device 320 to operate a web browser. Once the user device 320 is able to operate the web browser, plug-ins can then be enabled to enable the carrying out of the methods 100/200.
- FIG 4 is not intended to be a hardware diagram; thus many of the components depicted in FIG 4 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 4.
- the central server 360 is a hardware and software suite comprised of preprogrammed logic, algorithms and other means of processing information coming in, in order to send out information which is useful to the objective of the system 300 in which the central server 360 resides.
- hardware which can be used by the central server 360 will be described briefly herein.
- the central server 360 can broadly comprise a database which stores pertinent information, and processes information packets from the user devices 320.
- the central administrator for providing a translation service for online meetings/events runs the central server 360.
- the central server 360 can be operated from a commercial hosted service such for example, Amazon Web Services, Facebook Cloud and so forth. .
- the central server 360 is represented in a form as shown in FIG 5.
- the central server 360 is in communication with a communications network 350, as shown in FIG 3.
- the central server 360 is able to communicate with the user devices 320, the content generators 380 and/or other processing devices, as required, over the communications network 350.
- the user devices 320 communicate via a direct communication channel (LAN or WIFI) with the central server 360.
- LAN or WIFI direct communication channel
- the components of the central server 360 can be configured in a variety of ways.
- the components can be implemented entirely by software to be executed on standard computer server hardware, which may comprise one hardware unit or different computer hardware units distributed over various locations, some of which may require the communications network 350 for communication.
- the central server 360 is a commercially available computer system based on a 32 bit or a 64 bit Intel architecture, and the processes and/or methods executed or performed by the central server 360 are implemented in the form of programming instructions of one or more software components or modules 502 stored on non-volatile computer-readable storage 503 associated with the central server 360.
- the server 360 includes at least one or more of the following standard, commercially available, computer components, all interconnected by a bus 505:
- RAM random access memory
- CPU central processing unit
- FIG 5 is not intended to be a hardware diagram; thus many of the components depicted in FIG 5 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 5.
- each content generator 380 should be capable of providing at least one audio recording stream, so as to ensure that there is content to be translated.
- each content generator 380 should use a device which is configured to carry out at least the following tasks:
- system 300 enables the methods 100/200 to be carried out in a desired manner as described in earlier paragraphs. However, it should also be noted that the methods 100/200 need not be carried out only using the system
- the system 300 is able to enable the same advantages.
- the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement.
- This is critical in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a system and method to manage a plurality of language audio streams. The system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams. Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
Description
A SYSTEM AND METHOD TO MANAGE A PLURALITY OF LANGUAGE AUDIO STREAMS
Field of the Invention
The present invention relates to a system, and method to manage a plurality of language audio streams, particularly during a live translation instance for a video stream.
Background
The world has recently embraced online meetings and events as a viable alternative to in-person meetings and events. The covid-19 pandemic has substantially hastened the digitization of events and meetings, and correspondingly, the audience reach is no longer constrained by geography, with language typically being the only barrier for effective communication.
Currently, many online meetings and event platforms are based on WebRTC protocols to enable browser-based versions of the platforms. However, many of such platforms do not include built-in simultaneous translation capabilities to facilitate multilingual communication.
In many instances, machine implemented translations are less desirable than human translations, for example, in relation to contextual nuances. Thus, the need for human simultaneous interpretation for online meetings has increased substantially.
Thus, a system and method to provide a desirable experience and functionality for users when providing human translations on WebRTC platforms is currently lacking.
Summary
In a first aspect, there is provided a system to manage a plurality of language audio streams, the system comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
In a second aspect, there is provided a data processor implemented method to manage a plurality of language audio streams, the method comprising: transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
It will be appreciated that the broad forms of the invention and their respective features can be used in conjunction, interchangeably and/or independently, and reference to separate broad forms is not intended to be limiting.
Brief Description of the Drawings
A non-limiting example of the present invention will now be described with reference to the accompanying drawings, in which:
FIG 1 provides a schematic view of a first embodiment of the present invention;
FIG 2 provides a schematic view of second embodiment of the present invention;
FIG 3 provides a schematic view of an example of a system for managing a plurality of language audio streams;
FIG 4 is a schematic diagram showing components of an example user device of the system shown in FIG 3; and
FIG 5 is a schematic diagram showing components of an example central server shown in FIG 3.
Detailed Description
The present invention provides a system and method to manage a plurality of language audio streams. The system and method broadly provides users with an interface to optimise the output to users from a plurality of language audio streams. Optimising the output includes being able to calibrate respective volume levels of the plurality of language audio streams. This provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
In the present method, browser extension (add-on) based approaches are implemented to enable any online meeting and event platform on desktop and laptop computers (either Windows or Mac) with on-demand delivery of simultaneous interpretation by human interpreters.
There are typically two types of video conference or live streaming platforms that can be enabled in browsers (Chrome, Firefox or Microsoft Edge, etc.), namely those which rely on:
1 ) hidden media stream that cannot be controlled, such as Zoom, Webex, and so forth, and
2) exposed media stream that can be controlled, such as Teams, Google Meet, Youtube Live, and so forth.
The method and system of the present invention is for both types of platforms, correspondingly offering the same advantages.
An example of a broad overview of a first embodiment of the present invention which is applicable to the instance of hidden media streams from video conference/live streaming platforms is shown in FIG 1 .
A method 100 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is hidden, there is provided a browser extension (add-on) to open a new tab with interpretation audio streams.
At step 105, there are various language streams for consumption. In this example, the original language (floor) stream and three other language streams are available for selection by the user. Even though it is envisaged that the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
Subsequently, at step 1 10, the user is able to toggle a desired language stream between an “on” state or an “off” state. In the “on” state, that language stream will be consumable by the user. In the “off” state, that language stream will not be consumable by the user. However, the state of the language stream is dependent on a selection of the user at step 1 15.
At step 115, the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection. FIG 1 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
It should be appreciated that the method 100 enables seamless real-time switching of channels between floor (original meeting audio) and a plurality of translations.
During use of the method 100, for example, a translation streaming URL can be provided to a browser extension, which enables opening of a new tab. This new tab can include a graphical user interface that is configured to provide, for example:
- audio volume display,
- a channel/language selection list; and
- an exit selector.
Once the channel/language is selected from the graphical user interface in the tab, the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language.
For the purpose of illustration, it is assumed that the method 100 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like. Typically, the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
Referring to FIG 2, there is shown an example of a second embodiment of the present invention which is applicable to the instance of exposed media streams from video conference/live streaming platforms.
A method 200 for managing a plurality of audio language streams is provided. As a language stream of the meeting/event is exposed, there is also provided a browser extension (add-on) to open a new tab with translation audio streams, and the browser extension being configured to provide a graphical user interface.
At step 205, there are various language streams available for consumption. In this example, the original language (floor) stream and three other language streams are available for selection by the user. Even though it is envisaged that the translation for each language stream is carried out by human translators, the present invention does not preclude the use of machine translators for any of the language streams.
Subsequently, at step 210, the user is able to toggle a desired language stream between an “on” state or an “off” state. In the “on” state, that language stream will be consumable by the user. In the “off” state, that language stream will not be consumable by the user. However, the state of the language stream is dependent on a selection of the user at step 215.
At step 215, the desired language stream in the “on” state will be the language stream being consumed and not available for selection, while the other language streams in the “off” state which are not being consumed are then available for selection. FIG 2 shows the plurality of language streams being “channel 1”, “channel 2” and “channel 3”. It should be appreciated that the plurality of language streams can be indicated differently as well.
At step 220, the exposed language streams are able to enable volume control of the respective streams. This allows the user to listen to both an actual language of the
meeting/event, together with a desired translation language, each at different respective volume levels.
It should be appreciated that the method 200 enables seamless concurrent real-time consumption of channels between floor (original meeting audio) and a plurality of translations.
During use of the method 200, for example, translated language streams are transmitted (via WebRTC) separately from the meeting stream, and shown in a semitransparent movable pop-up frame window floating on top of the meeting/event video, with a graphical user interface that is configured to provide, for example:
- a language/channel selection list,
- language audio penetration functionality;
- volume control functionality for all language streams;
- refresh connector toggle; and
- an exit selector.
Once the channel/language is selected from the graphical user interface in the tab, the user then returns to the meeting/event video, and proceeds to appreciate the meeting/event in the desired language, and if desired, concurrently with an original language of the meeting/event. Consuming the meeting/event video with both the original language concurrently with a translated language provides advantages such as, for example, discerning verbal cues of a speaker/performer, auditory hearing comfort for a user, reduction of destructive interference for the multiple streams of audio leading to better understanding by the user.
It should be appreciated that the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user
engagement. This is important in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
For the purpose of illustration, it is assumed that the method 200 can be performed at least in part amongst one or more data processing devices such as, for example, a laptop, a desktop computer, a central server, or the like. Typically, the central server will be configured to carry out a majority of the processing tasks given the processing load required by the method 100.
An example of a system 300 to manage a plurality of language audio streams will now be described with reference to FIG 3.
In this example, the system 300 includes one or more user devices 320, a communications network 350, one or more content generators 380 (for example, a broadcaster, a live auctioneer, an event provider and so forth who may not be based at the same physical location), and a central server 360 (eg. a central administrator for providing a translation service for online meetings/events.). The one or more user devices 320 and the one or more content generators 380 communicate with the central server 360 via the communications network 350. The communications network 350 can be of any appropriate form, such as the Internet and/or a number of local area networks (LANs). Further details of respective components of the system 300 will be provided in a following portion of the description. It will be appreciated that the configuration shown in FIG 3 is not limiting and for the purpose of illustration only.
User Device 320
The user device 320 of any of the examples herein may be a laptop computer or a desktop computer, being configured with a capability to access the internet and/or download and operate web browsers, while being connectable to the communications
network 350. The user device 320 should be able to run the graphical user interfaces of methods 100/200 when the methods are being carried out.
An exemplary embodiment of the user device 320 is shown in FIG 4. As shown, the user device 320 includes the following components in electronic communication via a bus 41 1 :
1. a display 402;
2. non-volatile memory 403;
3. random access memory ("RAM") 404;
4. data processor(s) 401 ;
5. a transceiver component 405 that includes a transceiver(s);
6. an image capture module 410; and
7. input controls 407.
In some embodiments, software 409 is stored in the non-volatile memory 403 to enable the user device 320 to operate a web browser. Once the user device 320 is able to operate the web browser, plug-ins can then be enabled to enable the carrying out of the methods 100/200.
Although the components depicted in FIG 4 represent physical components, FIG 4 is not intended to be a hardware diagram; thus many of the components depicted in FIG 4 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 4.
Central Server 360
The central server 360 is a hardware and software suite comprised of preprogrammed logic, algorithms and other means of processing information coming in,
in order to send out information which is useful to the objective of the system 300 in which the central server 360 resides. For the sake of illustration, hardware which can be used by the central server 360 will be described briefly herein.
The central server 360 can broadly comprise a database which stores pertinent information, and processes information packets from the user devices 320. In some embodiments, the central administrator for providing a translation service for online meetings/events runs the central server 360. The central server 360 can be operated from a commercial hosted service such for example, Amazon Web Services, Alibaba Cloud and so forth. .
In one possible embodiment, the central server 360 is represented in a form as shown in FIG 5.
The central server 360 is in communication with a communications network 350, as shown in FIG 3. The central server 360 is able to communicate with the user devices 320, the content generators 380 and/or other processing devices, as required, over the communications network 350. In some instances, the user devices 320, communicate via a direct communication channel (LAN or WIFI) with the central server 360.
The components of the central server 360 can be configured in a variety of ways. The components can be implemented entirely by software to be executed on standard computer server hardware, which may comprise one hardware unit or different computer hardware units distributed over various locations, some of which may require the communications network 350 for communication.
In the example shown in FIG 5, the central server 360 is a commercially available computer system based on a 32 bit or a 64 bit Intel architecture, and the processes and/or methods executed or performed by the central server 360 are implemented in the form of programming instructions of one or more software components or modules
502 stored on non-volatile computer-readable storage 503 associated with the central server 360.
The server 360 includes at least one or more of the following standard, commercially available, computer components, all interconnected by a bus 505:
1 . random access memory (RAM) 506; and
2. at least one central processing unit (CPU) 507.
Although the components depicted in FIG 5 represent physical components, FIG 5 is not intended to be a hardware diagram; thus many of the components depicted in FIG 5 may be realized by common constructs or distributed among additional physical components. Moreover, it is contemplated that other existing and yet-to-be developed physical components and architectures may be utilized to implement the functional components described with reference to FIG 5.
Content Generator 380
It should be appreciated that each content generator 380 should be capable of providing at least one audio recording stream, so as to ensure that there is content to be translated. Typically, each content generator 380 should use a device which is configured to carry out at least the following tasks:
- record an audio stream; and
- connect to the communications network 350.
It should be appreciated that a capability of capturing a video stream can be optional for each content generator 380.
It should be appreciated that the system 300 enables the methods 100/200 to be carried out in a desired manner as described in earlier paragraphs. However, it should
also be noted that the methods 100/200 need not be carried out only using the system
300. Other systems may also be used to enable the methods 100/200.
As also mentioned for the method 200, the system 300 is able to enable the same advantages. For example, the flexibility provided by the volume control of the respective language audio streams is a long desired feature in relation to online meeting/event videos. The capability of doing so contributes substantially towards the appreciation of online meeting/event videos, and this can also enhance user engagement. This is critical in an era when online live selling, and online auctions are growing in popularity. During online live selling, and online auctions, when user engagement is enhanced, it leads to a direct beneficial consequence of improved sales and increased revenue.
Throughout this specification and claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers or steps but not the exclusion of any other integer or group of integers.
Persons skilled in the art will appreciate that numerous variations and modifications will become apparent. All such variations and modifications which become apparent to persons skilled in the art, should be considered to fall within the spirit and scope that the invention broadly appearing before described.
Claims
1. A system to manage a plurality of language audio streams, the system comprising at least one data processing device configured to: transmit, from a content generator, an original language audio stream; receive, at the user device, the original language audio stream and at least two translated language streams; activate, at the user device, a browser extension; select, at the user device, an “on” state for a first translated language stream; toggle, at the user device, an “off” state for at least a second translated language stream; and adjust, at the user device, a volume level of the original language stream and the first translated language stream.
2. The system of claim 1 , wherein the original language audio stream is obtained from a video stream.
3. The system of claim 2, wherein the at least two translated language streams are transmitted separately from the video stream.
4. The system of any of claims 1 -3, wherein the browser extension is configured to provide a graphical user interface.
5. The system of any of claims 1 to 4, wherein the at least two translated language streams are human generated.
6. The system of any of claims 1 to 4, wherein the at least two translated language streams are machine generated.
7. A data processor implemented method to manage a plurality of language audio streams, the method comprising:
transmitting, from a content generator, an original language audio stream; receiving, at the user device, the original language audio stream and at least two translated language streams; activating, at the user device, a browser extension; selecting, at the user device, an “on” state for a first translated language stream; toggling, at the user device, an “off” state for at least a second translated language stream; and adjusting, at the user device, a volume level of the original language stream and the first translated language stream.
8. The method of claim 7, wherein the original language audio stream is obtained from a video stream.
9. The method of claim 8, wherein the at least two translated language streams are transmitted separately from the video stream.
10. The method of any of claims 7-9, wherein the browser extension is configured to provide a graphical user interface.
1 1. The method of any of claims 7 to 10, wherein the at least two translated language streams are human generated.
12. The method of any of claims 7 to 10, wherein the at least two translated language streams are machine generated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2022/050321 WO2023219556A1 (en) | 2022-05-13 | 2022-05-13 | A system and method to manage a plurality of language audio streams |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2022/050321 WO2023219556A1 (en) | 2022-05-13 | 2022-05-13 | A system and method to manage a plurality of language audio streams |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023219556A1 true WO2023219556A1 (en) | 2023-11-16 |
Family
ID=88730681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2022/050321 WO2023219556A1 (en) | 2022-05-13 | 2022-05-13 | A system and method to manage a plurality of language audio streams |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023219556A1 (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010044726A1 (en) * | 2000-05-18 | 2001-11-22 | Hui Li | Method and receiver for providing audio translation data on demand |
US20170011740A1 (en) * | 2011-08-31 | 2017-01-12 | Google Inc. | Text transcript generation from a communication session |
US20180276203A1 (en) * | 2013-11-08 | 2018-09-27 | Google Llc | User interface for realtime language translation |
CN110166729A (en) * | 2019-05-30 | 2019-08-23 | 上海赛连信息科技有限公司 | Cloud video-meeting method, device, system, medium and calculating equipment |
US20190364303A1 (en) * | 2018-05-22 | 2019-11-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Live broadcast processing method, apparatus, device, and storage medium |
CN112188241A (en) * | 2020-10-09 | 2021-01-05 | 上海网达软件股份有限公司 | Method and system for real-time subtitle generation of live stream |
US20210042477A1 (en) * | 2017-06-14 | 2021-02-11 | Microsoft Technology Licensing, Llc | Customized transcribed conversations |
CN112735430A (en) * | 2020-12-28 | 2021-04-30 | 传神语联网网络科技股份有限公司 | Multilingual online simultaneous interpretation system |
-
2022
- 2022-05-13 WO PCT/SG2022/050321 patent/WO2023219556A1/en unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010044726A1 (en) * | 2000-05-18 | 2001-11-22 | Hui Li | Method and receiver for providing audio translation data on demand |
US20170011740A1 (en) * | 2011-08-31 | 2017-01-12 | Google Inc. | Text transcript generation from a communication session |
US20180276203A1 (en) * | 2013-11-08 | 2018-09-27 | Google Llc | User interface for realtime language translation |
US20210042477A1 (en) * | 2017-06-14 | 2021-02-11 | Microsoft Technology Licensing, Llc | Customized transcribed conversations |
US20190364303A1 (en) * | 2018-05-22 | 2019-11-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Live broadcast processing method, apparatus, device, and storage medium |
CN110166729A (en) * | 2019-05-30 | 2019-08-23 | 上海赛连信息科技有限公司 | Cloud video-meeting method, device, system, medium and calculating equipment |
CN112188241A (en) * | 2020-10-09 | 2021-01-05 | 上海网达软件股份有限公司 | Method and system for real-time subtitle generation of live stream |
CN112735430A (en) * | 2020-12-28 | 2021-04-30 | 传神语联网网络科技股份有限公司 | Multilingual online simultaneous interpretation system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100544259C (en) | Systems and methods for determining remote device media capabilities | |
US6760749B1 (en) | Interactive conference content distribution device and methods of use thereof | |
US7991801B2 (en) | Real-time dynamic and synchronized captioning system and method for use in the streaming of multimedia data | |
CA2884407C (en) | System and method for broadcasting interactive content | |
US20200021545A1 (en) | Method, device and storage medium for interactive message in video page | |
US8713454B2 (en) | Method and apparatus for sharing virtual workspaces | |
US8997154B2 (en) | Apparatus and method for obtaining media content | |
US8019276B2 (en) | Audio transmission method and system | |
US10187435B2 (en) | Queued sharing of content in online conferencing | |
CN1449620A (en) | Video messaging | |
US9973820B2 (en) | Method and apparatus for creating dynamic webpages in a media communication system | |
US11533347B2 (en) | Selective internal forwarding in conferences with distributed media servers | |
US11064232B2 (en) | Media broadcast system | |
US11838572B2 (en) | Streaming video trunking | |
US9172594B1 (en) | IPv6 to web architecture | |
CN113497945A (en) | Live broadcast and configuration method based on cloud mobile phone and related device and system | |
CN104618785A (en) | Audio and video playing method, device and system | |
US20140006915A1 (en) | Webpage browsing synchronization in a real time collaboration session field | |
US20200226953A1 (en) | System and method for facilitating masking in a communication session | |
WO2023219556A1 (en) | A system and method to manage a plurality of language audio streams | |
CN109948082B (en) | Live broadcast information processing method and device, electronic equipment and storage medium | |
US20110252156A1 (en) | System and Method for Providing Information to Users of a Communication Network | |
KR102198799B1 (en) | Conferencing apparatus and method for sharing content thereof | |
How et al. | The technical infrastructure for remote participation in the European Fusion Programme | |
US20110162023A1 (en) | Method and system for providing correlated advertisement for complete internet anywhere |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22941816 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |