EP3111627B1 - Perceptual continuity using change blindness in conferencing - Google Patents
Perceptual continuity using change blindness in conferencing Download PDFInfo
- Publication number
- EP3111627B1 EP3111627B1 EP15712202.9A EP15712202A EP3111627B1 EP 3111627 B1 EP3111627 B1 EP 3111627B1 EP 15712202 A EP15712202 A EP 15712202A EP 3111627 B1 EP3111627 B1 EP 3111627B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio
- event
- audio input
- stream
- input streams
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/563—User guidance or feature selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. Transmission Power Control [TPC] or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0225—Power saving arrangements in terminal devices using monitoring of external events, e.g. the presence of a signal
- H04W52/0229—Power saving arrangements in terminal devices using monitoring of external events, e.g. the presence of a signal where the received signal is a wanted signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/14—Delay circuits; Timers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/01—Input selection or mixing for amplifiers or loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Definitions
- the present invention relates to the field of audio teleconferencing, and, in particular, discloses the utilisation of change blindness mechanisms to mask changes in teleconferencing.
- Video and audio teleconferencing systems where multiple parties interact remotely to carry out a conference are an important resource.
- the distributed server resource is responsible for appropriately mixing uplinked audio signals together from each conference participant and downlink the audio signals for playback by each audio output device.
- a mixer receives a respective 'uplink stream' from each of the telephone endpoints, which carries an audio signal captured by that telephone endpoint, and sends a respective 'downlink stream' to each of the telephone endpoints.
- each telephone endpoint receives a downlink stream which is able to carry a mixture of the respective audio signals captured by the other telephone endpoints. Accordingly, when two or more participants in a telephone conference speak at the same time, the other participant(s) can hear both participants speaking.
- United States Patent No. US 6,976,055 B1 discloses a method and apparatus for conducting a transfer of a conference call.
- a media gateway receives a message to transfer a conference call from a first call resource to a second call resource. If the message indicates a change in the number of clients participating in the conference call, the media gateway simultaneously transfers the conference call and plays a prompt to the clients indicating the change. If the message does not indicate the change, the media gateway transfers the conference call in response to detecting a period of silence.
- the mixer it is known (and usually desirable) for the mixer to employ an adaptive approach whereby it changes the mixing in response to perceiving certain variations in one or more of the audio signals. For example, an audio signal may be omitted from the mixture in response to determining that it contains no speech (i.e. only background noise). But changing the mixing at the wrong time may lead to disconcerting artefacts being heard by the participants.
- Various methods, devices, apparatus and systems disclosed herein may provide an improved form of audio conferencing mixing.
- an audio conferencing mixing system of the type taking a plurality of audio input streams of input audio information of conference participants, including mixing transition events and outputting a plurality of audio output streams including output audio information, a method of mixing the audio output streams so as to reduce the detectability of the mixing transition events, as recited in claim 1.
- the mixing transition events can include changes in the audio input stream encoding which would be noticeable to a listening participant when listening in isolation.
- the masking trigger can include at least one of: the onset or cessation of speech; a predetermined change in speech characteristics, or the onset of simultaneous speech by a predetermined number of participants.
- the scheduling can comprise delaying the occurrence of the transition event until the masking trigger occurs.
- the masking trigger can comprise the utterance of predetermined text by at least one of the conference participants.
- the presence of an increase in volume and/or predetermined spectral flux in one of the audio input streams can be indicative of a masking trigger in the one of the audio input streams.
- the onset or cessation of speech can be denoted by a change in value of a voice activity flag in one of the audio input streams.
- the masking event can be determined by an auditory scene analysis of the series of audio input streams.
- the audio input streams can include at least one CTX (continuous transmission) audio input stream and at least one DTX (discrete transmission) audio input stream.
- an audio conferencing mixing system as recited in claim 7 and a computer readable medium as recited in claim 8.
- Various embodiments disclosed herein may have particular application where the system and server is able to integrate spatial and/or more continuous audio signals into the mixer and the presented scene.
- the embodiments may be of use where there is a desire for scalability and therefore lower computational complexity and/or bandwidth usage.
- the embodiments may also of value in the absence of system constraints, where the use is primarily to achieve a degree of perceptual scene complexity reduction, which must also occur by changing the presence and contribution of different participant audio signals to the mix.
- the actions and changes to presented scenes are due to incoming control signals from other factors or user control input.
- the use of the embodiments may lessen the impact of activities such as sound stream or object termination, level adjustment, changes to spatial render properties, changes to processing, or any other change that would normally result in a sudden change to a perceived property of the audio stream that would be unexpected and therefore problematic for achieving the goal of perceptual continuity.
- the preferred embodiment operates in an environment for audio teleconferencing (with or without an associated video stream).
- FIG. 1 An exemplary audio teleconferencing system is illustrated 1 in Fig. 1 .
- a series of conference participants collectively provide audio input and output.
- a first participant 2 uses a pair of headphones 5 and input microphone 3 interconnected to computer 6 for conference participation.
- the computer 6 provides uplink 8 and downlink 7 connections over a network 9, with mixer 11.
- a second group of participants e.g. 20 use an audio device 21 which provides audio output including spatialization information.
- the audio device 21 also provides internal computational and communication abilities and includes uplink 23 and downlink 24 channels which interconnect via network 25, 26 with mixer 11. Additional participants can also be interconnected to the mixer via other means.
- the arrangement of Fig. 1 includes a plurality of conference participants 2 utilising DTX endpoints, exemplified by the binaural headset 5 with boom microphone 3.
- Each of said plurality of DTX endpoints asserts 10 a DTX uplink stream 8 to the teleconferencing mixer 11, typically via a network 9.
- the mixer produces a downlink stream 7 for each DTX endpoint, which is transmitted back to the endpoint 2 over the network 9 to be heard by the participant 2.
- Each of a plurality of CTX endpoints captures the speech 27 of a further plurality of conference participants 20. Non-trivial background noise may also be captured by such devices.
- Each of the said plurality of CTX endpoints asserts a CTX uplink stream 26 to the mixer 11, typically via a network 25. Without loss of generality, network 25 may be the same network as that used by the DTX endpoints.
- the mixer 11 produces a downlink stream 23 for each CTX endpoint, which is transmitted back to the endpoint 21 over the network 25 for playback to a plurality of participants 20.
- each of the participant endpoints send an uplink audio stream to a teleconferencing mixer and receives a downlink stream therefrom.
- the uplinks and downlinks may be encoded digitally and transmitted via a suitable packet-switched network, such as a voice over internet protocol (VoIP) network, or they may travel over a circuit-switched network, such as the public switched telephone network (PSTN). Either way, it is the mixer's 11 responsibility to produce a downlink audio stream to send back to each endpoint such that, in general, each participant hears every other participant except himself.
- VoIP voice over internet protocol
- PSTN public switched telephone network
- One class of endpoint in such a system employs discontinuous transmission (DTX) on the uplink.
- DTX discontinuous transmission
- Such an endpoint attempts to maximise intelligibility while minimising the use of network resources by one of more of: employing microphone placements close to the talkers' mouths; noise suppression signal processing which remove background noise; only sending the uplink stream when human speech is present.
- This strategy can result in less aberrant noise being heard by the listener, but it can also result in a less natural-sounding experience, firstly because noise suppression signal processing typically results in the introduction of disturbing dynamic artefacts when the background noise is non-stationary, secondly because the noise suppression affects the equalisation of the speech and thirdly because the binary transmit/don't transmit decision, based on imperfect information from an associated voice activity detector (VAD), will sometimes lead to speech being cut off and at other times lead to residual noise being transmitted as speech.
- VAD voice activity detector
- a second class of endpoint employs continuous transmission (CTX) on the uplink. That is, they send an audio stream regardless of whether the VAD (if present) determines that speech is present or not.
- CTX continuous transmission
- a CTX device may employ multiple microphones to retain spatial diversity to allow binaural release from masking.
- the designer of a CTX device may also seek to limit the amount of noise suppression processing that the device performs in order to minimise the potential for disturbing dynamic artefacts and spectral colouration.
- a DTX device seeks to remove, suppress or otherwise avoid transmitting anything it deems not to constitute human speech
- a CTX device seeks to be transparent, transmitting everything using the most perceptually continuously and relevantly manner possible.
- a mixer 11 may be able to freely discard its uplink stream when speech is not detected without perceptual consequence for the listener.
- the mixer when forming a downlink mix that contains a CTX stream, the mixer must be careful in how it applies mixing transitions to the stream. For example, discarding a CTX stream when talk is not detected may be readily noticed by a listener because the background noise associated with that stream may be heard to turn off, especially if no other CTX stream is present to mask the transition. The listener may be left wondering whether the system has failed, or whether the CTX endpoint has disconnected from the conference. The goal of providing a natural listening experience would not be met in such a case.
- the goal of a teleconferencing mixer 11 is to allow each participant to hear the speech from every other participant, but not from himself.
- the total background noise power heard may increase to a point where it is distracting or detrimental to intelligibility.
- multiple uplink streams all talking at the same time The result may be too cacophonous to facilitate useful communication. It may be better to let only the two or three most perceptually relevant streams through in this case.
- European Patent Publication No. EP 1 855 455 to Enbom discloses one such methodology and International Patent Application No. PCT/US2013/061658 filed 25 September 2013 , also discloses a second way of achieving this.
- a teleconferencing mixer 11 may furthermore be able to affect other kinds of mixing transitions. For example, it may be able to lower the coding bitrate or audio fidelity of an uplink stream in a downlink mix, or (when the uplink contains spatial diversity from multiple microphones) it may be able to adjust the spatial fidelity with which an uplink stream is heard in a downlink mix.
- the mixer may furthermore be able to affect the perceived position or region in space from which a stream appears to the listener to emanate.
- HRTFs head-related transfer functions
- the mixer may make such transitions dynamically, based on the behaviour of the participants and endpoints in the conference and some types of transitions may be noticeable or disconcerting to a listener when applied carelessly.
- the preferred embodiments include a novel class of methods for handling mixing transitions at a teleconferencing server in such a way that the transition is not readily noticeable by a listener, thereby preserving perceptual continuity and naturalness in the listening experience. To do so, use is made of the phenomenon of selective attention in human auditory scene analysis.
- a method of making mixing transitions in a teleconferencing mixer is provided that would otherwise be immediately noticed, but which go unnoticed because they are synchronised to coincide with some other events which captures the listener's attention - for example, the joining of a new participant to the conference or the onset of speech from a participant who has not talked for some time.
- the preferred embodiment thereby provides a class of methods for improving the perceived continuity in a downlink audio stream, making use of the concepts of selective attention and change blindness.
- Each method of the class can be implemented in a teleconferencing mixer.
- the teleconferencing mixer may reside in one or more central servers. In other embodiments the teleconferencing mixer may reside in one or more of the endpoints.
- an uplink streams is received from each endpoint.
- the mixer produces a downlink stream for each endpoint.
- examples of mixing techniques a mixer may employ to form a downlink stream from a plurality of uplink streams include:
- a transition is any change to the downlink audio stream which would be audible and noticeable to a listening participant at a downlink endpoint if affected in isolation (that is without any masking event).
- examples of mixing transitions include: Turning on or off or fading in our out an uplink stream in a mixed downlink stream; Beginning or ceasing forwarding of an uplink stream as a component of a downlink stream; Changing the spatial fidelity or representation of an uplink stream in a downlink stream; Changing the audio quality (for example, by means of adjusting the coding bitrate) of an uplink stream as a component of a downlink stream; Changing the perceived position of a uplink stream in a downlink stream's spatial scene, when the downlink is capable of spatial or positional audio rendering; Dropping or raising the gain of a particular uplink stream in a downlink mix by a step change; Switching or crossfading from the actual uplink audio stream to a synthetically generated noise field designed to be spectrally and (where applicable) spatially similar to the noise present in the uplink stream.
- a trigger is an event derived from the state of the conference.
- examples of triggers include: 1) A VAD flag on an uplink stream signalling the onset, or cessation of speech on that uplink.
- the VAD may be implemented in the sending client with the result included in metadata in the uplink stream.
- the VAD may be implemented in the mixer and make its speech presence determination based on the encoded or decoded audio included in the uplink stream; 2) A heuristic derived from VAD information.
- a fade-out transition can be triggered on a downlink; 3) The onset or cessation of talkburst transition from an endpoint with a DTX uplink; 4) A maximum number of simultaneous talkers is exceeded.
- verbosity metric or heuristic can be used, including simple measures such as power spectrum analysis of each channel.
- simple measures such as power spectrum analysis of each channel.
- complex measure of verbosity is obtained in International Patent Application No. PCT/US2013/061658 filed 25 September 2013 .
- a key aspect of the preferred embodiment is that the mixer waits until a suitable masking event occurs before applying any transition that results from that trigger.
- a masking event is any event that may capture a listener's attention or otherwise mask a transition.
- examples of masking events include: 1) A VAD on an uplink stream signalling the onset, or cessation of speech on that uplink. The onset of speech on a particular uplink may be especially valuable as a masking event if speech hasn't been present on that uplink for some time. Therefore, onset events may be graded or gated based on length of time since last speech was detected on the uplink.
- the masking events are binary. That is, an event either is, or is not, present. A pending transition will simply be made upon assertion of a masking event.
- events can be graded according to an event magnitude, which is an estimate of how effectively the event will capture a listener's attention. This magnitude is used to control how a transition is made. For example, a large magnitude event might cause a fade transition to occur over a short period of time, while a small magnitude event might cause a fade transition to occur over a long period of time.
- a mixer that wants to attenuate an uplink in a downlink mix in a series of step gain-change transitions as the result of a trigger. In this case, the amount of attenuation applied in each transition could be a function of the corresponding event magnitude.
- examples of properties upon which an event magnitude could be based include: the volume level of speech in an uplink; the volume level at the onset of speech in a talkburst; the magnitude of an event in a Dolby Volume-style event detector; the confidence that a particular word, syllable or phrase has been detected in an uplink stream; and the time elapsed at the start of a talkburst since the end of the previous talkburst on an uplink.
- FIG. 2 illustrates schematically one form of teleconferencing mixer 11.
- a plurality of uplink streams some DTX ( 31, 32 ) , some CTX ( 33 ) are asserted to the mixer 11.
- Each of the uplink streams passes through an unpack unit 35, 36, 37.
- the unpack unit unpacks the uplink stream and extracts the VAD 38, making event information and audio information 40 from the uplink stream, and to identify masking events 39 as described below.
- the mixer produces a plurality of downlink streams 42, 43, 44. Shown in the figure is the mixing apparatus 46 associated with downlink 43. Not shown is similar apparatus which exists for each of the other downlink 42,44.
- the mixing control unit 11 for this downlink operates on the VAD and masking event signals produced by the unpack units 35, 37 associated with the other downlinks and produces a gain for each of the uplinks other than uplink 36, because downlink 43 will be heard by the same endpoint Y that generated uplink 32. These gains are used to scale 48, 49 and mix 50 the audio from the uplinks to produce a final audio stream suitable for repacking and encoding 51 back through the downlink 43.
- the masking event output e.g. 39 of the corresponding unpack unit e.g. is asserted for a short period (for example 20 ms) when the corresponding VAD signal transitions from low (no speech detected) to high (speech detected) after being low for a period exceeding threshold ⁇ T event , which for example could be set to 10 seconds.
- the behavior of the control unit 47 with respect to DTX uplinks is to set the corresponding gain to 1 whenever the associated VAD signal is high. That is, DTX endpoints are mixed into the downlink whenever they are sending speech.
- the behavior of the control unit with respect to CTX endpoints 33 is to deassert an internal trigger signal whenever the amount of time that has elapsed since the VAD flag of the corresponding uplink was high exceeds the threshold ⁇ T trigger , which for example could be set at 60 seconds.
- the trigger signal is asserted whenever the corresponding VAD flag is high.
- the control unit waits until the masking event signal corresponding to any of the other endpoints is asserted before applying a transition, which in the case of this preferred embodiment involves slewing down gain from 1 to 0 of the CTX endpoint over a an amount of time ⁇ T transition , which for example could be set to 3 seconds.
- Fig. 3 illustrates an example of a timeline 60 of operations for the embodiment described above.
- the sequence starts with CTX endpoint Z finishing a talkburst 61.
- two talkbursts 62, 63 are detected from endpoint X.
- Y now talks for a time and three talkbursts 64, 65 and 66 are detected after which X talks again 67.
- a trigger event 63 occurs because no speech has been detected from CTX endpoint Z for a period exceeding ⁇ T trigger .
- the control unit is now in a state where it will begin a transition upon the next instance of a masking event, instead of fading out immediately 71.
- a masking event 68 occurs when Y begins talking 64, but this is of no significance to the control unit, because it occurs before the trigger and because the mixer is currently servicing the downlink for the endpoint Y in question.
- X recommences talking 67 after a period of silence 70 longer than ⁇ T event
- a second event is signalled 69 upon receipt of which the control unit starts the transition, fading uplink Z out 72 in downlink Y over a number of seconds.
- Fig. 4A is a block diagram that shows examples of elements of a system for determining events from audio waveforms.
- the types and numbers of components shown in Fig. 4A are merely shown by way of example. Alternative implementations may include more, fewer and/or different components.
- the system 400 may, for example, be an instance of, or a component of, a teleconferencing mixer such as the teleconferencing mixer 11 shown in Fig. 2 and described above.
- the system 400 may be a component of a teleconferencing server, e.g., a line card.
- the functionality of the system 400 may be implemented, at least in part, by one or more telephone endpoints.
- the system 400 may be implemented, at least in part, by a control system that may include a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
- the system 400 may be implemented according to instructions (e.g., software) stored on one or more non-transitory media.
- non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
- the system 400 includes a feature extractor 401 and an event detector 402.
- the feature extractor 401 is shown receiving input waveforms 403.
- the waveforms 403 may correspond to speech and/or background noise.
- the waveforms 403 may vary according to the particular implementation. For example, if the feature extractor 401 is implemented in a teleconferencing mixer, a teleconferencing server, or a similar device, the waveforms 403 may be unpacked and decoded waveforms from an uplink stream. However, if the feature extractor 401 is implemented in a telephone endpoint, the waveforms 403 may be raw microphone signals or pre-processed microphone signals.
- the feature extractor 401 is capable of analysing input waveforms 403 and producing output corresponding to one or more types of features 404. Some examples are shown in Fig. 4B and are described below.
- the event detector 402 is capable of analysing the features 404 and producing output corresponding to one or more types of events 405.
- the events 405 may be masking events as disclosed elsewhere herein. Accordingly, in some examples the events 405 may correspond with the onset of speech, the cessation of speech, the presence of particular syllables, words or classes of speech, changes in the volume level, spectral flux, or other such heuristics, and/or criteria determined according to auditory scene analysis.
- the output of the event detector 402 may be "binary," indicating only whether an event is, or is not, present. However, in some examples, the output of the event detector 402 also may indicate an event magnitude, e.g., as described above.
- Fig. 4B shows examples of input waveforms and corresponding features and events that may be generated by a system such as that shown in Fig. 4A .
- the feature extractor 401 is capable of analysing input waveforms 403 and producing output corresponding to changes in level and changes in the pitch. Accordingly, in the example shown in Fig. 4B the features 404a correspond with changes in the level of the waveforms 403, whereas the features 404b correspond with changes in the pitch of the waveforms 403.
- the event detector 402 has detected events 405a-405d at times t 1 -t 4 , which correspond with the waveform portions 403a-403d, respectively.
- the output of the event detector 402 indicates an event magnitude, which is indicated by the length of the lines shown in Fig. 4B corresponding with the events 405a-405d.
- the event 405a has a smaller magnitude than the event 405b.
- the event detector 402 has detected the events 405a-405d at times corresponding with significant changes (e.g., changes that are at or above predetermined thresholds) in both the level and the pitch of the waveforms 403.
- Fig. 5A is a block diagram that shows examples of elements of an alternative system for determining events from audio waveforms.
- the types and numbers of components shown in Fig. 5A are merely shown by way of example. Alternative implementations may include more, fewer and/or different components.
- the system 500 may, for example, be an instance of, or a component of, a teleconferencing mixer such as the teleconferencing mixer 11 shown in Fig. 2 and described above.
- the system 500 may be a component of a teleconferencing server, e.g., a line card.
- the functionality of the system 500 may be implemented, at least in part, by one or more telephone endpoints.
- the system 500 may be implemented, at least in part, by a control system that may include a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
- the system 500 may be implemented according to instructions (e.g., software) stored on one or more non-transitory media.
- non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
- the system 500 includes a feature extractor 401 and an event detector 402.
- the feature extractor 401 is capable of functioning as a voice activity detector (VAD).
- VAD voice activity detector
- the features output by the feature extractor 401 include VAD results 504.
- the event detector 402 is capable of detecting the events 505 according to a different methodology than that described above with reference to Figs. 4A and 4B .
- Fig. 5B shows examples of features that may be extracted and events that may be detected by a system such as that shown in Fig. 5A .
- the same input waveforms 403 shown in Figure 4B are input to the feature extractor 401.
- the feature extractor 401 determines that the waveform portion 403a does not correspond to speech, but instead corresponds to background noise. Therefore, a negative VAD result is output at time t 1 .
- the feature extractor 401 outputs a VAD result 504a corresponding to the waveform portions 403b and 403c, beginning at time t 2 and extending to time t 3 .
- the feature extractor 401 outputs a VAD result 504b, beginning at time t 4 , that corresponds to the waveform portion 403d.
- the event detector 402 is capable of determining events that correspond with an onset of speech after a predetermined time interval of non-speech.
- the predetermined time interval of non-speech may vary according to the implementation.
- the predetermined time interval of non-speech may be 2 seconds, 3 seconds, 5 seconds, 10 seconds, 15 seconds, 20 seconds, 30 seconds, 60 seconds, etc.
- the predetermined time interval of non-speech may correspond with ⁇ T trigger , which is described above with reference to Fig. 2 .
- the event detector 402 detects only a single event 505. In this instance, the event detector 402 outputs binary events. According to this example, the event detector 402 does not detect an event at time t 1 because the feature extractor 401 has determined that the waveform portion 403a does not correspond to speech and therefore no VAD result was output at time t 1 . In this implementation, the event detector 402 detects an event 505 at time t 2 , corresponding with the beginning of the VAD result 504a, because this feature corresponds to an onset of speech after a predetermined time interval of non-speech. In this example, the predetermined time interval of non-speech is greater than the time interval between time t 3 and time t 4 .
- the event detector 402 does not detect an event at time t 4 , corresponding with the beginning of the VAD result 504b, because this feature corresponds to an onset of speech after a time interval of non-speech that is shorter than the predetermined time interval of non-speech.
- Figs. 6A-6C show different system topologies for implementing feature extractors and event detectors.
- the types and numbers of components shown in Figs. 6A-6C are merely shown by way of example. Alternative implementations may include more, fewer and/or different components.
- the systems 600A-600B may be implemented, at least in part, by control systems that may include one or more general purpose single- or multi-chip processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic, and/or discrete hardware components.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- the systems 600A-600B may be implemented according to instructions (e.g., software) stored on one or more non-transitory media.
- non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
- the systems 600A-600C include telephone endpoints 601A-601C and unpack units 603A-603C.
- Each of the unpack units 603A-603C may, in some implementations, have functionality similar to one of the unpack units 35-37 that are described above with reference to Fig. 2 .
- the unpack units 603A-603C may be part of a teleconferencing mixer such as those disclosed elsewhere herein.
- the telephone endpoints 601A-601C may include one or more microphones (not shown) for converting sound into input waveforms.
- the telephone endpoint 601A includes a feature extractor 401A and the unpack unit 603A includes an event detector 402A.
- the feature extractor 401A is capable of VAD functionality.
- the feature extractor 401A is capable of receiving the input waveforms 610A and outputting VAD results 504A to the multiplexer 606A.
- the audio encoder 604A is capable of encoding the input waveforms 610A and outputting encoded audio data 607A to the multiplexer 606A.
- the multiplexer 606A is capable of combining the VAD results 504A and the encoded audio data 607A.
- the telephone endpoint 601A is capable of outputting an uplink stream 605A to the network 602.
- the unpack unit 603A includes a demultiplexer 609A that is capable of receiving the uplink stream 605A and of separating the VAD results 504A from the encoded audio data 607A.
- the demultiplexer 609A is capable of outputting the VAD results 504A to the event detector 402A, which is capable of detecting and outputting the events 405A.
- the demultiplexer 609A is capable of outputting the encoded audio data 607A to the decoder 608A, which is capable of decoding the audio data 607A and outputting decoded audio data 613A.
- the system 600B includes a telephone endpoint 601B and an unpack unit 603B.
- the telephone endpoint 601B includes an audio encoder 604B that is capable of encoding the input waveforms 610B and outputting encoded audio data 607B, which is provided in the uplink stream 605B to the network 602.
- the unpack unit 603B includes a decoder 608B, which is capable of decoding the uplink stream 605B and outputting decoded audio data 613B.
- the unpack unit 603B includes a feature extractor 401B, which is capable of receiving the decoded audio data 613B and extracting the features 404.
- the feature extractor 401B is capable of outputting the features 404 to the event detector 402B, which is capable of detecting and outputting the events 405B.
- the telephone endpoint 601C includes a feature extractor 401C and an event detector 402C.
- the feature extractor 401C is capable of VAD functionality.
- the feature extractor 401C is capable of receiving the input waveforms 610C and outputting VAD results 504C to the multiplexer 606C and to the event detector 402C.
- the audio encoder 604C is capable of encoding the input waveforms 610C and outputting encoded audio data 607C to the multiplexer 606C.
- the event detector 402C is capable of detecting events 405C, based on the VAD results 504C, and of outputting the events 405C to the multiplexer 606C.
- the multiplexer 606C is capable of combining the VAD results 504C, the events 405C and the encoded audio data 607C, all of which are provided to the network 602 in the uplink stream 605C.
- the unpack unit 603C includes a demultiplexer 609C that is capable of receiving the uplink stream 605C and of separating the VAD results 504C and the events 405C from the encoded audio data 607C.
- the demultiplexer 609C is capable of outputting the encoded audio data 607C to the decoder 608C, which is capable of decoding the encoded audio data 607C and outputting decoded audio data 613C.
- the preferred embodiments provide a method and system for masking audio conference transitions by monitoring the audio environment for a suitable trigger and delaying the transitions until such time as the trigger occurs.
- any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others.
- the term comprising, when used in the claims should not be interpreted as being limitative to the means or elements or steps listed thereafter.
- the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B.
- Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
- exemplary is used in the sense of providing examples, as opposed to indicating quality. That is, an "exemplary embodiment” is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.
- an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic or optical signal, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wired line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
- the program code may execute entirely on the user's computer as a stand-alone software package, or partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
- Mobile Radio Communication Systems (AREA)
- Telephone Function (AREA)
Description
- This application claims the benefit of priority to United States Provisional Patent Application No.
61/946,030 filed 28 February 2014 - The present invention relates to the field of audio teleconferencing, and, in particular, discloses the utilisation of change blindness mechanisms to mask changes in teleconferencing.
- Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.
- Video and audio teleconferencing systems where multiple parties interact remotely to carry out a conference are an important resource.
- Many systems are known. Most rely on a central or distributed server resource to ensure each participant is able to hear and/or see the other participants using, for example, dedicated teleconferencing devices, standard computer resources with audio/input output facilities or Smart Phone type devices. The distributed server resource is responsible for appropriately mixing uplinked audio signals together from each conference participant and downlink the audio signals for playback by each audio output device.
- By way of background, in a typical (known) teleconferencing system a mixer receives a respective 'uplink stream' from each of the telephone endpoints, which carries an audio signal captured by that telephone endpoint, and sends a respective 'downlink stream' to each of the telephone endpoints. Thus each telephone endpoint receives a downlink stream which is able to carry a mixture of the respective audio signals captured by the other telephone endpoints. Accordingly, when two or more participants in a telephone conference speak at the same time, the other participant(s) can hear both participants speaking.
- United States Patent No.
US 6,976,055 B1 discloses a method and apparatus for conducting a transfer of a conference call. Therein, a media gateway receives a message to transfer a conference call from a first call resource to a second call resource. If the message indicates a change in the number of clients participating in the conference call, the media gateway simultaneously transfers the conference call and plays a prompt to the clients indicating the change. If the message does not indicate the change, the media gateway transfers the conference call in response to detecting a period of silence. - It is known (and usually desirable) for the mixer to employ an adaptive approach whereby it changes the mixing in response to perceiving certain variations in one or more of the audio signals. For example, an audio signal may be omitted from the mixture in response to determining that it contains no speech (i.e. only background noise). But changing the mixing at the wrong time may lead to disconcerting artefacts being heard by the participants.
- Various methods, devices, apparatus and systems disclosed herein may provide an improved form of audio conferencing mixing.
- In accordance with a first aspect of the present disclosure, there is provided in an audio conferencing mixing system of the type taking a plurality of audio input streams of input audio information of conference participants, including mixing transition events and outputting a plurality of audio output streams including output audio information, a method of mixing the audio output streams so as to reduce the detectability of the mixing transition events, as recited in claim 1.
- The mixing transition events can include changes in the audio input stream encoding which would be noticeable to a listening participant when listening in isolation.
- Preferably, the masking trigger can include at least one of: the onset or cessation of speech; a predetermined change in speech characteristics, or the onset of simultaneous speech by a predetermined number of participants. The scheduling can comprise delaying the occurrence of the transition event until the masking trigger occurs.
- In some embodiments, the masking trigger can comprise the utterance of predetermined text by at least one of the conference participants. In some embodiments, the presence of an increase in volume and/or predetermined spectral flux in one of the audio input streams can be indicative of a masking trigger in the one of the audio input streams. The onset or cessation of speech can be denoted by a change in value of a voice activity flag in one of the audio input streams. In some embodiments, the masking event can be determined by an auditory scene analysis of the series of audio input streams.
- The audio input streams can include at least one CTX (continuous transmission) audio input stream and at least one DTX (discrete transmission) audio input stream.
- In accordance with respective further aspects of the present disclosure, there is provided an audio conferencing mixing system as recited in claim 7 and a computer readable medium as recited in claim 8.
- Various embodiments disclosed herein may have particular application where the system and server is able to integrate spatial and/or more continuous audio signals into the mixer and the presented scene. Specifically, the embodiments may be of use where there is a desire for scalability and therefore lower computational complexity and/or bandwidth usage. The embodiments may also of value in the absence of system constraints, where the use is primarily to achieve a degree of perceptual scene complexity reduction, which must also occur by changing the presence and contribution of different participant audio signals to the mix. Furthermore, there is a case for using such a system where the actions and changes to presented scenes are due to incoming control signals from other factors or user control input.
In such cases, the use of the embodiments may lessen the impact of activities such as sound stream or object termination, level adjustment, changes to spatial render properties, changes to processing, or any other change that would normally result in a sudden change to a perceived property of the audio stream that would be unexpected and therefore problematic for achieving the goal of perceptual continuity. - Exemplary embodiments will now be described, by way of example only, with reference to the accompanying drawings in which:
-
Fig. 1 illustrates schematically one form of adaptive mixing arrangement of the preferred embodiment; -
Fig. 2 illustrates the teleconference mixer of the preferred embodiment; and -
Fig. 3 illustrates a timeline of example sequence of operations of the teleconference mixer of the preferred embodiment. -
Fig. 4A is a block diagram that shows examples of elements of a system for determining events from audio waveforms. -
Fig. 4B shows examples of input waveforms and corresponding features and events that may be generated by a system such as that shown inFig. 4A . -
Fig. 5A is a block diagram that shows examples of elements of an alternative system for determining events from audio waveforms. -
Fig. 5B shows examples of features that may be extracted and events that may be detected by a system such as that shown inFig. 5A . -
Figs. 6A-6C show different system topologies for implementing feature extractors and event detectors. - The preferred embodiment operates in an environment for audio teleconferencing (with or without an associated video stream).
- An exemplary audio teleconferencing system is illustrated 1 in
Fig. 1 . In this arrangement a series of conference participants collectively provide audio input and output. For example, in the arrangement 1, afirst participant 2 uses a pair ofheadphones 5 and input microphone 3 interconnected to computer 6 for conference participation. The computer 6 provides uplink 8 and downlink 7 connections over a network 9, withmixer 11. - A second group of participants e.g. 20 use an
audio device 21 which provides audio output including spatialization information. Theaudio device 21 also provides internal computational and communication abilities and includesuplink 23 and downlink 24 channels which interconnect vianetwork mixer 11. Additional participants can also be interconnected to the mixer via other means. - The arrangement of
Fig. 1 includes a plurality ofconference participants 2 utilising DTX endpoints, exemplified by thebinaural headset 5 with boom microphone 3. Each of said plurality of DTX endpoints asserts 10 a DTX uplink stream 8 to theteleconferencing mixer 11, typically via a network 9. The mixer produces a downlink stream 7 for each DTX endpoint, which is transmitted back to theendpoint 2 over the network 9 to be heard by theparticipant 2. - Each of a plurality of CTX endpoints, exemplified by
speakerphone device 21, captures the speech 27 of a further plurality ofconference participants 20. Non-trivial background noise may also be captured by such devices. Each of the said plurality of CTX endpoints asserts aCTX uplink stream 26 to themixer 11, typically via anetwork 25. Without loss of generality,network 25 may be the same network as that used by the DTX endpoints. Themixer 11 produces adownlink stream 23 for each CTX endpoint, which is transmitted back to theendpoint 21 over thenetwork 25 for playback to a plurality ofparticipants 20. - In the teleconferencing system, each of the participant endpoints send an uplink audio stream to a teleconferencing mixer and receives a downlink stream therefrom. In such a system, the uplinks and downlinks may be encoded digitally and transmitted via a suitable packet-switched network, such as a voice over internet protocol (VoIP) network, or they may travel over a circuit-switched network, such as the public switched telephone network (PSTN). Either way, it is the mixer's 11 responsibility to produce a downlink audio stream to send back to each endpoint such that, in general, each participant hears every other participant except himself.
- One class of endpoint in such a system employs discontinuous transmission (DTX) on the uplink. Such an endpoint attempts to maximise intelligibility while minimising the use of network resources by one of more of: employing microphone placements close to the talkers' mouths; noise suppression signal processing which remove background noise; only sending the uplink stream when human speech is present.
- This strategy can result in less aberrant noise being heard by the listener, but it can also result in a less natural-sounding experience, firstly because noise suppression signal processing typically results in the introduction of disturbing dynamic artefacts when the background noise is non-stationary, secondly because the noise suppression affects the equalisation of the speech and thirdly because the binary transmit/don't transmit decision, based on imperfect information from an associated voice activity detector (VAD), will sometimes lead to speech being cut off and at other times lead to residual noise being transmitted as speech.
- A second class of endpoint employs continuous transmission (CTX) on the uplink. That is, they send an audio stream regardless of whether the VAD (if present) determines that speech is present or not. Here the intention is often to maximise the naturalness of the listening experience and allow a remote listener to perform aspects of speech localisation or spatialization, just as if he or she were present in person. Accordingly, a CTX device may employ multiple microphones to retain spatial diversity to allow binaural release from masking. The designer of a CTX device may also seek to limit the amount of noise suppression processing that the device performs in order to minimise the potential for disturbing dynamic artefacts and spectral colouration.
- Generally, a DTX device seeks to remove, suppress or otherwise avoid transmitting anything it deems not to constitute human speech, whereas a CTX device seeks to be transparent, transmitting everything using the most perceptually continuously and relevantly manner possible.
- It is important to bear these intents in mind when designing a teleconferencing mixer. Since a DTX endpoint's uplink is substantially silent when no speech is detected, a
mixer 11 may be able to freely discard its uplink stream when speech is not detected without perceptual consequence for the listener. However, when forming a downlink mix that contains a CTX stream, the mixer must be careful in how it applies mixing transitions to the stream. For example, discarding a CTX stream when talk is not detected may be readily noticed by a listener because the background noise associated with that stream may be heard to turn off, especially if no other CTX stream is present to mask the transition. The listener may be left wondering whether the system has failed, or whether the CTX endpoint has disconnected from the conference. The goal of providing a natural listening experience would not be met in such a case. - Generally, the goal of a
teleconferencing mixer 11 is to allow each participant to hear the speech from every other participant, but not from himself. There are, however, some nuances to this goal. For example, if many CTX streams, each containing background noise, are heard simultaneously by a listener, the total background noise power heard may increase to a point where it is distracting or detrimental to intelligibility. Consider as a further example where multiple uplink streams all talking at the same time. The result may be too cacophonous to facilitate useful communication. It may be better to let only the two or three most perceptually relevant streams through in this case. Many authors, including the present authors, have proposed methods for achieving this. For example, European Patent Publication No.EP 1 855 455 to Enbom , et al. discloses one such methodology and International Patent Application No.PCT/US2013/061658 filed 25 September 2013 , also discloses a second way of achieving this. - In addition, from the ability to simply allow or mute an uplink in a downlink mix dynamically, a
teleconferencing mixer 11 may furthermore be able to affect other kinds of mixing transitions. For example, it may be able to lower the coding bitrate or audio fidelity of an uplink stream in a downlink mix, or (when the uplink contains spatial diversity from multiple microphones) it may be able to adjust the spatial fidelity with which an uplink stream is heard in a downlink mix. If the downlink is presented to the listener using a spatial audio system, such as one that renders over an array or speakers or performs virtualisation over headphones using head-related transfer functions (HRTFs) or the like, the mixer may furthermore be able to affect the perceived position or region in space from which a stream appears to the listener to emanate. - Regardless of exactly which mixing transitions are available to a mixer in a particular teleconferencing system, the mixer may make such transitions dynamically, based on the behaviour of the participants and endpoints in the conference and some types of transitions may be noticeable or disconcerting to a listener when applied carelessly. The preferred embodiments include a novel class of methods for handling mixing transitions at a teleconferencing server in such a way that the transition is not readily noticeable by a listener, thereby preserving perceptual continuity and naturalness in the listening experience. To do so, use is made of the phenomenon of selective attention in human auditory scene analysis.
- The example of the phenomenon of selective attention can perhaps be most immediately understood by analogy to the concepts of change blindness or inattentional blindness in visual perception studies. For example, inattentional blindness is well illustrated by Simons' and Chablis' famous "invisible gorilla" experiment (Most, SB; Simons, DJ; Scholl, BJ; Jimenez, R; Clifford, E; Chabris, CF (January 2001). "How not to be seen: the contribution of similarity and selective ignoring to sustained inattentional blindness". Psychol Sci 12 (1): 9-17. doi:10.1111/1467-9280.00303. PMID 11294235. (see also www.invisiblegorilla.com), in which viewers of a video of a basketball match, when told to count the number of times the ball is passed, fail to notice a person in a gorilla suit walk into the centre of the screen and wave. The would-be gorilla is highly visible and would in other circumstances be immediately noticed, but often escapes the viewer's notice completely because their attention is diverted elsewhere.
- In the preferred embodiment, a method of making mixing transitions in a teleconferencing mixer is provided that would otherwise be immediately noticed, but which go unnoticed because they are synchronised to coincide with some other events which captures the listener's attention - for example, the joining of a new participant to the conference or the onset of speech from a participant who has not talked for some time.
- The preferred embodiment thereby provides a class of methods for improving the perceived continuity in a downlink audio stream, making use of the concepts of selective attention and change blindness. Each method of the class can be implemented in a teleconferencing mixer. In some embodiments the teleconferencing mixer may reside in one or more central servers. In other embodiments the teleconferencing mixer may reside in one or more of the endpoints.
- As is known in the art of teleconferencing facilities, for each conference hosted by the
mixer 11, an uplink streams is received from each endpoint. The mixer produces a downlink stream for each endpoint. Without loss of generality, examples of mixing techniques a mixer may employ to form a downlink stream from a plurality of uplink streams include: - Mixing: Decoding, or partially decoding, uplink streams, summing together the decoded, or partially decoded, audio signals and reencoding a downlink stream.
- Transcoding: Decoding an uplink and reencoding to form a component of a downlink.
- Forwarding: Copying all or part of the encoded information in an uplink stream into a downlink stream.
- Metadata adjustment: Adding, removing or modifying metadata associated with an uplink stream so as to alter the manner in which it will be rendered to the participants listening at the downlink endpoint.
- From time to time, the mixer performs transitions when mixing a particular downlink stream. A transition is any change to the downlink audio stream which would be audible and noticeable to a listening participant at a downlink endpoint if affected in isolation (that is without any masking event). Without loss of generality, examples of mixing transitions include: Turning on or off or fading in our out an uplink stream in a mixed downlink stream; Beginning or ceasing forwarding of an uplink stream as a component of a downlink stream; Changing the spatial fidelity or representation of an uplink stream in a downlink stream; Changing the audio quality (for example, by means of adjusting the coding bitrate) of an uplink stream as a component of a downlink stream; Changing the perceived position of a uplink stream in a downlink stream's spatial scene, when the downlink is capable of spatial or positional audio rendering; Dropping or raising the gain of a particular uplink stream in a downlink mix by a step change; Switching or crossfading from the actual uplink audio stream to a synthetically generated noise field designed to be spectrally and (where applicable) spatially similar to the noise present in the uplink stream.
- The mixer performs one or more transitions in response to a trigger. A trigger is an event derived from the state of the conference. Without loss of generality, examples of triggers include: 1) A VAD flag on an uplink stream signalling the onset, or cessation of speech on that uplink. The VAD may be implemented in the sending client with the result included in metadata in the uplink stream. Alternatively, the VAD may be implemented in the mixer and make its speech presence determination based on the encoded or decoded audio included in the uplink stream; 2) A heuristic derived from VAD information. For example, if a verbosity metric is employed by the mixer and crosses below a threshold for a certain uplink endpoint, a fade-out transition can be triggered on a downlink; 3) The onset or cessation of talkburst transition from an endpoint with a DTX uplink; 4) A maximum number of simultaneous talkers is exceeded.
- Different forms of characterisation of the verbosity metric or heuristic can be used, including simple measures such as power spectrum analysis of each channel. One more complex measure of verbosity is obtained in International Patent Application No.
PCT/US2013/061658 filed 25 September 2013 . - Upon assertion of a trigger, a key aspect of the preferred embodiment is that the mixer waits until a suitable masking event occurs before applying any transition that results from that trigger. A masking event is any event that may capture a listener's attention or otherwise mask a transition. Without loss of generality, examples of masking events include: 1) A VAD on an uplink stream signalling the onset, or cessation of speech on that uplink. The onset of speech on a particular uplink may be especially valuable as a masking event if speech hasn't been present on that uplink for some time. Therefore, onset events may be graded or gated based on length of time since last speech was detected on the uplink. 2) The presence of particular syllables, words or classes of speech as determined by a speech recognition or other classification algorithm implemented on the uplink endpoint (with the result embedded in the uplink stream) or on the mixer. 3) Jumps in the volume level, spectral flux, or other such heuristics based on the audio available in the uplink stream, or based on the microphone signal from which it was derived. 4) Events signalled using existing auditory scene analysis-based techniques such as those employed in products such as Dolby Volume and as outlined in
U.S. Patent 8,396,574 andU.S. Patent 8,428,270 . - In one class of embodiments, the masking events are binary. That is, an event either is, or is not, present. A pending transition will simply be made upon assertion of a masking event. In a second class of embodiments, events can be graded according to an event magnitude, which is an estimate of how effectively the event will capture a listener's attention. This magnitude is used to control how a transition is made. For example, a large magnitude event might cause a fade transition to occur over a short period of time, while a small magnitude event might cause a fade transition to occur over a long period of time. Consider, as a further example, a mixer that wants to attenuate an uplink in a downlink mix in a series of step gain-change transitions as the result of a trigger. In this case, the amount of attenuation applied in each transition could be a function of the corresponding event magnitude.
- Without loss of generality, examples of properties upon which an event magnitude could be based include: the volume level of speech in an uplink; the volume level at the onset of speech in a talkburst; the magnitude of an event in a Dolby Volume-style event detector; the confidence that a particular word, syllable or phrase has been detected in an uplink stream; and the time elapsed at the start of a talkburst since the end of the previous talkburst on an uplink.
- Whilst the mixer manages teleconference calls on demand,
Fig. 2 illustrates schematically one form ofteleconferencing mixer 11. A plurality of uplink streams, some DTX (31, 32), some CTX (33) are asserted to themixer 11. Each of the uplink streams passes through anunpack unit VAD 38, making event information andaudio information 40 from the uplink stream, and to identify maskingevents 39 as described below. The mixer produces a plurality of downlink streams 42, 43, 44. Shown in the figure is the mixingapparatus 46 associated withdownlink 43. Not shown is similar apparatus which exists for each of theother downlink control unit 11 for this downlink operates on the VAD and masking event signals produced by theunpack units uplink 36, becausedownlink 43 will be heard by the same endpoint Y that generateduplink 32. These gains are used toscale mix 50 the audio from the uplinks to produce a final audio stream suitable for repacking and encoding 51 back through thedownlink 43. - In this preferred embodiment, the masking event output e.g. 39 of the corresponding unpack unit e.g. is asserted for a short period (for example 20 ms) when the corresponding VAD signal transitions from low (no speech detected) to high (speech detected) after being low for a period exceeding threshold ΔTevent, which for example could be set to 10 seconds.
- The behavior of the control unit 47 with respect to DTX uplinks (e.g. 31) is to set the corresponding gain to 1 whenever the associated VAD signal is high. That is, DTX endpoints are mixed into the downlink whenever they are sending speech. The behavior of the control unit with respect to
CTX endpoints 33 is to deassert an internal trigger signal whenever the amount of time that has elapsed since the VAD flag of the corresponding uplink was high exceeds the threshold ΔTtrigger, which for example could be set at 60 seconds. The trigger signal is asserted whenever the corresponding VAD flag is high. When the trigger signal is deasserted, the control unit waits until the masking event signal corresponding to any of the other endpoints is asserted before applying a transition, which in the case of this preferred embodiment involves slewing down gain from 1 to 0 of the CTX endpoint over a an amount of time ΔTtransition, which for example could be set to 3 seconds. -
Fig. 3 illustrates an example of atimeline 60 of operations for the embodiment described above. The sequence starts with CTX endpoint Z finishing atalkburst 61. After this, twotalkbursts 62, 63 are detected from endpoint X. Y now talks for a time and threetalkbursts trigger event 63 occurs because no speech has been detected from CTX endpoint Z for a period exceeding ΔTtrigger. The control unit is now in a state where it will begin a transition upon the next instance of a masking event, instead of fading out immediately 71. A maskingevent 68 occurs when Y begins talking 64, but this is of no significance to the control unit, because it occurs before the trigger and because the mixer is currently servicing the downlink for the endpoint Y in question. When X recommences talking 67 after a period of silence 70 longer than ΔTevent, a second event is signalled 69 upon receipt of which the control unit starts the transition, fading uplink Z out 72 in downlink Y over a number of seconds. -
Fig. 4A is a block diagram that shows examples of elements of a system for determining events from audio waveforms. The types and numbers of components shown inFig. 4A are merely shown by way of example. Alternative implementations may include more, fewer and/or different components. Thesystem 400 may, for example, be an instance of, or a component of, a teleconferencing mixer such as theteleconferencing mixer 11 shown inFig. 2 and described above. In some implementations, thesystem 400 may be a component of a teleconferencing server, e.g., a line card. However, as described in more detail below with reference toFigs. 6A-6C , in some implementations the functionality of thesystem 400 may be implemented, at least in part, by one or more telephone endpoints. Thesystem 400 may be implemented, at least in part, by a control system that may include a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components. In some implementations, thesystem 400 may be implemented according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. - In the example shown in
Fig. 4A , thesystem 400 includes afeature extractor 401 and anevent detector 402. Here, thefeature extractor 401 is shown receivinginput waveforms 403. In some examples, thewaveforms 403 may correspond to speech and/or background noise. Thewaveforms 403 may vary according to the particular implementation. For example, if thefeature extractor 401 is implemented in a teleconferencing mixer, a teleconferencing server, or a similar device, thewaveforms 403 may be unpacked and decoded waveforms from an uplink stream. However, if thefeature extractor 401 is implemented in a telephone endpoint, thewaveforms 403 may be raw microphone signals or pre-processed microphone signals. - In this implementation, the
feature extractor 401 is capable of analysinginput waveforms 403 and producing output corresponding to one or more types offeatures 404. Some examples are shown inFig. 4B and are described below. - In this example, the
event detector 402 is capable of analysing thefeatures 404 and producing output corresponding to one or more types ofevents 405. In some implementations, theevents 405 may be masking events as disclosed elsewhere herein. Accordingly, in some examples theevents 405 may correspond with the onset of speech, the cessation of speech, the presence of particular syllables, words or classes of speech, changes in the volume level, spectral flux, or other such heuristics, and/or criteria determined according to auditory scene analysis. In some implementations, the output of theevent detector 402 may be "binary," indicating only whether an event is, or is not, present. However, in some examples, the output of theevent detector 402 also may indicate an event magnitude, e.g., as described above. -
Fig. 4B shows examples of input waveforms and corresponding features and events that may be generated by a system such as that shown inFig. 4A . In this example, thefeature extractor 401 is capable of analysinginput waveforms 403 and producing output corresponding to changes in level and changes in the pitch. Accordingly, in the example shown inFig. 4B thefeatures 404a correspond with changes in the level of thewaveforms 403, whereas thefeatures 404b correspond with changes in the pitch of thewaveforms 403. - In this example, the
event detector 402 has detectedevents 405a-405d at times t1-t4, which correspond with thewaveform portions 403a-403d, respectively. According to this example, the output of theevent detector 402 indicates an event magnitude, which is indicated by the length of the lines shown inFig. 4B corresponding with theevents 405a-405d. For example, theevent 405a has a smaller magnitude than theevent 405b. In this example, theevent detector 402 has detected theevents 405a-405d at times corresponding with significant changes (e.g., changes that are at or above predetermined thresholds) in both the level and the pitch of thewaveforms 403. -
Fig. 5A is a block diagram that shows examples of elements of an alternative system for determining events from audio waveforms. The types and numbers of components shown inFig. 5A are merely shown by way of example. Alternative implementations may include more, fewer and/or different components. Thesystem 500 may, for example, be an instance of, or a component of, a teleconferencing mixer such as theteleconferencing mixer 11 shown inFig. 2 and described above. In some implementations, thesystem 500 may be a component of a teleconferencing server, e.g., a line card. However, as described in more detail below with reference toFigs. 6A-6C , in some implementations the functionality of thesystem 500 may be implemented, at least in part, by one or more telephone endpoints. Thesystem 500 may be implemented, at least in part, by a control system that may include a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components. In some implementations, thesystem 500 may be implemented according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. - In the example shown in
Fig. 5A , thesystem 500 includes afeature extractor 401 and anevent detector 402. In this implementation, thefeature extractor 401 is capable of functioning as a voice activity detector (VAD). Accordingly, in this example the features output by thefeature extractor 401 include VAD results 504. As described below with reference toFig. 5B , in this example theevent detector 402 is capable of detecting theevents 505 according to a different methodology than that described above with reference toFigs. 4A and 4B . -
Fig. 5B shows examples of features that may be extracted and events that may be detected by a system such as that shown inFig. 5A . In the example shown inFig. 5B , thesame input waveforms 403 shown inFigure 4B are input to thefeature extractor 401. In this implementation, thefeature extractor 401 determines that thewaveform portion 403a does not correspond to speech, but instead corresponds to background noise. Therefore, a negative VAD result is output at time t1. Here, thefeature extractor 401 outputs aVAD result 504a corresponding to thewaveform portions feature extractor 401 outputs aVAD result 504b, beginning at time t4, that corresponds to thewaveform portion 403d. - In this example, the
event detector 402 is capable of determining events that correspond with an onset of speech after a predetermined time interval of non-speech. The predetermined time interval of non-speech may vary according to the implementation. For example, in some implementations the predetermined time interval of non-speech may be 2 seconds, 3 seconds, 5 seconds, 10 seconds, 15 seconds, 20 seconds, 30 seconds, 60 seconds, etc. According to some implementations, the predetermined time interval of non-speech may correspond with ΔTtrigger, which is described above with reference toFig. 2 . - In this example, the
event detector 402 detects only asingle event 505. In this instance, theevent detector 402 outputs binary events. According to this example, theevent detector 402 does not detect an event at time t1 because thefeature extractor 401 has determined that thewaveform portion 403a does not correspond to speech and therefore no VAD result was output at time t1. In this implementation, theevent detector 402 detects anevent 505 at time t2, corresponding with the beginning of theVAD result 504a, because this feature corresponds to an onset of speech after a predetermined time interval of non-speech. In this example, the predetermined time interval of non-speech is greater than the time interval between time t3 and time t4. Therefore, theevent detector 402 does not detect an event at time t4, corresponding with the beginning of theVAD result 504b, because this feature corresponds to an onset of speech after a time interval of non-speech that is shorter than the predetermined time interval of non-speech. -
Figs. 6A-6C show different system topologies for implementing feature extractors and event detectors. The types and numbers of components shown inFigs. 6A-6C are merely shown by way of example. Alternative implementations may include more, fewer and/or different components. Thesystems 600A-600B may be implemented, at least in part, by control systems that may include one or more general purpose single- or multi-chip processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic, and/or discrete hardware components. In some implementations, thesystems 600A-600B may be implemented according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. - In the examples shown in
Figs. 6A-6C , thesystems 600A-600C includetelephone endpoints 601A-601C and unpackunits 603A-603C. Each of theunpack units 603A-603C may, in some implementations, have functionality similar to one of the unpack units 35-37 that are described above with reference toFig. 2 . Theunpack units 603A-603C may be part of a teleconferencing mixer such as those disclosed elsewhere herein. Thetelephone endpoints 601A-601C may include one or more microphones (not shown) for converting sound into input waveforms. - According to the implementation shown in
Fig. 6A , thetelephone endpoint 601A includes afeature extractor 401A and theunpack unit 603A includes anevent detector 402A. In this example, thefeature extractor 401A is capable of VAD functionality. Accordingly, thefeature extractor 401A is capable of receiving theinput waveforms 610A and outputtingVAD results 504A to themultiplexer 606A. In this implementation, theaudio encoder 604A is capable of encoding theinput waveforms 610A and outputting encodedaudio data 607A to themultiplexer 606A. Here, themultiplexer 606A is capable of combining the VAD results 504A and the encodedaudio data 607A. Thetelephone endpoint 601A is capable of outputting anuplink stream 605A to thenetwork 602. - In the example shown in
Fig. 6A , theunpack unit 603A includes ademultiplexer 609A that is capable of receiving theuplink stream 605A and of separating the VAD results 504A from the encodedaudio data 607A. In this implementation, thedemultiplexer 609A is capable of outputting the VAD results 504A to theevent detector 402A, which is capable of detecting and outputting theevents 405A. Here, thedemultiplexer 609A is capable of outputting the encodedaudio data 607A to thedecoder 608A, which is capable of decoding theaudio data 607A and outputting decodedaudio data 613A. - In the example shown in
Fig. 6B , thesystem 600B includes atelephone endpoint 601B and anunpack unit 603B. According to this implementation, thetelephone endpoint 601B includes anaudio encoder 604B that is capable of encoding theinput waveforms 610B and outputting encodedaudio data 607B, which is provided in theuplink stream 605B to thenetwork 602. - In the example shown in
Fig. 6B , theunpack unit 603B includes adecoder 608B, which is capable of decoding theuplink stream 605B and outputting decodedaudio data 613B. In this implementation, theunpack unit 603B includes afeature extractor 401B, which is capable of receiving the decodedaudio data 613B and extracting thefeatures 404. In this example, thefeature extractor 401B is capable of outputting thefeatures 404 to theevent detector 402B, which is capable of detecting and outputting theevents 405B. - According to the implementation shown in
Fig. 6C , thetelephone endpoint 601C includes afeature extractor 401C and anevent detector 402C. In this example, thefeature extractor 401C is capable of VAD functionality. Accordingly, thefeature extractor 401C is capable of receiving theinput waveforms 610C and outputtingVAD results 504C to themultiplexer 606C and to theevent detector 402C. In this implementation, theaudio encoder 604C is capable of encoding theinput waveforms 610C and outputting encodedaudio data 607C to themultiplexer 606C. In this example, theevent detector 402C is capable of detectingevents 405C, based on the VAD results 504C, and of outputting theevents 405C to themultiplexer 606C. Here, themultiplexer 606C is capable of combining the VAD results 504C, theevents 405C and the encodedaudio data 607C, all of which are provided to thenetwork 602 in theuplink stream 605C. - In the example shown in
Fig. 6C , theunpack unit 603C includes ademultiplexer 609C that is capable of receiving theuplink stream 605C and of separating the VAD results 504C and theevents 405C from the encodedaudio data 607C. In this implementation, thedemultiplexer 609C is capable of outputting the encodedaudio data 607C to thedecoder 608C, which is capable of decoding the encodedaudio data 607C and outputting decodedaudio data 613C. - It will therefore be evident that the preferred embodiments provide a method and system for masking audio conference transitions by monitoring the audio environment for a suitable trigger and delaying the transitions until such time as the trigger occurs.
- Reference throughout this specification to "one embodiment", "some embodiments" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment", "in some embodiments" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
- In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
- As used herein, the term "exemplary" is used in the sense of providing examples, as opposed to indicating quality. That is, an "exemplary embodiment" is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.
- It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, FIG., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
- Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
- Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
- Any combination of one or more computer readable mediums may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic or optical signal, or any suitable combination thereof.
- A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wired line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer as a stand-alone software package, or partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
- The scope of the intention is limited by the appended claims. The embodiments which do not fall within the scope of the claims are to interpreted as examples.
Claims (9)
- A method of mixing audio input streams (31, 32, 33) for an audio conferencing mixing system (11) of the type taking the plurality of audio input streams (31, 32, 33) of input audio information of conference participants, including mixing transition events and outputting a plurality of audio output streams including output audio information, the method including the steps of:a) determining that a transition event is to occur, wherein the transition event is a change to the mixing of the audio input streams (31, 32, 33), the change comprising turning on/off or fading in/out, in at least one of the audio output streams, an audio input stream of the plurality of audio input streams (31, 32, 33);b) determining that a masking event is to occur in one of the audio input streams (31, 32, 33), wherein the masking event is an event that may capture the listening participant's attention such that, if it occurs when the transition event occurs, the transition event is not noticeable to the listening participant; andc) scheduling the transition event to substantially occur when the masking event occurs, wherein said scheduling the transition event comprises delaying the occurrence of the transition event until the masking event occurs.
- A method as claimed in Claim 1 wherein said masking event includes the onset or cessation of speech in one of said audio input streams.
- A method as claimed in claim 2, wherein said onset or cessation of speech is denoted by a change in value of a voice activity flag in one of said audio input streams.
- A method as claimed in any preceding claim wherein the masking event comprises the utterance of predetermined text by at least one of the conference participants.
- A method as claimed in any preceding claim wherein the presence of an increase in volume and/or predetermined spectral flux in one of the audio input streams is indicative of a masking event in said one of the audio input streams.
- A method as claimed in any preceding claim wherein the masking event is determined by an auditory scene analysis of the series of audio input streams.
- A method as claimed in any preceding claim wherein the audio input streams include at least one CTX, continuous transmission, audio input stream and at least one DTX, discrete transmission, audio input stream.
- An audio conferencing mixing system (11) configured to perform the method of any one of the preceding claims.
- A computer-readable medium carrying computer-interpretable instructions which, when executed by a processor of an apparatus for use in an audio conferencing mixing system, the apparatus being configured to receive a plurality of audio input streams and to produce at least one audio output stream based on the audio input streams, causes the apparatus to carry out the method of any one of claims 1 to 7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201461946030P | 2014-02-28 | 2014-02-28 | |
PCT/US2015/016100 WO2015130509A1 (en) | 2014-02-28 | 2015-02-17 | Perceptual continuity using change blindness in conferencing |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3111627A1 EP3111627A1 (en) | 2017-01-04 |
EP3111627B1 true EP3111627B1 (en) | 2018-07-04 |
Family
ID=52737385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP15712202.9A Active EP3111627B1 (en) | 2014-02-28 | 2015-02-17 | Perceptual continuity using change blindness in conferencing |
Country Status (5)
Country | Link |
---|---|
US (1) | US9876913B2 (en) |
EP (1) | EP3111627B1 (en) |
JP (1) | JP6224850B2 (en) |
CN (1) | CN106031141B (en) |
WO (1) | WO2015130509A1 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3111626B1 (en) | 2014-02-28 | 2021-09-22 | Dolby Laboratories Licensing Corporation | Perceptually continuous mixing in a teleconference |
EP3254455B1 (en) * | 2015-02-03 | 2019-12-18 | Dolby Laboratories Licensing Corporation | Selective conference digest |
US10771631B2 (en) * | 2016-08-03 | 2020-09-08 | Dolby Laboratories Licensing Corporation | State-based endpoint conference interaction |
US10237654B1 (en) | 2017-02-09 | 2019-03-19 | Hm Electronics, Inc. | Spatial low-crosstalk headset |
US10511806B2 (en) * | 2017-09-30 | 2019-12-17 | International Business Machines Corporation | Mitigating effects of distracting sounds in an audio transmission of a conversation between participants |
CN107888771B (en) * | 2017-11-08 | 2021-06-15 | 陕西中联电科电子有限公司 | Multi-voice fusion communication method based on android platform |
WO2020023856A1 (en) | 2018-07-27 | 2020-01-30 | Dolby Laboratories Licensing Corporation | Forced gap insertion for pervasive listening |
JP7562554B2 (en) | 2019-04-03 | 2024-10-07 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Scalable Audio Scene Media Server |
Family Cites Families (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6570606B1 (en) | 1998-05-29 | 2003-05-27 | 3Com Corporation | Method and apparatus for controlling transmission of media signals over a data network in response to triggering events at participating stations |
US7006616B1 (en) | 1999-05-21 | 2006-02-28 | Terayon Communication Systems, Inc. | Teleconferencing bridge with EdgePoint mixing |
US6650745B1 (en) | 1999-06-10 | 2003-11-18 | Avaya Technologies Corp. | Method and apparatus for dynamically exchanging data among participants to a conference call |
US6850496B1 (en) | 2000-06-09 | 2005-02-01 | Cisco Technology, Inc. | Virtual conference room for voice conferencing |
US6976055B1 (en) * | 2001-01-18 | 2005-12-13 | Cisco Technology, Inc. | Apparatus and method for conducting a transfer of a conference call |
US7298834B1 (en) | 2002-11-22 | 2007-11-20 | 3Com Corporation | System and method for large capacity conference calls |
NO318401B1 (en) * | 2003-03-10 | 2005-03-14 | Tandberg Telecom As | An audio echo cancellation system and method for providing an echo muted output signal from an echo added signal |
US20050122389A1 (en) | 2003-11-26 | 2005-06-09 | Kai Miao | Multi-conference stream mixing |
US7985138B2 (en) | 2004-02-17 | 2011-07-26 | International Business Machines Corporation | SIP based VoIP multiplayer network games |
CN1859511A (en) | 2005-04-30 | 2006-11-08 | 华为技术有限公司 | Telephone conference voice mixing method |
JP4787328B2 (en) | 2005-10-31 | 2011-10-05 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | Method and apparatus for capturing audio during a conference call |
WO2007084254A2 (en) | 2005-11-29 | 2007-07-26 | Dilithium Networks Pty Ltd. | Method and apparatus of voice mixing for conferencing amongst diverse networks |
US7379450B2 (en) | 2006-03-10 | 2008-05-27 | International Business Machines Corporation | System and method for peer-to-peer multi-party voice-over-IP services |
US20070263824A1 (en) | 2006-04-18 | 2007-11-15 | Cisco Technology, Inc. | Network resource optimization in a video conference |
NO345590B1 (en) | 2006-04-27 | 2021-05-03 | Dolby Laboratories Licensing Corp | Audio amplification control using specific volume-based hearing event detection |
ATE527810T1 (en) | 2006-05-11 | 2011-10-15 | Global Ip Solutions Gips Ab | SOUND MIXING |
JP2008034979A (en) * | 2006-07-26 | 2008-02-14 | Yamaha Corp | Voice communication device and voice communication system |
CN101502089B (en) | 2006-07-28 | 2013-07-03 | 西门子企业通讯有限责任两合公司 | Method for carrying out an audio conference, audio conference device, and method for switching between encoders |
JP4582238B2 (en) | 2006-08-30 | 2010-11-17 | 日本電気株式会社 | Audio mixing method and multipoint conference server and program using the method |
JP4709734B2 (en) * | 2006-12-01 | 2011-06-22 | 日本電信電話株式会社 | Speaker selection device, speaker selection method, speaker selection program, and recording medium recording the same |
US8218460B2 (en) | 2006-12-27 | 2012-07-10 | Laura Laaksonen | Network entity, method and computer program product for mixing signals during a conference session |
US20080159507A1 (en) | 2006-12-27 | 2008-07-03 | Nokia Corporation | Distributed teleconference multichannel architecture, system, method, and computer program product |
US20080252637A1 (en) | 2007-04-14 | 2008-10-16 | Philipp Christian Berndt | Virtual reality-based teleconferencing |
RU2438197C2 (en) | 2007-07-13 | 2011-12-27 | Долби Лэборетериз Лайсенсинг Корпорейшн | Audio signal processing using auditory scene analysis and spectral skewness |
US8073125B2 (en) | 2007-09-25 | 2011-12-06 | Microsoft Corporation | Spatial audio conferencing |
AU2009221444B2 (en) * | 2008-03-04 | 2012-06-14 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Mixing of input data streams and generation of an output data stream therefrom |
US8265252B2 (en) | 2008-04-11 | 2012-09-11 | Palo Alto Research Center Incorporated | System and method for facilitating cognitive processing of simultaneous remote voice conversations |
US20090316870A1 (en) * | 2008-06-19 | 2009-12-24 | Motorola, Inc. | Devices and Methods for Performing N-Way Mute for N-Way Voice Over Internet Protocol (VOIP) Calls |
US9449614B2 (en) * | 2009-08-14 | 2016-09-20 | Skype | Controlling multi-party communications |
US8577057B2 (en) * | 2010-11-02 | 2013-11-05 | Robert Bosch Gmbh | Digital dual microphone module with intelligent cross fading |
JP5458027B2 (en) | 2011-01-11 | 2014-04-02 | 日本電信電話株式会社 | Next speaker guidance device, next speaker guidance method, and next speaker guidance program |
PL3594943T3 (en) * | 2011-04-20 | 2024-07-29 | Panasonic Holdings Corporation | Device and method for execution of huffman coding |
EP2862165B1 (en) | 2012-06-14 | 2017-03-08 | Dolby International AB | Smooth configuration switching for multichannel audio rendering based on a variable number of received channels |
EP2898506B1 (en) | 2012-09-21 | 2018-01-17 | Dolby Laboratories Licensing Corporation | Layered approach to spatial audio coding |
US9628630B2 (en) | 2012-09-27 | 2017-04-18 | Dolby Laboratories Licensing Corporation | Method for improving perceptual continuity in a spatial teleconferencing system |
CN104050969A (en) | 2013-03-14 | 2014-09-17 | 杜比实验室特许公司 | Space comfortable noise |
US20140278380A1 (en) | 2013-03-14 | 2014-09-18 | Dolby Laboratories Licensing Corporation | Spectral and Spatial Modification of Noise Captured During Teleconferencing |
-
2015
- 2015-02-17 US US15/121,859 patent/US9876913B2/en active Active
- 2015-02-17 WO PCT/US2015/016100 patent/WO2015130509A1/en active Application Filing
- 2015-02-17 JP JP2016553857A patent/JP6224850B2/en active Active
- 2015-02-17 CN CN201580010641.8A patent/CN106031141B/en active Active
- 2015-02-17 EP EP15712202.9A patent/EP3111627B1/en active Active
Non-Patent Citations (1)
Title |
---|
None * |
Also Published As
Publication number | Publication date |
---|---|
US9876913B2 (en) | 2018-01-23 |
EP3111627A1 (en) | 2017-01-04 |
JP2017510179A (en) | 2017-04-06 |
US20170078488A1 (en) | 2017-03-16 |
WO2015130509A1 (en) | 2015-09-03 |
CN106031141B (en) | 2017-12-29 |
JP6224850B2 (en) | 2017-11-01 |
CN106031141A (en) | 2016-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3111627B1 (en) | Perceptual continuity using change blindness in conferencing | |
US10009475B2 (en) | Perceptually continuous mixing in a teleconference | |
US9628630B2 (en) | Method for improving perceptual continuity in a spatial teleconferencing system | |
CN110024029B (en) | audio signal processing | |
US9491299B2 (en) | Teleconferencing using monophonic audio mixed with positional metadata | |
EP2896126A1 (en) | Long term monitoring of transmission and voice activity patterns for regulating gain control | |
US8553520B2 (en) | System and method for echo suppression in web browser-based communication | |
EP2779161A1 (en) | Spectral and spatial modification of noise captured during teleconferencing | |
EP3262851B1 (en) | Techniques for sharing stereo sound between multiple users | |
US9031836B2 (en) | Method and apparatus for automatic communications system intelligibility testing and optimization | |
KR101597768B1 (en) | Interactive multiparty communication system and method using stereophonic sound | |
EP3900315B1 (en) | Microphone control based on speech direction | |
CN119585793A (en) | Intelligent voice or conversation enhancement | |
EP4354841A1 (en) | Conference calls | |
JP4402644B2 (en) | Utterance suppression device, utterance suppression method, and utterance suppression device program | |
CN113812136A (en) | Scalable Voice Scene Media Server | |
JP2007096555A (en) | Voice conference system, terminal, talker priority level control method used therefor, and program thereof | |
Färber et al. | High-Definition Audio for Group-to-Group Communication | |
Albrecht et al. | Continuous Mobile Communication with Acoustic Co-Location Detection | |
GB2538527A (en) | Signal processing device for processing an audio waveform for playback through a speaker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20160928 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAX | Request for extension of the european patent (deleted) | ||
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20180126 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAJ | Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted |
Free format text: ORIGINAL CODE: EPIDOSDIGR1 |
|
GRAL | Information related to payment of fee for publishing/printing deleted |
Free format text: ORIGINAL CODE: EPIDOSDIGR3 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
GRAR | Information related to intention to grant a patent recorded |
Free format text: ORIGINAL CODE: EPIDOSNIGR71 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
INTC | Intention to grant announced (deleted) | ||
INTG | Intention to grant announced |
Effective date: 20180523 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1015736 Country of ref document: AT Kind code of ref document: T Effective date: 20180715 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602015013011 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20180704 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1015736 Country of ref document: AT Kind code of ref document: T Effective date: 20180704 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181004 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181004 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181005 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181104 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602015013011 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
26N | No opposition filed |
Effective date: 20190405 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190217 Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20190228 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190228 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190217 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190217 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20181104 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20150217 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20180704 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230513 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240123 Year of fee payment: 10 Ref country code: GB Payment date: 20240123 Year of fee payment: 10 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240123 Year of fee payment: 10 |