[go: up one dir, main page]

US20240038250A1 - Method and system for triggering events - Google Patents

Method and system for triggering events Download PDF

Info

Publication number
US20240038250A1
US20240038250A1 US18/144,589 US202318144589A US2024038250A1 US 20240038250 A1 US20240038250 A1 US 20240038250A1 US 202318144589 A US202318144589 A US 202318144589A US 2024038250 A1 US2024038250 A1 US 2024038250A1
Authority
US
United States
Prior art keywords
fingerprint
audio signal
peaks
peak
trigger point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/144,589
Inventor
James Andrew NESFIELD
Daniel Jones
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sonos Experience Ltd
Original Assignee
Sonos Experience Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonos Experience Ltd filed Critical Sonos Experience Ltd
Priority to US18/144,589 priority Critical patent/US20240038250A1/en
Publication of US20240038250A1 publication Critical patent/US20240038250A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Definitions

  • the present invention is in the field of signal processing. More particularly, but not exclusively, the present invention relates to processing signals to trigger events.
  • Signals such as audio signals, can be processed to analyse various qualities of the signal.
  • Shazam technology analyses the audio signal to form a fingerprint of the audio signal. This fingerprint is then compared to a database of audio fingerprints to identify which music track the audio signal originates from.
  • the Shazam technology is optimised for hardware with sufficient compute to calculate fingerprints of the streaming audio signal and optimised for identifying one music track out of millions.
  • a method of triggering an event including:
  • FIG. 1 shows a block diagram illustrating a system in accordance with an embodiment of the invention
  • FIG. 2 shows a flow diagram illustrating a method in accordance with an embodiment of the invention
  • FIG. 3 shows a flow diagram illustrating a method in accordance with an embodiment of the invention
  • FIG. 4 shows a block diagram illustrating a system in accordance with an embodiment of the invention
  • FIG. 5 shows a diagram illustrating a function for determining peak elevation using neighbour reference points in accordance with an embodiment of the invention
  • FIG. 6 shows a diagram illustrating a ID slice of a magnitude spectrum buffer in accordance with an embodiment of the invention
  • FIG. 7 shows a diagram illustrating peaks located using a system in accordance with an embodiment of the invention:
  • FIG. 8 shows a diagram illustrating peaks located using a system in accordance with another embodiment of the invention.
  • FIG. 9 shows a diagram illustrating examination of a magnitude spectral history buffer using a fingerprint in accordance with an embodiment of the invention.
  • the present invention provides a method and system for triggering events.
  • fingerprints can be used to analyse a streamed signal, such as an audio signal, directly. This can enable the triggering of events at specific time locations, or trigger points, within the streamed signal.
  • FIG. 1 a system 100 in accordance with an embodiment of the invention is shown.
  • the one or more processors 101 may be configured receive a signal stream from a signal receiver 102 , to detect a trigger point within the signal stream using a fingerprint associated with the trigger point, and to trigger an event associated with the trigger point. Triggering the event may result in generation of one or more event instructions.
  • the event instructions may control one or more apparatus via one or more controllers 103 .
  • the one or more processors 101 may detect the trigger point from a set of trigger points.
  • the one or more processors 101 may detect the trigger point may comparing one or more of the set of fingerprints associated with the trigger points to the signal stream.
  • the signal stream may be processed for the comparison.
  • each frame of the signal stream is processed via a Fast Fourier Transform (FFT) and at least some of the frames are compared with at least one fingerprint associated with a trigger point of the set of trigger points.
  • FFT Fast Fourier Transform
  • the system 100 may include a signal receiver 102 configured to receive an analogue signal, such as an audio signal, from a signal transmitter 104 and to provide the signal as a signal stream to the one or more processors 101 .
  • analogue signal such as an audio signal
  • the signal receiver 102 may be an input device such as a microphone, a camera, a radio frequency receiver or any analogue sensor.
  • the system 100 may include one or more controllers 103 .
  • the controllers 103 may be configured for receiving events instructions from the one or more processors 101 to control one or more apparatus.
  • the one or more apparatus may include mechanical apparatus, transmitters, audio output, displays, or data storage.
  • the system 100 may include a signal transmitter 104 .
  • the signal transmitter 104 may correspond to the signal receiver 102 (e.g. a speaker for a microphone), and may be an audio speaker, a light transmission apparatus, a radio frequency transmitter, or any analogue transmitter.
  • the audio signals may be in the audible range, inaudible range, or include components in both the audible and inaudible range.
  • a signal stream is received.
  • the signal stream may be received via a signal receiver (e.g. receiver 102 ).
  • the signal may be audio.
  • the audio signal may be correspond to an audio track.
  • the signal receiver may be a microphone.
  • the signal stream may be received in real-time via the signal receiver.
  • the audio signal may be formed of audible, inaudible or a audible/inaudible components.
  • a trigger point is detected within the signal stream using a fingerprint associated with the trigger point.
  • the trigger point may be one of a set of trigger points.
  • a fingerprint for each of plurality of trigger points may be compared against the signal stream to detect a trigger point.
  • a detected trigger point may be the fingerprint which matches the signal stream beyond a predefined or dynamic threshold, the closest match within the set of trigger points, or the closest match within a subset of the set of trigger points.
  • the trigger points and associated fingerprints may be created as described in relation to FIG. 3 .
  • the trigger points and associated fingerprints are predefined, and not generated from an existing signal (such as an audio track).
  • the signal e.g. the audio signal
  • the signal may be generated for transmission based upon the fingerprint information.
  • a broadcast device may synthesis new audio with notes at the corresponding time/frequency offsets of the fingerprint.
  • the fingerprint may be formed of a set of peaks within 2D coordinate space of a magnitude spectrum.
  • Spectral magnitudes may be calculated from the signal stream to form a rolling buffer (with a size T forming a spectral frame within the buffer).
  • a trigger point may be detected within the signal by iterating over at least some of the peaks within the set of peaks for the fingerprint and examining the corresponding coordinate in the spectral frame in the buffer.
  • a confidence level may be calculated for each peak examination by measuring properties such as the ration between the peak's intensity and the mean intensity of its neighbouring bins.
  • An overall confidence interval may be calculated for the set of peaks by taking, for example, the mean of the individual peak confidences.
  • fingerprints for multiple trigger points may be used to examine the spectral frame.
  • the fingerprint with the highest confidence interval may be identified.
  • the trigger point associated with fingerprint is detected within the audio stream.
  • step 203 an event associated with the trigger point is triggered when the trigger point is detected.
  • the event may result in a controller (e.g. controller 103 ) actuating, for example, play-back of audio corresponding to the trigger point, generation of mechanical movement in time with the audio signal, manifestation of electronic game-play coordinated to audio signal, display of related material synchronised in time to the signal (such as subtitles), generation of any time synchronisation action, or any other action.
  • a controller e.g. controller 103
  • actuating for example, play-back of audio corresponding to the trigger point, generation of mechanical movement in time with the audio signal, manifestation of electronic game-play coordinated to audio signal, display of related material synchronised in time to the signal (such as subtitles), generation of any time synchronisation action, or any other action.
  • Steps 201 , 202 and 203 may be performed by one or more processors (e.g. processor(s) 101 ).
  • FIG. 3 a method 300 for creating a trigger points for an audio track in accordance with an embodiment of the invention will be described.
  • one or more trigger points are defined for the audio track at time locations within the audio track.
  • Each trigger point may be associated with a timing offset from the start of the audio track.
  • step 302 an associated fingerprint is generated at the time location for each trigger point.
  • the associated fingerprint may be generated from peaks identified within a FFT of the audio track at the time location.
  • the peaks may be local magnitude maxima of the 2D coordinate space created by each FFT block and FFT bin.
  • step 303 an event is associated with each trigger point.
  • the one or more trigger points with each associated fingerprint and event may then be used by the method described in relation to FIG. 2 .
  • This embodiment of the invention performs real-time recognition of pre-determined audio segments, triggering an event when a segment is recognised in the incoming audio stream.
  • the objective is to be able to respond in real-time, with the following key properties:
  • a spectral peak Given a 2D segment of an acoustic spectrum over some period of time, a spectral peak is a 2D point in space (where the X-axis is the time, delineated in FFT frames, and the Y-axis is frequency, delineated in FFT bins) which has some degree of “elevation” over the surrounding peaks.
  • the elevation of a peak is the difference between its decibel magnitude and the mean magnitude of some selection of peaks around it.
  • a peak with a greater elevation is substantially louder than the spectrogram cells around it, meaning it is perceptually prominent.
  • a short, sharp burst of a narrow frequency band would result in a peak of high elevation.
  • Peaks may be used to characterise audio segments because they have fixed linear relationships in the time and frequency domains, and because they may be robust to background noise. When a given audio recording is played back over a reasonable-quality speaker, the resultant output will typically demonstrate peaks of roughly similar elevation. Even when background noise is present, peaks at the original locations will still mostly be evident.
  • This recognition system may require that peaks be identically distributed in the original fingerprint and the audio stream to recognise effectively. Therefore, it assumes that the audio will not be altered before playback. For example, if it played at a lower speed or pitch, the recognition may fail.
  • this system should remain robust to processes such as distortion, filtering effects from acoustic transducers, and degradation from compression codecs, all of which do not heavily affect the linear relationships between peaks in time and frequency.
  • ‘NEIGHBOURHOOD_WIDTH’ and ‘NEIGHBOURHOOD_HEIGHT’ are used define the minimum spacing between peaks. ‘NEIGHBOURHOOD_WIDTH’ is measured in FFT frames, and ‘NEIGHBOURHOOD_HEIGHT’ in FFT bins.
  • the algorithm may be optimised for precision, to ensure that the best-possible peaks are selected.
  • the algorithm may be optimised for speed and efficiency.
  • FIG. 4 An overview flow chart of the fingerprinter and recogniser (“scanner”) is shown in FIG. 4 .
  • the key characteristic of a peak is its elevation. In a time-unlimited system, this would typically be determined by taking the mean magnitude of all the cells in the peak's neighbourhood, and calculating the ratio of the peak's magnitude vs the mean surrounding magnitude.
  • the audio fingerprinting phase is designed to identify and prioritise the peaks that uniquely and reliably characterise each segment of audio. It is typically performed offline prior to any recognition process, meaning it may not need to be designed for efficiency.
  • the fingerprinting phase proceeds as follows:
  • the audio recogniser takes a set of fingerprints and a stream of audio (usually real-time), and attempts to recognise fingerprints within the stream.
  • the first peaks are the most prominent and thus most likely to be successfully recognised in real-world playback.
  • an algorithm may be used that first inspects a small proportion (e.g. 20%) of a fingerprint's peaks. The mean confidence of these peaks is calculated, and compared to a threshold value that is lower than ‘CANDIDATE_CONFIDENCE_THRESHOLD’ (e.g, ‘0.8’ of the overall candidate threshold). If the mean confidence of this initial sample falls below the minimum threshold, it is unlikely that the fingerprint as a whole will be a match, and the rest of the peaks are not inspected.
  • a threshold value e.g. 20%
  • the mean confidence is above the threshold, the remainder of the peaks are inspected as normal, and the fingerprint as a whole either accepted or rejected.
  • a feature may be used to temporarily disable a fingerprint if its confidence is below some minimal threshold, removing it from the match pool for the next ‘CHIRP_FINGERPRINT_DISABLED_DURATION’ frames.
  • a potential advantage of some embodiments of the present invention is that trigger points within streamed signals can be used to trigger events. This may provide various functionality such as synchronisation of events with the streamed signal. Furthermore, by detecting the trigger point using the fingerprint rather than calculating fingerprints from the streamed signal, computation is reduced enable deployment on lower cost hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The present invention relates to a method of triggering an event. The method includes receiving a signal stream, detecting a trigger point within the signal stream using a fingerprint associated with the trigger point and triggering an event associated with the detected trigger point.

Description

    FIELD OF INVENTION
  • The present invention is in the field of signal processing. More particularly, but not exclusively, the present invention relates to processing signals to trigger events.
  • BACKGROUND
  • Signals, such as audio signals, can be processed to analyse various qualities of the signal.
  • For example, for streaming audio signals, Shazam technology analyses the audio signal to form a fingerprint of the audio signal. This fingerprint is then compared to a database of audio fingerprints to identify which music track the audio signal originates from.
  • The Shazam technology is optimised for hardware with sufficient compute to calculate fingerprints of the streaming audio signal and optimised for identifying one music track out of millions.
  • It would be desirable if a system could be developed which could be used on lower cost hardware to optimally analyse streaming signals to trigger events.
  • It is an object of the present invention to provide a method and system for triggering events which overcomes the disadvantages of the prior art, or at least provides a useful alternative.
  • SUMMARY OF INVENTION
  • According to a first aspect of the invention there is provided a method of triggering an event, including:
      • a) receiving a signal stream;
      • b) detecting a trigger point within the signal stream using a fingerprint associated with the trigger point; and
      • c) triggering an event associated with the detected trigger point;
  • Other aspects of the invention are described within the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
  • FIG. 1 : shows a block diagram illustrating a system in accordance with an embodiment of the invention;
  • FIG. 2 : shows a flow diagram illustrating a method in accordance with an embodiment of the invention;
  • FIG. 3 : shows a flow diagram illustrating a method in accordance with an embodiment of the invention;
  • FIG. 4 : shows a block diagram illustrating a system in accordance with an embodiment of the invention;
  • FIG. 5 : shows a diagram illustrating a function for determining peak elevation using neighbour reference points in accordance with an embodiment of the invention;
  • FIG. 6 : shows a diagram illustrating a ID slice of a magnitude spectrum buffer in accordance with an embodiment of the invention;
  • FIG. 7 : shows a diagram illustrating peaks located using a system in accordance with an embodiment of the invention:
  • FIG. 8 : shows a diagram illustrating peaks located using a system in accordance with another embodiment of the invention; and
  • FIG. 9 : shows a diagram illustrating examination of a magnitude spectral history buffer using a fingerprint in accordance with an embodiment of the invention.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • The present invention provides a method and system for triggering events.
  • The inventors have discovered that fingerprints can be used to analyse a streamed signal, such as an audio signal, directly. This can enable the triggering of events at specific time locations, or trigger points, within the streamed signal.
  • In FIG. 1 , a system 100 in accordance with an embodiment of the invention is shown.
  • One or more processors 101 are shown. The one or more processors 101 may be configured receive a signal stream from a signal receiver 102, to detect a trigger point within the signal stream using a fingerprint associated with the trigger point, and to trigger an event associated with the trigger point. Triggering the event may result in generation of one or more event instructions. The event instructions may control one or more apparatus via one or more controllers 103.
  • The one or more processors 101 may detect the trigger point from a set of trigger points. The one or more processors 101 may detect the trigger point may comparing one or more of the set of fingerprints associated with the trigger points to the signal stream. The signal stream may be processed for the comparison. In one embodiment, each frame of the signal stream is processed via a Fast Fourier Transform (FFT) and at least some of the frames are compared with at least one fingerprint associated with a trigger point of the set of trigger points.
  • The system 100 may include a signal receiver 102 configured to receive an analogue signal, such as an audio signal, from a signal transmitter 104 and to provide the signal as a signal stream to the one or more processors 101.
  • The signal receiver 102 may be an input device such as a microphone, a camera, a radio frequency receiver or any analogue sensor.
  • The system 100 may include one or more controllers 103. The controllers 103 may be configured for receiving events instructions from the one or more processors 101 to control one or more apparatus. For example, the one or more apparatus may include mechanical apparatus, transmitters, audio output, displays, or data storage.
  • The system 100 may include a signal transmitter 104. The signal transmitter 104 may correspond to the signal receiver 102 (e.g. a speaker for a microphone), and may be an audio speaker, a light transmission apparatus, a radio frequency transmitter, or any analogue transmitter.
  • In one embodiment, the audio signals may be in the audible range, inaudible range, or include components in both the audible and inaudible range.
  • Referring to FIG. 2 , a method 200 for triggering events in accordance with an embodiment of the invention will be described.
  • In step 201, a signal stream is received. The signal stream may be received via a signal receiver (e.g. receiver 102). The signal may be audio. The audio signal may be correspond to an audio track. The signal receiver may be a microphone.
  • The signal stream may be received in real-time via the signal receiver.
  • The audio signal may be formed of audible, inaudible or a audible/inaudible components.
  • In step 202, a trigger point is detected within the signal stream using a fingerprint associated with the trigger point. The trigger point may be one of a set of trigger points. A fingerprint for each of plurality of trigger points may be compared against the signal stream to detect a trigger point. A detected trigger point may be the fingerprint which matches the signal stream beyond a predefined or dynamic threshold, the closest match within the set of trigger points, or the closest match within a subset of the set of trigger points.
  • In one embodiment, the trigger points and associated fingerprints may be created as described in relation to FIG. 3 .
  • In one embodiment, the trigger points and associated fingerprints are predefined, and not generated from an existing signal (such as an audio track). The signal (e.g. the audio signal) may be generated for transmission based upon the fingerprint information. For example, a broadcast device may synthesis new audio with notes at the corresponding time/frequency offsets of the fingerprint.
  • The fingerprint may be formed of a set of peaks within 2D coordinate space of a magnitude spectrum.
  • Spectral magnitudes may be calculated from the signal stream to form a rolling buffer (with a size T forming a spectral frame within the buffer).
  • A trigger point may be detected within the signal by iterating over at least some of the peaks within the set of peaks for the fingerprint and examining the corresponding coordinate in the spectral frame in the buffer. A confidence level may be calculated for each peak examination by measuring properties such as the ration between the peak's intensity and the mean intensity of its neighbouring bins. An overall confidence interval may be calculated for the set of peaks by taking, for example, the mean of the individual peak confidences.
  • In on embodiment, fingerprints for multiple trigger points may be used to examine the spectral frame. In this case, the fingerprint with the highest confidence interval may be identified.
  • Where the confidence interval for an entire fingerprint exceeds a threshold (and is the highest confidence where multiple fingerprints are used), the trigger point associated with fingerprint is detected within the audio stream.
  • In step 203, an event associated with the trigger point is triggered when the trigger point is detected.
  • The event may result in a controller (e.g. controller 103) actuating, for example, play-back of audio corresponding to the trigger point, generation of mechanical movement in time with the audio signal, manifestation of electronic game-play coordinated to audio signal, display of related material synchronised in time to the signal (such as subtitles), generation of any time synchronisation action, or any other action.
  • Steps 201, 202 and 203 may be performed by one or more processors (e.g. processor(s) 101).
  • Referring to FIG. 3 , a method 300 for creating a trigger points for an audio track in accordance with an embodiment of the invention will be described.
  • In step 301, one or more trigger points are defined for the audio track at time locations within the audio track. Each trigger point may be associated with a timing offset from the start of the audio track.
  • In step 302, an associated fingerprint is generated at the time location for each trigger point.
  • The associated fingerprint may be generated from peaks identified within a FFT of the audio track at the time location. The peaks may be local magnitude maxima of the 2D coordinate space created by each FFT block and FFT bin.
  • In step 303, an event is associated with each trigger point. The one or more trigger points with each associated fingerprint and event may then be used by the method described in relation to FIG. 2 .
  • Referring to FIGS. 4 to 9 , a method of creating and detecting trigger points will be described in accordance with an embodiment of the invention.
  • This embodiment of the invention performs real-time recognition of pre-determined audio segments, triggering an event when a segment is recognised in the incoming audio stream. The objective is to be able to respond in real-time, with the following key properties:
      • minimal latency (less than 50 ms)
      • high reliability (99% recognition rate in typical acoustic environments over a distance of 1 m)
      • low false-positive rate (events should rarely be triggered at the wrong moment, if ever)
  • To perform audio triggering, two phases are involved.
      • 1. Audio fingerprinting (offline, non real-time): An input media file, plus an index of ‘FINGERPRINT_COUNT’ unique trigger timestamps (all within the duration of the media file), are used to generate ‘FINGERPRINT_COUNT’ audio “fingerprints”. Each fingerprint characterises the ‘FINGERPRINT_WIDTH’ frames of audio leading up to its corresponding timestamp in the trigger index, where ‘FINGERPRINT_WIDTH’ is the fixed duration of a fingerprint.
      • 2. **Audio recognition** (online, real-time): The set of fingerprints produced by Phase 1 are fed into a separate audio recognition system. This system listens to a live audio stream and attempts to recognise fingerprints from its database within the stream. When a fingerprint is recognised, the corresponding trigger is generated.
  • Spectral Peaks
  • Both phases utilise the concept of a “spectral peak”. Given a 2D segment of an acoustic spectrum over some period of time, a spectral peak is a 2D point in space (where the X-axis is the time, delineated in FFT frames, and the Y-axis is frequency, delineated in FFT bins) which has some degree of “elevation” over the surrounding peaks.
  • The elevation of a peak is the difference between its decibel magnitude and the mean magnitude of some selection of peaks around it. A peak with a greater elevation is substantially louder than the spectrogram cells around it, meaning it is perceptually prominent. A short, sharp burst of a narrow frequency band (for example, striking a glockenspiel) would result in a peak of high elevation.
  • Peaks may be used to characterise audio segments because they have fixed linear relationships in the time and frequency domains, and because they may be robust to background noise. When a given audio recording is played back over a reasonable-quality speaker, the resultant output will typically demonstrate peaks of roughly similar elevation. Even when background noise is present, peaks at the original locations will still mostly be evident.
  • This recognition system may require that peaks be identically distributed in the original fingerprint and the audio stream to recognise effectively. Therefore, it assumes that the audio will not be altered before playback. For example, if it played at a lower speed or pitch, the recognition may fail. However, this system should remain robust to processes such as distortion, filtering effects from acoustic transducers, and degradation from compression codecs, all of which do not heavily affect the linear relationships between peaks in time and frequency.
  • In this algorithm, ‘NEIGHBOURHOOD_WIDTH’ and ‘NEIGHBOURHOOD_HEIGHT’ are used define the minimum spacing between peaks. ‘NEIGHBOURHOOD_WIDTH’ is measured in FFT frames, and ‘NEIGHBOURHOOD_HEIGHT’ in FFT bins.
  • Different algorithms used to determine and match peaks for the fingerprinting and recognition phases. In the fingerprinting phase, the algorithm may be optimised for precision, to ensure that the best-possible peaks are selected. In the recognition phase, the algorithm may be optimised for speed and efficiency.
  • An overview flow chart of the fingerprinter and recogniser (“scanner”) is shown in FIG. 4 .
  • Peak Elevation Function
  • The key characteristic of a peak is its elevation. In a time-unlimited system, this would typically be determined by taking the mean magnitude of all the cells in the peak's neighbourhood, and calculating the ratio of the peak's magnitude vs the mean surrounding magnitude.
  • However, this may require a lot of calculations (up to ‘(NEIGHBOURHOOD_WIDTH*2+1)×(NEIGHBOURHOOD_HEIGHT*2+1)−1’: 120 calculations for width and height of 5). In one embodiment, to enhance efficiency, more economical neighbourhood functions to determine whether a peak's elevation can be determined with fewer reference points have been discovered to be effective.
  • One such current elevation function makes use of 7 reference points around the peak: in the 4 corners of its Moore neighbourhood, and the top, bottom and left edges. The right edge is omitted as this may be susceptible to be artificially amplified by note tails and reverberation from the peak's acoustic energy. This is shown in FIG. 5 .
  • Phase 1: Audio Fingerprinting
  • The audio fingerprinting phase is designed to identify and prioritise the peaks that uniquely and reliably characterise each segment of audio. It is typically performed offline prior to any recognition process, meaning it may not need to be designed for efficiency.
  • The fingerprinting phase proceeds as follows:
      • The fingerprinter is given a media file (for example, a mono uncompressed audio file) and an index of times at which trigger events are to occur.
      • The fingerprinter opens the media file and proceeds to read the contents in FFT-sized buffers (typically 256 frames). Each buffer, an FFT is performed and the magnitude spectrum derived. This magnitude spectrum is added to a rolling buffer.
      • Each frame, the last ‘NEIGHBOURHOOD_WIDTH*2+1’ frames of the rolling buffer are inspected to find peaks. This is done by iterating over each cell in the 1D line (shown in FIG. 6 ) running down the centre of the buffer, and checking if it is a local maxima in the Moore neighbourhood of size ‘NEIGHBOURHOOD_WIDTH×NEIGHBOURHOOD_HEIGHT’. If it is the maxima, the elevation function is applied to determine the peak-candidate's elevation versus nearby cells (see “Peak Elevation Function” above). If the peak-candidate's elevation exceeds a threshold in decibels (‘FINGERPRINT_PEAK_THRESHOLD’, typically around 12 dB), it is classified as a peak and added to another rolling buffer of width ‘FINGERPRINT_WIDTH’ as shown in FIG. 7 .
      • Note that the 1D-slice part of the above step may be performed for efficiency. An alternative approach would be to store a rolling spectrogram buffer of width ‘FINGERPRINT_WIDTH’, and do a brute-force search of the buffer for peaks as shown in FIG. 8 .
      • When the fingerprinter reaches a frame whose timestamp corresponds to a trigger timestamp, it collects the peaks for the previous ‘FINGERPRINT_WIDTH’ spectral frames. These peaks are ordered by their elevation (descending), and any peaks beyond the first ‘MAX_PEAK_COUNT’ peaks are rejected. This ensures that each fingerprint has a maximum number of peaks. The prioritisation is imposed because peaks with a higher elevation are proportionately more likely survive acoustic playback and be detected by the decoder. This allows “weaker” peaks to be omitted, improving efficiency later.
      • When the entire set of fingerprints have been collected, they are serialised to disk as a “fingerprint set”. A fingerprint set is characterised by:
        • the number of fingerprints it contains;
        • for each fingerprint, its width, its ID number, the number of peaks it contains, and each peak's X/Y coordinate, with each peak ordered by magnitude (descending).
  • Phase 2: Audio Recognition
  • The audio recogniser takes a set of fingerprints and a stream of audio (usually real-time), and attempts to recognise fingerprints within the stream.
  • It functions as follows:
      • Audio is read in FFT block-sized chunks. An FFT is performed to obtain the magnitude spectrum, which is appended to a rolling buffer of width ‘FINGERPRINT_WIDTH’. This acts as an acoustic spectrum history, containing the most recent frames of precisely the same width as a fingerprint.
      • Each frame, the scanner iterates over every available fingerprint. Each peak's (x, y) coordinates are examined within the spectral history buffer, and its elevation calculated according to the same elevation function as the fingerprinter (e.g., for the EDGES elevation function, comparing its value to the mean of 7 points around it). If the decibel difference between the peak's magnitude and the surrounding background magnitude exceeds a fixed ‘CANDIDATE_PEAK_THRESHOLD’, the peak is classified as a match. The ‘CANDIDATE_PEAK_THRESHOLD’ is measured in decibels, similar to the ‘FINGERPRINT_PEAK_THRESHOLD’ used to select peaks for the fingerprint. However, the ‘CANDIDATE_PEAK_THRESHOLD’ is substantially lower, as background noise and filtering may reduce a peak's prominence in real-world playback. FIG. 9 shows examining of a fingerprint formed from the peaks of FIG. 7 against the spectral history buffer. Peak matches are shown at 901, 902, 903, and 904, and a peak non-match shown at 905.
      • After all the peaks have been classified, a confidence level is determined for the fingerprint as a whole by calculating ‘PEAKS_MATCHED/TOTAL_NUM_PEAKS’ (i.e., the proportion of peaks matched). If this exceeds a fixed ‘CANDIDATE_CONFIDENCE_THRESHOLD’ (usually around 0.7), the fingerprint is classified as matching.
      • Matching then continues in case another fingerprint is matched at a higher confidence.
      • ‘WAIT_FOR_PEAK’ operation: The fingerprint with the highest confidence is selected as a match. Matches can sometimes occur one or two frames earlier than expected (with confidence levels rising quickly to a peak and then dropping again). To ensure a match isn't triggered too soon, the match is recorded. The following frame, if the confidence level drops, the match is triggered. If the confidence level rises, the match is recorded again to be checked the following frame. Matches are thus always triggered with one frame of delay, _after_the peak value.
      • Once a fingerprint match has been triggered, a trigger can not then occur again for a fixed number of frames to ensure accidental re-triggers. Optionally, the specific trigger ID can be disabled automatically for some period.
  • Audio Recognition: Additional Optimisations
  • Additional optimisations may be used to further reduce the CPU footprint of audio recognition, with minimal impact on recognition success rate:
  • Early Rejection
  • As a given fingerprint's peaks are ordered by magnitude, descending, the first peaks are the most prominent and thus most likely to be successfully recognised in real-world playback.
  • For efficiency, an algorithm may be used that first inspects a small proportion (e.g. 20%) of a fingerprint's peaks. The mean confidence of these peaks is calculated, and compared to a threshold value that is lower than ‘CANDIDATE_CONFIDENCE_THRESHOLD’ (e.g, ‘0.8’ of the overall candidate threshold). If the mean confidence of this initial sample falls below the minimum threshold, it is unlikely that the fingerprint as a whole will be a match, and the rest of the peaks are not inspected.
  • If the mean confidence is above the threshold, the remainder of the peaks are inspected as normal, and the fingerprint as a whole either accepted or rejected.
  • The values of ‘CHIRP_FINGERPRINT_SCANNER_EARLY_REJECT_PROPORTION’ and ‘CHIRP_FINGERPRINT_SCANNER_EARLY_REJECT_LOWER_THRESHOL D’ are selected to minimise the number of peaks that must be inspected on average, whilst minimising the number of actual matches that are missed.
  • Note that this technique can also function with an unordered set of peaks.
  • Disable Below Minimal Threshold
  • As acoustic spectra tend to be correlated over time, is unlikely that a match (with confidence of 0.7 above, corresponding to 70% of the peaks matching up) will arise immediately after a spectral block with a match rate of a low value such as 0.01.
  • Therefore, a feature may be used to temporarily disable a fingerprint if its confidence is below some minimal threshold, removing it from the match pool for the next ‘CHIRP_FINGERPRINT_DISABLED_DURATION’ frames.
  • A potential advantage of some embodiments of the present invention is that trigger points within streamed signals can be used to trigger events. This may provide various functionality such as synchronisation of events with the streamed signal. Furthermore, by detecting the trigger point using the fingerprint rather than calculating fingerprints from the streamed signal, computation is reduced enable deployment on lower cost hardware.
  • While the present invention has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details, representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departure from the spirit or scope of applicant's general inventive concept.

Claims (23)

1. A system comprising:
memory; and
a processor operatively coupled to the memory and configured to:
define one or more trigger points at one or more time locations within an audio signal;
generate, based on the audio signal, a fingerprint for each trigger point; and
associate an event with each trigger point.
2. The system of claim 1, wherein each trigger point is associated with a timing offset from a start of the audio signal.
3. The system of claim 1, wherein the fingerprint is generated from a plurality of peaks identified within a Fast Fourier Transform (FFT) of the audio signal at the time location of the fingerprint.
4. The system of claim 3, wherein each of the plurality of peaks is a local magnitude maxima of a 2D coordinate space created by each FFT block.
5. The system of claim 3, wherein each of the plurality of peaks in the fingerprint is separated from other peaks in the fingerprint by a minimum spacing.
6. The system of claim 5, wherein the minimum spacing includes a minimum width and a minimum height.
7. The system of claim 1, wherein a plurality of trigger points are defined, a plurality of fingerprints corresponding to the plurality of trigger points are generated, and each of the fingerprints characterizes a different frame of the audio signal, each frame having a fixed duration of a same width.
8. The system of claim 1, wherein the audio signal comprises a mono uncompressed audio.
9. The system of claim 1, wherein each fingerprint comprises a plurality of peaks, and generating the fingerprint comprises:
reading the audio signal in Fast Fourier Transform (FFT) sized buffers for performing FFT on the audio signal;
deriving a magnitude spectrum from the FFT of the audio signal;
adding the magnitude spectrum to a rolling buffer;
identifying peak-candidates for each frame of the rolling buffer;
based on identifying a local maxima of the peak-candidate in a one-dimensional slice of the rolling buffer, applying an elevation function to determine the peak-candidate's elevation versus elevations of nearby cells; and
based on the elevation of the peak-candidate's elevation exceeding a threshold, classifying the peak-candidate as a peak of the fingerprint.
10. The system of claim 1, wherein each fingerprint comprises a fingerprint width, fingerprint identification number, a number of peaks in the fingerprint, and coordinates of each peak in the fingerprint.
11. The system of claim 1, wherein each fingerprint comprises a number of peaks in the fingerprint and coordinates of each peak in the fingerprint.
12. The system of claim 1, further comparing:
a first device comprising the memory and the processor; and
a second device,
wherein the processor of the first device is further configured to transmit, to the second device, fingerprint information corresponding to the generated fingerprint,
wherein the second device is configured to:
identify a trigger point within an audio signal corresponding a fingerprint detected in the audio signal based on the fingerprint information; and
trigger an event associated with the fingerprint, a timing of the event synchronized to the identified trigger point within the audio signal.
13. The system of claim 1, further comparing:
a first device comprising the memory and the processor; and
a second device,
wherein the processor of the first device is further configured to transmit, to the second device, fingerprint information corresponding to the generated fingerprint and audio signal,
wherein the second device is configured to:
receive the audio signal from the first device;
detect a trigger point within the audio signal, wherein detecting the trigger point within the audio signal includes:
processing the audio signal to provide a plurality of frames, and
identifying, in the plurality of frames, a frame including a fingerprint associated with the trigger point based on the fingerprint information; and
trigger an event associated with the fingerprint in the first frame, a timing of the event synchronized to the detected trigger point within the audio signal.
14. The system of claim 13, wherein the second device comprises a microphone and the second device receives the audio signal via the microphone.
15. A method performed by a device comprising a processor and memory, the method comprising:
defining one or more trigger points at one or more time locations within an audio signal;
generating, based on the audio signal, a fingerprint for each trigger point; and
associating an event with each trigger point.
16. The method of claim 15, wherein the fingerprint is generated from a plurality of peaks identified within a Fast Fourier Transform (FFT) of the audio track at the time location of the fingerprint.
17. The method of claim 16, wherein each of the plurality of peaks is a local magnitude maxima of a 2D coordinate space created by each FFT block.
18. The method of claim 16, wherein each of the plurality of peaks in the fingerprint is separated from other peaks in the fingerprint by a minimum spacing.
19. The method of claim 15, wherein each fingerprint comprises a plurality of peaks and generating the fingerprint comprises:
reading the audio signal in Fast Fourier Transform (FFT) sized buffers for performing FFT on the audio signal;
deriving a magnitude spectrum from the FFT of the audio signal;
adding the magnitude spectrum to a rolling buffer;
identifying peak-candidates for each frame of the rolling buffer;
based on identifying a local maxima of the identified peak-candidate in a one-dimensional slice of the rolling buffer, applying an elevation function to determine the peak-candidate's elevation versus elevations of nearby cells; and
based on the elevation of the peak-candidate's elevation exceeding a threshold, classifying the peak-candidate as a peak of the fingerprint.
20. The method of claim 15, wherein each fingerprint comprises a fingerprint width, fingerprint identification number, a number of peaks in the fingerprint, and coordinates of each peak in the fingerprint.
21. A non-transitory computer readable medium having stored therein computer-readable instructions that, when executed by one or more processors, cause an apparatus connected to the one or more processors to:
define one or more trigger points at one or more time locations within an audio signal;
generate, based on the audio signal, a fingerprint for each trigger point; and
associate an event with each trigger point.
22. An apparatus comprising:
memory; and
a processor operatively coupled to the memory and configured to control the apparatus to:
define a plurality of trigger points within an audio signal, each of the trigger points corresponding to a different time location within the audio signal;
generate, based on peaks identified within a Fast Fourier Transform (FFT) of the audio signal, a fingerprint for each trigger point, wherein each fingerprint comprises a plurality of peaks identified within the FFT of the audio signal at the time location corresponding to the trigger point; and
associate an event with each trigger point.
23. The apparatus of claim 22, wherein each fingerprint comprises a fingerprint width, fingerprint identification number, a number of peaks in the fingerprint, and coordinates of each peak in the fingerprint.
US18/144,589 2017-05-16 2023-05-08 Method and system for triggering events Pending US20240038250A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/144,589 US20240038250A1 (en) 2017-05-16 2023-05-08 Method and system for triggering events

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
GB1709583.7 2017-05-16
GB1709583.7A GB2565751B (en) 2017-06-15 2017-06-15 A method and system for triggering events
PCT/GB2018/051645 WO2018229497A1 (en) 2017-06-15 2018-06-14 A method and system for triggering events
US201916623160A 2019-12-16 2019-12-16
US18/144,589 US20240038250A1 (en) 2017-05-16 2023-05-08 Method and system for triggering events

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US16/623,160 Continuation US11682405B2 (en) 2017-06-15 2018-06-14 Method and system for triggering events
PCT/GB2018/051645 Continuation WO2018229497A1 (en) 2017-05-16 2018-06-14 A method and system for triggering events

Publications (1)

Publication Number Publication Date
US20240038250A1 true US20240038250A1 (en) 2024-02-01

Family

ID=59462299

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/623,160 Active 2038-09-27 US11682405B2 (en) 2017-06-15 2018-06-14 Method and system for triggering events
US18/144,589 Pending US20240038250A1 (en) 2017-05-16 2023-05-08 Method and system for triggering events

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/623,160 Active 2038-09-27 US11682405B2 (en) 2017-06-15 2018-06-14 Method and system for triggering events

Country Status (4)

Country Link
US (2) US11682405B2 (en)
EP (2) EP4390923A1 (en)
GB (1) GB2565751B (en)
WO (1) WO2018229497A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201617408D0 (en) 2016-10-13 2016-11-30 Asio Ltd A method and system for acoustic communication of data
GB201617409D0 (en) 2016-10-13 2016-11-30 Asio Ltd A method and system for acoustic communication of data
GB201704636D0 (en) 2017-03-23 2017-05-10 Asio Ltd A method and system for authenticating a device
GB2565751B (en) * 2017-06-15 2022-05-04 Sonos Experience Ltd A method and system for triggering events
GB2570634A (en) 2017-12-20 2019-08-07 Asio Ltd A method and system for improved acoustic transmission of data
US11988784B2 (en) 2020-08-31 2024-05-21 Sonos, Inc. Detecting an audio signal with a microphone to determine presence of a playback device
TWI790682B (en) * 2021-07-13 2023-01-21 宏碁股份有限公司 Processing method of sound watermark and speech communication system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11599915B1 (en) * 2011-10-25 2023-03-07 Auddia Inc. Apparatus, system, and method for audio based browser cookies

Family Cites Families (136)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2511195C3 (en) 1975-03-14 1978-08-17 Basf Ag, 6700 Ludwigshafen Process for filtering and filters for carrying out the process
US4045616A (en) 1975-05-23 1977-08-30 Time Data Corporation Vocoder system
US4101885A (en) 1975-08-26 1978-07-18 Emil Blum Musical chime device
JPS55149545A (en) 1979-05-10 1980-11-20 Nec Corp Receiver for individual selection and call-out
JPH0778750B2 (en) 1985-12-24 1995-08-23 日本電気株式会社 Highly reliable computer system
US6133849A (en) 1996-02-20 2000-10-17 Unity Wireless Systems Corporation Control signal coding and detection in the audible and inaudible ranges
US7058726B1 (en) 1996-07-08 2006-06-06 Internet Number Corporation Method and systems for accessing information on a network using message aliasing functions having shadow callback functions
JP4372848B2 (en) 1996-07-08 2009-11-25 インターネットナンバー株式会社 Internet access method and system
US6766300B1 (en) 1996-11-07 2004-07-20 Creative Technology Ltd. Method and apparatus for transient detection and non-distortion time scaling
JPH11110319A (en) 1997-10-08 1999-04-23 Sony Corp Transmitter, receiver, recording device and reproducing device
US20100030838A1 (en) 1998-08-27 2010-02-04 Beepcard Ltd. Method to use acoustic signals for computer communications
US7379901B1 (en) 1998-09-11 2008-05-27 Lv Partners, L.P. Accessing a vendor web site using personal account information retrieved from a credit card company web site
US6829646B1 (en) 1999-10-13 2004-12-07 L. V. Partners, L.P. Presentation of web page content based upon computer video resolutions
WO2001015021A2 (en) 1999-08-24 2001-03-01 Digitalconvergence.:Com Inc. Method and apparatus for establishing connection to a remote location on a computer network
JP4792613B2 (en) 1999-09-29 2011-10-12 ソニー株式会社 Information processing apparatus and method, and recording medium
JP2001320337A (en) 2000-05-10 2001-11-16 Nippon Telegr & Teleph Corp <Ntt> Method and device for transmitting acoustic signal and storage medium
AU2295701A (en) 1999-12-30 2001-07-16 Digimarc Corporation Watermark-based personal audio appliance
US6737957B1 (en) 2000-02-16 2004-05-18 Verance Corporation Remote control signaling using audio watermarks
US6532477B1 (en) 2000-02-23 2003-03-11 Sun Microsystems, Inc. Method and apparatus for generating an audio signature for a data item
WO2001093473A2 (en) 2000-05-31 2001-12-06 Optinetix (Israel) Ltd. Systems and methods for distributing information through broadcast media
US6990453B2 (en) * 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
EP1400043A2 (en) 2000-10-20 2004-03-24 Koninklijke Philips Electronics N.V. Method and arrangement for enabling disintermediation, and receiver for use thereby
CN1199383C (en) 2000-10-20 2005-04-27 皇家菲利浦电子有限公司 Rendering device and arrangement
CN101820474B (en) 2000-11-30 2013-11-06 因特拉松尼克斯有限公司 Communication system
AU2211102A (en) 2000-11-30 2002-06-11 Scient Generics Ltd Acoustic communication system
GB0029799D0 (en) 2000-12-07 2001-01-17 Hewlett Packard Co Sound links
GB0029800D0 (en) 2000-12-07 2001-01-17 Hewlett Packard Co Sound links
GB2369955B (en) 2000-12-07 2004-01-07 Hewlett Packard Co Encoding of hyperlinks in sound signals
GB2369995B (en) 2000-12-15 2004-06-16 Lush Ltd Henna product
SE521693C3 (en) 2001-03-30 2004-02-04 Ericsson Telefon Ab L M A method and apparatus for noise suppression
US20030195745A1 (en) 2001-04-02 2003-10-16 Zinser, Richard L. LPC-to-MELP transcoder
US7516325B2 (en) 2001-04-06 2009-04-07 Certicom Corp. Device authentication in a PKI
WO2003001173A1 (en) 2001-06-22 2003-01-03 Rti Tech Pte Ltd A noise-stripping device
US7792279B2 (en) 2002-01-18 2010-09-07 At&T Intellectual Property I, L.P. Distinguishing audio alerts
US7533735B2 (en) 2002-02-15 2009-05-19 Qualcomm Corporation Digital authentication over acoustic channel
US7966497B2 (en) 2002-02-15 2011-06-21 Qualcomm Incorporated System and method for acoustic two factor authentication
US20030212549A1 (en) 2002-05-10 2003-11-13 Jack Steentra Wireless communication using sound
US7764716B2 (en) 2002-06-21 2010-07-27 Disney Enterprises, Inc. System and method for wirelessly transmitting and receiving digital data using acoustical tones
US7103541B2 (en) 2002-06-27 2006-09-05 Microsoft Corporation Microphone array signal enhancement using mixture models
JP2004139525A (en) 2002-10-21 2004-05-13 Nec Corp System and method for providing personal information
US20040264713A1 (en) 2003-06-27 2004-12-30 Robert Grzesek Adaptive audio communication code
US7706548B2 (en) 2003-08-29 2010-04-27 International Business Machines Corporation Method and apparatus for computer communication using audio signals
EP1806739B1 (en) 2004-10-28 2012-08-15 Fujitsu Ltd. Noise suppressor
US20060167841A1 (en) 2004-11-18 2006-07-27 International Business Machines Corporation Method and system for a unique naming scheme for content management systems
US7403743B2 (en) 2004-12-31 2008-07-22 Sony Ericsson Mobile Communications Ab System and method to unlock hidden multimedia content
AU2005201813B2 (en) 2005-04-29 2011-03-24 Phonak Ag Sound processing with frequency transposition
EP1899960A2 (en) 2005-05-26 2008-03-19 LG Electronics Inc. Method of encoding and decoding an audio signal
US20060287004A1 (en) 2005-06-17 2006-12-21 Fuqua Walter B SIM card cash transactions
US9344802B2 (en) 2005-06-28 2016-05-17 Field System, Inc. Information providing system
JP3822224B1 (en) 2005-06-28 2006-09-13 株式会社フィールドシステム Information provision system
US7516074B2 (en) 2005-09-01 2009-04-07 Auditude, Inc. Extraction and matching of characteristic fingerprints from audio signals
US7721958B2 (en) 2005-09-21 2010-05-25 Alcatel Lucent Coinless vending system, method, and computer readable medium using an audio code collector and validator
JP4899416B2 (en) 2005-10-27 2012-03-21 大日本印刷株式会社 Network connection device
TWI330355B (en) 2005-12-05 2010-09-11 Qualcomm Inc Systems, methods, and apparatus for detection of tonal components
EP1793228A1 (en) 2005-12-05 2007-06-06 F. Hoffmann-La Roche AG Method to give acoustically an information in an analytical system
JP2007195105A (en) 2006-01-23 2007-08-02 Hidejiro Kasagi Information acquisition support system and information acquisition method by portable information terminal using sound information
US9135339B2 (en) 2006-02-13 2015-09-15 International Business Machines Corporation Invoking an audio hyperlink
US20070192675A1 (en) 2006-02-13 2007-08-16 Bodin William K Invoking an audio hyperlink embedded in a markup document
US8249350B2 (en) 2006-06-30 2012-08-21 University Of Geneva Brand protection and product autentication using portable devices
DE602006006664D1 (en) 2006-07-10 2009-06-18 Harman Becker Automotive Sys Reduction of background noise in hands-free systems
US20080011825A1 (en) 2006-07-12 2008-01-17 Giordano Claeton J Transactions using handheld electronic devices based on unobtrusive provisioning of the devices
US7656942B2 (en) 2006-07-20 2010-02-02 Hewlett-Packard Development Company, L.P. Denoising signals containing impulse noise
JP4107613B2 (en) 2006-09-04 2008-06-25 インターナショナル・ビジネス・マシーンズ・コーポレーション Low cost filter coefficient determination method in dereverberation.
US20080112885A1 (en) 2006-09-06 2008-05-15 Innurvation, Inc. System and Method for Acoustic Data Transmission
US8036767B2 (en) 2006-09-20 2011-10-11 Harman International Industries, Incorporated System for extracting and changing the reverberant content of an audio input signal
WO2008102373A2 (en) 2007-02-23 2008-08-28 Ravikiran Sureshbabu Pasupulet A method and system for close range communication using concetric arcs model
US8732020B2 (en) 2007-03-28 2014-05-20 At&T Intellectual Property I, Lp Method and apparatus for fulfilling purchases
US20080262928A1 (en) 2007-04-18 2008-10-23 Oliver Michaelis Method and apparatus for distribution and personalization of e-coupons
US7944847B2 (en) 2007-06-25 2011-05-17 Efj, Inc. Voting comparator method, apparatus, and system using a limited number of digital signal processor modules to process a larger number of analog audio streams without affecting the quality of the voted audio stream
US8223959B2 (en) 2007-07-31 2012-07-17 Hewlett-Packard Development Company, L.P. Echo cancellation in which sound source signals are spatially distributed to all speaker devices
US8923509B2 (en) 2007-10-23 2014-12-30 Cisco Technology, Inc. Controlling echo in a wideband voice conference
EP2227915B1 (en) 2007-12-07 2019-05-15 Cirrus Logic International Semiconductor Limited Entrainment resistant feedback cancellation
US8175979B2 (en) 2008-04-02 2012-05-08 International Business Machines Corporation Method and system for anonymous electronic transactions using a mobile device
US20100088390A1 (en) 2008-10-03 2010-04-08 Microsoft Corporation Data sharing proxy for mobile devices
US8508357B2 (en) 2008-11-26 2013-08-13 The Nielsen Company (Us), Llc Methods and apparatus to encode and decode audio for shopper location and advertisement presentation tracking
US8972496B2 (en) 2008-12-10 2015-03-03 Amazon Technologies, Inc. Content sharing
US20100223138A1 (en) 2009-03-02 2010-09-02 First Data Corporation Systems, methods and apparatus for marketing by communicating tones to a mobile device
US8782530B2 (en) 2009-03-25 2014-07-15 Sap Ag Method and system for providing a user interface in a computer
US8320852B2 (en) 2009-04-21 2012-11-27 Samsung Electronic Co., Ltd. Method and apparatus to transmit signals in a communication system
EP2334111B1 (en) 2009-12-14 2012-08-01 Research In Motion Limited Authentication of mobile devices over voice channels
US8886531B2 (en) * 2010-01-13 2014-11-11 Rovi Technologies Corporation Apparatus and method for generating an audio fingerprint and using a two-stage query
EP2552038A4 (en) 2010-03-26 2015-09-16 Field System Inc Sending device
WO2011129725A1 (en) 2010-04-12 2011-10-20 Telefonaktiebolaget L M Ericsson (Publ) Method and arrangement for noise cancellation in a speech encoder
EP3418917B1 (en) * 2010-05-04 2022-08-17 Apple Inc. Methods and systems for synchronizing media
US8661515B2 (en) 2010-05-10 2014-02-25 Intel Corporation Audible authentication for wireless network enrollment
US10360278B2 (en) 2010-06-15 2019-07-23 Nintendo Of America Inc. System and method for accessing online content
GB2484140B (en) 2010-10-01 2017-07-12 Asio Ltd Data communication system
WO2012098579A1 (en) 2011-01-19 2012-07-26 三菱電機株式会社 Noise suppression device
US20120214416A1 (en) 2011-02-23 2012-08-23 Jonathan Douglas Kent Methods and apparatuses for communication between devices
US9270807B2 (en) 2011-02-23 2016-02-23 Digimarc Corporation Audio localization using audio signal encoding and recognition
JP5994136B2 (en) * 2011-06-15 2016-09-21 株式会社フィールドシステム Authentication system and authentication method
JP6147744B2 (en) 2011-07-29 2017-06-14 ディーティーエス・エルエルシーDts Llc Adaptive speech intelligibility processing system and method
JP5751110B2 (en) 2011-09-22 2015-07-22 富士通株式会社 Reverberation suppression apparatus, reverberation suppression method, and reverberation suppression program
US20130275126A1 (en) 2011-10-11 2013-10-17 Robert Schiff Lee Methods and systems to modify a speech signal while preserving aural distinctions between speech sounds
US9143402B2 (en) 2012-02-24 2015-09-22 Qualcomm Incorporated Sensor based configuration and control of network devices
US8995903B2 (en) 2012-07-25 2015-03-31 Gopro, Inc. Credential transfer management camera network
US8930005B2 (en) 2012-08-07 2015-01-06 Sonos, Inc. Acoustic signatures in a playback system
US9357385B2 (en) 2012-08-20 2016-05-31 Qualcomm Incorporated Configuration of a new enrollee device for use in a communication network
US9088336B2 (en) 2012-09-06 2015-07-21 Imagination Technologies Limited Systems and methods of echo and noise cancellation in voice communication
US20140074469A1 (en) 2012-09-11 2014-03-13 Sergey Zhidkov Apparatus and Method for Generating Signatures of Acoustic Signal and Apparatus for Acoustic Signal Identification
US9401153B2 (en) * 2012-10-15 2016-07-26 Digimarc Corporation Multi-mode audio recognition and auxiliary data encoding and decoding
US9305559B2 (en) * 2012-10-15 2016-04-05 Digimarc Corporation Audio watermark encoding with reversing polarity and pairwise embedding
US9473580B2 (en) 2012-12-06 2016-10-18 Cisco Technology, Inc. System and associated methodology for proximity detection and device association using ultrasound
US9318116B2 (en) 2012-12-14 2016-04-19 Disney Enterprises, Inc. Acoustic data transmission based on groups of audio receivers
US20140172429A1 (en) * 2012-12-14 2014-06-19 Microsoft Corporation Local recognition of content
US20140258110A1 (en) 2013-03-11 2014-09-11 Digimarc Corporation Methods and arrangements for smartphone payments and transactions
EP2952012B1 (en) 2013-03-07 2018-07-18 Apple Inc. Room and program responsive loudspeaker system
US20150004935A1 (en) 2013-06-26 2015-01-01 Nokia Corporation Method and apparatus for generating access codes based on information embedded in various signals
CN106409310B (en) 2013-08-06 2019-11-19 华为技术有限公司 A kind of audio signal classification method and apparatus
KR101475862B1 (en) 2013-09-24 2014-12-23 (주)파워보이스 Encoding apparatus and method for encoding sound code, decoding apparatus and methdo for decoding the sound code
US9226119B2 (en) 2013-11-20 2015-12-29 Qualcomm Incorporated Using sensor data to provide information for proximally-relevant group communications
US9722984B2 (en) 2014-01-30 2017-08-01 Netiq Corporation Proximity-based authentication
US20150248879A1 (en) 2014-02-28 2015-09-03 Texas Instruments Incorporated Method and system for configuring an active noise cancellation unit
KR102166423B1 (en) * 2014-03-05 2020-10-15 삼성전자주식회사 Display device, server and method of controlling the display device
KR102139997B1 (en) 2014-03-21 2020-08-12 에스케이플래닛 주식회사 Method for reinforcing security of beacon device, system and apparatus thereof
US20150371529A1 (en) 2014-06-24 2015-12-24 Bose Corporation Audio Systems and Related Methods and Devices
US9578511B2 (en) 2014-06-30 2017-02-21 Libre Wireless Technologies, Inc. Systems and techniques for wireless device configuration
US9270811B1 (en) 2014-09-23 2016-02-23 Amazon Technologies, Inc. Visual options for audio menu
US9947318B2 (en) 2014-10-03 2018-04-17 2236008 Ontario Inc. System and method for processing an audio signal captured from a microphone
US9118401B1 (en) 2014-10-28 2015-08-25 Harris Corporation Method of adaptive interference mitigation in wide band spectrum
JP6704929B2 (en) 2014-12-10 2020-06-03 キンダイ、インコーポレイテッドKyndi, Inc. Apparatus and method for combinatorial hypermap-based data representation and operation
CN105790852B (en) 2014-12-19 2019-02-22 北京奇虎科技有限公司 Method and system for data transmission based on multi-frequency sound waves
WO2016145235A1 (en) 2015-03-12 2016-09-15 Startimes Communication Network Technology Co. Ltd. Location based services audio system
US10530767B2 (en) 2015-03-23 2020-01-07 Telefonaktiebolaget Lm Ericsson (Publ) Methods and user device and authenticator device for authentication of the user device
JP6940414B2 (en) * 2015-04-20 2021-09-29 レスメッド センサー テクノロジーズ リミテッド Human detection and identification from characteristic signals
US10186251B1 (en) 2015-08-06 2019-01-22 Oben, Inc. Voice conversion using deep neural network with intermediate voice training
US11233582B2 (en) 2016-03-25 2022-01-25 Lisnr, Inc. Local tone generation
US10236031B1 (en) * 2016-04-05 2019-03-19 Digimarc Corporation Timeline reconstruction using dynamic path estimation from detections in audio-video signals
US10236006B1 (en) * 2016-08-05 2019-03-19 Digimarc Corporation Digital watermarks adapted to compensate for time scaling, pitch shifting and mixing
GB201617408D0 (en) 2016-10-13 2016-11-30 Asio Ltd A method and system for acoustic communication of data
CN106921650B (en) 2016-12-21 2021-01-19 创新先进技术有限公司 Cross-device login method, system and device
GB2565751B (en) * 2017-06-15 2022-05-04 Sonos Experience Ltd A method and system for triggering events
CN110622444B (en) 2017-06-13 2023-03-24 苹果公司 Robust ultrasound communication signal format
EP3416407B1 (en) 2017-06-13 2020-04-08 Nxp B.V. Signal processor
US11488590B2 (en) * 2018-05-09 2022-11-01 Staton Techiya Llc Methods and systems for processing, storing, and publishing data collected by an in-ear device
US11514777B2 (en) 2018-10-02 2022-11-29 Sonos, Inc. Methods and devices for transferring data using sound signals

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11599915B1 (en) * 2011-10-25 2023-03-07 Auddia Inc. Apparatus, system, and method for audio based browser cookies

Also Published As

Publication number Publication date
EP4390923A1 (en) 2024-06-26
GB201709583D0 (en) 2017-08-02
WO2018229497A1 (en) 2018-12-20
GB2565751A (en) 2019-02-27
GB2565751B (en) 2022-05-04
US11682405B2 (en) 2023-06-20
US20210098008A1 (en) 2021-04-01
EP3639262A1 (en) 2020-04-22

Similar Documents

Publication Publication Date Title
US20240038250A1 (en) Method and system for triggering events
US11336952B2 (en) Media content identification on mobile devices
CN110430425B (en) Video fluency determination method and device, electronic equipment and medium
KR101101384B1 (en) Parameterized Time Characterization
US11736762B2 (en) Media content identification on mobile devices
US11653062B2 (en) Methods and apparatus to determine audio source impact on an audience of media
US9374629B2 (en) Methods and apparatus to classify audio
KR102212225B1 (en) Apparatus and Method for correcting Audio data
US9792898B2 (en) Concurrent segmentation of multiple similar vocalizations
CN107609149B (en) Video positioning method and device
AU2024200622A1 (en) Methods and apparatus to fingerprint an audio signal via exponential normalization
CN109997186B (en) Apparatus and method for classifying acoustic environments
US20160163354A1 (en) Programme Control
JP2012185195A (en) Audio data feature extraction method, audio data collation method, audio data feature extraction program, audio data collation program, audio data feature extraction device, audio data collation device, and audio data collation system
US10701459B2 (en) Audio-video content control
WO2014098498A1 (en) Audio correction apparatus, and audio correction method thereof
CN111314536B (en) Method and equipment for detecting listening module of terminal equipment
US20160080863A1 (en) Feedback suppression test filter correlation
US20220199074A1 (en) A dialog detector
Vozarikova et al. Study of audio spectrum flatness for acoustic events recognition
EP2148327A1 (en) A method and a device and a system for determining the location of distortion in an audio signal

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED