Jeong et al., 2017 - Google Patents

Audio Event Detection Using Multiple-Input Convolutional Neural Network.

Jeong et al., 2017

Document ID: 12089201817674036143
Author: Jeong I; Lee S; Han Y; Lee K
Publication year: 2017
Publication venue: Dcase

External Links

Cited by

Snippet

This paper describes the model and training framework from our submission for DCASE 2017 task 3: sound event detection in real life audio. Extending the basic convolutional neural network architecture, we use both short-and long-term audio signal simultaneously …

Continue reading at www.researchgate.net (PDF) (other versions)

238000001514 detection method 0 title abstract description 18

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search

Similar Documents

Publication	Publication Date	Title
Jeong et al.	2017	Audio Event Detection Using Multiple-Input Convolutional Neural Network.
Hu et al.	2022	MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations
Qin et al.	2022	Simple attention module based speaker verification with iterative noisy label detection
Xu et al.	2017	Convolutional gated recurrent neural network incorporating spatial features for audio tagging
Senthilkumar et al.	2022	Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks
Shum et al.	2013	Unsupervised methods for speaker diarization: An integrated and iterative approach
Yella et al.	2014	Artificial neural network features for speaker diarization
CN111508526B (en)	2022-07-01	Method and device for detecting audio beat information and storage medium
US11238289B1 (en)	2022-02-01	Automatic lie detection method and apparatus for interactive scenarios, device and medium
Vinals et al.	2018	Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.
Kwon et al.	2022	Multi-scale speaker embedding-based graph attention networks for speaker diarisation
Kinoshita et al.	2022	Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model
CN114023354A (en)	2022-02-08	Guidance type acoustic event detection model training method based on focusing loss function
Pereira et al.	2021	Using deep autoencoders for in-vehicle audio anomaly detection
CN111785284A (en)	2020-10-16	Method, device and equipment for recognizing text-independent voiceprint based on phoneme assistance
Naranjo-Alcazar et al.	2019	On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Kumar et al.	2020	Designing neural speaker embeddings with meta learning
de Velasco Vázquez et al.	2019	Can spontaneous emotions be detected from speech on TV political debates?
Kwon et al.	2021	Look who’s not talking
Jallet et al.	2017	Acoustic scene classification using convolutional recurrent neural networks
Valenti et al.	2017	A neural network approach for sound event detection in real life audio
Liu et al.	2018	Learning salient features for speech emotion recognition using CNN
Yu et al.	2024	Efficient feature extraction and late fusion strategy for audiovisual emotional mimicry intensity estimation
Rakowski et al.	2019	Frequency-aware CNN for open set acoustic scene classification
Churaev et al.	2022	Multi-user facial emotion recognition in video based on user-dependent neural network adaptation