Jeong et al., 2017 - Google Patents
Audio Event Detection Using Multiple-Input Convolutional Neural Network.Jeong et al., 2017
View PDF- Document ID
- 12089201817674036143
- Author
- Jeong I
- Lee S
- Han Y
- Lee K
- Publication year
- Publication venue
- Dcase
External Links
Snippet
This paper describes the model and training framework from our submission for DCASE 2017 task 3: sound event detection in real life audio. Extending the basic convolutional neural network architecture, we use both short-and long-term audio signal simultaneously …
- 238000001514 detection method 0 title abstract description 18
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jeong et al. | Audio Event Detection Using Multiple-Input Convolutional Neural Network. | |
Hu et al. | MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations | |
Qin et al. | Simple attention module based speaker verification with iterative noisy label detection | |
Xu et al. | Convolutional gated recurrent neural network incorporating spatial features for audio tagging | |
Senthilkumar et al. | Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks | |
Shum et al. | Unsupervised methods for speaker diarization: An integrated and iterative approach | |
Yella et al. | Artificial neural network features for speaker diarization | |
CN111508526B (en) | Method and device for detecting audio beat information and storage medium | |
US11238289B1 (en) | Automatic lie detection method and apparatus for interactive scenarios, device and medium | |
Vinals et al. | Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge. | |
Kwon et al. | Multi-scale speaker embedding-based graph attention networks for speaker diarisation | |
Kinoshita et al. | Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model | |
CN114023354A (en) | Guidance type acoustic event detection model training method based on focusing loss function | |
Pereira et al. | Using deep autoencoders for in-vehicle audio anomaly detection | |
CN111785284A (en) | Method, device and equipment for recognizing text-independent voiceprint based on phoneme assistance | |
Naranjo-Alcazar et al. | On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification | |
Kumar et al. | Designing neural speaker embeddings with meta learning | |
de Velasco Vázquez et al. | Can spontaneous emotions be detected from speech on TV political debates? | |
Kwon et al. | Look who’s not talking | |
Jallet et al. | Acoustic scene classification using convolutional recurrent neural networks | |
Valenti et al. | A neural network approach for sound event detection in real life audio | |
Liu et al. | Learning salient features for speech emotion recognition using CNN | |
Yu et al. | Efficient feature extraction and late fusion strategy for audiovisual emotional mimicry intensity estimation | |
Rakowski et al. | Frequency-aware CNN for open set acoustic scene classification | |
Churaev et al. | Multi-user facial emotion recognition in video based on user-dependent neural network adaptation |