[go: up one dir, main page]

Jeong et al., 2017 - Google Patents

Audio Event Detection Using Multiple-Input Convolutional Neural Network.

Jeong et al., 2017

View PDF
Document ID
12089201817674036143
Author
Jeong I
Lee S
Han Y
Lee K
Publication year
Publication venue
Dcase

External Links

Snippet

This paper describes the model and training framework from our submission for DCASE 2017 task 3: sound event detection in real life audio. Extending the basic convolutional neural network architecture, we use both short-and long-term audio signal simultaneously …
Continue reading at www.researchgate.net (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Similar Documents

Publication Publication Date Title
Jeong et al. Audio Event Detection Using Multiple-Input Convolutional Neural Network.
Hu et al. MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations
Qin et al. Simple attention module based speaker verification with iterative noisy label detection
Xu et al. Convolutional gated recurrent neural network incorporating spatial features for audio tagging
Senthilkumar et al. Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks
Shum et al. Unsupervised methods for speaker diarization: An integrated and iterative approach
Yella et al. Artificial neural network features for speaker diarization
CN111508526B (en) Method and device for detecting audio beat information and storage medium
US11238289B1 (en) Automatic lie detection method and apparatus for interactive scenarios, device and medium
Vinals et al. Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.
Kwon et al. Multi-scale speaker embedding-based graph attention networks for speaker diarisation
Kinoshita et al. Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model
CN114023354A (en) Guidance type acoustic event detection model training method based on focusing loss function
Pereira et al. Using deep autoencoders for in-vehicle audio anomaly detection
CN111785284A (en) Method, device and equipment for recognizing text-independent voiceprint based on phoneme assistance
Naranjo-Alcazar et al. On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Kumar et al. Designing neural speaker embeddings with meta learning
de Velasco Vázquez et al. Can spontaneous emotions be detected from speech on TV political debates?
Kwon et al. Look who’s not talking
Jallet et al. Acoustic scene classification using convolutional recurrent neural networks
Valenti et al. A neural network approach for sound event detection in real life audio
Liu et al. Learning salient features for speech emotion recognition using CNN
Yu et al. Efficient feature extraction and late fusion strategy for audiovisual emotional mimicry intensity estimation
Rakowski et al. Frequency-aware CNN for open set acoustic scene classification
Churaev et al. Multi-user facial emotion recognition in video based on user-dependent neural network adaptation