Rakowski et al., 2019 - Google Patents

Frequency-aware CNN for open set acoustic scene classification

Rakowski et al., 2019

Document ID: 16523754012036764331
Author: Rakowski A; Kosmider M
Publication year: 2019
Publication venue: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA

External Links

Cited by

Snippet

This report describes systems used for Task 1c of the DCASE 2019 Challenge-Open Set Acoustic Scene Classification. The main system consists of a 5-layer convolutional neural network which preserves the location of features on the frequency axis. This is in contrast to …

Continue reading at dcase.community (PDF) (other versions)

238000011176 pooling 0 abstract description 10

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6256—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6261—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation partitioning the feature space
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details

Similar Documents

Publication	Publication Date	Title
CN110600054B (en)	2021-09-21	Sound scene classification method based on network model fusion
Ding et al.	2020	Autospeech: Neural architecture search for speaker recognition
Jeong et al.	2017	Audio Event Detection Using Multiple-Input Convolutional Neural Network.
JP6557783B2 (en)	2019-08-07	Cascade neural network with scale-dependent pooling for object detection
US12067989B2 (en)	2024-08-20	Combined learning method and apparatus using deepening neural network based feature enhancement and modified loss function for speaker recognition robust to noisy environments
CN110033756B (en)	2021-03-16	Language identification method and device, electronic equipment and storage medium
WO2018052587A1 (en)	2018-03-22	Method and system for cell image segmentation using multi-stage convolutional neural networks
CN112183107A (en)	2021-01-05	Audio processing method and device
Rakowski et al.	2019	Frequency-aware CNN for open set acoustic scene classification
Jeong et al.	2018	Audio tagging system for dcase 2018: focusing on label noise, data augmentation and its efficient learning
CN113762149B (en)	2025-03-04	Human behavior recognition system and method based on feature fusion of segmented attention
CN113345466B (en)	2024-03-01	Main speaker voice detection method, device and equipment based on multi-microphone scene
CN108877812B (en)	2021-04-02	A voiceprint recognition method, device and storage medium
Naranjo-Alcazar et al.	2019	On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Perez-Castanos et al.	2019	Cnn depth analysis with different channel inputs for acoustic scene classification
Jallet et al.	2017	Acoustic scene classification using convolutional recurrent neural networks
CN114882914B (en)	2024-06-18	Aliasing processing method, device and storage medium
Bursuc et al.	2021	Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification
JP5626221B2 (en)	2014-11-19	Acoustic image segment classification apparatus and method
CN113392728B (en)	2022-06-10	Target detection method based on SSA sharpening attention mechanism
Mustafa et al.	2020	Enhancing CNN-based Image Steganalysis on GPUs.
CN119562032A (en)	2025-03-04	Audio, video and image fusion monitoring and analysis system and method based on deep learning in rail scenarios
CN111144497B (en)	2023-04-28	Image saliency prediction method under multi-task deep network based on aesthetic analysis
CN111429937A (en)	2020-07-17	Voice separation method, model training method and electronic equipment
Phan et al.	2018	Beyond equal-length snippets: How long is sufficient to recognize an audio scene?