Rakowski et al., 2019 - Google Patents
Frequency-aware CNN for open set acoustic scene classificationRakowski et al., 2019
View PDF- Document ID
- 16523754012036764331
- Author
- Rakowski A
- Kosmider M
- Publication year
- Publication venue
- Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA
External Links
Snippet
This report describes systems used for Task 1c of the DCASE 2019 Challenge-Open Set Acoustic Scene Classification. The main system consists of a 5-layer convolutional neural network which preserves the location of features on the frequency axis. This is in contrast to …
- 238000011176 pooling 0 abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6256—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6261—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600054B (en) | Sound scene classification method based on network model fusion | |
Ding et al. | Autospeech: Neural architecture search for speaker recognition | |
Jeong et al. | Audio Event Detection Using Multiple-Input Convolutional Neural Network. | |
JP6557783B2 (en) | Cascade neural network with scale-dependent pooling for object detection | |
US12067989B2 (en) | Combined learning method and apparatus using deepening neural network based feature enhancement and modified loss function for speaker recognition robust to noisy environments | |
CN110033756B (en) | Language identification method and device, electronic equipment and storage medium | |
WO2018052587A1 (en) | Method and system for cell image segmentation using multi-stage convolutional neural networks | |
CN112183107A (en) | Audio processing method and device | |
Rakowski et al. | Frequency-aware CNN for open set acoustic scene classification | |
Jeong et al. | Audio tagging system for dcase 2018: focusing on label noise, data augmentation and its efficient learning | |
CN113762149B (en) | Human behavior recognition system and method based on feature fusion of segmented attention | |
CN113345466B (en) | Main speaker voice detection method, device and equipment based on multi-microphone scene | |
CN108877812B (en) | A voiceprint recognition method, device and storage medium | |
Naranjo-Alcazar et al. | On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification | |
Perez-Castanos et al. | Cnn depth analysis with different channel inputs for acoustic scene classification | |
Jallet et al. | Acoustic scene classification using convolutional recurrent neural networks | |
CN114882914B (en) | Aliasing processing method, device and storage medium | |
Bursuc et al. | Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification | |
JP5626221B2 (en) | Acoustic image segment classification apparatus and method | |
CN113392728B (en) | Target detection method based on SSA sharpening attention mechanism | |
Mustafa et al. | Enhancing CNN-based Image Steganalysis on GPUs. | |
CN119562032A (en) | Audio, video and image fusion monitoring and analysis system and method based on deep learning in rail scenarios | |
CN111144497B (en) | Image saliency prediction method under multi-task deep network based on aesthetic analysis | |
CN111429937A (en) | Voice separation method, model training method and electronic equipment | |
Phan et al. | Beyond equal-length snippets: How long is sufficient to recognize an audio scene? |