Audio Classification
141 papers with code • 21 benchmarks • 36 datasets
Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.
Libraries
Use these libraries to find Audio Classification models and implementationsDatasets
Subtasks
Most implemented papers
Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.
AudioMNIST: Exploring Explainable Artificial Intelligence for Audio Analysis on a Simple Benchmark
Explainable Artificial Intelligence (XAI) is targeted at understanding how models perform feature selection and derive their classification decisions.
Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
Modern deep neural networks are well known to be brittle in the face of unknown data instances and recognition of the latter remains a challenge.
Rethinking CNN Models for Audio Classification
Besides, we show that even though we use the pretrained model weights for initialization, there is variance in performance in various output runs of the same model.
SSAST: Self-Supervised Audio Spectrogram Transformer
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical Bayes
We propose MOdel Priors with Empirical Bayes using DNN (MOPED) method to choose informed weight priors in Bayesian neural networks.
$Π-$nets: Deep Polynomial Neural Networks
Deep Convolutional Neural Networks (DCNNs) is currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning.
Generalised Interpretable Shapelets for Irregular Time Series
The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'.
CRNNs for Urban Sound Tagging with spatiotemporal context
This paper describes CRNNs we used to participate in Task 5 of the DCASE 2020 challenge.
Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast
It is useful as a pre-processing step to index, store, and modify audio recordings, radio broadcasts and TV programmes.