Audio Classification

141 papers with code • 21 benchmarks • 36 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Classification

Dataset	Best Model	Compare
AudioSet	OmniVec	See all
ESC-50	InternVideo2	See all
VGGSound	Mirasol3B	See all
ICBHI Respiratory Sound Database	M2D-X/0.7 (η=0.3)	See all
SHD	Event-SSM	See all
FSD50K	ONE-PEACE	See all
Speech Commands	AST-S	See all
DCASE	CrissCross (AudioSet)	See all
Balanced Audio Set	BEATs	See all
SSC	Event-SSM	See all
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	See all
BirdCLEF 2021	EfficientLEAF (8s)	See all
DiCOVA	AUCO ResNet	See all
CREMA-D	M2D-AS/0.7	See all
RAVDESS	ASM-RH-A	See all
VocalSound	VocalSound Baseline	See all
Multimodal PISA	MMDL	See all
UCR Time Series Classification Archive	CDIL	See all
DEEP-VOICE: DeepFake Voice Recognition	XGBoost (330)	See all
EPIC-SOUNDS	Mirasol3B (A+V)	See all
MeerKAT: Meerkat Kalahari Audio Transcripts	animal2vec	See all

Show all 21 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Audio Classification models and implementations

Sreyan88/LAPE

3 papers

towhee-io/towhee

2 papers

3,040

google-research/leaf-audio

2 papers

481

faceonlive/ai-research

2 papers

300

See all 9 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

cruvadom/Convolutional-RNN • • 18 Feb 2016

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.

Paper
Code

AudioMNIST: Exploring Explainable Artificial Intelligence for Audio Analysis on a Simple Benchmark

soerenab/AudioMNIST • 9 Jul 2018

Explainable Artificial Intelligence (XAI) is targeted at understanding how models perform feature selection and derive their classification decisions.

Paper
Code

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition

MrtnMndt/OCDVAE_ContinualLearning • • 28 May 2019

Modern deep neural networks are well known to be brittle in the face of unknown data instances and recognition of the latter remains a challenge.

Paper
Code

Rethinking CNN Models for Audio Classification

kamalesh0406/Audio-Classification • • 22 Jul 2020

Besides, we show that even though we use the pretrained model weights for initialization, there is variance in performance in various output runs of the same model.

Paper
Code

SSAST: Self-Supervised Audio Spectrogram Transformer

YuanGongND/ssast • • 19 Oct 2021

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Paper
Code

Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical Bayes

IntelLabs/bayesian-torch • • 12 Jun 2019

We propose MOdel Priors with Empirical Bayes using DNN (MOPED) method to choose informed weight priors in Bayesian neural networks.

Paper
Code

$Π-$nets: Deep Polynomial Neural Networks

grigorisg9gr/polynomial_nets • • 8 Mar 2020

Deep Convolutional Neural Networks (DCNNs) is currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning.

Paper
Code

Generalised Interpretable Shapelets for Irregular Time Series

patrick-kidger/generalised_shapelets • • 28 May 2020

The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'.

Paper
Code