Useful Links
Computer Science
Artificial Intelligence
Deep Learning
Deep Learning for Audio Processing
1. Foundations of Audio and Deep Learning
2. Audio Data Representation and Preprocessing
3. Core Deep Learning Architectures for Audio
4. Key Applications and Tasks
5. Advanced Models and Techniques
6. Practical Implementation and Evaluation
Advanced Models and Techniques
Generative Models for Audio
Autoregressive Models
WaveNet Architecture
Dilated Causal Convolutions
Conditioning Mechanisms
Global and Local Conditioning
SampleRNN
Hierarchical Structure
Multi-scale Generation
WaveRNN
Efficient Autoregressive Generation
Sparse Generation
Parallel WaveNet
Probability Density Distillation
Fast Parallel Generation
Generative Adversarial Networks
GAN Fundamentals for Audio
Generator and Discriminator Design
Adversarial Loss Functions
WaveGAN
Raw Audio Generation
1D Convolutional Architecture
SpecGAN
Spectrogram Generation
Post-processing to Audio
MelGAN
Mel-spectrogram Conditioning
Efficient Vocoding
HiFi-GAN
High-fidelity Audio Generation
Multi-scale Discriminators
Variational Autoencoders
VAE Fundamentals
Encoder-Decoder Architecture
Latent Space Modeling
Variational Inference
β-VAE for Audio
Disentangled Representations
Controllable Generation
VQ-VAE for Audio
Vector Quantization
Discrete Latent Representations
Hierarchical VAEs
Multi-level Latent Variables
Structured Generation
Flow-based Models
Normalizing Flows
Invertible Transformations
Exact Likelihood Computation
WaveGlow
Flow-based Vocoding
Parallel Generation
FloWaveNet
Flow and WaveNet Combination
High-quality Synthesis
Diffusion Models
Denoising Diffusion Probabilistic Models
Forward and Reverse Processes
Noise Scheduling
DiffWave
Diffusion for Audio Generation
Unconditional and Conditional Generation
Grad-TTS
Diffusion for Text-to-Speech
Score-based Generation
Audio Source Separation
Problem Formulation
Mixing Models
Linear Instantaneous Mixing
Convolutive Mixing
Non-linear Mixing
Source Separation Objectives
Perfect Reconstruction
Perceptual Quality
Source Isolation
Classical Separation Methods
Independent Component Analysis
Statistical Independence
Non-Gaussian Sources
Non-negative Matrix Factorization
Parts-based Decomposition
Sparsity Constraints
Principal Component Analysis
Dimensionality Reduction
Orthogonal Components
Deep Learning Separation
Supervised Separation
Mask Estimation
Direct Signal Estimation
Permutation Problem
Deep Clustering
Embedding-based Separation
K-means Clustering
Permutation Invariant Training
Utterance-level PIT
Scale-invariant Training
Music Source Separation
Vocal Separation
Lead Vocal Extraction
Harmony Separation
Instrument Separation
Drum Separation
Bass Separation
Harmonic-Percussive Separation
Stem Separation
Multi-track Separation
Professional Audio Applications
Speech Separation
Single-channel Speech Separation
Monaural Source Separation
Deep Learning Approaches
Multi-channel Speech Separation
Beamforming Integration
Spatial Information Utilization
Speaker-independent Separation
Universal Separation Models
Adaptation Techniques
Self-Supervised and Unsupervised Learning
Contrastive Learning
Contrastive Predictive Coding
Predictive Coding Framework
Negative Sampling
SimCLR for Audio
Data Augmentation Strategies
Contrastive Loss Functions
Audio Representation Learning
Temporal Contrastive Learning
Cross-modal Contrastive Learning
Masked Language Modeling for Audio
Masked Acoustic Modeling
Random Masking Strategies
Reconstruction Objectives
wav2vec 2.0
Quantized Representations
Contrastive Learning
HuBERT
Hidden Unit BERT
Iterative Refinement
Pretext Tasks
Temporal Order Prediction
Sequence Order Learning
Shuffle Detection
Speed Prediction
Playback Speed Classification
Temporal Dynamics Learning
Rotation Prediction
Spectrogram Rotation
Spatial Relationship Learning
Multi-modal Self-supervision
Audio-Visual Learning
Cross-modal Correspondence
Synchronization Learning
Audio-Text Learning
Speech-Text Alignment
Semantic Correspondence
Transfer Learning and Pre-trained Models
Transfer Learning Strategies
Feature Extraction
Frozen Pre-trained Features
Feature Concatenation
Fine-tuning
Layer-wise Learning Rates
Gradual Unfreezing
Domain Adaptation
Domain Shift Handling
Adversarial Domain Adaptation
Pre-trained Audio Models
VGGish
YouTube-8M Pre-training
Audio Event Classification
PANNs
AudioSet Pre-training
Large-scale Audio Recognition
wav2vec Models
Self-supervised Pre-training
Speech Representation Learning
CLAP
Contrastive Language-Audio Pre-training
Cross-modal Representations
Cross-domain Transfer
Image-to-Audio Transfer
CNN Architecture Transfer
Spectrogram as Image
Speech-to-Music Transfer
Domain Adaptation Techniques
Feature Space Alignment
Language-to-Audio Transfer
Transformer Architecture Transfer
Sequence Modeling Transfer
Few-shot and Zero-shot Learning
Meta-learning for Audio
Model-Agnostic Meta-Learning
Prototypical Networks
Zero-shot Audio Classification
Semantic Embeddings
Attribute-based Classification
Few-shot Adaptation
Rapid Adaptation Techniques
Support Set Utilization
Previous
4. Key Applications and Tasks
Go to top
Next
6. Practical Implementation and Evaluation