Useful Links
Computer Science
Signal Processing
Speech Synthesis and Processing
1. Fundamentals of Sound and Speech
2. Digital Signal Processing for Speech
3. Speech Analysis and Feature Extraction
4. Speech Synthesis (Text-to-Speech)
5. Automatic Speech Recognition (ASR)
6. Advanced Topics and Applications
7. Evaluation and Quality Assessment
Advanced Topics and Applications
Speaker Recognition
Speaker Identification
Closed-set Identification
Open-set Identification
Text-dependent Systems
Text-independent Systems
Speaker Verification
Authentication Applications
Threshold Selection
Score Normalization
Anti-spoofing Measures
Speaker Embedding Techniques
i-vector Extraction
Total Variability Space
Factor Analysis
Probabilistic Linear Discriminant Analysis (PLDA)
x-vector Systems
Time Delay Neural Networks (TDNNs)
Statistics Pooling
Deep Speaker Embeddings
Channel Compensation
Intersession Variability
Channel Adaptation
Domain Mismatch Handling
Speech Enhancement
Noise Reduction Techniques
Spectral Subtraction
Wiener Filtering
Minimum Mean Square Error (MMSE) Estimation
Kalman Filtering
Deep Learning Enhancement
Denoising Autoencoders
Recurrent Neural Networks
Generative Adversarial Networks
Mask Estimation Networks
Multi-channel Enhancement
Beamforming Techniques
Microphone Array Processing
Blind Source Separation
Acoustic Echo Cancellation
Adaptive Filtering
Double-talk Detection
Nonlinear Echo Cancellation
Dereverberation
Room Impulse Response Modeling
Inverse Filtering
Statistical Reverberation Models
Spoken Language Understanding
Intent Recognition
Classification Approaches
Feature Engineering
Neural Architectures
Multi-intent Handling
Slot Filling
Sequence Labeling
Named Entity Recognition
Conditional Random Fields
Neural Sequence Models
Joint Intent and Slot Modeling
Multi-task Learning
Attention-based Joint Models
End-to-end SLU Systems
Dialogue State Tracking
Belief State Representation
State Update Mechanisms
Neural Dialogue State Trackers
Speech Emotion Recognition
Emotional Speech Databases
Acted vs. Natural Emotions
Annotation Schemes
Cross-cultural Considerations
Acoustic Feature Analysis
Prosodic Features
Spectral Features
Voice Quality Features
Temporal Dynamics
Emotion Classification Models
Traditional Machine Learning
Deep Learning Approaches
Multimodal Fusion
Continuous Emotion Recognition
Dimensional Emotion Models
Temporal Modeling
Real-time Processing
Voice Conversion
Parallel Voice Conversion
Dynamic Time Warping
Gaussian Mixture Models
Statistical Parameter Mapping
Non-parallel Voice Conversion
CycleGAN-based Approaches
Variational Autoencoders
StarGAN-VC
Real-time Voice Conversion
Low-latency Processing
Streaming Algorithms
Hardware Implementations
Multilingual Speech Processing
Cross-lingual Speech Recognition
Multilingual Acoustic Models
Language Identification
Code-switching Handling
Speech Translation
Cascade Systems
End-to-end Speech Translation
Simultaneous Translation
Multilingual TTS
Language-specific Modeling
Cross-lingual Voice Cloning
Accent Modeling
Previous
5. Automatic Speech Recognition (ASR)
Go to top
Next
7. Evaluation and Quality Assessment