UsefulLinks
Computer Science
Artificial Intelligence
Deep Learning
Reinforcement Learning
1. Foundations of Reinforcement Learning
2. Mathematical Foundations
3. Markov Decision Processes
4. Dynamic Programming
5. Monte Carlo Methods
6. Temporal-Difference Learning
7. Function Approximation
8. Deep Reinforcement Learning
9. Policy Gradient Methods
10. Advanced Topics
11. Implementation and Practical Considerations
12. Applications and Case Studies
8.
Deep Reinforcement Learning
8.1.
Introduction to Deep RL
8.1.1.
Neural Networks in RL
8.1.2.
Representation Learning
8.1.3.
End-to-End Learning
8.2.
Deep Q-Networks (DQN)
8.2.1.
Neural Network Q-Function Approximation
8.2.2.
Network Architecture Design
8.2.3.
Loss Function Definition
8.2.4.
Training Procedures
8.3.
Stabilizing Deep RL
8.3.1.
Experience Replay
8.3.1.1.
Replay Buffer Implementation
8.3.1.2.
Breaking Sample Correlations
8.3.1.3.
Batch Sampling Strategies
8.3.2.
Fixed Q-Targets
8.3.2.1.
Target Network Updates
8.3.2.2.
Stabilizing Training
8.3.2.3.
Update Frequencies
8.3.3.
Gradient Clipping
8.3.4.
Reward Clipping
8.4.
DQN Improvements
8.4.1.
Double DQN
8.4.1.1.
Overestimation Bias Problem
8.4.1.2.
Double Estimation Solution
8.4.1.3.
Performance Improvements
8.4.2.
Dueling DQN
8.4.2.1.
Value and Advantage Decomposition
8.4.2.2.
Network Architecture
8.4.2.3.
Aggregation Methods
8.4.3.
Prioritized Experience Replay
8.4.3.1.
TD-Error Based Prioritization
8.4.3.2.
Importance Sampling Corrections
8.4.3.3.
Implementation Details
8.4.4.
Rainbow DQN
8.4.4.1.
Combining Multiple Improvements
8.4.4.2.
Distributional RL
8.4.4.3.
Noisy Networks
8.5.
Deep RL for Continuous Actions
8.5.1.
Challenges with Continuous Actions
8.5.2.
Deep Deterministic Policy Gradient (DDPG)
8.5.2.1.
Actor-Critic Architecture
8.5.2.2.
Deterministic Policy Gradients
8.5.2.3.
Exploration Strategies
8.5.3.
Twin Delayed DDPG (TD3)
8.5.3.1.
Addressing Overestimation
8.5.3.2.
Delayed Policy Updates
8.5.3.3.
Target Policy Smoothing
Previous
7. Function Approximation
Go to top
Next
9. Policy Gradient Methods