Useful Links
Computer Science
Artificial Intelligence
Deep Learning
Reinforcement Learning
1. Foundations of Reinforcement Learning
2. Mathematical Foundations
3. Markov Decision Processes
4. Dynamic Programming
5. Monte Carlo Methods
6. Temporal-Difference Learning
7. Function Approximation
8. Deep Reinforcement Learning
9. Policy Gradient Methods
10. Advanced Topics
11. Implementation and Practical Considerations
12. Applications and Case Studies
Policy Gradient Methods
Introduction to Policy-Based Methods
Direct Policy Optimization
Advantages over Value-Based Methods
Policy Parameterization
Policy Gradient Theorem
Mathematical Derivation
Score Function Estimator
Gradient Estimation
REINFORCE Algorithm
Monte Carlo Policy Gradient
Algorithm Implementation
Variance Issues
Baseline Methods
State-Value Baselines
Advantage Estimation
Actor-Critic Methods
Combining Policy and Value Learning
Actor Network (Policy)
Critic Network (Value Function)
Training Procedures
Advanced Actor-Critic Methods
Advantage Actor-Critic (A2C)
Advantage Function Estimation
Synchronous Updates
Asynchronous Advantage Actor-Critic (A3C)
Parallel Training
Asynchronous Updates
Exploration Benefits
Generalized Advantage Estimation (GAE)
Bias-Variance Trade-off
λ-Return Estimation
Trust Region Methods
Trust Region Policy Optimization (TRPO)
Monotonic Improvement
KL Divergence Constraints
Natural Policy Gradients
Proximal Policy Optimization (PPO)
Clipped Surrogate Objective
Practical Implementation
Performance Characteristics
Continuous Action Spaces
Gaussian Policies
Policy Parameterization
Exploration in Continuous Spaces
Previous
8. Deep Reinforcement Learning
Go to top
Next
10. Advanced Topics