Useful Links
Computer Science
Artificial Intelligence
Deep Learning
Reinforcement Learning
1. Foundations of Reinforcement Learning
2. Mathematical Foundations
3. Markov Decision Processes
4. Dynamic Programming
5. Monte Carlo Methods
6. Temporal-Difference Learning
7. Function Approximation
8. Deep Reinforcement Learning
9. Policy Gradient Methods
10. Advanced Topics
11. Implementation and Practical Considerations
12. Applications and Case Studies
Temporal-Difference Learning
Introduction to TD Learning
Bootstrapping Concept
Sample-Based Updates
Online Learning Capability
TD Prediction
TD(0) Algorithm
Update Rule
Learning Rate Effects
Convergence Properties
Bias-Variance Properties
Biased but Lower Variance
Comparison with Monte Carlo
TD Control Methods
SARSA (On-Policy TD Control)
Algorithm Description
Update Rule
Convergence Properties
Exploration Strategies
Q-Learning (Off-Policy TD Control)
Algorithm Description
Off-Policy Nature
Convergence Guarantees
Exploration Independence
Expected SARSA
Algorithm Description
Reduced Variance
Performance Characteristics
Multi-Step TD Methods
n-Step TD Prediction
n-Step Returns
Bias-Variance Trade-off
Parameter Selection
n-Step SARSA
n-Step Q-Learning
TD(λ) Methods
Eligibility Traces
Trace Decay
Credit Assignment
TD(λ) Prediction
SARSA(λ)
Q(λ)
Comparing Learning Methods
Monte Carlo vs TD
Bias-Variance Analysis
Sample Efficiency
Computational Requirements
Previous
5. Monte Carlo Methods
Go to top
Next
7. Function Approximation