Useful Links
Computer Science
Big Data
Apache Spark
1. Introduction to Apache Spark
2. Core Spark Concepts
3. Spark Architecture and Execution
4. Spark SQL and Structured APIs
5. Structured Streaming
6. Machine Learning with MLlib
7. Graph Processing with GraphX
8. Performance Tuning and Optimization
Spark Architecture and Execution
Job Execution Model
Job Lifecycle
Job Definition
Job Submission
Job Completion
Stage Creation
Stage Boundaries
Shuffle Dependencies
Stage Scheduling
Task Management
Task Creation
Task Assignment
Task Execution
Task Recovery
Directed Acyclic Graph
DAG Construction
Transformation Graph Building
Dependency Analysis
DAG Scheduler
Stage Division Logic
Optimization Strategies
Fault Recovery Planning
Task Scheduling
Task Scheduler Components
Task Assignment Algorithms
Locality Preferences
Resource Allocation
Execution Flow
Task Serialization
Result Collection
Failure Handling
Cluster Deployment Options
Standalone Cluster Mode
Master-Worker Architecture
Resource Management
Configuration Options
YARN Integration
Resource Manager Integration
Container Management
Security Features
Mesos Integration
Framework Registration
Resource Offers
Fine-Grained vs Coarse-Grained
Kubernetes Integration
Pod Management
Dynamic Allocation
Container Orchestration
Deployment Modes
Client Mode
Driver Location
Network Requirements
Use Case Scenarios
Cluster Mode
Driver Deployment
Resource Isolation
Production Considerations
Previous
2. Core Spark Concepts
Go to top
Next
4. Spark SQL and Structured APIs