Useful Links
Computer Science
Big Data
Apache Spark
1. Introduction to Apache Spark
2. Core Spark Concepts
3. Spark Architecture and Execution
4. Spark SQL and Structured APIs
5. Structured Streaming
6. Machine Learning with MLlib
7. Graph Processing with GraphX
8. Performance Tuning and Optimization
Performance Tuning and Optimization
Monitoring and Debugging
Spark Web UI
Jobs Tab Analysis
Stages Tab Insights
Storage Tab Monitoring
Environment Tab Review
Executors Tab Metrics
SQL Tab Query Plans
Logging and Metrics
Log Level Configuration
Custom Metrics
External Monitoring Integration
Performance Profiling
CPU Profiling
Memory Profiling
I/O Analysis
Memory Management Optimization
Memory Architecture
Execution Memory
Storage Memory
Unified Memory Manager
Garbage Collection Tuning
GC Algorithm Selection
GC Parameter Tuning
Memory Pressure Management
Memory Overhead Optimization
Off-Heap Storage
Memory Fraction Tuning
Spill Management
Serialization Optimization
Serialization Formats
Java Serialization
Kryo Serialization
Custom Serializers
Serialization Configuration
Kryo Registration
Buffer Size Tuning
Compression Options
Data Layout and Partitioning
Partitioning Strategies
Hash Partitioning
Range Partitioning
Custom Partitioning
Partition Management
Optimal Partition Count
Partition Size Guidelines
Repartitioning vs Coalescing
Data Skew Handling
Skew Detection
Salting Techniques
Broadcast Joins
Caching and Persistence
Storage Levels
Memory-Only Storage
Memory-and-Disk Storage
Disk-Only Storage
Serialized Storage
Replication Options
Caching Strategies
When to Cache
Cache Eviction
Cache Monitoring
Persistence Best Practices
Checkpoint Usage
Lineage Management
Join Optimization
Join Strategies
Broadcast Hash Join
Shuffle Hash Join
Sort-Merge Join
Bucket Join
Join Hints
Broadcast Hints
Shuffle Hints
Merge Hints
Skewed Join Handling
Skew Detection
Adaptive Query Execution
Common Performance Issues
Data Skew Problems
Identification Methods
Mitigation Strategies
Shuffle Optimization
Shuffle Partitions Tuning
Shuffle Service Configuration
Small Files Problem
File Consolidation
Compaction Strategies
Resource Contention
CPU Bottlenecks
Memory Bottlenecks
I/O Bottlenecks
Task Scheduling Issues
Straggler Tasks
Task Locality
Dynamic Allocation
Previous
7. Graph Processing with GraphX
Go to top
Back to Start
1. Introduction to Apache Spark