Useful Links
Computer Science
Big Data
Apache Spark
1. Introduction to Apache Spark
2. Core Spark Concepts
3. Spark Architecture and Execution
4. Spark SQL and Structured APIs
5. Structured Streaming
6. Machine Learning with MLlib
7. Graph Processing with GraphX
8. Performance Tuning and Optimization
Structured Streaming
Streaming Fundamentals
Streaming Model
Unbounded Table Concept
Micro-Batch Processing
Continuous Processing
Time Semantics
Processing Time
Event Time
Ingestion Time
Watermarking
Late Data Handling
Watermark Configuration
State Cleanup
Output Modes
Append Mode
Complete Mode
Update Mode
Streaming API Components
Input Sources
File Source
Directory Monitoring
File Format Support
Kafka Source
Topic Subscription
Offset Management
Consumer Configuration
Socket Source
TCP Connection
Text Stream Processing
Rate Source
Synthetic Data Generation
Testing Applications
Output Sinks
File Sink
Partitioning Strategies
File Format Options
Kafka Sink
Producer Configuration
Serialization Options
Console Sink
Debug Output
Development Testing
Foreach Sink
Custom Output Logic
External System Integration
Query Management
Query Lifecycle
Trigger Configuration
Processing Time Triggers
Once Triggers
Continuous Triggers
Query Monitoring
Windowing Operations
Window Types
Tumbling Windows
Fixed-Size Windows
Non-Overlapping Intervals
Sliding Windows
Overlapping Windows
Slide Duration Configuration
Session Windows
Gap-Based Grouping
Dynamic Window Sizing
Window Functions
Aggregation in Windows
Window Specifications
Time-Based Grouping
State Management
Stateful Operations
State Store Implementation
State Partitioning
State Evolution
Checkpointing
Checkpoint Configuration
Recovery Mechanisms
Checkpoint Storage
Fault Tolerance
Exactly-Once Semantics
At-Least-Once Processing
Failure Recovery
Previous
4. Spark SQL and Structured APIs
Go to top
Next
6. Machine Learning with MLlib