Useful Links
Computer Science
Data Science
Real-Time Analytics and Stream Processing
1. Introduction to Stream Processing
2. Fundamental Concepts
3. System Architectures for Real-Time Data
4. Core Components of Streaming Pipelines
5. Stream Processing Frameworks and Technologies
6. Data Formats and Serialization
7. Algorithms and Analytics on Streams
8. State Management and Fault Tolerance
9. Real-World Applications and Use Cases
10. Operationalizing Streaming Systems
11. Advanced Topics and Future Trends
Stream Processing Frameworks and Technologies
Apache Flink
Architecture and Core Concepts
JobManager and TaskManager Roles
Job Coordination
Task Execution
Resource Management
Event Time and Watermarks in Flink
Built-in Watermark Support
Custom Watermark Strategies
DataStream API
Stream Transformations
Map and FlatMap Operations
Filter and KeyBy Operations
Windowing and State Management
Window Operators
State APIs
Table API and SQL
Declarative Stream Processing
SQL Query Support
Table Abstractions
Integration with Batch Processing
Unified APIs
Catalog Integration
State Management and Checkpointing
Consistent State Snapshots
Distributed Snapshots
Exactly-Once Guarantees
Recovery from Failures
Automatic Restart Strategies
State Restoration
FlinkCEP for Complex Event Processing
Pattern Definition and Detection
Pattern API
Event Sequence Matching
CEP Use Cases
Fraud Detection
System Monitoring
Apache Spark Streaming
Discretized Streams
Micro-Batch Model
RDD-Based Processing
Batch Interval Configuration
Fault Tolerance in D-Streams
RDD Lineage
Checkpoint Recovery
Structured Streaming
Unified Batch and Streaming API
DataFrame and Dataset APIs
Catalyst Optimizer
Event Time and Watermark Support
Built-in Time Handling
Late Data Management
Spark Ecosystem Integration
DataFrames and SQL
Spark SQL Integration
Catalog Support
Machine Learning Integration
MLlib Streaming
Model Serving
Apache Kafka Ecosystem
Kafka as Distributed Log
Partitioning and Replication
Topic Partitioning
Replica Management
Log Retention and Compaction
Time-Based Retention
Key-Based Compaction
Kafka Connect for Integration
Source and Sink Connectors
Database Connectors
File System Connectors
Integrating with External Systems
Schema Registry Integration
Transformation Capabilities
Kafka Streams for Stream Processing
Stream Transformations and Aggregations
KStream and KTable APIs
Stateful Operations
State Stores and Fault Tolerance
Local State Stores
Changelog Topics
ksqlDB for Streaming SQL
Declarative Stream Processing
SQL-Based Stream Processing
Materialized Views
Real-Time Querying
Interactive Queries
Push and Pull Queries
Additional Frameworks
Apache Storm
Topology-Based Processing
DAG Processing Model
Real-Time Guarantees
Spouts and Bolts
Data Source Components
Processing Logic Components
Apache Samza
Partitioned State Management
Local State Storage
Fault Tolerance
Kafka Integration
Native Kafka Support
Stream Partitioning
Cloud-Native Solutions
Google Cloud Dataflow
Apache Beam Runtime
Unified Batch and Stream Model
Amazon Kinesis
Shard-Based Scaling
Kinesis Data Analytics
Azure Stream Analytics
SQL-Based Processing
IoT Integration
Previous
4. Core Components of Streaming Pipelines
Go to top
Next
6. Data Formats and Serialization