UsefulLinks
Computer Science
Data Science
Data Engineering
1. Introduction to Data Engineering
2. Foundational Programming Skills
3. Computer Science and Software Engineering Foundations
4. Database Systems and Data Storage
5. Data Warehousing and Analytics
6. Modern Data Storage Architectures
7. Batch Data Processing Systems
8. Stream Processing and Real-Time Data
9. Data Pipeline Architecture and Orchestration
10. Cloud Data Engineering Platforms
11. Data Operations and Infrastructure Management
12. Data Governance, Quality, and Security
13. Advanced Data Engineering Topics
6.
Modern Data Storage Architectures
6.1.
Data Lake Concepts
6.1.1.
Centralized Storage Architecture
6.1.1.1.
Raw Data Preservation
6.1.1.2.
Multi-Format Support
6.1.1.3.
Scalable Storage Solutions
6.1.2.
Schema-on-Read vs. Schema-on-Write
6.1.2.1.
Flexibility vs. Structure Trade-offs
6.1.2.2.
Data Discovery Challenges
6.1.2.3.
Query Performance Implications
6.1.3.
Data Lake Zones
6.1.3.1.
Raw Data Zone
6.1.3.2.
Processed Data Zone
6.1.3.3.
Curated Data Zone
6.1.3.4.
Sandbox Zone
6.2.
Big Data File Formats
6.2.1.
Apache Parquet
6.2.1.1.
Columnar Storage Benefits
6.2.1.2.
Compression Efficiency
6.2.1.3.
Query Performance Optimization
6.2.2.
Apache Avro
6.2.2.1.
Schema Evolution Support
6.2.2.2.
Serialization Efficiency
6.2.2.3.
Cross-Language Compatibility
6.2.3.
ORC Format
6.2.3.1.
Optimized Row Columnar Structure
6.2.3.2.
Hive Integration
6.2.3.3.
Compression and Indexing
6.2.4.
Traditional Formats
6.2.4.1.
CSV File Handling
6.2.4.2.
JSON Data Processing
6.2.4.3.
XML Data Management
6.2.5.
Compression Techniques
6.2.5.1.
Compression Algorithm Selection
6.2.5.2.
Storage vs. Processing Trade-offs
6.2.5.3.
Compression Ratio Analysis
6.3.
Data Lakehouse Architecture
6.3.1.
Unified Storage and Processing
6.3.1.1.
ACID Transaction Support
6.3.1.2.
Schema Enforcement Options
6.3.1.3.
Time Travel Capabilities
6.3.2.
Delta Lake Implementation
6.3.2.1.
Versioning and Rollback
6.3.2.2.
Concurrent Read/Write Operations
6.3.2.3.
Data Quality Enforcement
6.3.3.
Apache Iceberg Features
6.3.3.1.
Table Format Specifications
6.3.3.2.
Partition Evolution
6.3.3.3.
Hidden Partitioning
6.4.
Data Mesh Principles
6.4.1.
Domain-Oriented Data Ownership
6.4.1.1.
Business Domain Alignment
6.4.1.2.
Decentralized Data Management
6.4.1.3.
Domain Team Responsibilities
6.4.2.
Data as a Product
6.4.2.1.
Product Thinking for Data
6.4.2.2.
Data Product Lifecycle
6.4.2.3.
Consumer-Centric Design
6.4.3.
Self-Serve Data Infrastructure
6.4.3.1.
Platform Abstraction
6.4.3.2.
Developer Experience
6.4.3.3.
Automated Data Operations
6.4.4.
Federated Computational Governance
6.4.4.1.
Global Standards and Policies
6.4.4.2.
Local Implementation Flexibility
6.4.4.3.
Automated Compliance Monitoring
Previous
5. Data Warehousing and Analytics
Go to top
Next
7. Batch Data Processing Systems