Useful Links
Computer Science
Big Data
Apache Spark
1. Introduction to Apache Spark
2. Core Spark Concepts
3. Spark Architecture and Execution
4. Spark SQL and Structured APIs
5. Structured Streaming
6. Machine Learning with MLlib
7. Graph Processing with GraphX
8. Performance Tuning and Optimization
Machine Learning with MLlib
MLlib Overview
Library Architecture
RDD-Based API
DataFrame-Based API
Pipeline Integration
Language Support
Scala Implementation
Java Bindings
Python Integration
R Interface
Comparison with Other Libraries
Scikit-learn Integration
TensorFlow Compatibility
Distributed vs Single-Machine
ML Pipeline Framework
Pipeline Components
Transformer Interface
Feature Transformation
Model Application
Estimator Interface
Model Training
Parameter Learning
Pipeline Construction
Stage Composition
Parameter Passing
Model Selection
Cross-Validation
K-Fold Validation
Train-Validation Split
Parameter Grid Search
Hyperparameter Tuning
Grid Construction
Model Persistence
Model Saving
Model Loading
Version Management
Feature Engineering
Feature Extraction
Text Processing
Tokenization
Stop Word Removal
N-Gram Generation
Hashing Features
HashingTF
Feature Hashing Benefits
Word Embeddings
Word2Vec Implementation
Vector Representations
Feature Transformation
Scaling Operations
StandardScaler
MinMaxScaler
MaxAbsScaler
Encoding Operations
OneHotEncoder
StringIndexer
IndexToString
Mathematical Transformations
Polynomial Features
Interaction Features
Feature Selection
Statistical Selection
ChiSqSelector
Correlation Analysis
Dimensionality Reduction
VectorSlicer
Feature Importance
Supervised Learning
Classification Algorithms
Linear Models
Logistic Regression
Linear SVM
Tree-Based Models
Decision Trees
Random Forest
Gradient-Boosted Trees
Neural Networks
Multilayer Perceptron
Deep Learning Integration
Ensemble Methods
Voting Classifiers
Stacking
Regression Algorithms
Linear Regression
Ordinary Least Squares
Ridge Regression
Lasso Regression
Tree-Based Regression
Decision Tree Regression
Random Forest Regression
Gradient-Boosted Regression
Generalized Linear Models
Poisson Regression
Gamma Regression
Unsupervised Learning
Clustering Algorithms
Partitioning Methods
K-Means Clustering
K-Means++
Bisecting K-Means
Probabilistic Models
Gaussian Mixture Models
Expectation-Maximization
Topic Modeling
Latent Dirichlet Allocation
Topic Discovery
Dimensionality Reduction
Principal Component Analysis
Variance Explanation
Component Selection
Singular Value Decomposition
Matrix Factorization
Low-Rank Approximation
Model Evaluation and Metrics
Classification Metrics
Accuracy Measures
Precision and Recall
F1 Score Calculation
ROC and AUC Analysis
Confusion Matrix
Regression Metrics
Error Measures
Mean Squared Error
Root Mean Squared Error
Mean Absolute Error
Goodness of Fit
R-squared
Adjusted R-squared
Clustering Evaluation
Internal Metrics
Silhouette Analysis
Within-Cluster Sum of Squares
External Metrics
Adjusted Rand Index
Normalized Mutual Information
Previous
5. Structured Streaming
Go to top
Next
7. Graph Processing with GraphX