Python for Data Science

  1. Introduction to Machine Learning with Scikit-learn
    1. Machine Learning Fundamentals
      1. Core Concepts
        1. What is Machine Learning
          1. Types of Machine Learning Problems
            1. Supervised Learning
              1. Classification Problems
                1. Regression Problems
                2. Unsupervised Learning
                  1. Clustering
                    1. Dimensionality Reduction
                      1. Association Rules
                      2. Semi-supervised Learning
                        1. Reinforcement Learning
                        2. Machine Learning Workflow
                          1. Problem Definition
                            1. Data Collection and Exploration
                              1. Data Preprocessing
                                1. Model Selection
                                  1. Training and Validation
                                    1. Model Evaluation
                                      1. Deployment and Monitoring
                                    2. Key Terminology
                                      1. Features and Target Variables
                                        1. Training and Test Sets
                                          1. Overfitting and Underfitting
                                            1. Bias-Variance Tradeoff
                                              1. Cross-validation
                                                1. Hyperparameters
                                                2. Scikit-learn Ecosystem
                                                  1. Library Overview
                                                    1. Integration with NumPy and Pandas
                                                      1. Consistent API Design
                                                        1. Community and Documentation
                                                      2. Scikit-learn API and Design Patterns
                                                        1. Core API Components
                                                          1. Estimator Objects
                                                            1. Estimator Interface
                                                              1. Parameter Setting
                                                                1. State Management
                                                                2. Predictor Interface
                                                                  1. predict() Method
                                                                    1. predict_proba() Method
                                                                      1. decision_function() Method
                                                                      2. Transformer Interface
                                                                        1. fit() Method
                                                                          1. transform() Method
                                                                            1. fit_transform() Method
                                                                          2. API Consistency
                                                                            1. Method Naming Conventions
                                                                              1. Parameter Conventions
                                                                                1. Return Value Patterns
                                                                                2. Estimator Types
                                                                                  1. Classifiers
                                                                                    1. Regressors
                                                                                      1. Clusterers
                                                                                        1. Transformers
                                                                                      2. Data Preprocessing and Feature Engineering
                                                                                        1. Data Preparation Workflow
                                                                                          1. Data Quality Assessment
                                                                                            1. Missing Data Strategies
                                                                                              1. Outlier Detection and Treatment
                                                                                              2. Feature Scaling and Normalization
                                                                                                1. Why Scaling Matters
                                                                                                  1. Standardization
                                                                                                    1. StandardScaler
                                                                                                      1. Z-score Normalization
                                                                                                        1. Robust Scaling
                                                                                                        2. Normalization
                                                                                                          1. MinMaxScaler
                                                                                                            1. Range Scaling
                                                                                                              1. Unit Vector Scaling
                                                                                                              2. When to Apply Scaling
                                                                                                              3. Categorical Data Encoding
                                                                                                                1. Label Encoding
                                                                                                                  1. LabelEncoder
                                                                                                                    1. Ordinal Encoding
                                                                                                                    2. One-Hot Encoding
                                                                                                                      1. OneHotEncoder
                                                                                                                        1. get_dummies() Alternative
                                                                                                                          1. Handling High Cardinality
                                                                                                                          2. Target Encoding
                                                                                                                            1. Binary Encoding
                                                                                                                            2. Feature Selection
                                                                                                                              1. Filter Methods
                                                                                                                                1. Statistical Tests
                                                                                                                                  1. Correlation Analysis
                                                                                                                                  2. Wrapper Methods
                                                                                                                                    1. Recursive Feature Elimination
                                                                                                                                      1. Forward/Backward Selection
                                                                                                                                      2. Embedded Methods
                                                                                                                                        1. L1 Regularization
                                                                                                                                          1. Tree-based Feature Importance
                                                                                                                                        2. Feature Creation
                                                                                                                                          1. Polynomial Features
                                                                                                                                            1. Interaction Terms
                                                                                                                                              1. Domain-specific Features
                                                                                                                                              2. Data Splitting
                                                                                                                                                1. Train-Test Split
                                                                                                                                                  1. train_test_split() Function
                                                                                                                                                    1. Stratification
                                                                                                                                                      1. Random State
                                                                                                                                                      2. Train-Validation-Test Split
                                                                                                                                                        1. Time Series Splitting
                                                                                                                                                        2. Pipeline Construction
                                                                                                                                                          1. Pipeline Concept
                                                                                                                                                            1. Creating Pipelines
                                                                                                                                                              1. Pipeline Class
                                                                                                                                                                1. make_pipeline() Function
                                                                                                                                                                2. Pipeline Benefits
                                                                                                                                                                  1. Code Organization
                                                                                                                                                                    1. Parameter Tuning
                                                                                                                                                                      1. Avoiding Data Leakage
                                                                                                                                                                      2. Column Transformers
                                                                                                                                                                        1. ColumnTransformer Class
                                                                                                                                                                          1. Heterogeneous Data Processing
                                                                                                                                                                      3. Supervised Learning Algorithms
                                                                                                                                                                        1. Linear Models
                                                                                                                                                                          1. Linear Regression
                                                                                                                                                                            1. Ordinary Least Squares
                                                                                                                                                                              1. Assumptions and Limitations
                                                                                                                                                                                1. Coefficient Interpretation
                                                                                                                                                                                2. Regularized Linear Models
                                                                                                                                                                                  1. Ridge Regression
                                                                                                                                                                                    1. L2 Regularization
                                                                                                                                                                                      1. Alpha Parameter
                                                                                                                                                                                      2. Lasso Regression
                                                                                                                                                                                        1. L1 Regularization
                                                                                                                                                                                          1. Feature Selection Properties
                                                                                                                                                                                          2. Elastic Net
                                                                                                                                                                                            1. Combined L1 and L2
                                                                                                                                                                                              1. l1_ratio Parameter
                                                                                                                                                                                            2. Logistic Regression
                                                                                                                                                                                              1. Binary Classification
                                                                                                                                                                                                1. Multiclass Extensions
                                                                                                                                                                                                  1. Probability Interpretation
                                                                                                                                                                                                    1. Regularization Options
                                                                                                                                                                                                  2. Tree-Based Models
                                                                                                                                                                                                    1. Decision Trees
                                                                                                                                                                                                      1. Tree Construction Algorithm
                                                                                                                                                                                                        1. Splitting Criteria
                                                                                                                                                                                                          1. Gini Impurity
                                                                                                                                                                                                            1. Entropy
                                                                                                                                                                                                              1. Mean Squared Error
                                                                                                                                                                                                              2. Tree Pruning
                                                                                                                                                                                                                1. Hyperparameters
                                                                                                                                                                                                                  1. max_depth
                                                                                                                                                                                                                    1. min_samples_split
                                                                                                                                                                                                                      1. min_samples_leaf
                                                                                                                                                                                                                    2. Ensemble Methods
                                                                                                                                                                                                                      1. Random Forest
                                                                                                                                                                                                                        1. Bootstrap Aggregating
                                                                                                                                                                                                                          1. Feature Randomness
                                                                                                                                                                                                                            1. Out-of-bag Error
                                                                                                                                                                                                                              1. Feature Importance
                                                                                                                                                                                                                              2. Gradient Boosting
                                                                                                                                                                                                                                1. Boosting Concept
                                                                                                                                                                                                                                  1. GradientBoostingClassifier
                                                                                                                                                                                                                                    1. GradientBoostingRegressor
                                                                                                                                                                                                                                      1. Learning Rate and Estimators
                                                                                                                                                                                                                                      2. Extra Trees
                                                                                                                                                                                                                                        1. Extremely Randomized Trees
                                                                                                                                                                                                                                          1. Differences from Random Forest
                                                                                                                                                                                                                                      3. Support Vector Machines
                                                                                                                                                                                                                                        1. SVM Concepts
                                                                                                                                                                                                                                          1. Maximum Margin Principle
                                                                                                                                                                                                                                            1. Support Vectors
                                                                                                                                                                                                                                              1. Kernel Trick
                                                                                                                                                                                                                                              2. SVM for Classification
                                                                                                                                                                                                                                                1. Linear SVM
                                                                                                                                                                                                                                                  1. Non-linear SVM
                                                                                                                                                                                                                                                    1. Kernel Functions
                                                                                                                                                                                                                                                      1. RBF Kernel
                                                                                                                                                                                                                                                        1. Polynomial Kernel
                                                                                                                                                                                                                                                          1. Sigmoid Kernel
                                                                                                                                                                                                                                                        2. SVM for Regression
                                                                                                                                                                                                                                                          1. Support Vector Regression
                                                                                                                                                                                                                                                            1. Epsilon Parameter
                                                                                                                                                                                                                                                            2. SVM Hyperparameters
                                                                                                                                                                                                                                                              1. C Parameter
                                                                                                                                                                                                                                                                1. Gamma Parameter
                                                                                                                                                                                                                                                                  1. Kernel Selection
                                                                                                                                                                                                                                                                2. Instance-Based Learning
                                                                                                                                                                                                                                                                  1. K-Nearest Neighbors
                                                                                                                                                                                                                                                                    1. KNN Algorithm
                                                                                                                                                                                                                                                                      1. Distance Metrics
                                                                                                                                                                                                                                                                        1. Euclidean Distance
                                                                                                                                                                                                                                                                          1. Manhattan Distance
                                                                                                                                                                                                                                                                            1. Minkowski Distance
                                                                                                                                                                                                                                                                            2. KNN for Classification
                                                                                                                                                                                                                                                                              1. KNN for Regression
                                                                                                                                                                                                                                                                                1. Choosing K
                                                                                                                                                                                                                                                                                  1. Curse of Dimensionality
                                                                                                                                                                                                                                                                                2. Naive Bayes
                                                                                                                                                                                                                                                                                  1. Bayes' Theorem
                                                                                                                                                                                                                                                                                    1. Naive Assumption
                                                                                                                                                                                                                                                                                      1. Gaussian Naive Bayes
                                                                                                                                                                                                                                                                                        1. Multinomial Naive Bayes
                                                                                                                                                                                                                                                                                          1. Bernoulli Naive Bayes
                                                                                                                                                                                                                                                                                        2. Unsupervised Learning Algorithms
                                                                                                                                                                                                                                                                                          1. Clustering Algorithms
                                                                                                                                                                                                                                                                                            1. K-Means Clustering
                                                                                                                                                                                                                                                                                              1. Algorithm Steps
                                                                                                                                                                                                                                                                                                1. Centroid Initialization
                                                                                                                                                                                                                                                                                                  1. Choosing Number of Clusters
                                                                                                                                                                                                                                                                                                    1. Elbow Method
                                                                                                                                                                                                                                                                                                      1. Silhouette Analysis
                                                                                                                                                                                                                                                                                                      2. K-Means Limitations
                                                                                                                                                                                                                                                                                                      3. Hierarchical Clustering
                                                                                                                                                                                                                                                                                                        1. Agglomerative Clustering
                                                                                                                                                                                                                                                                                                          1. Linkage Criteria
                                                                                                                                                                                                                                                                                                            1. Single Linkage
                                                                                                                                                                                                                                                                                                              1. Complete Linkage
                                                                                                                                                                                                                                                                                                                1. Average Linkage
                                                                                                                                                                                                                                                                                                                  1. Ward Linkage
                                                                                                                                                                                                                                                                                                                  2. Dendrogram Interpretation
                                                                                                                                                                                                                                                                                                                  3. Density-Based Clustering
                                                                                                                                                                                                                                                                                                                    1. DBSCAN Algorithm
                                                                                                                                                                                                                                                                                                                      1. Core Points and Noise
                                                                                                                                                                                                                                                                                                                        1. Epsilon and MinPts Parameters
                                                                                                                                                                                                                                                                                                                          1. Handling Irregular Shapes
                                                                                                                                                                                                                                                                                                                          2. Other Clustering Methods
                                                                                                                                                                                                                                                                                                                            1. Mean Shift
                                                                                                                                                                                                                                                                                                                              1. Spectral Clustering
                                                                                                                                                                                                                                                                                                                                1. Gaussian Mixture Models
                                                                                                                                                                                                                                                                                                                              2. Dimensionality Reduction
                                                                                                                                                                                                                                                                                                                                1. Principal Component Analysis (PCA)
                                                                                                                                                                                                                                                                                                                                  1. PCA Concept
                                                                                                                                                                                                                                                                                                                                    1. Eigenvalues and Eigenvectors
                                                                                                                                                                                                                                                                                                                                      1. Explained Variance
                                                                                                                                                                                                                                                                                                                                        1. Number of Components Selection
                                                                                                                                                                                                                                                                                                                                          1. PCA for Visualization
                                                                                                                                                                                                                                                                                                                                            1. PCA Limitations
                                                                                                                                                                                                                                                                                                                                            2. Other Dimensionality Reduction Techniques
                                                                                                                                                                                                                                                                                                                                              1. Linear Discriminant Analysis (LDA)
                                                                                                                                                                                                                                                                                                                                                1. t-SNE
                                                                                                                                                                                                                                                                                                                                                  1. UMAP
                                                                                                                                                                                                                                                                                                                                                    1. Factor Analysis
                                                                                                                                                                                                                                                                                                                                                2. Model Evaluation and Selection
                                                                                                                                                                                                                                                                                                                                                  1. Evaluation Metrics
                                                                                                                                                                                                                                                                                                                                                    1. Regression Metrics
                                                                                                                                                                                                                                                                                                                                                      1. Mean Absolute Error (MAE)
                                                                                                                                                                                                                                                                                                                                                        1. Mean Squared Error (MSE)
                                                                                                                                                                                                                                                                                                                                                          1. Root Mean Squared Error (RMSE)
                                                                                                                                                                                                                                                                                                                                                            1. R-squared (Coefficient of Determination)
                                                                                                                                                                                                                                                                                                                                                              1. Adjusted R-squared
                                                                                                                                                                                                                                                                                                                                                                1. Mean Absolute Percentage Error (MAPE)
                                                                                                                                                                                                                                                                                                                                                                2. Classification Metrics
                                                                                                                                                                                                                                                                                                                                                                  1. Accuracy
                                                                                                                                                                                                                                                                                                                                                                    1. Precision
                                                                                                                                                                                                                                                                                                                                                                      1. Recall (Sensitivity)
                                                                                                                                                                                                                                                                                                                                                                        1. F1-Score
                                                                                                                                                                                                                                                                                                                                                                          1. Specificity
                                                                                                                                                                                                                                                                                                                                                                            1. Confusion Matrix
                                                                                                                                                                                                                                                                                                                                                                              1. True/False Positives/Negatives
                                                                                                                                                                                                                                                                                                                                                                                1. Matrix Interpretation
                                                                                                                                                                                                                                                                                                                                                                                2. ROC Curve
                                                                                                                                                                                                                                                                                                                                                                                  1. True Positive Rate
                                                                                                                                                                                                                                                                                                                                                                                    1. False Positive Rate
                                                                                                                                                                                                                                                                                                                                                                                      1. AUC (Area Under Curve)
                                                                                                                                                                                                                                                                                                                                                                                      2. Precision-Recall Curve
                                                                                                                                                                                                                                                                                                                                                                                        1. Multi-class Metrics
                                                                                                                                                                                                                                                                                                                                                                                          1. Macro Averaging
                                                                                                                                                                                                                                                                                                                                                                                            1. Micro Averaging
                                                                                                                                                                                                                                                                                                                                                                                              1. Weighted Averaging
                                                                                                                                                                                                                                                                                                                                                                                          2. Cross-Validation
                                                                                                                                                                                                                                                                                                                                                                                            1. Cross-Validation Concept
                                                                                                                                                                                                                                                                                                                                                                                              1. K-Fold Cross-Validation
                                                                                                                                                                                                                                                                                                                                                                                                1. Fold Selection
                                                                                                                                                                                                                                                                                                                                                                                                  1. Stratified K-Fold
                                                                                                                                                                                                                                                                                                                                                                                                    1. Repeated K-Fold
                                                                                                                                                                                                                                                                                                                                                                                                    2. Leave-One-Out Cross-Validation
                                                                                                                                                                                                                                                                                                                                                                                                      1. Time Series Cross-Validation
                                                                                                                                                                                                                                                                                                                                                                                                        1. Cross-Validation Scoring
                                                                                                                                                                                                                                                                                                                                                                                                        2. Model Selection Strategies
                                                                                                                                                                                                                                                                                                                                                                                                          1. Hyperparameter Tuning
                                                                                                                                                                                                                                                                                                                                                                                                            1. Grid Search
                                                                                                                                                                                                                                                                                                                                                                                                              1. GridSearchCV
                                                                                                                                                                                                                                                                                                                                                                                                                1. Parameter Grids
                                                                                                                                                                                                                                                                                                                                                                                                                  1. Exhaustive Search
                                                                                                                                                                                                                                                                                                                                                                                                                  2. Random Search
                                                                                                                                                                                                                                                                                                                                                                                                                    1. RandomizedSearchCV
                                                                                                                                                                                                                                                                                                                                                                                                                      1. Sampling Distributions
                                                                                                                                                                                                                                                                                                                                                                                                                        1. Efficiency Benefits
                                                                                                                                                                                                                                                                                                                                                                                                                        2. Bayesian Optimization
                                                                                                                                                                                                                                                                                                                                                                                                                        3. Model Comparison
                                                                                                                                                                                                                                                                                                                                                                                                                          1. Cross-Validation Scores
                                                                                                                                                                                                                                                                                                                                                                                                                            1. Statistical Significance
                                                                                                                                                                                                                                                                                                                                                                                                                              1. Learning Curves
                                                                                                                                                                                                                                                                                                                                                                                                                                1. Validation Curves
                                                                                                                                                                                                                                                                                                                                                                                                                              2. Model Diagnostics
                                                                                                                                                                                                                                                                                                                                                                                                                                1. Overfitting Detection
                                                                                                                                                                                                                                                                                                                                                                                                                                  1. Training vs Validation Performance
                                                                                                                                                                                                                                                                                                                                                                                                                                    1. Learning Curves
                                                                                                                                                                                                                                                                                                                                                                                                                                    2. Underfitting Detection
                                                                                                                                                                                                                                                                                                                                                                                                                                      1. Model Complexity Analysis
                                                                                                                                                                                                                                                                                                                                                                                                                                      2. Bias-Variance Analysis
                                                                                                                                                                                                                                                                                                                                                                                                                                        1. Residual Analysis
                                                                                                                                                                                                                                                                                                                                                                                                                                          1. Residual Plots
                                                                                                                                                                                                                                                                                                                                                                                                                                            1. Normality Tests
                                                                                                                                                                                                                                                                                                                                                                                                                                              1. Homoscedasticity