Computational Linguistics

  1. Evaluation Methodologies
    1. Evaluation Paradigms
      1. Intrinsic vs. Extrinsic Evaluation
        1. Task-Specific Metrics
          1. End-to-End Evaluation
            1. Human Evaluation
            2. Automatic vs. Manual Evaluation
              1. Scalability Considerations
                1. Quality Trade-offs
                  1. Hybrid Approaches
                2. Experimental Design
                  1. Hypothesis Formation
                    1. Research Questions
                      1. Variable Identification
                        1. Control Conditions
                        2. Data Splitting
                          1. Training-Validation-Test Splits
                            1. Cross-Validation
                              1. Temporal Splits
                              2. Baseline Establishment
                                1. Simple Baselines
                                  1. State-of-the-Art Comparisons
                                    1. Human Performance
                                  2. Statistical Analysis
                                    1. Significance Testing
                                      1. Null Hypothesis Testing
                                        1. Type I and Type II Errors
                                          1. Multiple Comparisons
                                          2. Effect Size Measurement
                                            1. Cohen's d
                                              1. Practical Significance
                                                1. Confidence Intervals
                                                2. Bootstrap Methods
                                                  1. Resampling Techniques
                                                    1. Confidence Estimation
                                                      1. Bias Correction
                                                    2. Evaluation Metrics
                                                      1. Classification Metrics
                                                        1. Accuracy and Error Rate
                                                          1. Precision and Recall
                                                            1. F-Measure Variants
                                                              1. ROC and AUC
                                                              2. Ranking Metrics
                                                                1. Mean Average Precision
                                                                  1. Normalized Discounted Cumulative Gain
                                                                    1. Reciprocal Rank
                                                                    2. Generation Metrics
                                                                      1. BLEU and ROUGE
                                                                        1. METEOR
                                                                          1. Human Evaluation Protocols
                                                                          2. Correlation Metrics
                                                                            1. Pearson Correlation
                                                                              1. Spearman Rank Correlation
                                                                                1. Kendall's Tau
                                                                              2. Reproducibility and Reliability
                                                                                1. Experimental Reproducibility
                                                                                  1. Code and Data Sharing
                                                                                    1. Documentation Standards
                                                                                      1. Version Control
                                                                                      2. Result Reliability
                                                                                        1. Multiple Runs
                                                                                          1. Statistical Power
                                                                                            1. Replication Studies
                                                                                            2. Ethical Considerations
                                                                                              1. Bias in Evaluation
                                                                                                1. Fairness Metrics
                                                                                                  1. Privacy Concerns