Category: Regression analysis

Smoothing spline

Smoothing splines are function estimates, , obtained from a set of noisy observations of the target , in order to balance a measure of goodness of fit of to with a derivative based measure of the smoo

Calibration (statistics)

There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. "Calibration" can mean * a reverse process to regression, where instead of a

Polynomial regression

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial

Interaction cost

Interaction cost can comprise work, costs, and other expenses, required to complete a task or interaction. This applies to several categories, including: * Economy: the interaction cost of a purchase

Scatterplot smoothing

In statistics, several scatterplot smoothing methods are available to fit a function through the points of a scatterplot to best represent the relationship between the variables. Scatterplots may be s

Principal component regression

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown re

Cross-sectional regression

In statistics and econometrics, a cross-sectional regression is a type of regression in which the explained and explanatory variables are all associated with the same single period or point in time. T

Fractional model

In applied statistics, fractional models are, to some extent, related to binary response models. However, instead of estimating the probability of being in one bin of a dichotomous variable, the fract

G-prior

In statistics, the g-prior is an objective prior for the regression coefficients of a multiple regression. It was introduced by Arnold Zellner.It is a key tool in Bayes and empirical Bayes variable se

Canonical analysis

In statistics, canonical analysis (from Ancient Greek: κανων bar, measuring rod, ruler) belongs to the family of regression methods for data analysis. Regression analysis quantifies a relationship bet

Smearing retransformation

The Smearing retransformation is used in regression analysis, after estimating the logarithm of a variable. Estimating the logarithm of a variable instead of the variable itself is a common technique

Outline of regression analysis

The following outline is provided as an overview of and topical guide to regression analysis: Regression analysis – use of statistical techniques for learning about the relationship between one or mor

Elastic net regularization

In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso

Unit-weighted regression

In statistics, unit-weighted regression is a simplified and robust version (Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated. That is, it fits a model

Conjoint analysis

Conjoint analysis is a survey-based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual pr

Sliced inverse regression

Sliced inverse regression (or SIR) is a tool for dimensionality reduction in the field of multivariate statistics. In statistics, regression analysis is a method of studying the relationship between a

Linear predictor function

In statistics and in machine learning, a linear predictor function is a linear function (linear combination) of a set of coefficients and explanatory variables (independent variables), whose value is

Frisch–Waugh–Lovell theorem

In econometrics, the Frisch–Waugh–Lovell (FWL) theorem is named after the econometricians Ragnar Frisch, Frederick V. Waugh, and Michael C. Lovell. The Frisch–Waugh–Lovell theorem states that if the r

Policy capturing

Policy capturing or "the PC technique" is a statistical method used in social psychology to quantify the relationship between a person's judgement and the information that was used to make that judgem

Limited dependent variable

A limited dependent variable is a variable whose range ofpossible values is "restricted in some important way." In econometrics, the term is often used whenestimation of the relationship between the l

Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including

Nonhomogeneous Gaussian regression

Non-homogeneous Gaussian regression (NGR) is a type of statistical regression analysis used in the atmospheric sciences as a way to convert ensemble forecasts into probabilistic forecasts. Relative to

Blinder–Oaxaca decomposition

The Blinder–Oaxaca decomposition is a statistical method that explains the difference in the means of a dependent variable between two groups by decomposing the gap into that part that is due to diffe

Regression toward the mean

In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is the fact that if one sample of a random variable is extreme, the next sampling of the same

Causal inference

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and infere

Homoscedasticity and heteroscedasticity

In statistics, a sequence (or a vector) of random variables is homoscedastic (/ˌhoʊmoʊskəˈdæstɪk/) if all its random variables have the same finite variance. This is also known as homogeneity of varia

Contrast (statistics)

In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing compar

Polygenic score

In genetics, a polygenic score (PGS), also called a polygenic risk score (PRS), polygenic index (PGI), genetic risk score, or genome-wide score, is a number that summarizes the estimated effect of man

Non-linear mixed-effects modeling software

Nonlinear mixed-effects models are a special case of regression analysis for which a range of different software solutions are available. The statistical properties of nonlinear mixed-effects models m

C+-probability

In statistics, a c+-probability is the probability that a contrast variable obtains a positive value.Using a replication probability, the c+-probability is defined as follows: if we get a random draw

Working–Hotelling procedure

In statistics, particularly regression analysis, the Working–Hotelling procedure, named after Holbrook Working and Harold Hotelling, is a method of simultaneous estimation in linear regression models.

Antecedent variable

In statistics and social sciences, an antecedent variable is a variable that can help to explain the apparent relationship (or part of the relationship) between other variables that are nominally in a

Standardized coefficient

In statistics, standardized (regression) coefficients, also called beta coefficients or beta weights, are the estimates resulting from a regression analysis where the underlying data have been standar

General regression neural network

Generalized regression neural network (GRNN) is a variation to radial basis neural networks. GRNN was suggested by D.F. Specht in 1991. GRNN can be used for regression, prediction, and classification.

Generated regressor

In least squares estimation problems, sometimes one or more regressors specified in the model are not observable. One way to circumvent this issue is to estimate or generate regressors from observable

Projection pursuit regression

In statistics, projection pursuit regression (PPR) is a statistical model developed by Jerome H. Friedman and which is an extension of additive models. This model adapts the additive models in that it

Propensity score matching

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other interventio

Omitted-variable bias

In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to t

Structural break

In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in

Identifiability analysis

Identifiability analysis is a group of methods found in mathematical statistics that are used to determine how well the parameters of a model are estimated by the quantity and quality of experimental

Simple linear regression

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one depend

Multinomial probit

In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such,

Lasso (statistics)

In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularizatio

Deming regression

In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset. It differs from the simple line

Instrumental variables estimation

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or wh

Suppressor variable

A suppressor variable is a variable that increases the predictive validity of another variable when included in a regression equation. Suppression can occur when a single causal variable is related to

Commonality analysis

Commonality analysis is a statistical technique within multiple linear regression that decomposes a model's R2 statistic (i.e., explained variance) by all independent variables on a dependent variable

Difference in differences

Difference in differences (DID or DD) is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using obse

Explained variation

In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation (dispersion) of a given data set. Often, variation is quantified as variance; then,

Optimal design

In the design of experiments, optimal designs (or optimum designs) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statis

Simalto

SIMALTO – SImultaneous Multi-Attribute Trade Off – is a survey based statistical technique used in market research that helps determine how people prioritise and value alternative product and/or servi

Moderation (statistics)

In statistics and regression analysis, moderation (also known as effect modification) occurs when the relationship between two variables depends on a third variable. The third variable is referred to

DeFries–Fulker regression

In behavioural genetics, DeFries–Fulker (DF) regression, also sometimes called DeFries–Fulker extremes analysis, is a type of multiple regression analysis designed for estimating the magnitude of gene

Functional regression

Functional regression is a version of regression analysis when responses or covariates include functional data. Functional regression models can be classified into four types depending on whether the

Generalized estimating equation

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timep

Errors and residuals

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true val

Prediction interval

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has alr

Function approximation

In general, a function approximation problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a task-specific way. The need for functi

Heckman correction

The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social s

Quantile regression

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of t

Coefficient of multiple correlation

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the

Idempotent matrix

In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix is idempotent if and only if . For this product to be defined, must necessarily

Principle of marginality

In statistics, the principle of marginality is the fact that the average (or main) effects, of variables in an analysis are marginal to their interaction effect—that is, the main effect of one explana

Underfitting

No description available.

Interaction (statistics)

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on t

Multicollinearity

In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree

Quantile regression averaging

Quantile Regression Averaging (QRA) is a forecast combination approach to the computation of prediction intervals. It involves applying quantile regression to the point forecasts of a small number of

Design matrix

In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set

Symbolic regression

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity

Heteroskedasticity-consistent standard errors

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedas

Sobel test

In statistics, the Sobel test is a method of testing the significance of a mediation effect. The test is based on the work of Michael E. Sobel, a statistics professor at Columbia University in New Yor

Projection matrix

In statistics, the projection matrix , sometimes also called the influence matrix or hat matrix , maps the vector of response values (dependent variable values) to the vector of fitted values (or pred

Virtual sensing

Virtual sensing techniques, also called soft sensing, proxy sensing, inferential sensing, or surrogate sensing, are used to provide feasible and economical alternatives to costly or impractical physic

Meta-regression

Meta-regression is defined to be a meta-analysis that uses regression analysis to combine, compare, and synthesize research findings from multiple studies while adjusting for the effects of available

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'l

Binary regression

In statistics, specifically regression analysis, a binary regression estimates a relationship between one or more explanatory variables and a single output binary variable. Generally the probability o

Target function

No description available.

Under-fitting

No description available.

Guess value

In mathematical modeling, a guess value is more commonly called a starting value or initial value. These are necessary for most optimization problems which use search algorithms, because those algorit

Regression discontinuity design

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine th

Endogeneity with an exponential regression function

No description available.

Line fitting

Line fitting is the process of constructing a straight line that has the best fit to a series of data points. Several methods exist, considering: * Vertical distance: Simple linear regression * Resi

Moderated mediation

In statistics, moderation and mediation can occur together in the same model. Moderated mediation, also known as conditional indirect effects, occurs when the treatment effect of an independent variab

Mean and predicted response

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of t

Radial basis function network

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear c

Nonlinear regression

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one o

Pyrrho's lemma

In statistics, Pyrrho's lemma is the result that if one adds just one extra variable as a regressor from a suitable set to a linear regression model, one can get any desired outcome in terms of the co

Component analysis (statistics)

Component analysis is the analysis of two or more independent variables which comprise a treatment modality. It is also known as a dismantling study. The chief purpose of the component analysis is to

Dependent and independent variables

Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their value

Interval predictor model

In regression analysis, an interval predictor model (IPM) is an approach to regression where bounds on the function to be approximated are obtained.This differs from other techniques in machine learni

Linkage disequilibrium score regression

In statistical genetics, linkage disequilibrium score regression (LDSR or LDSC) is a technique that aims to quantify the separate contributions of polygenic effects and various confounding factors, su

Haseman–Elston regression

In statistical genetics, Haseman–Elston (HE) regression is a form of statistical regression originally proposed for linkage analysis of quantitative traits for sibling pairs. It was first developed by

Knockoffs (statistics)

In statistics, the knockoff filter, or simply knockoffs, is a framework for variable selection. It was originally introduced for linear regression by Rina Barber and Emmanuel Candès, and later general

Curve fitting

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either inte