Appropriate Model:
Assumptions | Hypothesis Test (to check the assumption) | Remedy in case of violation |
“Independence” (random pattern) | – Runs test – Bartlett’s – test LBQ’s | – Gapping – Batching – (Linear) regression – Time Series (ARIMA) |
Normal Distribution | Normality test | Transform Data |
Batching:
-> DEF: A second approach to “remove” the autocorrelation is batching:
- This is a model-free approach.
- The sequence of data is organized in sequential batches (not overlapped) and it is considered the average of values in each batch.
⚠PROBLEM: define an appropriate value of b (batch size)
-> Empirical Approaches:
- Initialize b=1
- Compute the autocorrelation coefficient at the first lag
- If the coefficient is smaller than 0.1 go to step 5
- Set b=2*b, go to step 2
- end
Grapping vs Batching:
- Both the approaches are applicable to stationary processes (constant mean)
- Both the approaches induce loss of information
-> These are approaches which do not tackle the autocorrelation issue instead of dealing with it.
❓ How can we “identify” the appropriate model in case of nonrandom data?
- Regression
- ARIMA
Regression:
-> MODEL:
-> IDEA:
-> OBJ: minimizing the Sum of Squared Errors (Minimum Mean Squared Error -MSE- approach)
- SIMPLE LINEAR REGRESSION: is a first order model, there’s one single regressor
Identification of the Model:
- Assume the structure:
📌We assume models linear with reference to the unknown parameters:
LINEAR PARAMETERS: A function Y = f(x) is said to be linear in X if X appears with a power or index of 1 only.
-> A function is said to be linear in the parameter, say, B1, if B1 appears with a power of 1 only and is not multiplied or divided by any other parameter (for eg B1 x B2 , or B2 / B1) 🔗
- Estimation of the unknown parameters by minimizing SSE:
=> Estimated Model: |
⚠Pay Attention:
-> If the true model is equal to the assumed one:
- Min Variance Estimators (among all the unbiased estimators)
-> PROBLEMS:
- Strong relationship among variables does not mean casual relationship
- Identified relationship is valid only in the explored interval of x pay attention to extrapolation.
- Correlation does not cause causation.
How Estimate and :
=> Defining and we can compute :
Multiple Linear Regression
-> DEF: regressoin with more than one regressor.
-> MODEL:
K = p + 1
: exists if the regressors are linearly independent (no column of X is linear combination of the other columns)
Error Sum of Squares:
📌 In gneral is the standard deviation of residuals.
Mean Squared Error ():
Confidence Interval:
-> DEF: interval which is expected to typically contain the parameter being estimated.
- Represent the long-run proportion of CIs that theoretically contain the true value of the parameter.
-> GIVEN:
-> For a given : |
Prediction Interval
-> DEF: range of values that is likely to contain the value of a single new observation given specified settings of the predictors.
:
-> DEF: measure of the percentage of variability observed in the data that is explained by the estimated regression model.
-> CHAR:
- Never decrease when new regressors are included.
:
-> DEF: trade-off between reduction of the SSE and reduction of n-K
Stepwise Regression
-> Methods to search the “best” model:
- Forward Selection
- Backward Elimination
- Stepwise Selection
-> CHAR:
- The more a variable is significant, the more the associate parameter has a small associated p-value ( large)
- If is significantly greater than zero, (p-value < alpha to enter): associated regressor should be included in the model
- If is not significantly greater than zero (p-value > alpha to remove): associated regressor should be removed from the model
Forward Selection:
-> DEF: sequential procedure (onne variable is added at a time). At each step, the variable that provides the better contribution to the “fitting” is selected. Once the variable is added, it cannot be removed in the subsequent steps.
Backward Elimination:
-> DEF: Sequential procedure (one variable is removed at each time). Starting from a model that contains all the possible variables, at each step we remove the variable that is “less useful” to explain the data variability. Once the variable is removed, it is never reincluded in the following steps.
Stepwise Selection:
-> DEF: combines forward & backward. We start as a forward selection but each time that a variable is added a backwward step is carried out to check wether a variable has to be removed. The procedure stops when no regressor has to be included in the model and no regressor has to be removed.
Model Checking:
-> DEF: more systematic criteria for choosing an “optimal” member in the path of models produced by forward or backward stepwise selection.
Mallow’s Cp, Akaike information criterio (AIC), Bayesian Information Criterion (BIC), adjusted and Cross-Validation (CV).
Qualitative Predictors:
-> DEF:
- Categorical predictors or factor variables.
Extension of the Linear Model:
-> DEF: removing the additive assumptions: interactions and nonlinearity.
INTERACTION EFFECT: refers to the role of a variable in an estimated model, and its effect on the dependent variable. A variable that has an interaction effect will have a different effect on the dependent variable, depending on the level of some third variable.
- Also known as synergy effect in marketing.
Hierarchy:
-> HIERARCHY PRINCIPLE: if we include an interaction in a model, we should also include the main effects, even if the p-value associated with their coefficients are not significant.
- Interactions are hard to interpret in a model without main effects.
- The interaction terms also contain main effects, if the model has no main effect terms.
Cubic Spline
-> DEF: a piecewise cubic function that interpolates a set of data points and guarantees smoothness at the data points.
-> Define a set of knots .
-> We want the function f in the model to:
- Be a cubic polynomial between every pair of knots ;
- Be continuous at each knot;
- Have continuous first and second derivatives at each knot.
=> We can write f in terms of K + 3 basis functions:
Where
Cross-Validation:
-> DEF: any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.
- Model selection: estimating the perormance of different models in order to choose the best one.
- Model Assessment: having chosen final model, estimating its prediction error (generalization error) on new data.
-> Data-rich situation => divide the dataset into three parts:
- Training Set: fit the models;
- Validation Set: used to estimate prediction error for model selection;
- Test Set: used for assessment of the error of the final chosen model.
K-Fold Cross Validation:
-> DEF: uses part of the abailable data to fit the model, and a different part to test it. Split the data into K roughly equal-sized parts. For the k-th part (k=1, …, K) we fit the model to the order K-1 parts of the data & calculate the prediction error of the fitted model when predicting the k-th part of the data. We do this for k=1,2,…, K and combine the K estimates of prediction error.
📌K = N => leave-one-out cross-validation.
Time Series Modeling via ARIMA:
Autocorrelation with linear model:
Autoregressive Model:
-> DEF: what is observe at time t is given by an intercept + another coefficient that is related by the previous data + some noise
Time Series & Stationarity:
-> Let be a discrete time series=>
- STRICTLY (or STRONG) STATIONARY: a time series is sed s.s. if its properties do not depend on modifications of the time origin:
-> The joint distribution of coincides with the joint distribution of .
- WEAK STATIONARY OF ORDER f: if all the moments of the series up to order f only depend on the time difference between the time series data.
ARMA, Autoregressive Moving Average model:
-> DEF: tool for understanding and, perhaps, predicting future values in this series.
- AR Part: involves regressing the variable on its own lagged (i.e., past) values.
-> ARMA process is stationary if its AR part is stationary
- MA Part: involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past.
-> ARMA process is invertible if its MA part is invertible
- p: order or term of degreeof the AR part.
- q: is the order of the MA part (as defined below): is the q random errors.
-> AIM: provide a parsimonious description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the second for the moving average (MA).
Model:
-> General model for stochastic model (linear):
Moments:
Mean:
Autocovariance & Autocorrelations (obtained by multiplying by ):
AR Model:
-> DEF: the notation AR( p) refers to the autoregressive model of order p.
Stationarity:
- The AR (p) process is stationary <=> the polynomial A(B) is stable
Backward Shift Operator:
-> DEF: operates on an element of a time series to produce the previous element:
Moments of AR(p) Process:
Mean:
Autovariance & Autocorrelation:
Variance:
-> Same finite differet equation of the original AR(p) process applies also to the autocovariance & Autocorrelation functions:
-> For AR(1) process, autocorrelation function (ACF) “geometrically decays” (if time was ‘continuous’, the decay would be exponential:
📌Sample AutoCorrelation Function
Identification of the order p of an AR(p) Process:
-> Use the Partial AutoCorrelation Function
- It’s a general result, true for every stationary AR(p) process
Moving Average Models, MA(q):
-> DEF: approach for modeling univariate time series. The observation at time t can be estimate as average of random shock at time t () and its previous q shocks.
- Output variable is cross-correlated with a non-identical ot itself random-variable.
- It’s always stationary (contrary to AR).
- It’s different from the moving average.
- The process can be represented by a weighted average of random shoks, , the process is referred to as a moving average process:
- The sum of weights is not necessarily = 1.
Moments of MA(q) Process:
Mean:
Variance:
Autocovariance:
Autocorrelation:
Special Cases:
ARIMA (p,d,q), AutoRegressive Integrated Moving Average model:
-> DEF: given time series data where t is an integer index & the are real numbers, an model is given by:
- Generalizatoin of an autoregressive moving average model
- Most industrial processes are not stationary: when no control action is applied, the process mean tends to departe from the target.
Non-Stationary ARIMA model:
-> It exhibit a stationary/ non stationary behaviour that depends on the AR term of the model (roots of A(B) polynomial). Particularly:
- Roots lie strictly ouside the unit circle in complex plane: stationarity
-> polynomial A(B) is stable.
- Roots lie strictly inside the unit circle in complex plane: “explosive” non stationarity
- Roots lie on the unit circle in complex plane: “homogeneous” non stationarity.
Case 2:
-> In order to transform the process into a stationary one by applying the difference operator (nabla):
Development of an ARIMA (p,d,q) model:
-> Iterative procedure:
-> GOAL: finding values of p,d,q from a given time series .
- One has to use the SACF (Sample Auto Correlation Function) and the SPACF (Sample Partial Autocorrelation Function)
Identification:
1.d parameter
- Non-stationarity (I term): inferred when the SACF does not exhibit an exponential decay (e.g., linear decay) 2. If the process seems to be non-stationary: apply difference operator.
- If the resulting time series still exhibit a non-stationary behaviour, iterative application of difference operator is required
-> Pay attention to “overdifferencing”: Simple way to detect “overdifferencing” consists of computing:
-> Choose d such that the variance of the series is minimized
2.(p, q) Parameters:
-> Check the SACF and SPACF patterns, reminding that:
- AR(p): the SACF shows “exponential decay” whereas the SPACF is used to choose the degree p
- MA(q): the SACF is used to choose the degree q whereas the SPACF shows “exponential decay”
- ARMA(p,q) resembles an AR(p) after q lags.
-> If the model is correct , such a model can be combined with the original ARIMA(p,d,q) model, as follows:
=> The result is an (⚠Parsimony Principle⚠)
PCA, Principal Component Analysis:
-> DEF: unsupervised procedure that provides a way to visualize high dimensional data, summarizing the most important information.
- Sequence of projections of the data, mutually uncorrelated & ordered in variance.
- The fist PC is the direction of the line that minimizes the total squared distance from each point to its orthogonal projection onto the line.
-> IDEA: find a new orthonormal reference system that allows to maximizing the variability of original data & simultaneously, reducing the number of variables that are necessary to describe the process.
- SCORE: is the projection of the i-th sample in the direction of .
- LOADINGS: are the coefficient of the eigenvector (weights multiplying the original variable to compute the PC)
-> If the variable in X have very different order of magnitude, we can work with the standardized variables (compute PCA on the correlation matrix)
Multivariate Random Varia Variables:
Variable:
Random #: p |
Expected Value:
Variance-Covariance Matrix:
Remember:
-> Product between a constant vector and a random vector:
Steps:
- Look for a linear function of the elements of x having a maximum variance:
- Look for a linear function uncorrelated with having maximum variance, and so on:
-> Let be the covariance matrix of X. Let X be a data matrix with n samples, and p variables. From each variable, we subtract the mean of the column (we center the variables).
-> We can found up to p PC’s, but in general, most of the variability in X can be accounted by m PCs, where .
-> Consider . The vector maximises , normalization contrait sum of squares of elements of equals 1.
📌We could use the cross validation training a set & finding the number of best principal component with some of them (assuming we have a lot of data).
Recontruct Original Information via PCA:
-> Going back to the original system: