Appropriate Model:
Assumptions | Hypothesis Test (to check the assumption) | Remedy in case of violation |
“Independence” (random pattern) | – Runs test – Bartlett’s – test LBQ’s | – Gapping – Batching – (Linear) regression – Time Series (ARIMA) |
Normal Distribution | Normality test | Transform Data |
Runs Test:
-> OBJ: understand if data are random.
-> DEF: the test classifies data as lyinb above (+) or below (-) a reference line.
- Usually the reference line is the overall mean of the data observed
- RUN: sequence of successive and equal symbols that preceds a different symbol.
- No specific assumptions of distribution is required (non-parametric test)
-> CHAR NON RANDOM:
- Mean of process is not constant
- Systematic pattern (process mean is not the better prediction for future data)
- Dispersion around mean value is not constant
-> Correlations:
- POSITIVELY CORRELATED: near observation are similar, but distant observations are different (stationary meandering)
- NEGATIVELY CORRELATED: every time that smth huge is observed, will follow smth small (oscillating)
Test:
-> If runs are too long or too short => we can not say that the model is random.
-> If the process is random => the # of runs observed on a large number of samples will be (approximately) distributed as a normal with mean E(Y) and standard deviation:
-> TEST:
- : process is random
- : process is not random
- Define α;
- It means that the (p-value)% of times we will see a difference btw the number of runs actually observed and the expected value which is equal or greater than the value observed this time.
- Masure how unusual/ surprising are data observed when the null hypothesis is true.
Bartlett’s Test:
-> OBJ: test homoscedasticity, that is, if multiple samples are from populations with equal variances.
Autocorrelation:
Autocorrelation: correlation between the elements of a series and others from the same series separated from them by a given interval.
- Degree of correlation of the same variables between two successive time intervals.
Correlation: relationship between two variables, whereas autocorrelation measures the relationship of a variable with lagged values of itself.
Lagging of one variable: creaete a second variable such that the observation at time t is close to the observation of the same time series at time t, k (lag k)
- Measure of linear associacion
Corelation & Independence:
Test:
-> Hypothesis:
Bonferroni’s Inequality
-> DEF: is the correction from the change of committing a Type I error.
- When conducting multiple analyses on the same dependent variable, the chance of committing aType I error increases, thus increasng the likelihood of coming about significant result by pure chance.
=> The probability of rejecting at least one null hypothesis when they are all true:
for independent test: .
-> We can build intervals to “constrain” the family error rate α (the overall first type error)
- For each of the N tests to be performed (using the same set of data) choose:
Ljung Box Pierce (LBQ) Test:
- Alternative solution.
Test:
-> Statistic: |
-> HP: