# Why resampling?

  • These methods refit a model of interest by sampling from the training set, in order to obtain additional information about the fitted model
  • Can be computationally expensive because they involve fitting the same statistical method multiple times using different subsets of the training data (隨時代進步這困難比較不受限)

# Cross-validation and the bootstrap

We discuss two resampling methods: cross-validation and the bootstrap. They provide

  • estimates of test-set prediction error
  • cross-validation select the appropriate level of flexibility

    A hyperparameter is a parameter whose value is used to control the learning process and cannot be inferred while fitting the model to the training set

    • Model hyperparameters
    • Algorithm hyperparameters
  • Bootstrap helps us to obtain the standard deviation of our parameter estimates. Also use to conduct ensemble learning

# Validation-set approach

we randomly divide the available set of samples into two parts: a training set and a validation or hold-out set

  • The model is fit on the training set, and the fitted model is used to predict the responses for the observations in the validation set
  • The resulting validation-set error provides an estimate of the test error. Estimates can be used to select the best model and to give an idea of the test error of the final chosen model
    白話:樣本隨機分成訓練集和測試集,挑適合的模型來擬和訓練集並預測測試集的觀測值,其得到的誤差可以用來估計測試集誤差
  • Drawbacks
    • The validation estimate of the test error can be highly variable
    • only a subset of the observations (those that are included in the training set) are used to fit the model
    • the validation set error may overestimate the test error for the model fit on the entire data set. Since statistical methods tend to perform worse when trained on fewer observations

# Training versus testing-set performance

Recall the distinction between the test error and the training error

is the average error that results from using a statistical learning method to predict the response on a new observation

can be easily calculated by applying the statistical learning method to the observations used in its training

# More on prediction-error estimates

The best solution: a large designated test set. Often not available

  1. mathematical adjustment order to estimate the test error rate to the training error rate

    These include the CpC_p statistic, AIC and BIC

  2. a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process and then applying the statistical learning method to those held-out observations

# Leave-One-Out Cross-Validation (LOOCV)

a single observation (x1,y1)(x_1,y_1) is used for the validation set

CV(n)=1ni=1nMSEi,Where MSEi=(yiyi^)2CV_{(n)}=\frac{1}{n}\sum_{i=1}^n MSE_i,\, \text{Where }MSE_i=(y_i-\hat{y_i})^2

# K-fold cross-validation (Widely used approach for estimating test error)

The idea is to randomly divide the data into kk equal-sized parts.

  1. We leave out part kk, fit the model to the other k1k-1 parts
  2. obtain predictions for the left-out 𝑖th part. This is done in turn for each part i=1,2,...,ki=1,2,...,k
  3. the results are combined

# The details

Let the kk parts be C1,C2,...,CkC_1, C_2, ..., C_k
There are nin_i observations in part ii.

CV(k)=i=1kninMSEiCV_{(k)}=\sum_{i=1}^k \frac{n_i}{n}MSE_i

  • MSEi=jCi(yjyi^)2niMSE_i=\frac{\sum_{j\in C_i}(y_j-\hat{y_i})^2}{n_i}, yj^\hat{y_j} is the fit for observation jj, obtained with part ii removed
  • Setting k=nk=n yields nn-fold or leave-one out cross-validation (LOOCV)

    One typically performs k-fold CV using k=5k=5 or k=10k=10

# A nice special case

With least-squares linear or polynomial regression, a shortcut makes the cost of LOOCV the same as that of a single model

CV(n)=1ni=1n(yiyi^1hi)2CV_{(n)}=\frac{1}{n}\sum_{i=1}^n (\frac{y_i-\hat{y_i}}{1-h_i})^2

Where

  • y^i\hat y_i is the iith fitted value from the original least squares fit
  • hih_i is the leverage

LOOCV is sometimes useful but typically doesn’t shake up the data enough

  • still time-consuming
  • The estimates from each fold are highly correlated and hence their averagecan have a high variance

# Bias-variance trade-off for cross-validation

  • Since each training set is as big as the original training set, the estimates of prediction error will typically be biased
  • This bias is minimized when k=nk=n(LOOCV), but this estimate has a high variance

# The Bootstrap

The bootstrap is a flexible and powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method
For example, it can provide

  • an estimate of the standard error of a coefficient
  • a confidence interval for that coefficient

# Bootstrap of α\alpha

  • Denoting the first bootstrap data set by Z1Z^{*1}, we use Z1Z^{*1} to produce a new bootstrap estimate for α\alpha, which we call \hat{\alpha}^
  • This procedure is repeated BB times for some large value of BB (say 1000), in order to produce BB different bootstrap data sets, Z1,Z2,...,ZBZ^{*1},Z^{*2},...,Z^{*B} and BB corresponding estimates,

    α^1,α^2,...,α^B\hat{\alpha}^{*1},\hat{\alpha}^{*2},...,\hat{\alpha}^{*B}

  • We estimate the standard error of these bootstrap estimates using the formula

    SEB(α^)=1B1r=1B(α^r1Br=1Bα^r)2 SE_B(\hat{\alpha})=\sqrt{\frac{1}{B-1}\sum_{r=1}^B(\hat{\alpha}^{*r}-\frac{1}{B}\sum_{r'=1}^B\hat{\alpha}^{*r'})^2}

# Other uses of the bootstrap

  • Primarily used to obtain standard errors of an estimate, also provides approximate confidence intervals for a population parameter

# Bootstrap for confidence interval

  1. Generate 𝑛 “bootstrap sample” data points xi,yix_i^*, y_i^*
  2. Fit linear regression using xi,yix_i^*, y_i^*
  3. Evaluate the regression line on fix xx-grid
  4. Repeat step 1-3 for BB times and collect the values in step 3
  5. For each point in the xx-grid, calculate the confidence interval using collected values

# The bootstrap in general

In more complex data situations, for example

  • if the data is a time series, we can’t simply sample the observations with replacement (Not i.i.d)
  • We can instead create blocks of consecutive observations, and sample those with replacements. Then we paste together sampled blocks to obtain a bootstrap dataset

# Can the bootstrap estimate prediction error?

In cross-validation, there is no overlap. This is crucial for its success

  • To estimate prediction error using the bootstrap,
    • we could think about using each bootstrap dataset as our training sample and +the original sample as our validation sample++
    • But each bootstrap sample has a significant overlap with the original data.
  • Can partly fix this problem by only using predictions for those observations that did not (by chance) occur in the current bootstrap sample
  • In the end, cross-validation provides a simpler, more attractive approach for estimating prediction error