# Notation
- : predictor variable (feature)
- We can refer to the input vector collectively as
- Vectors are represented as a column vector
- : response variable (target)
- Usually a scalar. if we have observations, we can represent as a vector
- We write our model as
where captures measurement errors and other discrepancies has a mean of zero.
- Usually a scalar. if we have observations, we can represent as a vector
# How to estimate ?
is the distribution of data that is unknown
- Choose a model f_
- Parametric: Assumes a specific form for , estimating a fixed set of parameters by fitting or training.
- Non-parametric: Does not assume a specific form, but need a larger amount of data.
- Choose a quality measure
- Loss function: measures how well a model fits the data.
- Common loss functions:
- Mean Squared Error (MSE): for regression problems
- Cross-Entropy Loss: for classification problems
- Mean Squared Error (MSE): for regression problems
- Optimize(fitting)
- Find the best parameters
- Calculus to find close form solution, gradient descent, expectation-maximization(EM) algorithm, etc.
# Nearest Neighbor Averaging
If we have few data points, we can use the average of nearest neighbors to estimate .
We cannot compute , but we can let , where is some neighborhood of .
Curse of dimensionality (維數災難): As the number of features increases, the volume of the feature space increases exponentially, making data points sparse. This sparsity makes it difficult to find nearby neighbors, reducing the effectiveness of nearest neighbor methods.
# Parametric and structured models
The linear model is an important example of a parametric model:
- Specified in terms of parameters:
- We estimate the parameters by fitting the model to the training data.
- It often serves as a good and interpretable approximation to the true function .
# Some tradeoffs
- Prediction accuracy vs. interpretability
Linear models are more interpretable; thin-plate splines are not - Good fit vs. overfit
- Parsimony vs. black box
Prefer a simpler model over a black-box predictor
# Assessing model accuracy
Suppose we fit a model using training data , and we have a separate test data set .
-
Training MSE:
-
Test MSE:
較複雜線性模型
![]()
epsilon 小的三次方關係,即便overfit在黑線附近,也要到很高次才看得到轉折
![]()
epsilon 中等的線性關係
![]()
-
Interpretation of the curves:
- Black curve: true function
- Red curve: MSE_
- Grey curve: MSE_
- Orange, blue, green curves: fits of different flexibility
-
As model flexibility increases:
- always decreases
- decreases initially, then increases (U-shape)
# Bias-Variance Tradeoff
Suppose we have fit model to some training data, and let be a test observation drawn from the population.
The expected test MSE at a point can be decomposed into three fundamental components:
Note that .
| model | Flexibility |
|---|---|
| Complexity | Increases |
| Bias | Decreases |
| Variance | Increases |

# Tradeoff for three examples
To obtain a good fitted model, select the one with the smallest test MSE.



