# Linear Regression
線性回歸是一種簡單的監督式學習方法,假設 𝑌 對 𝑋1,𝑋2,…𝑋𝑝 的依賴是線性的。
雖然看起來過於簡單,但線性回歸在概念和實踐上都非常有用。
# Simple Linear Regression
We assume a model: , where and are unknown constants, that represent the intercept and slope, and is the error term which is assumed to be i.i.d. that follows the normal distribution.(LINE)
# Estimation of the parameters
- Let be the prediction of when . Then the residual is defined as .
- Residual sum of squares:
# Accessing the Accuracy

- 針對紅線 生成資料點(黑點)
- 根據資料點做線性回歸,得到藍線
- 重複步驟 1 和 2,多畫幾次對應到信賴區間
量化信賴區間之誤差
The standard error of the estimate reflects how it varies underrepeated sampling.
Where
This standard error can be used to compute confidence intervals
# Hypothesis Testing
Standard errors can also be used to perform hypothesis tests, the most common one involves
testing the null hypothesis of : There is no relationship between and . (i.e., )
vs. the alternative hypothesis : There is a relationship between and . (i.e., )
# Overall Accuracy of the Model
- The Residual Standard Error (RSE) (smaller is better):
- R-squared or fraction of variance explained (larger is better):
Where is the total sum of squares.
# Multiple Linear Regression
Here our model is:
We estimate the coefficients by minimizing the residual sum of squares:
Is at least one of the predictors useful in predicting ?
We can use the F-statistic
Note that
- if linear model assumptions hold,
- if holds,
Is it only a subset of the predictors useful?
To examine whether a particular set variables are zero or not, , we can use the following F-statistic:
we fit a second model that uses all the variables except those last to get .
Decide the important variables
- If : 無唯一解,亦無法使用 F-test
- If : 至少有一個 predictor 和 response 有關聯
- 最直接的方法是 all subsets or best subset regression, 我們可以嘗試所有可能的 predictor 組合,選擇使得 RSS 最小的組合,但因為組合數 隨著 指數成長,計算量會非常大
How well does the model fit the data?
- Can be shown in this case
- Compute the RSE, when it is small compared to the range of , the model fits well
- How close between and can be quantified by the confidence interval
Given a set of predictor values, what response value should we predict, and how accurate is our prediction?
預測區間 (PI) 總是比信賴區間 (CI) 寬,因為考慮了個別預測的誤差
for multiple linear regression, see here
# Other Considerations in the Regression Model
# Qualitative Predictors
- It take a discrete set of values
- Also called categorical predictors or factor variables
# Potential Problems
When we fit a linear regression model to a particular data set, many problems may arise, including:
- 模型
- Non-linearity of the data -> L
- The linear regression model assumes that there is a straight-line relationship between the predictors and the response
- Residual plots are a useful graphical tool for identifying non-linearity
- The red line is a smooth fit to the residuals, which is displayed in order to make it easier to identify any trends
![]()
The residuals exhibit a clear U-shape in the left panel, which provides a strong indication of non-linearity in the data
the right-hand panel displays the residual plot that results from the model, which contains a quadratic term
Conclusion:There appears to be little pattern in the residuals, suggesting that the quadratic term improves the fit to the data
- Correlated errors -> I
An important assumption of the linear regression model is that the error terms, , are uncorrelated. What does this mean?
- is positive provides no information about the sign of \epsilon_
- As an extreme example, suppose we accidentally doubled our data and ignored the fact that we had done so.
- Our standard error: 2n
- Our estimated parameters: same
- Our confidence intervals: narrower by
Plots of residuals from simulated time series data sets generated with differing levels of correlation 𝜌 between error terms for adjacent time point
![]()
In the top panel, we see the residuals from a linear regression fit to data generated with uncorrelated errors![]()
The center panel illustrates a moderate case in which the residuals had a correlation of 0.5![]()
The residuals in the bottom panel show a clear pattern in the residuals—adjacent residuals tend to take on similar values - Non-constant variance of error terms -> E
Often the case that the variances of the error terms are non-constant
e.g., as the value of the response variable increases, the variance of the error terms also increases
This phenomenon is called heteroscedasticity![]()
![]()
- Non-linearity of the data -> L
- 資料收集
- Outliers (unusual )
- An outlier is a point for which 𝑦𝑖 is far from the value predicted by the model
- Outliers can arise for a variety of reasons, such as incorrect recording of an observationduring data collection
- Observations whose studentized residuals are greater than 3 in absolute value are possible outliers
- it can cause other problems
e.g. affecting RSE, since RSE (𝜎) is used to compute all confidence intervals and 𝑝-values, such a dramatic increase caused by a single data point can have implications for the interpretation of the final model - If we believe that an outlier has occurred, one solution is to simply remove the observation. However, care should be taken since an outlier may instead indicate a deficiency with the model
- High leverage points (unusual )
For a simple linear regression,
- In general, the leverage statistic , the average leverage for all observations is
- If given observation has a leverage statistic that is substantially larger than , then that observation has high leverage
- A value whose absence would greatly change the regression equation is influential observation
- influence pt typically has high leverage, but high leverage pts are not necessarily influential
- Cook’s Distance can be used to measure the influence of an observation
where is the fitted response value obtained when excluding
- In general, the leverage statistic , the average leverage for all observations is
- Collinearity
Collinearity refers to the situation in which two or more predictor variables are closely related to one another
- A better way to access multicollinearity is the Variance Inflation Factor (VIF)
where is the R-squared obtained by regressing against all the other predictors
A VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity
- When faced with the problem of collinearity, we can:
- Drop one of the problematic variables since it is redundant
- Combine the collinear variables into a single predictor
- A better way to access multicollinearity is the Variance Inflation Factor (VIF)
- Outliers (unusual )
# Linear Regression vs. K-Nearest Neighbors
We considor one of the simplest and best-known non-parametric mtd: K-nearest neighbors regression (KNN regression)
- Given a value for and a prediction point , KNN regression first identifies the training observations that are closest to , represented by
- It then estimates using the average of all the training responses in . In other words,
# Genelization of the Linear Model
Methods that expand the scope of linear models and how they are fit:
- Classification problems: logistic regression, support vector machines
- Non-linearity: kernel smoothing, splines and generalized additive model
- Interactions: Tree-based methods, bagging, random forests and boosting (these also capture non-linearities)
- Regularized fitting: Ridge regression and lassod





