LOADING

進度條正在跑跑中

統計學(五)


假設檢定 (Hypothesis Testing)

  • 虛無假設(null hypothesis):H0H_0
  • 對立假設(alternative hypothesis):H1H_1
  • 型一誤差(type I error):拒絕 H0H_0H0H_0 為真
    顯著水準(significance level):α=P(Type I error)\alpha = P(\text{Type I error})
  • 型二誤差(type II error):接受 H0H_0H0H_0 為假
    功效(power):1β=P(Reject H0H1 is true)1 - \beta = P(\text{Reject } H_0 | H_1 \text{ is true})
    \hspace{5pt} H0H_0 H1H_1
    accept H0H_0 1α1 - \alpha β\beta
    reject H0H_0 α\alpha 1β1 - \beta

Example: Normal Distribution

Let X1,X2,...,XniidN(μ,σ2)X_1, X_2, ..., X_n \overset{\text{iid}}{\sim} N(\mu, \sigma^2), μ\mu is unknown, σ2\sigma^2 is known
Goal: Find a level α\alpha test for H0:μ=μ0H_0: \mu = \mu_0 vs H1:μμ0H_1: \mu \neq \mu_0

  1. 在虛無假設下,
    X1,X2,...,XniidN(μ0,σ2)XˉnN(μ0,σ2n)n(Xˉnμ0)σH0N(0,1)\displaystyle \begin{aligned} X_1, X_2, ..., X_n \overset{\text{iid}}{\sim} N(\mu_0, \sigma^2) &\Rightarrow \bar{X}_n \sim N(\mu_0, \frac{\sigma^2}{n}) \\[10pt]&\Rightarrow \frac{\sqrt{n}(\bar{X}_n - \mu_0)}{\sigma} \overset{H_0}{\sim} N(0, 1) \end{aligned}
  2. 存在一個常數 c>0c>0,使得 α=P(Xˉnμ0>cH0)=P(Xˉnμ0σ/n>cσ/n)=P(Z>cσ/n)+P(Z<cσ/n)\\[10pt] \displaystyle \begin{aligned} \alpha &= P(|\bar{X}_n - \mu_0| > c|H_0) \\[10pt] &= P\left(\frac{|\bar{X}_n - \mu_0|}{\sigma/\sqrt{n}} > \frac{c}{\sigma/\sqrt{n}} \right) \\[10pt] &= P(Z > \frac{c}{\sigma/\sqrt{n}}) + P(Z < -\frac{c}{\sigma/\sqrt{n}}) \end{aligned}
  3. 因此,我們可以選擇 c=Z(α2)σn\displaystyle c = Z(\frac{\alpha}{2}) \frac{\sigma}{\sqrt{n}},使得 α{1if Xˉnμ0>Z(α2)σn0if Xˉnμ0Z(α2)σn\\[10pt]\hspace{20pt} \alpha \hspace{3pt}\begin{cases} 1 & \text{if } |\bar{X}_n - \mu_0| > \displaystyle Z(\frac{\alpha}{2}) \frac{\sigma}{\sqrt{n}} \\[10pt] 0 & \text{if } |\bar{X}_n - \mu_0| \leq \displaystyle Z(\frac{\alpha}{2}) \frac{\sigma}{\sqrt{n}} \end{cases}

Student’s t-distribution

觀察到的顯著性p值 (obsereved significance p-value)

tt^* 為觀察到的檢定統計量,p-value=2min{P(TtH0),P(TtH0)}=P(TtH0)\text{p-value} = 2\min\{P(T \geq t^*|H_0), P(T \leq t^*|H_0)\} = P(|T| \geq |t^*| | H_0)

p-value 是觀察到的檢定統計量 tt^* 的機率,當虛無假設為真時,觀察到更極端的機率

t-test

X1,X2,...,XniidN(μ,σ2)X_1, X_2, ..., X_n \overset{\text{iid}}{\sim} N(\mu, \sigma^2), μ\muσ2\sigma^2 未知

  1. 虛無假設: H0:μ=μ0H_0: \mu = \mu_0 vs H1:μμ0H_1: \mu \neq \mu_0\\[10pt]α\alpha 水準的檢定, 我們想找 cc 使得: α=P(Xˉnμ0>cH0)=P(Xˉnμ0S/n>c)=P(tn1>c)=P(tn1>c)+P(tn1<c)\\[10pt] \hspace{20pt} \begin{aligned} \alpha &= P(|\bar{X}_n - \mu_0| > c|H_0) \\[10pt] &= P\left(\frac{|\bar{X}_n - \mu_0|}{S/\sqrt{n}} > c \right) \\[15pt] &= P(|t_{n-1}| > c) \\[10pt] &= P(t_{n-1} > c) + P(t_{n-1} < -c) \end{aligned}

  2. c=tn1(α2)α{1if tn1>tn1(α2)0if tn1tn1(α2)\displaystyle c = t_{n-1}\left(\frac{\alpha}{2}\right)\\[10pt] \hspace{20pt} \alpha \hspace{3pt}\begin{cases} 1 &\displaystyle \text{if } |t_{n-1}| > t_{n-1}\left(\frac{\alpha}{2}\right) \\[10pt] 0 &\displaystyle \text{if } |t_{n-1}| \leq t_{n-1}\left(\frac{\alpha}{2}\right) \end{cases}

Two-sample t-test

X1,X2,...,XmiidN(μ1,σ2)X_1, X_2, ..., X_m \overset{\text{iid}}{\sim} N(\mu_1, \sigma^2), Y1,Y2,...,YniidN(μ2,σ2)Y_1, Y_2, ..., Y_n \overset{\text{iid}}{\sim} N(\mu_2, \sigma^2)\\

  1. 假設:

    • 有同樣的變異數 σ2\sigma^2
    • XiX_iYjY_j 獨立
    • μ1\mu_1, μ2\mu_2, σ2\sigma^2 都未知
  2. 虛無假設:H0:μ1=μ2H_0: \mu_1 = \mu_2 vs H1:μ1μ2H_1: \mu_1 \neq \mu_2 \\[5pt] 如果 H0H_0 為真,則 XˉnYˉm\bar{X}_n \approx \bar{Y}_m

  3. 找到 Xn,YmX_{n}, Y_{m} 的分布 {XˉnN(μ1,σ2n)YˉmN(μ2,σ2m)XˉnYˉmN(μ1μ2,σ2[(1/n)+(1/m)])(XˉnYˉm)(μ1μ2)σ(1/n)+(1/m)N(0,1)\\[10pt] \hspace{5pt} \displaystyle \begin{cases} \bar{X}_n \sim N(\mu_1, \frac{\sigma^2}{n}) \\[5pt] \bar{Y}_m \sim N(\mu_2, \frac{\sigma^2}{m}) \end{cases} \\[10pt] \hspace{5pt} \displaystyle \begin{aligned}&\Rightarrow \bar{X}_n - \bar{Y}_m \sim N(\mu_1 - \mu_2, \sigma^2[(1/n) + (1/m)]) \\[5pt] &\Rightarrow \frac{(\bar{X}_n - \bar{Y}_m) - (\mu_1 - \mu_2)}{\sigma\sqrt{(1/n) + (1/m)}} \sim N(0, 1) \end{aligned}

  4. (n1)S12σ2+(m1)S22σ2χ2(n+m2)E((n1)S12σ2)+E((m1)S22σ2)=n+m2E(Sp2)=σ2\displaystyle \frac{(n-1)S_1^2}{\sigma^2} + \frac{(m-1)S_2^2}{\sigma^2} \sim \chi^2(n+m-2)\\[10pt] \begin{aligned} &\Rightarrow E\left(\frac{(n-1)S_1^2}{\sigma^2}\right) + E\left(\frac{(m-1)S_2^2}{\sigma^2}\right) = n + m - 2 \\[10pt] &\Rightarrow E(S_p^2) = \sigma^2 \end{aligned}
    因此,我們可知 (XˉnYˉm)(μ1μ2)Sp(1/n)+(1/m)t(n+m2)\displaystyle \frac{(\bar{X}_n - \bar{Y}_m) - (\mu_1 - \mu_2)}{S_p\sqrt{(1/n) + (1/m)}} \sim t(n+m-2)

  5. 在虛無假設下,我們可以選擇 c=tn+m2(α2)\displaystyle c = t_{n+m-2}\left(\frac{\alpha}{2}\right),使得 α{1if tn+m2>tn+m2(α2)0if tn+m2tn+m2(α2)\\[10pt] \hspace{20pt} \alpha \hspace{3pt}\begin{cases} 1 & \displaystyle \text{if } |t_{n+m-2}| > t_{n+m-2}\left(\frac{\alpha}{2}\right) \\[10pt] 0 & \displaystyle \text{if } |t_{n+m-2}| \leq t_{n+m-2}\left(\frac{\alpha}{2}\right) \end{cases}\\[5pt]
    是在虛無假設下的 α\alpha 水準的檢定