LOADING

進度條正在跑跑中

統計學(四)


The method of moments (矩估計法)

XFX \sim F 的第 kk 個矩為 μk=E(Xk)\mu_k = E(X^k),則 μk\mu_k 稱為 FFkk 次矩。
樣本平均定義為 μk=1ni=1nXik\mu_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k,其中 X1,X2,...,XnX_1, X_2, ..., X_n 是來自 FF 的樣本。
nn \rightarrow \infty 時,樣本平均 μk\mu_k 會趨近母體的 kk 次矩 μk\mu_k

θ^\hat{\theta}θ\theta 的估計量。
E(θ^)=θE(\hat{\theta}) = \theta,則 θ^\hat{\theta} 稱為 θ\theta 的不偏估計量(unbiased estimator)

  • 估計量的標準差稱為標準誤差(standard error)
    SE(θ^)=Var(θ^)SE(\hat{\theta}) = \sqrt{Var(\hat{\theta})}
  • θ^\hat{\theta} 的分配稱為抽樣分配(sampling distribution)
    • E(λ^)=1ni=1nE(Xi)=λ.\displaystyle E(\hat{\lambda}) = \frac{1}{n} \sum_{i=1}^{n} E(X_i) = \lambda. 因此,λ^\hat{\lambda}λ\lambda 的不偏估計量。
    • Var(λ^)=1n2Var(i=1nXi)=1n2i=1nVar(Xi)=λn.\displaystyle Var(\hat{\lambda}) = \frac{1}{n^2} Var(\sum_{i=1}^{n} X_i) = \frac{1}{n^2} \sum_{i=1}^{n} Var(X_i) = \frac{\lambda}{n}. 因此,λ^\hat{\lambda} 的標準誤差為 λ/n\sqrt{\lambda/n}
  • 均方誤差(Means Squared Error, MSE):
    MSE(θ^)=E[(θ^θ)2]0=E[(θ^E(θ^)+E(θ^)θ)2]=E[(θ^E(θ^))2]+2E[(θ^E(θ^))(E(θ^)θ)]+E[(E(θ^)θ)2]=Var(θ^)+2E[(θ^E(θ^))(E(θ^)θ)]+(E(θ^)θ)2=Var(θ^)+(E(θ^)θ)20\displaystyle \begin{aligned} MSE(\hat{\theta}) &= E[(\hat{\theta} - \theta)^2] \geq 0 \\ &= E[(\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta}) - \theta)^2] \\ &= E[(\hat{\theta} - E(\hat{\theta}))^2] + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + E[(E(\hat{\theta}) - \theta)^2] \\ &= Var(\hat{\theta}) + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + (E(\hat{\theta}) - \theta)^2 \\ &= Var(\hat{\theta}) + (E(\hat{\theta}) - \theta)^2 \geq 0 \end{aligned}
    如果 θ^\hat{\theta}θ\theta 的不偏估計量,則 MSE(θ^)=Var(θ^)MSE(\hat{\theta}) = Var(\hat{\theta})
    θ^1\hat{\theta}_1θ^2\hat{\theta}_2θ\theta 的估計量,如果 MSE(θ^1)MSE(θ^2)MSE(\hat{\theta}_1) \leq MSE(\hat{\theta}_2)θ^1\hat{\theta}_1 更好

Maximum likelihood estimation (MLE) (最大概似估計法)

假設 X1,X2,...,XnX_1, X_2, ..., X_n有聯合機率密度函數 f(x1,x2,...,xnθ)=fθ(x1,x2,...,xn)f(x_1, x_2, ..., x_n| \theta) = f_{\theta}(x_1, x_2, ..., x_n),其中 θ\theta 是未知參數

  • 對於每個固定觀察值 x1,x2,...,xnx_1, x_2, ..., x_ni=1,2,...,ni=1, 2, ..., n ,概似函數(likelihood function)定義為 L(θ)=fθ(x1,x2,...,xn)\displaystyle L(\theta) = f_{\theta}(x_1, x_2, ..., x_n)
    X1,X2,...,XnX_1, X_2, ..., X_n 是獨立同分配(i.i.d.),則 L(θ)=f(x1,x2,...,xnθ)=i=1nf(xiθ)\displaystyle L(\theta) = f(x_1, x_2, ..., x_n| \theta) = \prod_{i=1}^{n} f(x_i| \theta)

  • 最大概似估計量(Maximum likelihood estimator, MLE) θ^\hat{\theta} 是使得概似函數 L(θ)L(\theta) 最大的 θ\thetaθ^=argmaxL(θ)\displaystyle \hat{\theta} = \arg \max L(\theta)
    通常對 logL(θ)\log L(\theta) 進行最大化 :θ^=argmaxlogL(θ)=i=1nlogf(xiθ)\displaystyle \hat{\theta} = \arg \max \log L(\theta) = \sum_{i=1}^{n} \log f(x_i| \theta)

最大概似估計法的大樣本理論

X1,X2,...,XnX_1, X_2, ..., X_n 是來自分佈 f(xθ)f(x| \theta) 的樣本,θ0\theta_0θ\theta 的真值

一致性

在某些條件下,當樣本數 nn \rightarrow \infty 時, θ^Pθ0\hat{\theta} \xrightarrow{P} \theta_0 as nn \rightarrow \infty. 我們說 θ^\hat{\theta} 的最大概似估計量是一致的(uniformly consistent)

證明概念:
\hspace{30pt}我們想要最大化 1n(θ)=1ni=1nlogf(xiθ)\displaystyle \frac{1}{n} \ell(\theta) = \frac{1}{n} \sum_{i=1}^{n} \log f(x_i| \theta)\\[10pt]\hspace{30pt}nn \rightarrow \infty 時,1n(θ)=logf(xθ)f(xθ0)dx\displaystyle \frac{1}{n} \ell(\theta) = \int \log f(x| \theta) f(x| \theta_0) dx\\[10pt]\hspace{30pt}換句話說,對足夠大的 nn,最大化 (θ)\ell(\theta)θ\theta 將會接近最大化 E[logf(xθ)]E[\log f(x| \theta)]θ\theta \\[10pt]\hspace{30pt}而要最大化 E[logf(xθ)]E[\log f(x| \theta)]\\[10pt]\hspace{30pt} θlogf(xθ)f(xθ0)dx=θlogf(xθ)f(xθ0)dx=(θf(xθ)/f(xθ))f(xθ0)dx\displaystyle \begin{aligned} \frac{\partial}{\partial \theta} \int \log f(x| \theta) f(x| \theta_0) dx &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta_0) dx \\[10pt] &= \int (\frac{\partial}{\partial \theta}f(x| \theta) / f(x| \theta)) f(x| \theta_0) dx \end{aligned}
θ=θ0\hspace{30pt}\theta = \theta_0 時,θf(xθ)θ=θ0dx=θf(xθ0)dx=θf(xθ0)dxθ=θ0=θ1=0\\[10pt]\hspace{30pt}\displaystyle \begin{aligned} \int \frac{\partial}{\partial \theta} f(x| \theta) \Big|_{\theta = \theta_0} dx &= \int \frac{\partial}{\partial \theta} f(x| \theta_0) dx \\[10pt] &= \frac{\partial}{\partial \theta} f(x| \theta_0) dx \Big|_{\theta = \theta_0} \\[10pt] &= \frac{\partial}{\partial \theta} \cdot 1 = 0 \end{aligned}

Fisher information (費雪資訊)

在特定條件下,我們可以定義 θ\theta 的 Fisher information 為 I(θ)=E[(θlogf(xθ))2]=E[2θ2logf(xθ)]\displaystyle I(\theta) = E\left[ \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \right] = -E\left[ \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right]

證明:
0=θf(xθ)dx=θf(xθ)dx=θlogf(xθ)f(xθ)dx=θ(θlogf(xθ)f(xθ))dx=2θ2logf(xθ)f(xθ)dx+θlogf(xθ)θf(xθ)dx=E(2θ2logf(xθ))=E(θlogf(xθ))2\hspace{30pt} \displaystyle \begin{aligned} 0 &= \frac{\partial}{\partial \theta} \int f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \left( \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) \right) dx \\ &= \int \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) f(x| \theta) dx + \int \frac{\partial}{\partial \theta} \log f(x| \theta) \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= E\left( \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right) \\ &= E \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \end{aligned}

漸進常態(asymptotic normality)

在特定條件下,n(θ^θ0)dN(0,I(θ0)1)\displaystyle \sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, I(\theta_0)^{-1}) as nn \rightarrow \infty
我們說 θ^\hat{\theta}θ0\theta_0漸進常態估計量(asymptotically normal estimator)

證明概念:
\hspace{30pt}θ0\theta_0 附近的泰勒展開式:(θ)=(θ0)+(θθ0)(θ0)+...\ell'(\theta) = \ell'(\theta_0) + (\theta - \theta_0) \ell''(\theta_0) + ...\\[10pt]\hspace{30pt}θ=θ^\theta = \hat{\theta}\\[10pt]\hspace{31pt} 0=(θ^)(θ0)+(θ^θ0)(θ0)(θ^θ0)(θ0)(θ0)n(θ^θ0)(1n(θ0)/1n(θ0))N(0,I(θ0)1) , as n\displaystyle \begin{aligned} 0 &= \ell'(\hat{\theta}) \approx \ell'(\theta_0) + (\hat{\theta} - \theta_0) \ell''(\theta_0) \\[10pt] &\Rightarrow (\hat{\theta} - \theta_0) \approx -\frac{\ell'(\theta_0)}{\ell''(\theta_0)} \\[10pt] &\Rightarrow \sqrt{n}(\hat{\theta} - \theta_0) \approx \left( \frac{1}{\sqrt{n}} \ell'(\theta_0) \Big/ -\frac{1}{n} \ell''(\theta_0)\right) \approx N(0, I(\theta_0)^{-1}) \text{ , as } n \rightarrow \infty \end{aligned}

信賴區間(confidence interval)

對於 0<α<10 < \alpha < 1, Z(α)Z(\alpha) 為滿足 P(Z>Z(α))=αP(Z > Z(\alpha)) = \alpha 的常數,ZN(0,1)Z\sim N(0, 1)\\[10pt]因為標準常態分佈是對稱的,所以 P(Z(α)<Z<Z(α))=1αP(-Z(\alpha) < Z < Z(\alpha)) = 1 - \alpha. 因此,P(Z(α2)<Z<Z(α2))=1α\displaystyle P(-Z(\frac{\alpha}{2}) < Z < Z(\frac{\alpha}{2})) = 1 - \alpha

克拉馬-賽米德定理(Cramér-Rao theorem)

假設 X1,X2,...,XnX_1, X_2, ..., X_n 是來自分佈 f(xθ)f(x| \theta) 的樣本,θ\theta 是未知參數\\[10pt]T=t(X1,X2,...,Xn)T = t(X_1, X_2, ..., X_n)θ\theta 的不偏估計量,在某些圓滑條件下: Var(T)1nI(θ)\displaystyle Var(T) \geq \frac{1}{nI(\theta)}

證明:
\hspace{30pt} Let Z=i=1nθlogf(xiθ)=i=1n(θf(xiθ)/f(xiθ))\displaystyle Z=\sum_{i=1}^{n} \frac{\partial}{\partial \theta} \log f(x_i| \theta)=\sum_{i=1}^{n} \left( \frac{\partial}{\partial \theta} f(x_i| \theta) / f(x_i| \theta) \right) \\[10pt]\hspace{30pt} Note that ρ(Z,T)=Cov(Z,T)Var(Z)Var(T)1\displaystyle \rho(Z, T) = \frac{Cov(Z, T)}{\sqrt{Var(Z)Var(T)}} \leq 1 \\[10pt]\hspace{30pt} Cov(Z,T)=E(ZT)E(Z)E(T)=E([i=1nθf(xiθ)f(xiθ)]t(X1,X2,...,Xn))=...t(x1,x2,...,xn)[i=1nθf(xiθ)f(xiθ)](i=1nf(xiθ))dx1dx2...dxn=...t(x1,x2,...,xn)θi=1nf(xiθ)dx1dx2...dxn=θ...t(x1,x2,...,xn)i=1nf(xiθ)dx1dx2...dxn=θE(T)=θθ=1\displaystyle \begin{aligned} Cov(Z, T) &= E(ZT) - E(Z)E(T) \\[10pt] &= E\left( \left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] t(X_1, X_2, ..., X_n)\right) \\[10pt] &= \int ...\int t(x_1, x_2, ..., x_n)\left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] \left( \prod_{i=1}^{n} f(x_i| \theta) \right) dx_1 dx_2 ... dx_n \\[10pt]&= \int ...\int t(x_1, x_2, ..., x_n) \frac{\partial}{\partial \theta} \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} \int ...\int t(x_1, x_2, ..., x_n) \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} E(T) = \frac{\partial}{\partial \theta} \theta = 1\end{aligned}