統計學(四)

2024/6/29 數理化 LaTeX 課堂筆記 Statistics

The method of moments (矩估計法)
Maximum likelihood estimation (MLE) (最大概似估計法)
最大概似估計法的大樣本理論
信賴區間(confidence interval)
克拉馬-賽米德定理(Cramér-Rao theorem)

The method of moments (矩估計法)

$X \sim F$ 的第 $k$ 個矩為 $\mu_k = E(X^k)$ ，則 $\mu_k$ 稱為 $F$ 的 $k$ 次矩。
樣本平均定義為 $\mu_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k$ ，其中 $X_1, X_2, ..., X_n$ 是來自 $F$ 的樣本。
當 $n \rightarrow \infty$ 時，樣本平均 $\mu_k$ 會趨近母體的 $k$ 次矩 $\mu_k$ 。

$\hat{\theta}$ 為 $\theta$ 的估計量。
若 $E(\hat{\theta}) = \theta$ ，則 $\hat{\theta}$ 稱為 $\theta$ 的不偏估計量(unbiased estimator)

估計量的標準差稱為標準誤差(standard error)
$SE(\hat{\theta}) = \sqrt{Var(\hat{\theta})}$
$\hat{\theta}$ $\hat{θ}$ 的分配稱為抽樣分配(sampling distribution)
- $\displaystyle E(\hat{\lambda}) = \frac{1}{n} \sum_{i=1}^{n} E(X_i) = \lambda.$ 因此， $\hat{\lambda}$ 是 $\lambda$ 的不偏估計量。
- $\displaystyle Var(\hat{\lambda}) = \frac{1}{n^2} Var(\sum_{i=1}^{n} X_i) = \frac{1}{n^2} \sum_{i=1}^{n} Var(X_i) = \frac{\lambda}{n}.$ 因此， $\hat{\lambda}$ 的標準誤差為 $\sqrt{\lambda/n}$
均方誤差(Means Squared Error, MSE)：
$\displaystyle \begin{aligned} MSE(\hat{\theta}) &= E[(\hat{\theta} - \theta)^2] \geq 0 \\ &= E[(\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta}) - \theta)^2] \\ &= E[(\hat{\theta} - E(\hat{\theta}))^2] + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + E[(E(\hat{\theta}) - \theta)^2] \\ &= Var(\hat{\theta}) + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + (E(\hat{\theta}) - \theta)^2 \\ &= Var(\hat{\theta}) + (E(\hat{\theta}) - \theta)^2 \geq 0 \end{aligned}$
如果 $\hat{\theta}$ 是 $\theta$ 的不偏估計量，則 $MSE(\hat{\theta}) = Var(\hat{\theta})$
$\hat{\theta}_1$ 和 $\hat{\theta}_2$ 是 $\theta$ 的估計量，如果 $MSE(\hat{\theta}_1) \leq MSE(\hat{\theta}_2)$ 則 $\hat{\theta}_1$ 更好

Maximum likelihood estimation (MLE) (最大概似估計法)

假設 $X_1, X_2, ..., X_n$ 有聯合機率密度函數 $f(x_1, x_2, ..., x_n| \theta) = f_{\theta}(x_1, x_2, ..., x_n)$ ，其中 $\theta$ 是未知參數

對於每個固定觀察值 $x_1, x_2, ..., x_n$ ， $i=1, 2, ..., n$ ，概似函數(likelihood function)定義為 $\displaystyle L(\theta) = f_{\theta}(x_1, x_2, ..., x_n)$
若 $X_1, X_2, ..., X_n$ 是獨立同分配(i.i.d.)，則 $\displaystyle L(\theta) = f(x_1, x_2, ..., x_n| \theta) = \prod_{i=1}^{n} f(x_i| \theta)$
最大概似估計量(Maximum likelihood estimator, MLE) $\hat{\theta}$ 是使得概似函數 $L(\theta)$ 最大的 $\theta$ 值 $\displaystyle \hat{\theta} = \arg \max L(\theta)$
通常對 $\log L(\theta)$ 進行最大化： $\displaystyle \hat{\theta} = \arg \max \log L(\theta) = \sum_{i=1}^{n} \log f(x_i| \theta)$

最大概似估計法的大樣本理論

$X_1, X_2, ..., X_n$ 是來自分佈 $f(x| \theta)$ 的樣本， $\theta_0$ 是 $\theta$ 的真值

一致性

在某些條件下，當樣本數 $n \rightarrow \infty$ 時， $\hat{\theta} \xrightarrow{P} \theta_0$ as $n \rightarrow \infty$ . 我們說 $\hat{\theta}$ 的最大概似估計量是一致的(uniformly consistent)

證明概念：
$\hspace{30pt}$ 我們想要最大化 $\displaystyle \frac{1}{n} \ell(\theta) = \frac{1}{n} \sum_{i=1}^{n} \log f(x_i| \theta)\\[10pt]$ $\hspace{30pt}$ 當 $n \rightarrow \infty$ 時， $\displaystyle \frac{1}{n} \ell(\theta) = \int \log f(x| \theta) f(x| \theta_0) dx\\[10pt]\hspace{30pt}$ 換句話說，對足夠大的 $n$ ，最大化 $\ell(\theta)$ 的 $\theta$ 將會接近最大化 $E[\log f(x| \theta)]$ 的 $\theta \\[10pt]\hspace{30pt}$ 而要最大化 $E[\log f(x| \theta)]\\[10pt]\hspace{30pt}$ $\displaystyle \begin{aligned} \frac{\partial}{\partial \theta} \int \log f(x| \theta) f(x| \theta_0) dx &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta_0) dx \\[10pt] &= \int (\frac{\partial}{\partial \theta}f(x| \theta) / f(x| \theta)) f(x| \theta_0) dx \end{aligned}$
$\hspace{30pt}\theta = \theta_0$ 時， $\\[10pt]\hspace{30pt}\displaystyle \begin{aligned} \int \frac{\partial}{\partial \theta} f(x| \theta) \Big|_{\theta = \theta_0} dx &= \int \frac{\partial}{\partial \theta} f(x| \theta_0) dx \\[10pt] &= \frac{\partial}{\partial \theta} f(x| \theta_0) dx \Big|_{\theta = \theta_0} \\[10pt] &= \frac{\partial}{\partial \theta} \cdot 1 = 0 \end{aligned}$

Fisher information (費雪資訊)

在特定條件下，我們可以定義 $\theta$ 的 Fisher information 為 $\displaystyle I(\theta) = E\left[ \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \right] = -E\left[ \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right]$

證明：
$\hspace{30pt} \displaystyle \begin{aligned} 0 &= \frac{\partial}{\partial \theta} \int f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \left( \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) \right) dx \\ &= \int \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) f(x| \theta) dx + \int \frac{\partial}{\partial \theta} \log f(x| \theta) \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= E\left( \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right) \\ &= E \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \end{aligned}$

漸進常態(asymptotic normality)

在特定條件下， $\displaystyle \sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, I(\theta_0)^{-1})$ as $n \rightarrow \infty$ ，
我們說 $\hat{\theta}$ 是 $\theta_0$ 的漸進常態估計量(asymptotically normal estimator)

證明概念：
$\hspace{30pt}$ 在 $\theta_0$ 附近的泰勒展開式： $\ell'(\theta) = \ell'(\theta_0) + (\theta - \theta_0) \ell''(\theta_0) + ...\\[10pt]\hspace{30pt}$ 取 $\theta = \hat{\theta}\\[10pt]\hspace{31pt}$ $\displaystyle \begin{aligned} 0 &= \ell'(\hat{\theta}) \approx \ell'(\theta_0) + (\hat{\theta} - \theta_0) \ell''(\theta_0) \\[10pt] &\Rightarrow (\hat{\theta} - \theta_0) \approx -\frac{\ell'(\theta_0)}{\ell''(\theta_0)} \\[10pt] &\Rightarrow \sqrt{n}(\hat{\theta} - \theta_0) \approx \left( \frac{1}{\sqrt{n}} \ell'(\theta_0) \Big/ -\frac{1}{n} \ell''(\theta_0)\right) \approx N(0, I(\theta_0)^{-1}) \text{ , as } n \rightarrow \infty \end{aligned}$

信賴區間(confidence interval)

對於 $0 < \alpha < 1$ , $Z(\alpha)$ 為滿足 $P(Z > Z(\alpha)) = \alpha$ 的常數， $Z\sim N(0, 1)\\[10pt]$ 因為標準常態分佈是對稱的，所以 $P(-Z(\alpha) < Z < Z(\alpha)) = 1 - \alpha$ . 因此， $\displaystyle P(-Z(\frac{\alpha}{2}) < Z < Z(\frac{\alpha}{2})) = 1 - \alpha$

克拉馬-賽米德定理(Cramér-Rao theorem)

假設 $X_1, X_2, ..., X_n$ 是來自分佈 $f(x| \theta)$ 的樣本， $\theta$ 是未知參數 $\\[10pt]$ 令 $T = t(X_1, X_2, ..., X_n)$ 是 $\theta$ 的不偏估計量，在某些圓滑條件下： $\displaystyle Var(T) \geq \frac{1}{nI(\theta)}$

證明：
$\hspace{30pt}$ Let $\displaystyle Z=\sum_{i=1}^{n} \frac{\partial}{\partial \theta} \log f(x_i| \theta)=\sum_{i=1}^{n} \left( \frac{\partial}{\partial \theta} f(x_i| \theta) / f(x_i| \theta) \right)$ $\\[10pt]\hspace{30pt}$ Note that $\displaystyle \rho(Z, T) = \frac{Cov(Z, T)}{\sqrt{Var(Z)Var(T)}} \leq 1 \\[10pt]\hspace{30pt}$ $\displaystyle \begin{aligned} Cov(Z, T) &= E(ZT) - E(Z)E(T) \\[10pt] &= E\left( \left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] t(X_1, X_2, ..., X_n)\right) \\[10pt] &= \int ...\int t(x_1, x_2, ..., x_n)\left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] \left( \prod_{i=1}^{n} f(x_i| \theta) \right) dx_1 dx_2 ... dx_n \\[10pt]&= \int ...\int t(x_1, x_2, ..., x_n) \frac{\partial}{\partial \theta} \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} \int ...\int t(x_1, x_2, ..., x_n) \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} E(T) = \frac{\partial}{\partial \theta} \theta = 1\end{aligned}$

LOADING