The method of moments (矩估計法)
X ∼ F X \sim F X ∼ F 的第 k k k 個矩為 μ k = E ( X k ) \mu_k = E(X^k) μ k = E ( X k ) ,則 μ k \mu_k μ k 稱為 F F F 的 k k k 次矩。
樣本平均 定義為 μ k = 1 n ∑ i = 1 n X i k \mu_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k μ k = n 1 ∑ i = 1 n X i k ,其中 X 1 , X 2 , . . . , X n X_1, X_2, ..., X_n X 1 , X 2 , . . . , X n 是來自 F F F 的樣本。
當 n → ∞ n \rightarrow \infty n → ∞ 時,樣本平均 μ k \mu_k μ k 會趨近母體的 k k k 次矩 μ k \mu_k μ k 。
θ ^ \hat{\theta} θ ^ 為 θ \theta θ 的估計量。
若 E ( θ ^ ) = θ E(\hat{\theta}) = \theta E ( θ ^ ) = θ ,則 θ ^ \hat{\theta} θ ^ 稱為 θ \theta θ 的不偏估計量(unbiased estimator)
估計量的標準差稱為標準誤差(standard error)
S E ( θ ^ ) = V a r ( θ ^ ) SE(\hat{\theta}) = \sqrt{Var(\hat{\theta})} S E ( θ ^ ) = V a r ( θ ^ )
θ ^ \hat{\theta} θ ^ 的分配稱為抽樣分配(sampling distribution)
E ( λ ^ ) = 1 n ∑ i = 1 n E ( X i ) = λ . \displaystyle E(\hat{\lambda}) = \frac{1}{n} \sum_{i=1}^{n} E(X_i) = \lambda. E ( λ ^ ) = n 1 i = 1 ∑ n E ( X i ) = λ . 因此,λ ^ \hat{\lambda} λ ^ 是 λ \lambda λ 的不偏估計量。
V a r ( λ ^ ) = 1 n 2 V a r ( ∑ i = 1 n X i ) = 1 n 2 ∑ i = 1 n V a r ( X i ) = λ n . \displaystyle Var(\hat{\lambda}) = \frac{1}{n^2} Var(\sum_{i=1}^{n} X_i) = \frac{1}{n^2} \sum_{i=1}^{n} Var(X_i) = \frac{\lambda}{n}. V a r ( λ ^ ) = n 2 1 V a r ( i = 1 ∑ n X i ) = n 2 1 i = 1 ∑ n V a r ( X i ) = n λ . 因此,λ ^ \hat{\lambda} λ ^ 的標準誤差為 λ / n \sqrt{\lambda/n} λ / n
均方誤差(Means Squared Error, MSE):
M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] ≥ 0 = E [ ( θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ) 2 ] = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + 2 E [ ( θ ^ − E ( θ ^ ) ) ( E ( θ ^ ) − θ ) ] + E [ ( E ( θ ^ ) − θ ) 2 ] = V a r ( θ ^ ) + 2 E [ ( θ ^ − E ( θ ^ ) ) ( E ( θ ^ ) − θ ) ] + ( E ( θ ^ ) − θ ) 2 = V a r ( θ ^ ) + ( E ( θ ^ ) − θ ) 2 ≥ 0 \displaystyle \begin{aligned} MSE(\hat{\theta}) &= E[(\hat{\theta} - \theta)^2] \geq 0 \\ &= E[(\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta}) - \theta)^2] \\ &= E[(\hat{\theta} - E(\hat{\theta}))^2] + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + E[(E(\hat{\theta}) - \theta)^2] \\ &= Var(\hat{\theta}) + 2E[(\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)] + (E(\hat{\theta}) - \theta)^2 \\ &= Var(\hat{\theta}) + (E(\hat{\theta}) - \theta)^2 \geq 0 \end{aligned} M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] ≥ 0 = E [ ( θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ) 2 ] = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + 2 E [ ( θ ^ − E ( θ ^ ) ) ( E ( θ ^ ) − θ ) ] + E [ ( E ( θ ^ ) − θ ) 2 ] = V a r ( θ ^ ) + 2 E [ ( θ ^ − E ( θ ^ ) ) ( E ( θ ^ ) − θ ) ] + ( E ( θ ^ ) − θ ) 2 = V a r ( θ ^ ) + ( E ( θ ^ ) − θ ) 2 ≥ 0
如果 θ ^ \hat{\theta} θ ^ 是 θ \theta θ 的不偏估計量,則 M S E ( θ ^ ) = V a r ( θ ^ ) MSE(\hat{\theta}) = Var(\hat{\theta}) M S E ( θ ^ ) = V a r ( θ ^ )
θ ^ 1 \hat{\theta}_1 θ ^ 1 和 θ ^ 2 \hat{\theta}_2 θ ^ 2 是 θ \theta θ 的估計量,如果 M S E ( θ ^ 1 ) ≤ M S E ( θ ^ 2 ) MSE(\hat{\theta}_1) \leq MSE(\hat{\theta}_2) M S E ( θ ^ 1 ) ≤ M S E ( θ ^ 2 ) 則 θ ^ 1 \hat{\theta}_1 θ ^ 1 更好
Maximum likelihood estimation (MLE) (最大概似估計法)
假設 X 1 , X 2 , . . . , X n X_1, X_2, ..., X_n X 1 , X 2 , . . . , X n 有聯合機率密度函數 f ( x 1 , x 2 , . . . , x n ∣ θ ) = f θ ( x 1 , x 2 , . . . , x n ) f(x_1, x_2, ..., x_n| \theta) = f_{\theta}(x_1, x_2, ..., x_n) f ( x 1 , x 2 , . . . , x n ∣ θ ) = f θ ( x 1 , x 2 , . . . , x n ) ,其中 θ \theta θ 是未知參數
對於每個固定觀察值 x 1 , x 2 , . . . , x n x_1, x_2, ..., x_n x 1 , x 2 , . . . , x n ,i = 1 , 2 , . . . , n i=1, 2, ..., n i = 1 , 2 , . . . , n ,概似函數(likelihood function)定義為 L ( θ ) = f θ ( x 1 , x 2 , . . . , x n ) \displaystyle L(\theta) = f_{\theta}(x_1, x_2, ..., x_n) L ( θ ) = f θ ( x 1 , x 2 , . . . , x n )
若 X 1 , X 2 , . . . , X n X_1, X_2, ..., X_n X 1 , X 2 , . . . , X n 是獨立同分配(i.i.d.),則 L ( θ ) = f ( x 1 , x 2 , . . . , x n ∣ θ ) = ∏ i = 1 n f ( x i ∣ θ ) \displaystyle L(\theta) = f(x_1, x_2, ..., x_n| \theta) = \prod_{i=1}^{n} f(x_i| \theta) L ( θ ) = f ( x 1 , x 2 , . . . , x n ∣ θ ) = i = 1 ∏ n f ( x i ∣ θ )
最大概似估計量(Maximum likelihood estimator, MLE) θ ^ \hat{\theta} θ ^ 是使得概似函數 L ( θ ) L(\theta) L ( θ ) 最大的 θ \theta θ 值 θ ^ = arg max L ( θ ) \displaystyle \hat{\theta} = \arg \max L(\theta) θ ^ = arg max L ( θ )
通常對 log L ( θ ) \log L(\theta) log L ( θ ) 進行最大化 :θ ^ = arg max log L ( θ ) = ∑ i = 1 n log f ( x i ∣ θ ) \displaystyle \hat{\theta} = \arg \max \log L(\theta) = \sum_{i=1}^{n} \log f(x_i| \theta) θ ^ = arg max log L ( θ ) = i = 1 ∑ n log f ( x i ∣ θ )
最大概似估計法的大樣本理論
X 1 , X 2 , . . . , X n X_1, X_2, ..., X_n X 1 , X 2 , . . . , X n 是來自分佈 f ( x ∣ θ ) f(x| \theta) f ( x ∣ θ ) 的樣本,θ 0 \theta_0 θ 0 是 θ \theta θ 的真值
一致性
在某些條件下,當樣本數 n → ∞ n \rightarrow \infty n → ∞ 時, θ ^ → P θ 0 \hat{\theta} \xrightarrow{P} \theta_0 θ ^ P θ 0 as n → ∞ n \rightarrow \infty n → ∞ . 我們說 θ ^ \hat{\theta} θ ^ 的最大概似估計量是一致的(uniformly consistent)
證明概念:
\hspace{30pt} 我們想要最大化 1 n ℓ ( θ ) = 1 n ∑ i = 1 n log f ( x i ∣ θ ) \displaystyle \frac{1}{n} \ell(\theta) = \frac{1}{n} \sum_{i=1}^{n} \log f(x_i| \theta)\\[10pt] n 1 ℓ ( θ ) = n 1 i = 1 ∑ n log f ( x i ∣ θ ) \hspace{30pt} 當 n → ∞ n \rightarrow \infty n → ∞ 時,1 n ℓ ( θ ) = ∫ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x \displaystyle \frac{1}{n} \ell(\theta) = \int \log f(x| \theta) f(x| \theta_0) dx\\[10pt]\hspace{30pt} n 1 ℓ ( θ ) = ∫ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x 換句話說,對足夠大的 n n n ,最大化 ℓ ( θ ) \ell(\theta) ℓ ( θ ) 的 θ \theta θ 將會接近最大化 E [ log f ( x ∣ θ ) ] E[\log f(x| \theta)] E [ log f ( x ∣ θ ) ] 的 θ \theta \\[10pt]\hspace{30pt} θ 而要最大化 E [ log f ( x ∣ θ ) ] E[\log f(x| \theta)]\\[10pt]\hspace{30pt} E [ log f ( x ∣ θ ) ] ∂ ∂ θ ∫ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x = ∫ ∂ ∂ θ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x = ∫ ( ∂ ∂ θ f ( x ∣ θ ) / f ( x ∣ θ ) ) f ( x ∣ θ 0 ) d x \displaystyle \begin{aligned} \frac{\partial}{\partial \theta} \int \log f(x| \theta) f(x| \theta_0) dx &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta_0) dx \\[10pt] &= \int (\frac{\partial}{\partial \theta}f(x| \theta) / f(x| \theta)) f(x| \theta_0) dx \end{aligned} ∂ θ ∂ ∫ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x = ∫ ∂ θ ∂ log f ( x ∣ θ ) f ( x ∣ θ 0 ) d x = ∫ ( ∂ θ ∂ f ( x ∣ θ ) / f ( x ∣ θ ) ) f ( x ∣ θ 0 ) d x
θ = θ 0 \hspace{30pt}\theta = \theta_0 θ = θ 0 時,∫ ∂ ∂ θ f ( x ∣ θ ) ∣ θ = θ 0 d x = ∫ ∂ ∂ θ f ( x ∣ θ 0 ) d x = ∂ ∂ θ f ( x ∣ θ 0 ) d x ∣ θ = θ 0 = ∂ ∂ θ ⋅ 1 = 0 \\[10pt]\hspace{30pt}\displaystyle \begin{aligned} \int \frac{\partial}{\partial \theta} f(x| \theta) \Big|_{\theta = \theta_0} dx &= \int \frac{\partial}{\partial \theta} f(x| \theta_0) dx \\[10pt] &= \frac{\partial}{\partial \theta} f(x| \theta_0) dx \Big|_{\theta = \theta_0} \\[10pt] &= \frac{\partial}{\partial \theta} \cdot 1 = 0 \end{aligned} ∫ ∂ θ ∂ f ( x ∣ θ ) ∣ ∣ ∣ ∣ θ = θ 0 d x = ∫ ∂ θ ∂ f ( x ∣ θ 0 ) d x = ∂ θ ∂ f ( x ∣ θ 0 ) d x ∣ ∣ ∣ ∣ θ = θ 0 = ∂ θ ∂ ⋅ 1 = 0
Fisher information (費雪資訊)
在特定條件下,我們可以定義 θ \theta θ 的 Fisher information 為 I ( θ ) = E [ ( ∂ ∂ θ log f ( x ∣ θ ) ) 2 ] = − E [ ∂ 2 ∂ θ 2 log f ( x ∣ θ ) ] \displaystyle I(\theta) = E\left[ \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \right] = -E\left[ \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right] I ( θ ) = E [ ( ∂ θ ∂ log f ( x ∣ θ ) ) 2 ] = − E [ ∂ θ 2 ∂ 2 log f ( x ∣ θ ) ]
證明:
0 = ∂ ∂ θ ∫ f ( x ∣ θ ) d x = ∫ ∂ ∂ θ f ( x ∣ θ ) d x = ∫ ∂ ∂ θ log f ( x ∣ θ ) f ( x ∣ θ ) d x = ∫ ∂ ∂ θ ( ∂ ∂ θ log f ( x ∣ θ ) f ( x ∣ θ ) ) d x = ∫ ∂ 2 ∂ θ 2 log f ( x ∣ θ ) f ( x ∣ θ ) d x + ∫ ∂ ∂ θ log f ( x ∣ θ ) ∂ ∂ θ f ( x ∣ θ ) d x = E ( ∂ 2 ∂ θ 2 log f ( x ∣ θ ) ) = E ( ∂ ∂ θ log f ( x ∣ θ ) ) 2 \hspace{30pt} \displaystyle \begin{aligned} 0 &= \frac{\partial}{\partial \theta} \int f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) dx \\ &= \int \frac{\partial}{\partial \theta} \left( \frac{\partial}{\partial \theta} \log f(x| \theta) f(x| \theta) \right) dx \\ &= \int \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) f(x| \theta) dx + \int \frac{\partial}{\partial \theta} \log f(x| \theta) \frac{\partial}{\partial \theta} f(x| \theta) dx \\ &= E\left( \frac{\partial^2}{\partial \theta^2} \log f(x| \theta) \right) \\ &= E \left( \frac{\partial}{\partial \theta} \log f(x| \theta) \right)^2 \end{aligned} 0 = ∂ θ ∂ ∫ f ( x ∣ θ ) d x = ∫ ∂ θ ∂ f ( x ∣ θ ) d x = ∫ ∂ θ ∂ log f ( x ∣ θ ) f ( x ∣ θ ) d x = ∫ ∂ θ ∂ ( ∂ θ ∂ log f ( x ∣ θ ) f ( x ∣ θ ) ) d x = ∫ ∂ θ 2 ∂ 2 log f ( x ∣ θ ) f ( x ∣ θ ) d x + ∫ ∂ θ ∂ log f ( x ∣ θ ) ∂ θ ∂ f ( x ∣ θ ) d x = E ( ∂ θ 2 ∂ 2 log f ( x ∣ θ ) ) = E ( ∂ θ ∂ log f ( x ∣ θ ) ) 2
漸進常態(asymptotic normality)
在特定條件下,n ( θ ^ − θ 0 ) → d N ( 0 , I ( θ 0 ) − 1 ) \displaystyle \sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, I(\theta_0)^{-1}) n ( θ ^ − θ 0 ) d N ( 0 , I ( θ 0 ) − 1 ) as n → ∞ n \rightarrow \infty n → ∞ ,
我們說 θ ^ \hat{\theta} θ ^ 是 θ 0 \theta_0 θ 0 的漸進常態估計量(asymptotically normal estimator)
證明概念:
\hspace{30pt} 在 θ 0 \theta_0 θ 0 附近的泰勒展開式:ℓ ′ ( θ ) = ℓ ′ ( θ 0 ) + ( θ − θ 0 ) ℓ ′ ′ ( θ 0 ) + . . . \ell'(\theta) = \ell'(\theta_0) + (\theta - \theta_0) \ell''(\theta_0) + ...\\[10pt]\hspace{30pt} ℓ ′ ( θ ) = ℓ ′ ( θ 0 ) + ( θ − θ 0 ) ℓ ′ ′ ( θ 0 ) + . . . 取 θ = θ ^ \theta = \hat{\theta}\\[10pt]\hspace{31pt} θ = θ ^ 0 = ℓ ′ ( θ ^ ) ≈ ℓ ′ ( θ 0 ) + ( θ ^ − θ 0 ) ℓ ′ ′ ( θ 0 ) ⇒ ( θ ^ − θ 0 ) ≈ − ℓ ′ ( θ 0 ) ℓ ′ ′ ( θ 0 ) ⇒ n ( θ ^ − θ 0 ) ≈ ( 1 n ℓ ′ ( θ 0 ) / − 1 n ℓ ′ ′ ( θ 0 ) ) ≈ N ( 0 , I ( θ 0 ) − 1 ) , as n → ∞ \displaystyle \begin{aligned} 0 &= \ell'(\hat{\theta}) \approx \ell'(\theta_0) + (\hat{\theta} - \theta_0) \ell''(\theta_0) \\[10pt] &\Rightarrow (\hat{\theta} - \theta_0) \approx -\frac{\ell'(\theta_0)}{\ell''(\theta_0)} \\[10pt] &\Rightarrow \sqrt{n}(\hat{\theta} - \theta_0) \approx \left( \frac{1}{\sqrt{n}} \ell'(\theta_0) \Big/ -\frac{1}{n} \ell''(\theta_0)\right) \approx N(0, I(\theta_0)^{-1}) \text{ , as } n \rightarrow \infty \end{aligned} 0 = ℓ ′ ( θ ^ ) ≈ ℓ ′ ( θ 0 ) + ( θ ^ − θ 0 ) ℓ ′ ′ ( θ 0 ) ⇒ ( θ ^ − θ 0 ) ≈ − ℓ ′ ′ ( θ 0 ) ℓ ′ ( θ 0 ) ⇒ n ( θ ^ − θ 0 ) ≈ ( n 1 ℓ ′ ( θ 0 ) / − n 1 ℓ ′ ′ ( θ 0 ) ) ≈ N ( 0 , I ( θ 0 ) − 1 ) , as n → ∞
信賴區間(confidence interval)
對於 0 < α < 1 0 < \alpha < 1 0 < α < 1 , Z ( α ) Z(\alpha) Z ( α ) 為滿足 P ( Z > Z ( α ) ) = α P(Z > Z(\alpha)) = \alpha P ( Z > Z ( α ) ) = α 的常數,Z ∼ N ( 0 , 1 ) Z\sim N(0, 1)\\[10pt] Z ∼ N ( 0 , 1 ) 因為標準常態分佈是對稱的,所以 P ( − Z ( α ) < Z < Z ( α ) ) = 1 − α P(-Z(\alpha) < Z < Z(\alpha)) = 1 - \alpha P ( − Z ( α ) < Z < Z ( α ) ) = 1 − α . 因此,P ( − Z ( α 2 ) < Z < Z ( α 2 ) ) = 1 − α \displaystyle P(-Z(\frac{\alpha}{2}) < Z < Z(\frac{\alpha}{2})) = 1 - \alpha P ( − Z ( 2 α ) < Z < Z ( 2 α ) ) = 1 − α
克拉馬-賽米德定理(Cramér-Rao theorem)
假設 X 1 , X 2 , . . . , X n X_1, X_2, ..., X_n X 1 , X 2 , . . . , X n 是來自分佈 f ( x ∣ θ ) f(x| \theta) f ( x ∣ θ ) 的樣本,θ \theta θ 是未知參數\\[10pt] 令 T = t ( X 1 , X 2 , . . . , X n ) T = t(X_1, X_2, ..., X_n) T = t ( X 1 , X 2 , . . . , X n ) 是 θ \theta θ 的不偏估計量,在某些圓滑條件下: V a r ( T ) ≥ 1 n I ( θ ) \displaystyle Var(T) \geq \frac{1}{nI(\theta)} V a r ( T ) ≥ n I ( θ ) 1
證明:
\hspace{30pt} Let Z = ∑ i = 1 n ∂ ∂ θ log f ( x i ∣ θ ) = ∑ i = 1 n ( ∂ ∂ θ f ( x i ∣ θ ) / f ( x i ∣ θ ) ) \displaystyle Z=\sum_{i=1}^{n} \frac{\partial}{\partial \theta} \log f(x_i| \theta)=\sum_{i=1}^{n} \left( \frac{\partial}{\partial \theta} f(x_i| \theta) / f(x_i| \theta) \right) Z = i = 1 ∑ n ∂ θ ∂ log f ( x i ∣ θ ) = i = 1 ∑ n ( ∂ θ ∂ f ( x i ∣ θ ) / f ( x i ∣ θ ) ) \\[10pt]\hspace{30pt} Note that ρ ( Z , T ) = C o v ( Z , T ) V a r ( Z ) V a r ( T ) ≤ 1 \displaystyle \rho(Z, T) = \frac{Cov(Z, T)}{\sqrt{Var(Z)Var(T)}} \leq 1 \\[10pt]\hspace{30pt} ρ ( Z , T ) = V a r ( Z ) V a r ( T ) C o v ( Z , T ) ≤ 1 C o v ( Z , T ) = E ( Z T ) − E ( Z ) E ( T ) = E ( [ ∑ i = 1 n ∂ ∂ θ f ( x i ∣ θ ) f ( x i ∣ θ ) ] t ( X 1 , X 2 , . . . , X n ) ) = ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) [ ∑ i = 1 n ∂ ∂ θ f ( x i ∣ θ ) f ( x i ∣ θ ) ] ( ∏ i = 1 n f ( x i ∣ θ ) ) d x 1 d x 2 . . . d x n = ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) ∂ ∂ θ ∏ i = 1 n f ( x i ∣ θ ) d x 1 d x 2 . . . d x n = ∂ ∂ θ ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) ∏ i = 1 n f ( x i ∣ θ ) d x 1 d x 2 . . . d x n = ∂ ∂ θ E ( T ) = ∂ ∂ θ θ = 1 \displaystyle \begin{aligned} Cov(Z, T) &= E(ZT) - E(Z)E(T) \\[10pt] &= E\left( \left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] t(X_1, X_2, ..., X_n)\right) \\[10pt] &= \int ...\int t(x_1, x_2, ..., x_n)\left[ \sum_{i=1}^{n} \frac{\frac{\partial}{\partial \theta} f(x_i| \theta)}{f(x_i| \theta)} \right] \left( \prod_{i=1}^{n} f(x_i| \theta) \right) dx_1 dx_2 ... dx_n \\[10pt]&= \int ...\int t(x_1, x_2, ..., x_n) \frac{\partial}{\partial \theta} \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} \int ...\int t(x_1, x_2, ..., x_n) \prod_{i=1}^{n} f(x_i| \theta) dx_1 dx_2 ... dx_n \\[10pt] &= \frac{\partial}{\partial \theta} E(T) = \frac{\partial}{\partial \theta} \theta = 1\end{aligned} C o v ( Z , T ) = E ( Z T ) − E ( Z ) E ( T ) = E ( [ i = 1 ∑ n f ( x i ∣ θ ) ∂ θ ∂ f ( x i ∣ θ ) ] t ( X 1 , X 2 , . . . , X n ) ) = ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) [ i = 1 ∑ n f ( x i ∣ θ ) ∂ θ ∂ f ( x i ∣ θ ) ] ( i = 1 ∏ n f ( x i ∣ θ ) ) d x 1 d x 2 . . . d x n = ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) ∂ θ ∂ i = 1 ∏ n f ( x i ∣ θ ) d x 1 d x 2 . . . d x n = ∂ θ ∂ ∫ . . . ∫ t ( x 1 , x 2 , . . . , x n ) i = 1 ∏ n f ( x i ∣ θ ) d x 1 d x 2 . . . d x n = ∂ θ ∂ E ( T ) = ∂ θ ∂ θ = 1