STA209/229 - Summary

Xirius-ESTIMATIONTHEORY4-STA209229.pdf Xirius AI

This document, "Xirius-ESTIMATIONTHEORY4-STA209229.pdf," provides a comprehensive introduction to Estimation Theory, specifically focusing on Point Estimation. It is designed for students in courses STA209/229, laying the groundwork for understanding how to infer unknown population parameters from sample data.

The document begins by defining fundamental concepts such as parameters, statistics, estimators, and estimates, clarifying the distinction between point and interval estimation. The core of the material then delves into the crucial properties that characterize "good" estimators: unbiasedness, efficiency, consistency, and sufficiency. Each property is explained with its mathematical definition, implications, and illustrative examples. Finally, the document presents two widely used methods for constructing point estimators: the Method of Moments (MOM) and the Method of Maximum Likelihood (MLE), detailing their principles, steps, and applications with various probability distributions.

Overall, this PDF serves as a vital resource for understanding the theoretical underpinnings and practical techniques of point estimation, equipping students with the knowledge to evaluate and construct effective statistical estimators. It emphasizes both the theoretical properties an estimator should possess and the practical methodologies for deriving them, making it a complete guide to the basics of estimation theory.

MAIN TOPICS AND CONCEPTS

Introduction to Estimation Theory

Estimation theory is a branch of statistics concerned with estimating the values of unknown population parameters based on observed sample data. It forms the foundation for making inferences about a population when only a sample is available. The document distinguishes between two main types of estimation:

* Point Estimation: Involves calculating a single value (an estimate) from the sample data to represent the unknown population parameter.

* Interval Estimation: Involves calculating an interval (a range of values) within which the unknown parameter is likely to lie, along with a level of confidence.

This document primarily focuses on point estimation.

Properties of Good Estimators

For an estimator to be considered "good," it should possess certain desirable properties. The document details four key properties:

Unbiasedness

* Detailed explanation: An estimator $\hat{\theta}$ is said to be an unbiased estimator of a parameter $\theta$ if its expected value is equal to the true value of the parameter. This means that, on average, the estimator will hit the true parameter value. If the expected value is not equal to the parameter, the estimator is biased. The bias is the difference between the expected value of the estimator and the true parameter.

* Important formulas/equations:

* An estimator $\hat{\theta}$ is unbiased for $\theta$ if: $E(\hat{\theta}) = \theta$

* The bias of an estimator is: $B(\hat{\theta}) = E(\hat{\theta}) - \theta$

* Mean Squared Error (MSE): A measure of the overall quality of an estimator, which accounts for both its variance and its bias. It is defined as: $MSE(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = Var(\hat{\theta}) + [B(\hat{\theta})]^2$

* Examples:

* The sample mean $\bar{X}$ is an unbiased estimator for the population mean $\mu$.

* The sample variance $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2$ is an unbiased estimator for the population variance $\sigma^2$.

* The estimator $\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2$ is a biased estimator for $\sigma^2$, with $E(\hat{\sigma}^2) = \frac{n-1}{n}\sigma^2$.

Efficiency

* Detailed explanation: An estimator is considered efficient if it has the smallest possible variance among all unbiased estimators. A lower variance indicates that the estimator's values are more tightly clustered around its expected value, leading to more precise estimates.

* Important formulas/equations:

* Relative Efficiency: For two unbiased estimators $\hat{\theta}_1$ and $\hat{\theta}_2$ of $\theta$, the relative efficiency of $\hat{\theta}_1$ to $\hat{\theta}_2$ is: $Eff(\hat{\theta}_1, \hat{\theta}_2) = \frac{Var(\hat{\theta}_2)}{Var(\hat{\theta}_1)}$. If $Eff > 1$, $\hat{\theta}_1$ is more efficient.

* Cramer-Rao Lower Bound (CRLB): A theoretical lower bound for the variance of any unbiased estimator. For an unbiased estimator $\hat{\theta}$ of $\theta$, its variance must satisfy: $Var(\hat{\theta}) \ge \frac{1}{E[(\frac{\partial \ln L}{\partial \theta})^2]} = \frac{1}{-E[\frac{\partial^2 \ln L}{\partial \theta^2}]}$, where $L$ is the likelihood function.

* An efficient estimator is an unbiased estimator whose variance attains the CRLB.

* Example: For a normal distribution $N(\mu, \sigma^2)$, the sample mean $\bar{X}$ is an efficient estimator for $\mu$ because its variance $Var(\bar{X}) = \sigma^2/n$ achieves the CRLB for $\mu$.

Consistency

* Detailed explanation: An estimator $\hat{\theta}_n$ (based on a sample of size $n$) is consistent for a parameter $\theta$ if it converges in probability to $\theta$ as the sample size $n$ approaches infinity. This means that as more data is collected, the estimator becomes increasingly accurate and closer to the true parameter value.

* Important formulas/equations:

* An estimator $\hat{\theta}_n$ is consistent for $\theta$ if: $\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| < \epsilon) = 1$ for any $\epsilon > 0$.

* A sufficient condition for consistency is that the estimator is asymptotically unbiased ($\lim_{n \to \infty} E(\hat{\theta}_n) = \theta$) and its variance approaches zero as $n \to \infty$ ($\lim_{n \to \infty} Var(\hat{\theta}_n) = 0$).

* Examples:

* The sample mean $\bar{X}$ is a consistent estimator for the population mean $\mu$ because it is unbiased and its variance $\sigma^2/n$ tends to zero as $n \to \infty$.

* The sample median is also a consistent estimator for the mean of a normal distribution, though typically less efficient than the sample mean.

Sufficiency

* Detailed explanation: A statistic $T = T(X_1, ..., X_n)$ is sufficient for a parameter $\theta$ if it captures all the information about $\theta$ that is contained in the sample. In other words, once the value of the sufficient statistic is known, no additional information about $\theta$ can be obtained from the original sample data.

* Important formulas/equations:

* Factorization Theorem (Fisher-Neyman Criterion): A statistic $T$ is sufficient for $\theta$ if and only if the joint probability density function (or probability mass function) of the sample can be factored into two parts: $f(x_1, ..., x_n; \theta) = g(T(x_1, ..., x_n); \theta) h(x_1, ..., x_n)$, where $g$ depends on $\theta$ only through $T$, and $h$ does not depend on $\theta$.

* Minimal Sufficient Statistic: A sufficient statistic that is a function of every other sufficient statistic. It represents the most compressed form of the data that still retains all information about the parameter.

* Examples:

* For a Bernoulli distribution with parameter $p$, the sum of the observations $T = \sum X_i$ (number of successes) is a sufficient statistic for $p$.

* For a normal distribution $N(\mu, \sigma^2)$ with known $\sigma^2$, the sum of observations $\sum X_i$ (or equivalently, $\bar{X}$) is sufficient for $\mu$.

* For a normal distribution $N(\mu, \sigma^2)$ with both $\mu$ and $\sigma^2$ unknown, the pair of statistics $(\sum X_i, \sum X_i^2)$ are jointly sufficient for $(\mu, \sigma^2)$.

Methods of Estimation

The document describes two primary methods for constructing point estimators:

Method of Moments (MOM)

* Detailed explanation: The Method of Moments is a technique for estimating parameters by equating sample moments to their corresponding population moments. The number of moments used is equal to the number of parameters to be estimated.

* Key points:

* Population Moments: Theoretical moments of the probability distribution (e.g., $k$-th raw moment $\mu'_k = E(X^k)$).

* Sample Moments: Empirical moments calculated from the observed sample data (e.g., $k$-th raw sample moment $m'_k = \frac{1}{n}\sum X_i^k$).

* Steps:

1. Determine the number of parameters to be estimated.

2. Express the first $k$ population moments in terms of the unknown parameters.

3. Calculate the first $k$ sample moments from the given data.

4. Equate the population moments to their corresponding sample moments.

5. Solve the resulting system of equations for the unknown parameters.

* Properties of MOM Estimators: MOM estimators are generally consistent and asymptotically normal, but they are not necessarily unbiased or efficient in finite samples.

* Examples:

* For a normal distribution $N(\mu, \sigma^2)$, the MOM estimator for $\mu$ is $\hat{\mu}_{MOM} = \bar{X}$.

* For a Poisson distribution with parameter $\lambda$, the MOM estimator for $\lambda$ is $\hat{\lambda}_{MOM} = \bar{X}$.

* For a Gamma distribution with parameters $\alpha$ and $\beta$, the MOM estimators are $\hat{\alpha}_{MOM} = \frac{\bar{X}^2}{S^2}$ and $\hat{\beta}_{MOM} = \frac{S^2}{\bar{X}}$ (where $S^2$ is the unbiased sample variance).

Method of Maximum Likelihood (MLE)

* Detailed explanation: The Method of Maximum Likelihood is a powerful and widely used technique that chooses the parameter values that maximize the likelihood of observing the given sample data. It seeks the parameter values that make the observed data "most probable."

* Key points:

* Likelihood Function: For a random sample $X_1, ..., X_n$ from a distribution with PDF/PMF $f(x; \theta)$, the likelihood function is $L(\theta; x_1, ..., x_n) = \prod_{i=1}^n f(x_i; \theta)$. It is treated as a function of the parameter $\theta$ for fixed sample values.

* Often, it's easier to maximize the natural logarithm of the likelihood function (log-likelihood), $\ln L(\theta)$, because it converts products into sums, simplifying differentiation.

* Steps:

1. Write down the likelihood function $L(\theta)$.

2. Take the natural logarithm of the likelihood function, $\ln L(\theta)$.

3. Differentiate $\ln L(\theta)$ with respect to the parameter(s) $\theta$ and set the derivative(s) to zero (this is known as the score equation).

4. Solve the resulting equation(s) for $\theta$. The solution(s) represent the Maximum Likelihood Estimator(s) (MLEs), denoted as $\hat{\theta}_{MLE}$.

5. (Optional) Verify that the solution corresponds to a maximum by checking the second derivative.

* Properties of MLEs: MLEs are highly desirable estimators. They are often consistent, asymptotically unbiased, and asymptotically efficient (meaning they attain the CRLB for large samples). They also possess the invariance property: if $\hat{\theta}$ is the MLE for $\theta$, then $g(\hat{\theta})$ is the MLE for $g(\theta)$.

* Examples:

* For a Bernoulli distribution with parameter $p$, the MLE for $p$ is $\hat{p}_{MLE} = \bar{X}$.

* For a Poisson distribution with parameter $\lambda$, the MLE for $\lambda$ is $\hat{\lambda}_{MLE} = \bar{X}$.

* For a normal distribution $N(\mu, \sigma^2)$ with known $\sigma^2$, the MLE for $\mu$ is $\hat{\mu}_{MLE} = \bar{X}$.

KEY DEFINITIONS AND TERMS

* Parameter: A numerical characteristic of a population that is typically unknown and needs to be estimated (e.g., population mean $\mu$, population variance $\sigma^2$, population proportion $p$).

* Statistic: A function of the sample observations that does not depend on any unknown parameters. It is a random variable whose value can be calculated from a sample (e.g., sample mean $\bar{X}$, sample variance $S^2$).

* Estimator: A statistic used to estimate an unknown population parameter. It is a rule or formula that tells us how to calculate an estimate from the sample data (e.g., $\bar{X}$ is an estimator for $\mu$).

* Estimate: The specific numerical value obtained from an estimator for a given sample. It is the actual value calculated from a particular set of sample data (e.g., if $\bar{X}=10$ for a sample, then 10 is the estimate of $\mu$).

* Unbiased Estimator: An estimator whose expected value is equal to the true value of the parameter it is estimating. On average, it neither overestimates nor underestimates the parameter.

* Biased Estimator: An estimator whose expected value is not equal to the true value of the parameter. It systematically overestimates or underestimates the parameter.

* Mean Squared Error (MSE): A measure of the overall quality of an estimator, quantifying the average squared difference between the estimator and the true parameter. It combines both the variance of the estimator and its squared bias.

* Efficient Estimator: An unbiased estimator that has the smallest possible variance among all unbiased estimators for a given parameter. It provides the most precise estimates.

* Consistent Estimator: An estimator that converges in probability to the true parameter value as the sample size increases indefinitely. It becomes more accurate with more data.

* Sufficient Statistic: A statistic that summarizes all the information about an unknown parameter contained in a sample. Once the sufficient statistic is known, no further information about the parameter can be extracted from the original sample.

* Likelihood Function: A function of the unknown parameter(s) that expresses the probability (or probability density) of observing the given sample data for different possible values of the parameter(s). It is maximized to find the Maximum Likelihood Estimator.

IMPORTANT EXAMPLES AND APPLICATIONS

Estimating Population Mean ($\mu$):

* Sample Mean ($\bar{X}$): This is a consistently used example throughout the document. It is shown to be an unbiased estimator for $\mu$ ($E(\bar{X}) = \mu$). For a normal distribution, it is also an efficient estimator, achieving the Cramer-Rao Lower Bound. Furthermore, it is a consistent estimator, as its variance $\sigma^2/n$ tends to zero as $n \to \infty$. Both the Method of Moments and the Method of Maximum Likelihood yield $\bar{X}$ as the estimator for $\mu$ under various distributional assumptions (e.g., Normal, Poisson).

Estimating Population Variance ($\sigma^2$):

* Unbiased Sample Variance ($S^2$): The estimator $S^2 = \frac{1}{n-1}\sum (X_i - \bar{X})^2$ is a crucial example of an unbiased estimator for $\sigma^2$. The document provides a detailed derivation of $E(S^2) = \sigma^2$.

* Biased Sample Variance ($\hat{\sigma}^2$): The estimator $\hat{\sigma}^2 = \frac{1}{n}\sum (X_i - \bar{X})^2$ is presented as a biased estimator for $\sigma^2$, with $E(\hat{\sigma}^2) = \frac{n-1}{n}\sigma^2$. This highlights the importance of the $(n-1)$ denominator for unbiasedness.

Estimating Bernoulli Parameter ($p$):

For a Bernoulli distribution, the number of successes $T = \sum X_i$ is shown to be a *

• Xirius AI