STAT201/211 - Summary

Xirius-Topics131-STAT201211.pdf Xirius AI

This document, "Xirius-Topics131-STAT201211.pdf," is a comprehensive educational module for a statistics course (STAT201/211) focusing on Sampling Distributions. It serves as a foundational text for understanding how sample statistics can be used to make inferences about population parameters. The document systematically introduces the concept of a sampling distribution as the probability distribution of a statistic, emphasizing its critical role in inferential statistics.

The module delves into the sampling distributions of various key statistics, including sample means, sample proportions, sample variances, and the differences or ratios of these statistics from two independent samples. For each statistic, it outlines its properties (mean, variance, standard error) and the conditions under which its sampling distribution can be approximated by known theoretical distributions such as the Normal, t, Chi-Square, and F distributions. A significant portion is dedicated to the Central Limit Theorem (CLT), explaining its power in allowing the use of the normal distribution for sample means even when the population distribution is not normal, provided the sample size is sufficiently large.

Through detailed explanations, specific formulas (including their derivations where applicable), and illustrative examples, the document equips students with the knowledge to calculate probabilities related to sample statistics. It covers the necessary standardization techniques (e.g., Z-scores, t-scores, Chi-Square values, F-ratios) and the appropriate degrees of freedom for each distribution. This material is essential for subsequent topics in inferential statistics, such as hypothesis testing and confidence interval estimation, by providing the theoretical basis for evaluating the reliability of sample-based conclusions about populations.

MAIN TOPICS AND CONCEPTS

Sampling Distribution of Sample Means ($\bar{X}$)

This section introduces the sampling distribution of the sample mean, $\bar{X}$, which is a crucial concept for estimating the population mean, $\mu$.

Definition: The sampling distribution of $\bar{X}$ is the probability distribution of all possible sample means that could be drawn from a population of a given size $n$.
Properties:

- Mean of the Sample Means: The expected value of the sample mean is equal to the population mean: $E(\bar{X}) = \mu$. This indicates that $\bar{X}$ is an unbiased estimator of $\mu$.

- Variance of the Sample Means: The variance of the sample mean is the population variance divided by the sample size: $Var(\bar{X}) = \frac{\sigma^2}{n}$.

- Standard Error of the Mean: The standard deviation of the sample mean is $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$. This measures the variability of sample means around the population mean.

Central Limit Theorem (CLT): This fundamental theorem states:

- If the population from which the samples are drawn is normally distributed, then the sampling distribution of $\bar{X}$ is exactly normal for any sample size $n$.

- If the population is not normally distributed, the sampling distribution of $\bar{X}$ will be approximately normal if the sample size $n$ is sufficiently large (typically $n \ge 30$).

Standardization: To use the standard normal distribution table, the sample mean $\bar{X}$ is standardized using the Z-score formula:

Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}

Cases for Population Variance ($\sigma^2$):

- $\sigma^2$ is known: If the population variance is known, the Z-statistic above is used. The sampling distribution of $\bar{X}$ is Normal, either exactly (if population is normal) or approximately (if $n \ge 30$ by CLT).

- $\sigma^2$ is unknown: If the population variance is unknown, it is estimated by the sample variance $s^2$. In this case, if the population is normal, the statistic follows a t-distribution:

t = \frac{\bar{X} - \mu}{s/\sqrt{n}}

This t-distribution has $df = n-1$ degrees of freedom. For large sample sizes ($n \ge 30$), the t-distribution approximates the standard normal distribution.

Example: If $\mu=170$, $\sigma=10$, and $n=100$, find $P(\bar{X} > 172)$.

$Z = \frac{172 - 170}{10/\sqrt{100}} = \frac{2}{1} = 2$. $P(Z > 2) = 1 - P(Z \le 2) = 1 - 0.9772 = 0.0228$.

Sampling Distribution of Sample Proportions ($\hat{p}$)

This section focuses on the sampling distribution of the sample proportion, $\hat{p}$, used to estimate the population proportion, $p$.

Definition: The sample proportion $\hat{p} = X/n$, where $X$ is the number of "successes" in a sample of size $n$.
Properties:

- Mean of the Sample Proportions: $E(\hat{p}) = p$. $\hat{p}$ is an unbiased estimator of $p$.

- Variance of the Sample Proportions: $Var(\hat{p}) = \frac{p(1-p)}{n}$.

- Standard Error of the Proportion: $\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}$.

Normal Approximation: The sampling distribution of $\hat{p}$ can be approximated by a normal distribution if the sample size is sufficiently large, specifically when $np \ge 5$ and $n(1-p) \ge 5$ (some texts use $np \ge 10$ and $n(1-p) \ge 10$).
Standardization: The Z-score for sample proportions is:

Z = \frac{\hat{p} - p}{\sqrt{p(1-p)/n}}

Example: If $p=0.2$, $n=100$, find $P(\hat{p} > 0.25)$.

Conditions: $np = 20 \ge 5$, $n(1-p) = 80 \ge 5$.

$Z = \frac{0.25 - 0.2}{\sqrt{0.2(0.8)/100}} = \frac{0.05}{\sqrt{0.16/100}} = \frac{0.05}{0.04} = 1.25$.

$P(Z > 1.25) = 1 - P(Z \le 1.25) = 1 - 0.8944 = 0.1056$.

Sampling Distribution of Sample Variances ($s^2$)

This section describes the sampling distribution of the sample variance, $s^2$, used to estimate the population variance, $\sigma^2$.

Assumptions: The population from which the sample is drawn must be normally distributed.
Statistic: The statistic used for the sampling distribution of $s^2$ is the Chi-Square ($\chi^2$) statistic:

\chi^2 = \frac{(n-1)s^2}{\sigma^2}

This statistic follows a Chi-Square distribution with $df = n-1$ degrees of freedom.

Properties of Chi-Square Distribution:

- It is a non-negative distribution.

- It is positively skewed (skewed to the right).

- Its shape depends on its degrees of freedom ($df$).

- Mean: $E(\chi^2) = df$.

- Variance: $Var(\chi^2) = 2 \times df$.

Example: If $\sigma^2 = 100$, $n=20$, find $P(s^2 > 150)$.

$\chi^2 = \frac{(20-1) \times 150}{100} = \frac{19 \times 150}{100} = 28.5$.

With $df = 19$, $P(\chi^2 > 28.5) = 0.078$ (from Chi-Square table).

Sampling Distribution of the Difference Between Two Sample Means ($\bar{X}_1 - \bar{X}_2$)

This section covers the sampling distribution of the difference between two independent sample means, used to compare two population means, $\mu_1 - \mu_2$.

Assumptions: Independent random samples are drawn from two populations.
Properties:

- Mean of the Difference: $E(\bar{X}_1 - \bar{X}_2) = \mu_1 - \mu_2$.

- Variance of the Difference: $Var(\bar{X}_1 - \bar{X}_2) = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$.

- Standard Error of the Difference: $\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$.

Normal Approximation:

- If both populations are normally distributed, the sampling distribution of $\bar{X}_1 - \bar{X}_2$ is exactly normal.

- If populations are not normal but both sample sizes are large ($n_1 \ge 30$ and $n_2 \ge 30$), the distribution is approximately normal by the CLT.

Standardization: The Z-score for the difference of means is:

Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

Example: If $\mu_1=170, \sigma_1=10, n_1=100$ and $\mu_2=165, \sigma_2=8, n_2=80$, find $P(\bar{X}_1 - \bar{X}_2 > 8)$.

$\mu_1 - \mu_2 = 5$. $\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{10^2}{100} + \frac{8^2}{80}} = \sqrt{1 + 0.8} = \sqrt{1.8} \approx 1.3416$.

$Z = \frac{8 - 5}{1.3416} = \frac{3}{1.3416} \approx 2.236$.

$P(Z > 2.236) = 1 - P(Z \le 2.236) = 1 - 0.9873 = 0.0127$.

Sampling Distribution of the Difference Between Two Sample Proportions ($\hat{p}_1 - \hat{p}_2$)

This section deals with the sampling distribution of the difference between two independent sample proportions, used to compare two population proportions, $p_1 - p_2$.

Assumptions: Independent random samples are drawn from two binomial populations.
Properties:

- Mean of the Difference: $E(\hat{p}_1 - \hat{p}_2) = p_1 - p_2$.

- Variance of the Difference: $Var(\hat{p}_1 - \hat{p}_2) = \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}$.

- Standard Error of the Difference: $\sigma_{\hat{p}_1 - \hat{p}_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$.

Normal Approximation: The sampling distribution of $\hat{p}_1 - \hat{p}_2$ is approximately normal if all conditions for normal approximation of individual proportions are met: $n_1 p_1 \ge 5, n_1(1-p_1) \ge 5, n_2 p_2 \ge 5, n_2(1-p_2) \ge 5$.
Standardization: The Z-score for the difference

• Xirius AI