< Back to Syllabus




1. Review binomial random variable

  • If \(Y\) = number of success in \(n\) independent trials, then \(Y\) is a binomial random variable with \(\pi\) = probability of success.
    • \(Y~B(n, \pi)\)
    • PMF of \(Y\):

\[ p_{Y}(y)=\sum_{n=0}^{n}\left(\begin{array}{l}n \\ k\end{array}\right) \pi(1-\pi) \quad y=0,1, \ldots, n \]

    • Expected value of \(Y\), \(E(Y)\) is determined by

\[ p_{Y}(y)=\sum_{n=0}^{n} y \times\left(\begin{array}{l}n \\ k\end{array}\right) \pi(1-\pi)=n \pi \]



2. Sample Proportion

  • The sample proportion is a random variable and can be defined as:

\[ E(P)=E\left(\frac{Y}{n}\right)=\frac{E(Y)}{n}=\frac{n \pi}{n}=\pi \]

  • This is an example of the method of moments.


Method of Moments

a way of estimating parameters, based on matching a moment of the data-generating distribution with the related moment of the empirical distribution


2.1 The Maximum Likelihood Estimator (MLE)

  • Maximum likelihood estimator (MLE) is the quantity that maximizes the likelihood function.


2.2 The Likelihood Function

  • The likelihood function is the probability mass function (PMF) or density (PDF) evaluated at the data \(X_1,...,X_n\), views as a function of the parameter.
    • Assume we have a set of discrete independent and identically distributed (i.i.d.) random variable’s, \(X_1,...,X_n\) whose distribution depends on a parameter \(\theta\).
    • Denote the PMF of each \(X_i\) as \(p(x \mid \theta)\)
    • the likelihood is then

\[ L(\theta)=\prod_{i=1}^{n} p\left(x_{i} \mid \theta\right) \]

  • But it is often more useful to compute the log-likelihood function:

\[ l(\theta)=\sum_{i=1}^{n} \log p\left(x_{i} \mid \theta\right) \]

  • View the above as a function of the parameter.
    • Thus, it is important to define the parameter space or the set of values that a parameter can take.


2.3 Example: Likelihood of a Binomial Sample

Suppose that we want to test whether a coin is fair, i.e., if the probabilities that it lands on “heads” or “tails” are the same. We can flip the coin a few times, say \(n=15\) and see how many times it give “heads” (\(x=1\)) or “tails” (\(x=0\)). Then \(Y=X_1+...+X_n\) is a **binomial random variable, \(Y~B(n, \pi)\).

  • More formally, we have a series of i.i.d. Bernoulli random variables (which you can think of as a Binomial with \(n=1\)), \(X_1,...,X_n\), such that

\[ X_{i} \sim B(1, \pi) \]

  • We don’t know \(\pi\) since we don’t know whether the coin is fair, but we know the observed values of x_i in that we performed the experiment.
    • We can compute the **likelihood function of the \(n\) Bernoulli trials:

\[ L(\pi)=\prod_{i=1}^{n} \pi^{x_{i}}(1-\pi)^{1-x_{i}}, \quad x_{i} \in\{0,1\} \]

  • And the log-likelihood function is

\[ \begin{aligned} l(\pi) &=\sum_{i=1}^{n} x_{i} \log \pi+\left(1-x_{i}\right) \log (1-\pi) \\ &=n {\log }(1-\pi)+(\log \pi-\log (1-\pi)) \sum_{i=1}^{n} x_{i} \end{aligned} \]

  • R
set.seed(1)
(x <- rbinom(15, size = 1, prob = .5))
##  [1] 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1
loglik <- function(pi, data) {
    sum(log(dbinom(data, size = 1, prob = pi)))
}

loglik(pi = .5, data = x) %>% round(3)
## [1] -10.397
loglik(pi = .4, data = x) %>% round(3)
## [1] -10.906
  • Plot the likelihood over a range of \(\pi\) values
pis <- seq(0.1, 0.9, by = 0.01)
ll <- sapply(pis, loglik, data = x)
plot(pis, ll, type = 'l', col = 2, lwd = 2,
     xlab = expression(pi),
     ylab = 'log-likelihood')

  • The maximum likelihood estimator (MLE) is the parameter value that maximizes the likelihood (or log-likelihood) function.
    • Take the derivative of the log-likelihood function with respect to the parameter, \(\theta\).
    • Find the value of the parameter for which the derivative = 0.

\[ \begin{aligned} \frac{d l(\pi)}{d \pi}=& \frac{1}{\pi(1-\pi)} \sum_{i=1}^{n} x_{i}-\frac{n}{1-\pi} \\=& \frac{\sum_{i=1}^{n} x_{i}-n \pi}{\pi(1-\pi)} \\=& 0 \\ & \hat{\pi}=\frac{1}{n} \sum_{i=1}^{n} x_{i} \end{aligned} \]

  • R
plot(pis, ll, type = 'l', col = 2, lwd = 2,
     xlab = expression(pi),
     ylab = 'log-likelihood')
abline(v = mean(x), lty = 2, lwd = 2)



3. Properties of the estimators

3.1 Properties of estimators: Bias

  • An estimator, \(\hat{\theta}\), is a random variable for which we can compute mean and variance.
  • Can define the bias of an estimator as:

\[ \operatorname{Bias}=E[\hat{\theta}]-\theta \]

  • An unbiased estimator is when

\[ E[\hat{\theta}]=\theta \]


3.2 Mean Squared Error (MSE)

  • When we compare two estimators, we care both bias and variance.
  • We may prefer a biased estimator over an unbiased one, if the bias estimator has smaller variance.
  • Bias-variance tradeoff
  • A measure to compare two estimators is the mean squared error (MSE), which combines bias and variance.

\[ M S E[\hat{\theta}]=E\left[(\hat{\theta}-\theta)^{2}\right]=\operatorname{Var}(\hat{\theta})+\operatorname{Bias}(\hat{\theta})^{2} \]


3.3 Consistency

  • An estimator, \(\hat{\theta}\), is consistent, as \(n\) goes to infinity, it converges in probability to the true parameter value \(\theta\). For any \(\varepsilon>0\).
    • There are two methods.

\[ \begin{array}{c}1) \lim _{n \rightarrow \infty} \operatorname{Pr}\left(\left|\hat{\theta}_{n}-\theta\right|>\varepsilon\right)=0 \\ 2) \lim _{n \rightarrow \infty} M S E\left(\hat{\theta}_{n}\right)=0\end{array} \]


3.4 Example: sample proportion

  • We know that \(E(P)=\pi\), and this is an unbiased estimator of \(\pi\).
  • The MSE of the sample proportion, P, can be determined.
  • We talked about the \(\operatorname{Var}(P)=\frac{\pi(1-\pi)}{n}\)

\[ M S E(P)=\operatorname{Var}(P)+\operatorname{Bias}(P)^{2}=\frac{\pi(1-\pi)}{n}+0 \]

  • This implies that an estimator of the population proportion based on a larger sample size will have a smaller MSE than one based on a samller sample size.
  • To determine whether the sample proportion is a consistent estimator we note that as \(n\) goes to infinity,

\[ M S E\left(P_{n}\right)=\frac{\pi(1-\pi)}{n} \rightarrow 0 \]

  • Thus, the sample proportion is consistent.



4. Why Confidence Intervals?

4.1 Confidence Intervals

  • Statistics is not only about estimating the unknown quantities in the population, but also about estimating the uncertainty of the estimates.
  • Once we have an estimate of our parameter of interest, \(\hat{\theta}\), we want to construct a range of plausible values for the true value of \(\theta\).
  • We want to be confident, say at the 95% level, that the true parameter lies within a certain range (or interval).


Quantiles of the sampling distribution

  • Use the quantiles of the sampling distribution to compute the probability that the parameter lies within the interval.

\[\operatorname{Pr}\left(q_{0.025} \leq \hat{\theta} \leq q_{0.975}\right)=0.95\]

  • Also we often know the exact or approximate distribution of \(\hat{\theta}\). Thus, we can compute the qunatiles to obtain the interval.


4.2 Exact 95% CI for the sample proportion

  • \(Y~B(n, \pi)\)
  • Use the quantiles of the sampling distribution of \(P\) to compute the confidence interval.

\[ \begin{aligned} 0.95 &=\operatorname{Pr}\left(q_{0.025} \leq P \leq q_{0.975}\right) \\ &=\operatorname{Pr}\left(q_{0.025} \leq \frac{Y}{n} \leq q_{0.975}\right) \\ &=\operatorname{Pr}\left(n q_{0.025} \leq Y \leq n q_{0.975}\right) \end{aligned} \]

  • R: n*qbinom(0.025, n, $\pi$) to n*qbinom(0.975, n, $\pi$)

  • To determine the lower limit of the confidence interval, \(P_L\) and the upper limit, \(P_U\) by solving the equations below

  • Upper limit:

\[ \sum_{k=0}^{y}\left(\begin{array}{l}n \\ k\end{array}\right) p_{U}^{k}\left(1-p_{U}^{n-k}\right)=\frac{0.05}{2}=0.025 \]

  • Lower limit:

\[ \sum_{k=0}^{y-1}\left(\begin{array}{l}n \\ k\end{array}\right) p_{L}^{k}\left(1-p_{L}^{n-k}\right)=1-\frac{0.05}{2}=0.975 \]

  • The interval (\(P_L\), \(P_U\)) is an exact 100(1-\(\alpha\))% CI for \(P\).
    • \(\alpha\) = 0.05
    • \(y\) = observed number of successes
    • \(n\) = number of trials
    • \(f_u\) = \(F(y, p_u, n) - \alpha/2\)
    • \(f_1\) = \(F(y-1, p_l, n) - (1-\alpha/2)\)


F is the cumulative density function (CDF) for the binomial distribution.


  • Find the value of \(p_u\) that corresponds to \(f_u=0\) and the value of \(p_l\) that corresponds to \(f_l=0\) using R.
ciLimits <- function(y, n, alpha) {
    fl <- function(p) {
        pbinom(y - 1, n, p) - (1 - alpha/2)
    }
    fu <- function(p) {
        pbinom(y, n, p) - alpha/2
    }
    pl <- uniroot(fl, c(0.01, 0.99))
    pu <- uniroot(fu, c(0.01, 0.99))
    return(c(pl$root, pu$root))
}


4.3 Example: Number of heads in 15 coin flips

Suppose we are interested in determining whether a coin is fair and we flip it 15 times. We observe that there were 4 heads observed. + Point Estimate: \(p=y/n=4/15=0.267\) + CI:

ciLimits(y = 4, n = 15, alpha = 0.10) %>% round(3)
## [1] 0.097 0.511
  • Conclusion: The 90% confidence interval is 0.097, 0.511. Since the interval contains 0.5, the results are consistent with a fair coin.
  • R binom.test(y, n)
    • \(y\) = observed number of successes
    • \(n\) = number of trials.
binom.test(x = 4, n = 15, conf.level = 0.90)
## 
##  Exact binomial test
## 
## data:  4 and 15
## number of successes = 4, number of trials = 15, p-value = 0.1185
## alternative hypothesis: true probability of success is not equal to 0.5
## 90 percent confidence interval:
##  0.09665833 0.51075189
## sample estimates:
## probability of success 
##              0.2666667


4.4 Interpretation of Confidence Interval

  • Be careful in interpreting the result.
  • \(\pi\) is a parameter, not a random variable
  • The randomness comes from \(P\), the sample proportion, which means that the boundaries of the interval are random.
    • Correct: there is a 95% chance the interval contains \(\pi\)
    • Incorrect: there is a 95% chance that \(\pi\) is in the interval.
      • This is incorrect because the population proportion is a parameter, not a random variable. The value of the population proportion does not vary from sample to sample. Only \(P\) and the CI varies from sample to sample.


4.5 One-sided Confidence Interval

  • There are two types of intervals: one-sided and two-sided.
    • Two-sided intervals have both a lower and upper bound. It usually assigns half the \(\alpha\) value to the lower side and half the \(\alpha\) to the upper side.
    • One-sided intervals is used when we are only interested in either the lower or the upper bound. In this case, all of \(\alpha\) goes to one-side.
  • e.g. Suppose we wanted to know whether a coin was biased so that the probability of heads is greater than 0.50. As before, we flip the coin 15 times and observe 4 heads. In this case, we would be interested in having a lower bound for our confidence interval so we can see if the lower bound is greater than 0.50.
binom.test(4,15,conf.level=0.90, alternative="greater")
## 
##  Exact binomial test
## 
## data:  4 and 15
## number of successes = 4, number of trials = 15, p-value = 0.9824
## alternative hypothesis: true probability of success is greater than 0.5
## 90 percent confidence interval:
##  0.1217687 1.0000000
## sample estimates:
## probability of success 
##              0.2666667
    • Conclusion: We can see that this interval contains 0.50 and this is consistent with the coin being fair; there is no evidence that the probability of heads is greater than 0.50.
  • e.g. Suppose we wanted to know whether a coin was biased so that the probability of heads is less than 0.50. As before, we flip the coin 15 times and observe 4 heads. In this case, we would be interested in having an upper bound for our confidence interval so we can see if the upper bound is less than 0.50.
binom.test(4,15,conf.level=0.90, alternative="less")
## 
##  Exact binomial test
## 
## data:  4 and 15
## number of successes = 4, number of trials = 15, p-value = 0.05923
## alternative hypothesis: true probability of success is less than 0.5
## 90 percent confidence interval:
##  0.0000000 0.4639709
## sample estimates:
## probability of success 
##              0.2666667
    • Conclusion: We can see that this interval does NOT contains 0.50 and this is inconsistent with the coin being fair; there is evidence that the probability of heads is less than 0.50.



5. Normal distribution approximation of a binomial distribution

  • When \(n\), \(n \pi\), and \(n(1-\pi)\) are sufficiently large, then the binomial distribution is well approximated by the normal distribution.
\[ B(n, \pi) \sim N(n \pi, \sqrt{n \pi(1-\pi)}) \]


plot(0:500, dbinom(0:500, 500, 0.4),
     type = "h",
     col = "blue",
     xlab = "number of successes",
     ylab = "probability",
     main = " Binomial comparison to Normal")
lines(0:500, dnorm(0:500, 200, sqrt(500 * 0.4 * 0.6)), lwd = 4, col = "red")

plot(180:220, dbinom(180:220, 500, 0.4), type = "h", col = "blue", xlab = "number of successes", 
    ylab = "probability", main = " Binomial comparison to Normal")
lines(180:220, dnorm(180:220, 200, sqrt(500 * 0.4 * 0.6)), lwd = 4, col = "red")

  • The closer \(\pi\) is to 0.5, the better the normal approximation will be.
    • If \(n \le 50\), the approximation will not be good
    • Rule of thumb: \(n \pi(1-\pi) \ge 10\)

5.1 Implications for the confidence interval for the sampling propotion

  • If \(n \pi(1-\pi) \ge 10\) (i.e., sufficiently large), the sampling distribution for the sample proportion can be approximated by a normal distribution with mean \(\pi\) and standard deviation of \(\sqrt{(\pi(1-\pi) / n)}\).
  • So for large \(n\) (i.e., \(n \pi(1-\pi) \ge 10\)), an approximate 100(1-\(\alpha\))% confidence interval for \(\pi\), can be determined with a normal distribution.

5.2 Example: Approximation of 95% Confidence Interval for the population proportion

\[ 0.95=\operatorname{Pr}\left(q_{0.025} \leq \frac{p-\pi}{\sqrt{p(1-p) / n}} \leq q_{0.975}\right) \]

  • The quantity given above has a standard normal distribution.
  • R
qnorm(0.025)
## [1] -1.959964
qnorm(0.975)
## [1] 1.959964

\[ 0.95=\operatorname{Pr}\left(p-q_{0.975} \sqrt{p(1-p) / n} \leq \pi \leq p+q_{0.975} \sqrt{p(1-p) / n}\right) \]

\[ 0.95=\operatorname{Pr}(p-1.96 \sqrt{p(1-p) / n} \leq \pi \leq p+1.96 \sqrt{p(1-p) / n}) \]

  • e.g. Suppose we have a random sample of 500 individuals from a population and that 212 of them are obese (e.g. have a BMI > 30).
  • <1> What is an estimate for the proportion of obese individuals in this population?
binom.test(212, 500, conf.level = 0.90)
## 
##  Exact binomial test
## 
## data:  212 and 500
## number of successes = 212, number of trials = 500, p-value = 0.0007798
## alternative hypothesis: true probability of success is not equal to 0.5
## 90 percent confidence interval:
##  0.3870533 0.4616173
## sample estimates:
## probability of success 
##                  0.424
  • <2> What is the 90% confidence interval obtained from a normal approximation?
(lower <- 0.424 - 1.645 * sqrt(0.424 * 0.576 / 500)) %>% round(3)
## [1] 0.388
(upper <- 0.424 + 1.645 * sqrt(0.424 * 0.576 / 500)) %>% round(3)
## [1] 0.46
(test <- prop.test(212, 500, conf.level = 0.90))
## 
##  1-sample proportions test with continuity correction
## 
## data:  212 out of 500, null probability 0.5
## X-squared = 11.25, df = 1, p-value = 0.0007962
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
##  0.3871687 0.4616718
## sample estimates:
##     p 
## 0.424
  • Using the prop.test() the 90% CI is 0.387, 0.462.


R function gives more accurate estimates is that it uses a correction for continuity. Specifically the binomial distribution is discrete and the normal distribution is continuous so a correction is needed to assign the area under the curve (AUC) to each mass of the binomial.




< Back to Syllabus