Week 3-2. Hypothesis testing for a population proportion

1. Today’s data

There was a study to compare the proportion of patients who were not abusing substances one year after the end of a treatment program. Nationwide, it is known that the rate of relapse after completing a substance abuse program is 50%. A team of researchers has tested a new substance abuse treatment program and found that at one year after the completion of the program, 31 of the individuals out of 50 that were treated were not abusing substances. The researchers want to know whether their treatment program appears better than the existing programs.
If the new treatment were not better than the existing treatments, the researchers would have expected 50% of the patients to have relapsed by one year post treatment.
- Our estimate for the success of the new program would be \(p = 31/50 = 0.62\).
- Is this value far enough from 0.50 to indicate that the new treatment works better than the available treatments?
We can answer this by calculating a confidence interval (perhaps a 95% confidence interval) for the population proportion, \(\pi\), and determining whether the interval contains 0.50.
- If the interval contains 0.50, we would have no evidence that this treatment works because random variability from sample to sample could explain why we got 0.62 when in reality the population proportion is equal to 0.50. We could also use another approach.

2. Sampling distribution of the sample proportion

Sample proportion, \(P=Y/n\) is a random variable.
It is a rescaling of the binomial random variable \(Y∼B(n, \pi)\).
If the treatment had the same success of preventing relapse to substance abuse within one year of the completion of treatment, the number of patients who did not relapse, \(Y\) would have a binomial distribution, \(Y∼B(50, 0.5)\).
The sampling distribution for the sample proportion \(P\) is rescaling of the distribution of \(Y\) by dividing \(Y\) by 50 (or by multiplying \(Y\) by 1/50 ).

2.1 Simulation of the sampling proportion

Use rbinom().
Sampling distribution with 5000 repeats of obtaining a random sample of 50 patients who are treated with standard treatment for substance abuse (so with \(\pi\) for probability of no relapse within one year of completing treatment).
Create vector of 5000 sample proportions determined from a random sample of 50 individuals from a population with a probability of success, 0.50.

sampProps <- rbinom(5000, size = 50, prob = 0.5) / 50

Plot this data and indicate where our observed sample proportion of 0.62, lies within this sampling distribution.

hist(sampProps, 
     xlab = "proportion of patients who did not relapse",
     ylab = "probability",
     main = "PMF for proportion of patients who did not relapse")

abline(v = 0.62, col = 2, lwd = 4)

The researchers claim that their treatment program for substance abuse is better than standard treatment programs.
- Evidence that supports their claim would be values of the sample proportion that are greater than 0.50.
- So looking at the plot above, this would be the values of 0.62 and the probability of getting a value of 0.62 or more is shaded in the histogram.

h <- hist(sampProps, breaks = 20, plot = FALSE)
ccat <- cut(h$breaks, c(-Inf, 0.61, Inf))

plot(h, col = c("white", "indianred")[ccat],
     xlab = "proportion of patients who did not relapse", 
     ylab = "probability", 
     main = "PMF for proportion of patients who did not relapse")
abline(v = 0.62, col = 2, lwd = 4)

The probability of the area above can be calculated:

\[ p \text {-value }=\operatorname{Pr}(T \geq t)=1-F(t) \]

(result <- 1 - pbinom(30, 50, 0.5) %>% round(3))

## [1] 0.059

The probability that the researchers would observe a sample proportion of 0.62 or greater if their treatment only worked as well as standard treatments is 0.059. At this point, there are two choices:
- OPTION 1. The new treatment works as well as the standard treatment and the researchers got a relatively unlikely result (i.e. one that would happen 0.059 of the time).
- OPTION 2. The assumption that the new treatment has the same success rate as the standard treatment is not correct.
Different people may arrive at different conclusions.
- The chance of getting a sample proportion of 0.059 if the new treatment had the same success rate as standard therapy would happen 6 in 100 times.
- Some people would think that is a relatively rare event and so would be likely to believe OPTION 2 and conclude that the new treatment has a better success rate than the standard treatments.
- Other people may believe that there is not enough information to conclude that the new treatment has a better success rate than standard treatment.
Can use this approach to determine whether there is evidence to support scientific claims.
- Specifically, (1) can determine the sampling distribution of the sample statistic under a reasonable assumption and
- 1. can determine whether there is evidence that the sample proportion we obtained is unusual or not under our assumption.

3. Hypothesis testing

Key concepts:
- Hypothesis testing: a formal statistical procedure that quantifies the evidence that supports one of two mutually exclusive hypotheses.
- Null hypothesis (i.e., \(H_0\)): represents the status quo, or the expected result.
  - Status quo depends on the experimental design.
  - (Single group) the status quo would be based on historical values.
  - (Two groups) the status quo would be that the two groups are the same.
- Alternative hypothesis (i.e., \(H_a\)): represents the research hypothesis, what we really believe is true.
- Test statistic: a numercial measure of computing the effect of interests.
- Significance level: the amount of confidence that we have in concluding that the effect is not solely driven by chance.

3.1 Test statistic

Test statistic: a function of the data whose value determines the result of the test. Test statistic is a random variable.
- Denote with \(X_1, ..., X_n\), a simple random sample from a distribution \(F_{\theta}\).
- The function \(T(X_1, ..., X_n)\) is test statistic.

3.2 The null distribution

Null distribution: the distribution of the test statistic if the null hypothesis is true.
Can think of the null distribution as the reference distribution: we determine where the observed value of the test statistic is within the sampling distribution under the null hypothesis.

3.3 \(p\)-value

the probability under the null hypothesis of observing a value as unlikely or more unlikely than the value observed for the test statistic based on sample.
a measure of the distance from the observed statistic to the value of the parameter specified by the null hypothesis.
the probability that chance alone would produce a test statistic as extreme as the one observed.
(For \(p\)-value calculation) need to know (or estimate) the CDF of the test statistic under the null distribution.

\(p\)-value is NOT the probability that the null hypothesis is true.

3.4 Significance level

We define a result as statistically significant if the value of the test statistic is unlikely to occur by chance if the null hypothesis is true.
- “unlikely” can be defined as significance level, \(\alpha\).
Should be set a priori meaning at the time the study is designed.
Common expression: “a result is statistically significant at level \(\alpha\) if the \(p\)-value is such that \(p\)-value < \(\alpha\)”

3.5 Steps of Hypothesis Testing

<1> Decide on the effect that you want to study. Select a suitable test statistic to measure the effect.
<2> Set up a null hypothesis, which is a simple, tractable model that allows you to comput the null distribution.
<3> State the alternative hypothesis. This will be either one-sided or two-sided.
<4> Do the experiment, collect the data, and compute the observed value of the test statistic.
<5> Make a decision: reject the null hypothesis if the \(p\)-value < \(\alpha\), significance level.
<6> Write a conclusion using words in context of the study that generated the data. This should contain an indication of whether the result is statistically significant (report a \(p\)-value) and should give an estimate of the result (including a confidence interval).

4. Hypothesis test example

The researchers want to know if their substance abuse treatment program has a greater proportion of patient who do not relapse within the ﬁrst year after the completion of treatment than the currently available standard treatment.
- \(\alpha\)=0.05
- \(H_0: \pi=0.5\)
  - This is the null hypothesis because the status quo is the currently used standard treatment program it has a historical value of the proportion of patients who do not relapse within the first year of treatment of 0.50.
- \(H_a: \pi>0.5\)
  - This is the alternative hypothesis because the researchers believe that their new treatment program is better than currently used standard.
  - The new treatment program would be deemed better if the proportion of patient who do not relapse within the first year after the completion of treatment is greater than the proportion for the current standard treatment.
- estimator for the population proportion, \(\pi\) is the sample proportion and that its sampling distribution is a rescaled binomial PMF.
- Test-statistic: \(T(Y)=Y/50\)
- Observed value of the statistic is \(t=0.62\).
- \(T\) has a rescaled binomial PMF and, under \(H_0\), \(\pi=0.5\), the null distribution of the test staistic is a rescaled \(B(50, 0.5)\) PMF with a rescaling value of \(1/50\) (meaning we divide the binomial random variable \(Y\), by 50).
- Since it’s one-sided test and the researchers want to reject \(H_0\) if \(\pi>0.5\).
- \(p\)-value = Pr\((T \ge t)\) = 1 - \(F\)(\(t\)), where F() is the CDF of the rescaled binomial distribution.
  - \(p\)-value can be calculated by using pbinom() in R.
  - Pr\((Y \ge 31)\) = 1-Pr\((Y<31)\) = 1-Pr\((Y \le 31)\)

(result <- 1 - pbinom(30, 50, 0.5) %>% round(3))

## [1] 0.059

In this case, \(p\)-value = 0.059 is greater than the significance level, \(\alpha=0.05\).
We do not have enough evidence to reject the null hypotehsis \(H_0\).
Conclusion: there is no evidence that the proportion of patients who do not relapse within the first year after treatment completion with the new treatment program differs from 0.50 (or is better than the standard treatment program).

plot(0:50, dbinom(0:50, 50, 0.5), col="black", type = "h", lwd = 2, xlab = "T", ylab = "PMF", main = "P-value")
points(30:50, dbinom(30:50, 50, 0.5), type="h", lwd = 6, col = "indianred")

The sum of the probabilities of the red region is the \(p\)-value.

5. Key Points

(\(p\)-value < \(\alpha\)) the data show strong evidence against the null hypothesis, hence we reject the null hypothesis.
(\(p\)-value \(\ge\) \(\alpha\)) the data does NOT show strong evidence against the null hypotehsis. The evidence is consistent with the null hypothesis. Thus, we fail to reject the null hypothesis.

Note that we never accept the null hypothesis. (There is no guarantee that the null hypothesis is true!)

All we can conclude if \(p\)-value > \(\alpha\) is that there is no evidence to reject the null hypothesis in favor of the alternative.

This does not mean that there is evidence in favor of the null hypothesis.

(\(p\)-value > \(\alpha\)) This does not mean the null hypothesis is true. No evidence against the null hypothesis is not evidence for the null hypothesis.

6. Why do we do an indirect test?

The problem is that if researchers believe that their treatment for substance abuse is better than the standard treatment.
This means they believe that the proportion of patients who have not relapsed within one year of treatment completion is greater than 0.5.
But to calculate a probability that this is true, based on our test-statistic, we need to know the sampling distribution under this hypothesis.
However, it is just stated that it is believed that the population proportion, \(\pi\) is greater than 0.50 but no specivfic alue is given.
Hence, we do not know which rescaled binomial random variable and distribution would be the one for the sampling distribution.
This is why we take in indirect approach.
If we assume the status quo, then we can compute the sampling distribution if we believe the status quo (i.e., null hypothesis) is true.
But that leaves us with the indirect approach: we assume the null hypothesis is true and then assess the amount of evidence against this assumption on the basis of a \(p\)-value.
If the \(p\)-value is really small, we can either believe:
- <1> we got an a rare or unusual sample, or
- <2> the null hypothesis is not true.
If the \(p\)-value is “small enough”, we conclude <2>, the null hypothesis is not true.
- “small enough” means level of significane, \(\alpha\).

Court example

\(H_0\): The person on trial is innocent.

\(H_a\): The prosecution believes the person is guilty.

Data: The evidence is presented.

Reject \(H_0\): The jury will convict if they believe the individual is guilty “beyond reasonable doubt” which is \(p\)-value < \(\alpha\).

6.1 Convention

It is the convention in performing test of significance to do a two-sided test

6.2 \(p\)-value and confidence interval

It is usually expected that the significance level \(\alpha\) you use for hypothesis testing in an analysis is the same one you use in computing a confidence interval.
(One-sided test) you need to get the corresponding one-sided confidence interval. There are two choices.
- The population parameter is greater than what is hypothesized in the null hypothesis. (i.e., \(H_0: \pi>0.5\))
  - CI: (0, Upper Limit)
- The population parameter is less than what is hypothesized in the null hypothesis. (i.e., \(H_0: \pi<0.5\))
  - CI: (Lower Limit, 1)

NEVER just report a \(p\)-value alone.

ALWAYS give the estimate of the population parameter and corresponding confidence interval in addition.

7. Test for the proportion of one population using `R`

Let \(Y\) be a binomial random variable, \(Y~B(n, \pi)\). Suppose that we observe \(Y=y\). To test
- \(H_0: \pi= \pi_0\)
- \(H_a: \pi \ne \pi_0\)
We compute the probability of observing a value as or more extreme than the observed \(y\).
Also called as the Binomial test.

7.1 Substance abuse treatment program

R: binom.test()

binom.test(31, 50, p = 0.5, alternative = 'greater')

## 
##  Exact binomial test
## 
## data:  31 and 50
## number of successes = 31, number of trials = 50, p-value = 0.05946
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.4939593 1.0000000
## sample estimates:
## probability of success 
##                   0.62

8. Complete example

The goal was to lose at least 5 pounds by week 6 of the study. A person was classified as ‘positive’ if s/he had lost 5 or more pounds by week 6 of the or classified as ‘negative’ is s/he had not lost 5 pounds by week 6. There were a total of 35 individuals enrolled on the study. The data are in an Excel spreadsheet in a comma delimited format. You are asked to analyze the data to determine whether the diet is successful. The investigators would consider the diet to be successful if more than 60% of the 35 individuals are considered ‘positive’ using a significance level of \(\alpha\) = 0.01. It is known that the best diet to date has a rate of 60% of participants lose 5 pounds by week 6. For this study, patients were weighed at baseline and then at week 6. If at week 6 an individual weighed 5 or less pounds compared to baseline, they were labelled ‘positive’, otherwise they were labelled ‘negative.’ The data are in a .csv file called dietData.csv.

dietData <- read.csv('dietData.csv', header = T)
dim(dietData)

## [1] 35  2

names(dietData)

## [1] "id" "x"

dietData %>% head

##   id        x
## 1  1 positive
## 2  2 positive
## 3  3 positive
## 4  4 positive
## 5  5 positive
## 6  6 positive

summary(dietData)

##        id            x            
##  Min.   : 1.0   Length:35         
##  1st Qu.: 9.5   Class :character  
##  Median :18.0   Mode  :character  
##  Mean   :18.0                     
##  3rd Qu.:26.5                     
##  Max.   :35.0

To perform a hypothesis test, it is useful to explicitly state (for yourself) what the null hypothesis and alternative hypotheses are.
- \(H_0: \pi=0.60\); the proportion of individuals who lose 5 of more pounds by week 6 is equal to 0.60
- \(H_a:\pi\ne0.60\); the proportion of individuals who lose 5 of more pounds by week 6 not equal to 0.60

y <- sum(dietData$x == "positive")
testResults <- binom.test(y, 35, p=0.6) 
testResults$con

## [1] 0.6635017 0.9343782
## attr(,"conf.level")
## [1] 0.95

\(p\)-value: 0.005
95% confidence interval for \(\pi\) is 0.664 to 0.934
Based on the \(p\)-value, we would reject the null hypothesis because the \(p\)-value < 0.01.
Conclusion: It appears as though the new diet has a higher proportion of individuals who have lost 5 or more pounds at week 6 compared to baseline (\(p\)-value=0.005). For the new diet, it appears as though the proportion of patients who lost 5 or more pounds at week 6 compared to baseline was 0.829 (95% confidence interval: 0.664 to 0.934)

< Back to Syllabus