In studies involving time-to-event data, subjects often face multiple possible outcomes, each of which competes with the others. This situation is common in medical research, where patients might experience various events such as recurrence, death from the disease, or death from other causes. For example, in a study on cancer, potential events could include:
Each of these outcomes represents a “competing risk,” and the presence of multiple potential outcomes can complicate the analysis. The challenge arises because the occurrence of one event may alter the probability of observing other events. For instance, if a patient experiences a recurrence, this could increase their likelihood of dying, which means that the times to recurrence and death are not independent.
When faced with multiple possible outcomes, there are two main statistical approaches to analyze the data:
Cause-Specific Hazard: This method focuses on the rate of occurrence of a particular event, considering only those who haven’t experienced any other competing events. It answers the question, “Among those who haven’t experienced a competing event, what is the rate of the event of interest?”
Subdistribution Hazard: This approach looks at the rate of occurrence of an event, including the influence of competing events. It provides insight into the overall impact of an event, considering the presence of other competing risks.
Each approach highlights different aspects of the data, and the choice between them should be driven by the research question. For example, to understand the effect of a treatment on a specific event, the cause-specific hazard might be more informative. On the other hand, if the goal is to evaluate the overall impact of a treatment, including the possibility of competing events, the subdistribution hazard might be preferable.
In an ideal scenario where events are independent (which is rarely the case), the cause-specific hazard approach provides an unbiased estimate. However, when events are dependent, the results can vary depending on the chosen method.
The cumulative incidence calculated using traditional methods, such as 1 minus the Kaplan-Meier estimate, tends to overestimate the actual cumulative incidence. The extent of this overestimation depends on the rates of the events and their dependence. Therefore, it’s crucial to carefully select the appropriate method based on the specific research question and the nature of the data.
For instance, to demonstrate that a covariate is influencing the event of interest, cause-specific hazards are often preferred. On the other hand, if the goal is to establish overall benefit, subdistribution hazards might be more suitable for building predictive models or assessing health economics.
Melanoma
DatasetTo illustrate these concepts, we’ll use the Melanoma
dataset from the {MASS} package. This dataset includes variables such
as:
time
: Survival time in days (potentially censored)status
: Event status (1 = died from melanoma, 2 =
alive, 3 = dead from other causes)sex
: 1 = male, 0 = femaleage
: Age in yearsyear
: Year of operationthickness
: Tumor thickness in mmulcer
: 1 = presence of ulceration, 0 = absenceLet’s start by loading the data and recoding the status
variable for clarity:
# Load the Melanoma dataset
data(Melanoma, package = "MASS")
# Recode the status variable
Melanoma <-
Melanoma %>%
mutate(
status = as.factor(recode(status, `2` = 0, `1` = 1, `3` = 2))
)
Now, the status
variable is recoded as:
status
: 0 = alive, 1 = died from melanoma, 2 = dead
from other causesLet’s take a look at the first six records of the dataset:
head(Melanoma)
## time status sex age year thickness ulcer
## 1 10 2 1 76 1972 6.76 1
## 2 30 2 1 56 1968 0.65 0
## 3 35 0 1 41 1977 1.34 0
## 4 99 2 0 71 1968 2.90 0
## 5 185 1 1 52 1965 12.08 1
## 6 204 1 1 28 1971 4.84 1
To estimate the cumulative incidence of an event in the presence of
competing risks, we can use the cuminc
function from the
{tidycmprsk} package. This function provides a non-parametric estimate
of the cumulative incidence for each event type, taking into account the
competing risks.
cuminc(Surv(time, status) ~ 1, data = Melanoma)
##
## time n.risk estimate std.error 95% CI
## 1,000 171 0.127 0.023 0.086, 0.177
## 2,000 103 0.230 0.030 0.174, 0.291
## 3,000 54 0.310 0.037 0.239, 0.383
## 4,000 13 0.339 0.041 0.260, 0.419
## 5,000 1 0.339 0.041 0.260, 0.419
##
## time n.risk estimate std.error 95% CI
## 1,000 171 0.034 0.013 0.015, 0.066
## 2,000 103 0.050 0.016 0.026, 0.087
## 3,000 54 0.058 0.017 0.030, 0.099
## 4,000 13 0.106 0.032 0.053, 0.179
## 5,000 1 0.106 0.032 0.053, 0.179
We can visualize the cumulative incidence using the
ggcuminc()
function from the {ggsurvfit} package. The plot
below shows the cumulative incidence of death due to melanoma:
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc() +
labs(
x = "Days"
) +
add_confidence_interval() +
add_risktable()
To include both event types in the plot, specify the outcomes in the
ggcuminc(outcome=)
argument:
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc(outcome = c("1", "2")) +
ylim(c(0, 1)) +
labs(
x = "Days"
)
Next, let’s examine the cumulative incidence of death due to
melanoma, stratified by ulceration status. We’ll use the
tbl_cuminc()
function from the {tidycmprsk} package to
create a table of cumulative incidences at various time points, and
we’ll add Gray’s test to compare the groups.
cuminc(Surv(time, status) ~ ulcer, data = Melanoma) %>%
tbl_cuminc(
times = 1826.25,
label_header = "**{time/365.25}-year cuminc**") %>%
add_p()
Characteristic | 5-year cuminc | p-value1 |
---|---|---|
ulcer | <0.001 | |
0 | 9.1% (4.6%, 15%) | |
1 | 39% (29%, 49%) | |
1 Gray’s Test |
We can also plot the cumulative incidence of death due to melanoma according to ulceration status:
cuminc(Surv(time, status) ~ ulcer, data = Melanoma) %>%
ggcuminc() +
labs(
x = "Days"
) +
add_confidence_interval() +
add_risktable()
There are two main approaches to competing risks regression:
Cause-Specific Hazards: This approach estimates
the instantaneous rate of the event of interest in individuals who are
currently event-free. It’s typically done using Cox regression with the
coxph
function.
Subdistribution Hazards: This method estimates
the instantaneous rate of occurrence of the event of interest,
considering the presence of competing events. It can be estimated using
Fine-Gray regression with the crr
function.
Let’s explore the effect of age and sex on the hazard of death due to melanoma, with death from other causes as a competing event. We’ll start with the subdistribution hazards approach:
crr(Surv(time, status) ~ sex + age, data = Melanoma)
##
## Variable Coef SE HR 95% CI p-value
## sex 0.588 0.272 1.80 1.06, 3.07 0.030
## age 0.013 0.009 1.01 0.99, 1.03 0.18
We can generate a table of the results using the
tbl_regression()
function from the {gtsummary} package,
setting exp = TRUE
to obtain hazard ratios:
crr(Surv(time, status) ~ sex + age, data = Melanoma) %>%
tbl_regression(exp = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
sex | 1.80 | 1.06, 3.07 | 0.030 |
age | 1.01 | 0.99, 1.03 | 0.2 |
1 HR = Hazard Ratio, CI = Confidence Interval |
Our analysis reveals that male sex (1 = male, 0 = female) is significantly associated with an increased hazard of death due to melanoma, while age does not show a significant association.
If we were to use the cause-specific hazards approach, we would first need to censor patients who died from other causes, then apply Cox regression:
coxph(
Surv(time, ifelse(status == 1, 1, 0)) ~ sex + age,
data = Melanoma
) %>%
tbl_regression(exp = TRUE)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
sex | 1.82 | 1.08, 3.07 | 0.025 |
age | 1.02 | 1.00, 1.03 | 0.056 |
1 HR = Hazard Ratio, CI = Confidence Interval |
Interestingly, both the cause-specific and subdistribution hazards approaches provide similar results in this case.
Competing risks analysis is a crucial tool in survival analysis, allowing researchers to account for multiple potential outcomes that can influence the event of interest. Understanding the differences between cause-specific and subdistribution hazards, and when to apply each, is essential for accurate data interpretation and decision-making in clinical research.
By carefully selecting the appropriate method, you can ensure that your analysis provides meaningful insights into the event of interest, considering the complexities introduced by competing risks.
For further reading and in-depth understanding, consider referring to the following references: