In studies involving time-to-event data, subjects often face multiple possible outcomes, each of which competes with the others. This situation is common in medical research, where patients might experience various events such as recurrence, death from the disease, or death from other causes. For example, in a study on cancer, potential events could include:
Each of these outcomes represents a “competing risk,” and the presence of multiple potential outcomes can complicate the analysis. The challenge arises because the occurrence of one event may alter the probability of observing other events. For instance, if a patient experiences a recurrence, this could increase their likelihood of dying, which means that the times to recurrence and death are not independent.
When faced with multiple possible outcomes, there are two main statistical approaches to analyze the data:
Cause-Specific Hazard: This method focuses on the rate of occurrence of a particular event, considering only those who haven’t experienced any other competing events. It answers the question, “Among those who haven’t experienced a competing event, what is the rate of the event of interest?”
Subdistribution Hazard: This approach looks at the rate of occurrence of an event, including the influence of competing events. It provides insight into the overall impact of an event, considering the presence of other competing risks.
Each approach highlights different aspects of the data, and the choice between them should be driven by the research question. For example, to understand the effect of a treatment on a specific event, the cause-specific hazard might be more informative. On the other hand, if the goal is to evaluate the overall impact of a treatment, including the possibility of competing events, the subdistribution hazard might be preferable.
In an ideal scenario where events are independent (which is rarely the case), the cause-specific hazard approach provides an unbiased estimate. However, when events are dependent, the results can vary depending on the chosen method.
The cumulative incidence calculated using traditional methods, such as 1 minus the Kaplan-Meier estimate, tends to overestimate the actual cumulative incidence. The extent of this overestimation depends on the rates of the events and their dependence. Therefore, it’s crucial to carefully select the appropriate method based on the specific research question and the nature of the data.
For instance, to demonstrate that a covariate is influencing the event of interest, cause-specific hazards are often preferred. On the other hand, if the goal is to establish overall benefit, subdistribution hazards might be more suitable for building predictive models or assessing health economics.
Melanoma
DatasetTo illustrate these concepts, we’ll use the Melanoma
dataset from the {MASS} package. This dataset includes variables such
as:
time
: Survival time in days (potentially censored)status
: Event status (1 = died from melanoma, 2 =
alive, 3 = dead from other causes)sex
: 1 = male, 0 = femaleage
: Age in yearsyear
: Year of operationthickness
: Tumor thickness in mmulcer
: 1 = presence of ulceration, 0 = absenceLet’s start by loading the data and recoding the status
variable for clarity:
# Load the Melanoma dataset
data(Melanoma, package = "MASS")
# Recode the status variable
Melanoma <-
Melanoma %>%
mutate(
status = as.factor(recode(status, `2` = 0, `1` = 1, `3` = 2))
)
Now, the status
variable is recoded as:
status
: 0 = alive, 1 = died from melanoma, 2 = dead
from other causesLet’s take a look at the first six records of the dataset:
head(Melanoma)
## time status sex age year thickness ulcer
## 1 10 2 1 76 1972 6.76 1
## 2 30 2 1 56 1968 0.65 0
## 3 35 0 1 41 1977 1.34 0
## 4 99 2 0 71 1968 2.90 0
## 5 185 1 1 52 1965 12.08 1
## 6 204 1 1 28 1971 4.84 1
To estimate the cumulative incidence of an event in the presence of
competing risks, we can use the cuminc
function from the
{tidycmprsk} package. This function provides a non-parametric estimate
of the cumulative incidence for each event type, taking into account the
competing risks.
cuminc(Surv(time, status) ~ 1, data = Melanoma)
##
## time n.risk estimate std.error 95% CI
## 1,000 171 0.127 0.023 0.086, 0.177
## 2,000 103 0.230 0.030 0.174, 0.291
## 3,000 54 0.310 0.037 0.239, 0.383
## 4,000 13 0.339 0.041 0.260, 0.419
## 5,000 1 0.339 0.041 0.260, 0.419
##
## time n.risk estimate std.error 95% CI
## 1,000 171 0.034 0.013 0.015, 0.066
## 2,000 103 0.050 0.016 0.026, 0.087
## 3,000 54 0.058 0.017 0.030, 0.099
## 4,000 13 0.106 0.032 0.053, 0.179
## 5,000 1 0.106 0.032 0.053, 0.179
We can visualize the cumulative incidence using the
ggcuminc()
function from the {ggsurvfit} package. The plot
below shows the cumulative incidence of death due to melanoma:
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc() +
labs(
x = "Days"
) +
add_confidence_interval() +
add_risktable()
To include both event types in the plot, specify the outcomes in the
ggcuminc(outcome=)
argument:
cuminc(Surv(time, status) ~ 1, data = Melanoma) %>%
ggcuminc(outcome = c("1", "2")) +
ylim(c(0, 1)) +
labs(
x = "Days"
)