Introduction to Competing Risk Analysis

In studies involving time-to-event data, subjects often face multiple possible outcomes, each of which competes with the others. This situation is common in medical research, where patients might experience various events such as recurrence, death from the disease, or death from other causes. For example, in a study on cancer, potential events could include:

  • Recurrence of cancer
  • Death due to cancer
  • Death from unrelated causes
  • Response to treatment

Each of these outcomes represents a “competing risk,” and the presence of multiple potential outcomes can complicate the analysis. The challenge arises because the occurrence of one event may alter the probability of observing other events. For instance, if a patient experiences a recurrence, this could increase their likelihood of dying, which means that the times to recurrence and death are not independent.


Approaches to Competing Risk Analysis

When faced with multiple possible outcomes, there are two main statistical approaches to analyze the data:

  1. Cause-Specific Hazard: This method focuses on the rate of occurrence of a particular event, considering only those who haven’t experienced any other competing events. It answers the question, “Among those who haven’t experienced a competing event, what is the rate of the event of interest?”

  2. Subdistribution Hazard: This approach looks at the rate of occurrence of an event, including the influence of competing events. It provides insight into the overall impact of an event, considering the presence of other competing risks.

Each approach highlights different aspects of the data, and the choice between them should be driven by the research question. For example, to understand the effect of a treatment on a specific event, the cause-specific hazard might be more informative. On the other hand, if the goal is to evaluate the overall impact of a treatment, including the possibility of competing events, the subdistribution hazard might be preferable.


Key Considerations in Competing Risk Analysis

In an ideal scenario where events are independent (which is rarely the case), the cause-specific hazard approach provides an unbiased estimate. However, when events are dependent, the results can vary depending on the chosen method.

The cumulative incidence calculated using traditional methods, such as 1 minus the Kaplan-Meier estimate, tends to overestimate the actual cumulative incidence. The extent of this overestimation depends on the rates of the events and their dependence. Therefore, it’s crucial to carefully select the appropriate method based on the specific research question and the nature of the data.

For instance, to demonstrate that a covariate is influencing the event of interest, cause-specific hazards are often preferred. On the other hand, if the goal is to establish overall benefit, subdistribution hazards might be more suitable for building predictive models or assessing health economics.


The Melanoma Dataset

To illustrate these concepts, we’ll use the Melanoma dataset from the {MASS} package. This dataset includes variables such as:

  • time: Survival time in days (potentially censored)
  • status: Event status (1 = died from melanoma, 2 = alive, 3 = dead from other causes)
  • sex: 1 = male, 0 = female
  • age: Age in years
  • year: Year of operation
  • thickness: Tumor thickness in mm
  • ulcer: 1 = presence of ulceration, 0 = absence

Let’s start by loading the data and recoding the status variable for clarity:

# Load the Melanoma dataset
data(Melanoma, package = "MASS")

# Recode the status variable
Melanoma <- 
  Melanoma %>% 
  mutate(
    status = as.factor(recode(status, `2` = 0, `1` = 1, `3` = 2))
  )

Now, the status variable is recoded as:

  • status: 0 = alive, 1 = died from melanoma, 2 = dead from other causes

Let’s take a look at the first six records of the dataset:

head(Melanoma)
##   time status sex age year thickness ulcer
## 1   10      2   1  76 1972      6.76     1
## 2   30      2   1  56 1968      0.65     0
## 3   35      0   1  41 1977      1.34     0
## 4   99      2   0  71 1968      2.90     0
## 5  185      1   1  52 1965     12.08     1
## 6  204      1   1  28 1971      4.84     1


Estimating Cumulative Incidence in the Presence of Competing Risks

To estimate the cumulative incidence of an event in the presence of competing risks, we can use the cuminc function from the {tidycmprsk} package. This function provides a non-parametric estimate of the cumulative incidence for each event type, taking into account the competing risks.

cuminc(Surv(time, status) ~ 1, data = Melanoma)
## 
## time    n.risk   estimate   std.error   95% CI          
## 1,000   171      0.127      0.023       0.086, 0.177    
## 2,000   103      0.230      0.030       0.174, 0.291    
## 3,000   54       0.310      0.037       0.239, 0.383    
## 4,000   13       0.339      0.041       0.260, 0.419    
## 5,000   1        0.339      0.041       0.260, 0.419    
## 
## time    n.risk   estimate   std.error   95% CI          
## 1,000   171      0.034      0.013       0.015, 0.066    
## 2,000   103      0.050      0.016       0.026, 0.087    
## 3,000   54       0.058      0.017       0.030, 0.099    
## 4,000   13       0.106      0.032       0.053, 0.179    
## 5,000   1        0.106      0.032       0.053, 0.179

We can visualize the cumulative incidence using the ggcuminc() function from the {ggsurvfit} package. The plot below shows the cumulative incidence of death due to melanoma:

cuminc(Surv(time, status) ~ 1, data = Melanoma) %>% 
  ggcuminc() + 
  labs(
    x = "Days"
  ) + 
  add_confidence_interval() +
  add_risktable()

To include both event types in the plot, specify the outcomes in the ggcuminc(outcome=) argument:

cuminc(Surv(time, status) ~ 1, data = Melanoma) %>% 
  ggcuminc(outcome = c("1", "2")) +
  ylim(c(0, 1)) + 
  labs(
    x = "Days"
  )