Travis build status Codecov test coverage cran version

by Kristian Brock. Documentation is hosted at


escalation provides a grammar for dose-finding clinical trials.

It starts by providing functions to use dose-escalation methodologies like the continual reassessment method (CRM), the Bayesian optimal interval design (BOIN), and the perennial 3+3:

These functions fetch model fitting code. Largely, this is imported from existing R packages. These approaches can then be augmented with extra behaviours to specialise the dose selection process. For example, we can add behaviours to prevent skipping doses, or to stop when we reach a certain sample size. escalation supports the following behaviours:

Each of these functions overrides the way doses are selected or when a design decides to stop the trial. The behaviours can be flexibly combined using the %>% operator from the tidyverse.

These models are then fit to trial outcomes to produce dose recommendations. No matter how the dose selection behaviours were combined, the resulting model fits supports a standard interface. The two most important methods are recommended_dose() to get the current dose selection, and continue() to learn whether the model advocates continuing patient recruitment.

Having defined this nomenclature for combining dose selection behaviours and providing a standard interface for the resulting analyses, it is simple to run simulations or calculate dose-pathways for future cohorts of patients.

escalation provides an object-oriented approach to dose-escalation clinical trials in R. See Usage


Describing outcomes in dose-finding trials

escalation uses a succinct syntax for describing dose-finding outcomes, described in Brock (2019) for the phase I setting and in Brock et al. (2017) for the phase I/II setting.

In a phase I trial, we use the letters:

  • T to show that toxicity occurred in a patient;
  • N to show that toxicity did not occur in a patient.

In a joint phase I/II trial, like those supported by EffTox, where we have coincident efficacy and toxicity outcomes, those relevant letters are:

  • T to show that toxicity without efficacy occurred in a patient;
  • E to show that efficacy without toxicity occurred in a patient;
  • N to show that neither occurred;
  • B to show that both occurred.

These outcome letters are strewn behind integer dose-levels to show the outcomes of patients in cohorts. To show that a cohort a three patients was given dose 2, that the first two patients were without toxicity, but the third patient experienced toxicity, we would use the outcome string:

If that cohort was followed by another cohort of three, all of which were without toxicity, the overall outcome string would be:

And so on. These strings are used in the escalate package to make it easy to fit models to observed outcomes. There are many examples below.

Dose selectors

A core class in the escalation package is the selector. It encapsulates the notion that a general dose-escalation design is able to recommend doses, keep track of how many patients have been treated at what doses, what toxicity outcomes have been seen, and whether a trial should continue. This general interface is true of model-based methods like the CRM and rule-based methods like the 3+3. Irrespective the particular approach used, the interface is consistent.

In this tutorial, we will demonstrate each of the types of selector implemented in the package and how they can be combined to tailor behaviour.

To begin, let us load escalation


At the core of the dose selection process is an algorithm or a model that selects doses in responses to outcomes. The classes capable of performing this core role are:

These last two are implemented natively in escalation. We look at each now.


The continual reassessment method (O’Quigley, Pepe, and Fisher 1990) is implemented in the dfcrm package by Cheung (2013). The very least information we need to provide is a dose-toxicity skeleton, and our target toxicity level. The skeleton represents our prior beliefs on the probabilities of toxicity at each of the doses under investigation. The model iteratively seeks a dose with toxicity probability close to the target.

For illustration, let us say we have

skeleton <- c(0.05, 0.1, 0.25, 0.4, 0.6)
target <- 0.25

We create a dose-selection model using:

model <- get_dfcrm(skeleton = skeleton, target = target)

and we can fit this to outcomes using code like:

The fit object will tell you the dose recommended by the CRM model to be administered next. Depending on your preference for classic R or tidyverse R, you might run:


Either way, you get the same answer. The model advocates skipping straight to dose 4. Clinicians are unlikely to feel comfortable with this. We can respecify the model to expressly not skip doses in escalation. We will do that later on.

For now, let us return to our model fit. We can ask whether the trial should keep going:

Naturally it wants to continue because dfcrm does not implement any stopping rules. Again, we will add various stopping behaviours in sections below.

The CRM-fitting function in dfcrm accepts many arguments to customise the model form and these are passed onwards by get_dfcrm function via the ... parameter. For example, to use the one-parameter logit model in dfcrm (rather than the default empiric model) with the intercept term fixed to take the value 4, we can specify:

intcpt and logistic are the parameter names chosen by the authors of dfcrm.


escalate also implements the BOIN dose-finding design by Liu and Yuan

(2015) via the BOIN package (Yuan and Liu 2018).

In contrast to CRM, BOIN does not require a dose-toxicity skeleton. In its simplest case, it requires merely the number of doses under investigation and our target toxicity level. Continuing with our example above:

target <- 0.25

model <- get_boin(num_doses = 5, target = target)

As before, we can fit the model to some observed outcomes:

and ask the recommended dose:

The BOIN dose selector natively implements stopping rules, as described by Liu & Yuan. For instance, if the bottom dose is too toxic, the design will advise the trial halts:

Notice in this scenario that the recommended dose is NA:

This clarifies that no dose should be recommended for further study. In this setting, this is because all doses are considered too toxic. This is distinct from scenarios where a design advocates stopping a trial and recommending a dose for further study. We will encounter situations like that below.

Since escalation provides many flexible options for stopping, we have made it possible to suppress BOIN’s native stopping rule via use_stopping_rule = TRUE.

Similar to the method described above, extra parameters are passed to the get.boundary function in the BOIN package to customise the escalation procedure. For instance, the boundaries that guide changes in dose are set to be 60% and 140% of the target toxicity rate, by default. To instead use 30% and 170%, we could run:

To observe the effect of the change, note that the default values suppress escalation in this scenario:

get_boin(num_doses = 5, target = target) %>% 
  fit('1NNN 2NNT') %>% 

The parameter names p.saf and p.tox were chosen by the authors of the BOIN package.


The 3+3 method is an old method for dose-escalation that uses fixed cohorts of three and pre-specified rules to govern dose-selection (Korn et al. 1994; Le Tourneau, Lee, and Siu 2009).

To create a 3+3 design, we need no more information than the number of doses under investigation:

As usual, we can fit the model to some outcomes and learn the recommended dose:

Korn et al. (1994) described a variant of 3+3 that permits deescalation to ensure that six patients are treated at a dose before it is recommended. To use that option in our model, we could have run:

model <- get_three_plus_three(num_doses = 5, allow_deescalate = TRUE)

The model would then advocate deescalation if at least two toxicities are seen at a dose and the dose below has fewer than 6 treated patients:


The final dose selector in this section is not really a model at all, so much as a pre-specified path to follow. Let us say that we would like to escalate through the doses in the absence of toxicity, treating two patients at each of the first two doses, and three at the other doses. We can specify such a path in escalation using:

model <- follow_path('1NN 2NN 3NNN 4NNN 5NNN')

When fit to data, the method just returns whatever comes next in the sequence:

When the outcomes diverge from the pre-specified path, however, this selector does not know what to do:

That rather seems to limit its value. The point of this class is that we sometimes want to specify what is occasionally referred to as an initial escalation plan. When trial outcomes diverge from the initial plan, another method takes over. This is a perfect opportunity to show how different selectors can be joined together. Let us say that we wish to follow the initial plan described above, but when the first toxicity event is seen, we want a CRM model to take over. We simply join the functions together using the pipe operator from magrittr:

model <- follow_path('1NN 2NN 3NNN 4NNN 5NNN') %>% 
  get_dfcrm(skeleton = skeleton, target = target)

Now, when trial outcomes diverge from the path, the CRM model analyses all of the outcomes and recommends the next dose:

This concludes our look at the core dose-selecting classes. We now turn our attention to the ways in which these methods can be adapted using extra behaviours.


We saw in the CRM example above that the design undesirably wanted to skip straight to a high dose, without trying some of the lower doses. A simple and very common constraint to impose in dose-finding trials is to avoid skipping untested doses.

Resuming our CRM example, we suppress the skipping of untested doses in escalation with:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  dont_skip_doses(when_escalating = TRUE)

We then fit the model as before:

This time, however, the model advocates dose 3. Previously, it wanted to go straight to dose 4.

We prevented skipping dose in escalation. We could have prevented skipping doses in deescalation with:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  dont_skip_doses(when_deescalating = TRUE)

or both with:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  dont_skip_doses(when_escalating = TRUE, when_deescalating = TRUE)


Let us now investigate some methods that facilitate stopping. The simplest condition on which to stop is when the total sample size reaches some pre-specified level. For instance, we might want to treat a maximum of 15 patients and then stop. To do this, we call the stop_at_n function and append it onto the end of a core dose selector, like this:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  stop_at_n(n = 15)

When this design has seen fewer than 15 patients, it will select doses and advocate that the trial continues. For instance:

fit <- model %>% fit('1NNN 2TNN 2NNN 3NNN')
fit %>% continue()

The design advocates continuing at dose:

In contrast, once 15 patients are seen,

fit <- model %>% fit('1NNN 2TNN 2NNN 3NNN 3NTN')
fit %>% continue()

the design advocates stopping. It is important to note that, even though the design has stopped, it still recommends that a dose be studied at the next trial phase:

This is in contrast to the scenario where a trial is stopped because all doses are inappropriate. In this scenario, the dose recommendation would be NA. We will encounter this in examples below.


Another common approach is to stop a dose-finding experiment when a given number of patients have been treated at a particular dose.

Continuing with our CRM model, to stop when nine patients have been treated at the dose that is about to be recommended again, we use:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  stop_when_n_at_dose(n = 9, dose = 'recommended')

We can observe how this alters the dose-selection model. Here we see six patients treated at dose 2:

fit <- model %>% fit('1NNN 2TNN 2NTN')

The model recommends that dose 2 should be given to more patients:

If the next cohort results in dose 2 being recommended yet again, i.e. to bring the total number of patients at dose 2 to nine or more, the model stops:

fit <- model %>% fit('1NNN 2TNN 2NTN 2NNN')
fit %>% continue()

In this scenario, dose 2 is the final recommended dose and the trial stops gracefully at a pre-specified stopping rule.

This behaviour can also be configured to stop when any dose has been given n times:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  stop_when_n_at_dose(n = 9, dose = 'any')

or when a particular dose-level has been given n times:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  stop_when_n_at_dose(n = 9, dose = 3)

Naturally, you can combine this behaviour with other behaviours. The following model stops the trial when nine patients have been evaluated at the recommended dose or when 21 patients have been treated in total, whichever occurs first:

model <- get_dfcrm(skeleton = skeleton, target = target) %>% 
  stop_when_n_at_dose(n = 9, dose = 'recommended') %>% 
  stop_at_n(n = 21)


The two stopping mechanisms above scrutinise the number of patients treated. In many situations, this will be valuable. However, in other situations, we might want to stop when a threshold amount of statistical information is obtained. One way to achieve this is to stop when the confidence interval or credible interval for the probability of toxicity at a dose is covered by a specified range.

For instance, we know that the BOIN design seeks a target toxicity level, and we have used a target of 25% in our examples. We might say that we are sure enough about the recommended dose when the associated 90% credible interval (because BOIN is a Bayesian design) of the toxicity probability falls in the region 10% - 40%.

model <- get_boin(target = target, num_doses = 5) %>%
  stop_when_tox_ci_covered(dose = 'recommended', lower = 0.10, upper = 0.4)

Say that we observe the following trial path:

fit <- model %>% 
  fit('1NNN 2NTN 2TNN 2NNN 2NNT 2NTN 2NNN 2TNN')

The design recommends dose 2 and it also advocates stopping:

This is because the lower bound of the 90% interval for the probability of toxicity at dose 2 is at least 10%:

and the upper bound is no more than 40%:

It may be intersting to note that our CRM model would not stop in this scenario:

model <- get_dfcrm(skeleton = skeleton, target = target) %>%
  stop_when_tox_ci_covered(dose = 'recommended', lower = 0.10, upper = 0.4)

fit <- model %>% 
  fit('1NNN 2NTN 2TNN 2NNN 2NNT 2NTN 2NNN 2TNN')

fit %>% continue()

This is because the lower bound of the 90% CI falls slightly outside the sought range:

As before, we can specify dose = 'recommended', dose = 'any', or a particular numerical dose-level with dose = 3, for example.

It should be appreciated that this approach only works when the underlying model extends a way of calculating quantiles and uncertainty intervals. The 3+3 lacks a statistical foundation and does not offer quantiles:

get_three_plus_three(num_doses = 5) %>% 
  fit('1NNN 2NTN') %>% 
  prob_tox_quantile(p = 0.05)


The stopping rules considered so far stop a trial and recommend a dose once some critical threshold of information is obtained. We will naturaly want to stop if all doses are too toxic.

We saw above that some model-based dose-finding approaches can calculate quantiles. We can take this idea further and advocate stopping when there is sufficient evidence that the toxicity probability at some dose exceeds a critical threshold. In such circumstances, no dose will be recommended because all doses of the treatment will be deemed to be excessively toxic.

Let us set up a rule to stop and recommend no dose if the probability of toxicity at the lowest dose is too high:

model <- get_dfcrm(skeleton = skeleton, target = target) %>%
  stop_when_too_toxic(dose = 1, tox_threshold = 0.35, confidence = 0.7)

The above examples stops when 70% of the probability mass or posterior distribution of the probability of toxicity at dose 1 exceeds 35%. With an isolated toxicity incidence at dose 1, the model advocates continuing at dose 1:

This is because the probability that the toxicity rate exceeds 35% is less than 70%:

However, with material additional toxicity at dose 1, the design now advocates stopping:

Furthermore, no dose is recommended:

This is because we are now at least 70% sure that the lowest dose is too toxic:

Once again, we can specify dose = 'recommended', dose = 'any', or a particular numerical dose-level with dose = 3, for example. We also require that the underlying model supports the calculation of quantiles. BOIN supports this fucntionality:

model <- get_boin(target = target, num_doses = 5) %>%
  stop_when_too_toxic(dose = 1, tox_threshold = 0.35, confidence = 0.7)

fit <- model %>% fit('1NTN 1TTT')
fit %>% continue()

but a non-statistical method like 3+3 does not.


We have looked at many behaviours that provide stopping. We can also look at some behaviours that delay stopping.

We might want to guarantee that we treat at least n patients at a dose before we permit a dose-finding trial to stop. For instance, we might not feel comfortable recommending a dose for the next phase of study if it has only been evaluated in a small number of patients.

It makes sense for this behaviour to be used with a design that would otherwise stop. Let us say that we would normally like to stop after 18 patients have been treated. However, we will also demand that at least 6 patients be treated at the recommended dose before stopping is allowed, irrespective the overall sample size. We specify:

model <- get_boin(target = target, num_doses = 5) %>% 
  stop_at_n(n = 18) %>% 
  demand_n_at_dose(n = 6, dose = 'recommended')

In the following situation:

fit <- model %>% fit('1NNN 2NNT 3NTN 3NNN 4TTN 3NTT')
fit %>% continue()

the design advocates continuing at dose 2 even though 18 patients have been evaluated. This is because the demand_n_at_dose function is overriding the stopping behaviour of stop_at_n. It is requesting that the trial continue at dose 2 instead of stopping with only three patients treated at the nominal recommended dose.

It is important to recognise that the order of the functions matters. If we flip the order of the constraints in the example above, the outcome is different:

model <- get_boin(target = target, num_doses = 5) %>% 
  demand_n_at_dose(n = 6, dose = 'recommended') %>% 
  stop_at_n(n = 18)

fit <- model %>% fit('1NNN 2NNT 3NTN 3NNN 4TTN 3NTT')
fit %>% continue()

Now the stop_at_n constraint overrides the action of demand_n_at_dose to halt the trial when n=18, even though only three patients have been evaluated at dose 2. It overrides because it comes later in the decision chain. Users should be aware that commands that come later take precedence.

Once again, we can specify dose = 'recommended', dose = 'any', or a particular numerical dose-level with dose = 3, for example.

In summary, the demand_n_at_dose function delays stopping in a scenario when a dose is being selected.


In contrast to demand_n_at_dose, the try_rescue_dose function delays stopping in a scenario where no dose is going to be selected. It overrides a decision to stop and recommend no dose when fewer than n patients have been evaluated at a given dose. Thus, it provides a facility to ensure that some “rescue” dose has been tried before stopping is allowed.

This is another function where effective demonstration requires a design that would normally stop. Let us say that we will stop if we are 80% sure that the toxicity rate at the lowest dose exceeds 35%. But before we stop, we want to ensure that at least two patients have been evaluated at the lowest dose. We write:

model <- get_dfcrm(skeleton = skeleton, target = target) %>%
  stop_when_too_toxic(dose = 1, tox_threshold = 0.35, confidence = 0.8) %>%
  try_rescue_dose(dose = 1, n = 2)

Then, even when this design sees some major toxicity at dose 2:

the design will not advocate stopping, even though the posterior confidence that the tox rate at dose 1 exceeds 35% is greater than 80%:

Once two patients are seen at dose 1, stopping can be countenanced. If those two patients tolerate treatment at dose 1:

then stopping is not advocated because the posterior belief is now that dose 1 is not excessively toxic:

However, if even one of those patients at dose 1 experiences toxicity:

Then the trial stops and no dose is recommended.

The try_rescue_dose function allows researchers to rescue situations where otherwise sensible stopping criteria may prove too sensitive to chance events in very small sample sizes.


This function implements the convex infinite bounds penalisation (CIBP) criterion of Mozgunov and Jaki (2020) that adjusts the way doses are selected in CRM trials. Their method is mindful of the uncertainty in the estimates of the probability of toxicity and uses an asymmetry parameter, 0 < a < 2, to penalise escalation to risky doses. The method alters the way doses are selected but not when the trial should stop. For a < 1, the criterion penalises toxic doses more heavily, making escalation decisions more conservative.

To add the behaviour to a dose-finding design, we run:

model <- get_dfcrm(skeleton = skeleton, target = target) %>%
  select_dose_by_cibp(a = 0.3)

The model is then fit to outcomes in the usual way:

Simulation and dose-paths

We have described at length above the flexible methods that escalation provides to specify dose-escalation designs and tailor trial behaviour. Once designs are specified, we can investigate their operating characteristics by simulation using the simulate_trials function. We can also exhaustively calculate dose recommendations for future cohorts using the get_dose_paths function. Both of these topics are the topics of full vignettes. Please check them out.

Future Plans

I plan to add model-fitting functions for:

  • CRM and EffTox via trialr
  • CRM via bcrm
  • EWOC via ewoc
  • mTPI once I discover how that design is implemented in R.

I want to investigate adding some further stopping functions like those researched by Zohar and Chevret (2001).

Finally, I will investigate adding time-to-event versions of the designs presented here, the so-called TITE designs. These will require a different approach to simulation because cohorts no longer apply.

Getting help

This package is in its infancy. If you want help using it, please contact me.

If you have found a bug, please drop me a line and also log it here:


Brock, Kristian. 2019. “trialr: Bayesian Clinical Trial Designs in R and Stan.” *arXiv E-Prints*, June, arXiv:1907.00161.
Brock, Kristian, Lucinda Billingham, Mhairi Copland, Shamyla Siddique, Mirjana Sirovica, and Christina Yap. 2017. “Implementing the EffTox Dose-Finding Design in the Matchpoint Trial.” *BMC Medical Research Methodology* 17 (1): 112. .
Cheung, Ken. 2013. *Dfcrm: Dose-Finding by the Continual Reassessment Method*. .
Korn, Edward L., Douglas Midthune, T. Timothy Chen, Lawrence V. Rubinstein, Michaele C. Christian, and Richard M. Simon. 1994. “A Comparison of Two Phase I Trial Designs.” *Statistics in Medicine* 13 (18): 1799–1806. .
Le Tourneau, Christophe, J. Jack Lee, and Lillian L. Siu. 2009. “Dose Escalation Methods in Phase I Cancer Clinical Trials.” *Journal of the National Cancer Institute* 101 (10): 708–20. .
Liu, Suyu, and Ying Yuan. 2015. “Bayesian Optimal Interval Designs for Phase I Clinical Trials.” *Journal of the Royal Statistical Society: Series C (Applied Statistics)* 64 (3): 507–23. .
Mozgunov, Pavel, and Thomas Jaki. 2020. “Improving Safety of the Continual Reassessment Method via a Modified Allocation Rule.” *Statistics in Medicine* 39 (7): 906–22. .
O’Quigley, J, M Pepe, and L Fisher. 1990. “Continual Reassessment Method: A Practical Design for Phase 1 Clinical Trials in Cancer.” *Biometrics* 46 (1): 33–48. .
Yuan, Ying, and Suyu Liu. 2018. *BOIN: Bayesian Optimal Interval (Boin) Design for Single-Agent and Drug- Combination Phase I Clinical Trials*. .
Zohar, Sarah, and Sylvie Chevret. 2001. “The Continual Reassessment Method: Comparison of Bayesian Stopping Rules for Dose-Ranging Studies.” *Statistics in Medicine* 20 (19): 2827–43. .