This R Markdown document shows how to analyse and interpret multi-arm designs for testing proportions with rpact.

This vignette provides examples of how to analyse a trial with
multiple arms and a binary endpoint. It shows how to calculate the
conditional power at a given stage and to select/deselect treatment
arms. For designs with multiple arms, rpact enables the
analysis using the **closed combination testing
principle**. For a description of the methodology please refer to
Part III of the book “Group Sequential
and Confirmatory Adaptive Designs in Clinical Trials” (Wassmer and
Brannath, 2016).

Suppose the trial was conducted as a multi-arm multi-stage trial with three active treatments arms and a control arm when the trial started. In the interim stages, it should be possible to de-select treatment arms if the treatment effect is too small to show significance - assuming reasonable sample size - at the end of the trial. This should hold true even if a certain sample size increase was taken into account. The endpoint is a failure and it is intended to test each active arm against control. This is to test the hypotheses \[ H_{0i}:\pi_{\text{arm}i} = \pi_\text{control} \qquad\text{against} \qquad H_{1i}:\pi_{\text{arm}i} < \pi_\text{control}\;, \;i = 1,2,3\,,\] in the many-to-one comparisons setting. That is, it is intended to show that the failure rate is smaller in active arms as compared to control and so the power is directed towards negative values of \(\pi_{\text{arm}i} - \pi_\text{control}\).

**First, load the rpact package**

```
library(rpact)
packageVersion("rpact") # version should be version 3.0 or later
```

`## [1] '3.3.2'`

In rpact, we first have to select the combination test with the corresponding stopping boundaries to be used in the closed testing procedure. We choose a design with critical values within the Wang & Tsiatis \(\Delta\)-class of boundaries with \(\Delta = 0.25\). Planning two interim stages and a final stage, assuming equally sized stages, the design is defined through

```
<- getDesignInverseNormal(
designIN kMax = 3, alpha = 0.025,
typeOfDesign = "WT", deltaWT = 0.25
)kable(summary(designIN))
```

**Sequential analysis with a maximum of 3 looks (inverse normal
combination test design)**

Wang & Tsiatis Delta class design (deltaWT = 0.25), one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0544, ASN H1 0.8202, ASN H01 0.9966, ASN H0 1.0489.

Stage | 1 | 2 | 3 |
---|---|---|---|

Information rate | 33.3% | 66.7% | 100% |

Efficacy boundary (z-value scale) | 2.741 | 2.305 | 2.083 |

Stage Levels | 0.0031 | 0.0106 | 0.0186 |

Cumulative alpha spent | 0.0031 | 0.0124 | 0.0250 |

Overall power | 0.1400 | 0.5262 | 0.8000 |

This definition fixes the weights in the combination test which are the same over the three stages. This is a reasonable choice although the amount of information seems to be not the same over the stages (see Wassmer, 2010).

In each treatment and the control arm, subjects were randomized such that around 40 subjects per arm will be observed. Assume that the following actual sample sizes and failures in the control and the three experimental treatment arms were obtained for the first stage of the trial:

Arm | n | Failures |
---|---|---|

Active 1 | 42 | 7 |

Active 2 | 39 | 8 |

Active 3 | 38 | 14 |

Control | 41 | 18 |

These data are defined as an rpact dataset with the function
`getDataset()`

for the later use in
`getAnalysisResults()`

through

```
<- getDataset(
dataRates events1 = 7,
events2 = 8,
events3 = 14,
events4 = 18,
sampleSizes1 = 42,
sampleSizes2 = 39,
sampleSizes3 = 38,
sampleSizes4 = 41
)
```

That is, you can use the `getDataset()`

function in the
usual way and simply extend it to the multiple treatment arms situation.
Note that the arm with the highest index **always refers to the
control group**. For the control group, specifically, it is
**mandatory to enter values over all stages**. As we will
see below, it is possible to omit information of de-selected active
arms.

Using

```
<- getAnalysisResults(
results design = designIN, dataInput = dataRates,
directionUpper = FALSE
)kable(summary(results))
```

one obtains the test results for the first stage of this trial (note
the `directionUpper = FALSE`

specification that yields small
\(p\)-values for negative test
statistics):

**Multi-arm analysis results for a binary endpoint (3 active
arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design). The results were calculated using a multi-arm test for rates (one-sided), Dunnett intersection test, normal approximation test. H0: pi(i) - pi(control) = 0 against H1: pi(i) - pi(control) < 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Efficacy boundary (z-value scale) | 2.741 | 2.305 | 2.083 |

Cumulative alpha spent | 0.0031 | 0.0124 | 0.0250 |

Stage level | 0.0031 | 0.0106 | 0.0186 |

Cumulative effect size (1) | -0.272 | ||

Cumulative effect size (2) | -0.234 | ||

Cumulative effect size (3) | -0.071 | ||

Cumulative treatment rate (1) | 0.167 | ||

Cumulative treatment rate (2) | 0.205 | ||

Cumulative treatment rate (3) | 0.368 | ||

Cumulative control rate | 0.439 | ||

Stage-wise test statistic (1) | -2.704 | ||

Stage-wise test statistic (2) | -2.233 | ||

Stage-wise test statistic (3) | -0.639 | ||

Stage-wise p-value (1) | 0.0034 | ||

Stage-wise p-value (2) | 0.0128 | ||

Stage-wise p-value (3) | 0.2615 | ||

Adjusted stage-wise p-value (1, 2, 3) | 0.0095 | ||

Adjusted stage-wise p-value (1, 2) | 0.0066 | ||

Adjusted stage-wise p-value (1, 3) | 0.0066 | ||

Adjusted stage-wise p-value (2, 3) | 0.0239 | ||

Adjusted stage-wise p-value (1) | 0.0034 | ||

Adjusted stage-wise p-value (2) | 0.0128 | ||

Adjusted stage-wise p-value (3) | 0.2615 | ||

Overall adjusted test statistic (1, 2, 3) | 2.346 | ||

Overall adjusted test statistic (1, 2) | 2.480 | ||

Overall adjusted test statistic (1, 3) | 2.480 | ||

Overall adjusted test statistic (2, 3) | 1.980 | ||

Overall adjusted test statistic (1) | 2.704 | ||

Overall adjusted test statistic (2) | 2.233 | ||

Overall adjusted test statistic (3) | 0.639 | ||

Test action: reject (1) | FALSE | ||

Test action: reject (2) | FALSE | ||

Test action: reject (3) | FALSE | ||

Conditional rejection probability (1) | 0.2647 | ||

Conditional rejection probability (2) | 0.1708 | ||

Conditional rejection probability (3) | 0.0202 | ||

95% repeated confidence interval (1) | [-0.541; 0.038] | ||

95% repeated confidence interval (2) | [-0.514; 0.089] | ||

95% repeated confidence interval (3) | [-0.384; 0.259] | ||

Repeated p-value (1) | 0.0519 | ||

Repeated p-value (2) | 0.0948 | ||

Repeated p-value (3) | 0.4568 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

First of all, at the first interim no hypothesis can be rejected with
the closed combination test. This is seen from the
`test action: reject (i)`

variable. It is remarkable,
however, that the \(p\)-value for the
comparison of treatment arm 1 against control (p = 0.0034) is quite
small and even the \(p\)-value for the
global intersection is (p(1, 2, 3) = 0.0095) is not too far from showing
significance. It is important to know that, by default, the
**Dunnett many-to-one comparison test** for binary data is
used as the test for the intersection hypotheses, and the
**approximate pairwise score test** (which is the signed
square root of the \(\chi^2\) test) is
used for the calculation of the separate \(p\)-values. Note that in this presentation
the intersection tests for the whole closed system of hypotheses is
provided such that the closed test can completely be reproduced.

The repeated \(p\)-values (0.0519,
0.0948, and 0.4568, respectively) precisely correspond with the test
decision meaning that a repeated \(p\)-value is smaller or equal to the
overall significance level (0.025) if and only if the corresponding
hypothesis can be rejected at the considered stage. This direct
correspondence is not generally true for the repeated confidence
intervals (i.e., they can contain the value zero although the null
hypothesis can be rejected), but it is true for the situation at hand.
The repeated confidence intervals can be displayed with the
`plot(, type = 2)`

command by

`plot(results, type = 2)`