We’ve made a large effort to make Tplyr tables flexible, but not everything can (or, in some cases, we think should) be handled during table construction itself. To address this, Tplyr has several post-processing functions that help put finishing touches on your data to help with presentation.
Certain types of output formats don’t elegantly handle string wrapping of text. For some formats, this could simply be the width of the text before a word wraps. Other formats, such as LaTex outputs to PDF (depending on the rendering engine) may wrap white space fine, but words whose character length is longer than the allotted width may print wider than the table cell.
To address this, in Tplyr we’ve added the function
str_indent_wrap()
. This is largely built on top of the
function stringr::str_wrap()
to preserve as much efficiency
as possible, but two issues common in clinical outputs have been
addressed:
As a post-processing function, note that this function works on a tibble or data.frame object, and not a Tplyr table.
Let’s look at an example.
<- tibble(
dat row_label1 = c("RENAL AND URINARY DISORDERS", " NEPHROLITHIASIS"),
var1_Placebo = c(" 5 (50.0%)", " 3 (30.0%)")
)
%>%
dat mutate(
row_label1 = str_indent_wrap(row_label1, width = 10)
)#> # A tibble: 2 × 2
#> row_label1 var1_Placebo
#> <chr> <chr>
#> 1 "RENAL AND\nURINARY\nDISORDERS" " 5 (50.0%)"
#> 2 " NEPHROLIT-\n HIASIS" " 3 (30.0%)"
Note: We’re viewing the data frame output here because HTML based outputs eliminate duplicate white spaces, which makes it difficult to see things like padded indentation
Row masking is the process blanking of repeat row values within a data frame to give the appearance of grouping variables. Some table packages, such as gt, will handle this for you. Other packages, like huxtable, have options like merging cells, but this may be a more simplistic approach. Furthermore, this is a common approach in clinical tables when data validation is done on an output dataframe.
<- tplyr_table(adsl, TRT01P) %>%
dat add_layer(
group_count(RACE, by = "Race n (%)")
%>%
) build() %>%
select(1:3)
kable(dat)
row_label1 | row_label2 | var1_Placebo |
---|---|---|
Race n (%) | AMERICAN INDIAN OR ALASKA NATIVE | 0 ( 0.0%) |
Race n (%) | BLACK OR AFRICAN AMERICAN | 8 ( 9.3%) |
Race n (%) | WHITE | 78 ( 90.7%) |
In this example, note that “Race n (%)” is duplicated for each row. We can blank this out using `apply_row_masks()
%>%
dat apply_row_masks() %>%
kable()
row_label1 | row_label2 | var1_Placebo |
---|---|---|
Race n (%) | AMERICAN INDIAN OR ALASKA NATIVE | 0 ( 0.0%) |
BLACK OR AFRICAN AMERICAN | 8 ( 9.3%) | |
WHITE | 78 ( 90.7%) |
A second feature of apply_row_masks()
is the ability to
apply row breaks between different groups of data, for example,
different layers of a table.
<- tplyr_table(adsl, TRT01P) %>%
dat add_layer(
group_count(RACE, by = "Race n (%)")
%>%
) add_layer(
group_desc(AGE, by = "Age (years)")
%>%
) build()
%>%
dat apply_row_masks(row_breaks=TRUE) %>%
kable()
row_label1 | row_label2 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 | ord_layer_2 | ord_break |
---|---|---|---|---|---|---|---|---|
Race n (%) | AMERICAN INDIAN OR ALASKA NATIVE | 0 ( 0.0%) | 1 ( 1.2%) | 0 ( 0.0%) | 1 | 1 | 1 | 1 |
BLACK OR AFRICAN AMERICAN | 8 ( 9.3%) | 9 ( 10.7%) | 6 ( 7.1%) | 1 | 1 | 2 | 1 | |
WHITE | 78 ( 90.7%) | 74 ( 88.1%) | 78 ( 92.9%) | 1 | 1 | 3 | 1 | |
1 | NA | NA | 2 | |||||
Age (years) | n | 86 | 84 | 84 | 2 | 1 | 1 | 1 |
Mean (SD) | 76.3 ( 8.59) | 75.9 ( 7.89) | 77.4 ( 8.29) | 2 | 1 | 2 | 1 | |
Median | 76.0 | 76.0 | 77.5 | 2 | 1 | 3 | 1 | |
Q1, Q3 | 69.0, 81.0 | 70.0, 80.0 | 71.0, 82.0 | 2 | 1 | 4 | 1 | |
Min, Max | 52, 89 | 56, 88 | 51, 88 | 2 | 1 | 5 | 1 | |
Missing | 0 | 0 | 0 | 2 | 1 | 6 | 1 | |
2 | NA | NA | 2 |
The row breaks are inserted as blank rows. Additionally, when row
breaks are inserted you’ll have the additional variable
ord_break
added to the dataframe, where the value is 1 for
table data rows and 2 for the newly added break rows. Character
variables will have blank values (i.e. ""
) and the numeric
sorting values will be NA
.
There are a few considerations when using
apply_row_masks()
:
apply_row_masks()
ord_layer_index
is used. You can submit other variables via
the ellipsis parameter (...
) if you’d like to use a
different variable grouping to insert rowsIn some circumstances, like add_total_row()
,
Tplyr lets you specify special formats separate from
those in set_format_strings()
. But within the table body
there’s no other way to set specific, conditional formats based on the
table data itself. To address this, we’ve added the post-processing
function apply_conditional_format()
to allow you to set
conditional formats on result cells.
apply_conditional_format()
operates on a character
vector, so it can generally be used within the context of
dplyr::mutate()
like any other character modifying
function. It will make a 1:1 replacement of values, so it will return a
character vector of equal length. Let’s look at two examples.
<- c(" 0 (0.0%)", " 8 (9.3%)", "78 (90.7%)")
string
apply_conditional_format(string, 2, x == 0, " 0 ", full_string=TRUE)
#> [1] " 0 " " 8 (9.3%)" "78 (90.7%)"
apply_conditional_format(string, 2, x == 0, "")
#> [1] " 0 " " 8 (9.3%)" "78 (90.7%)"
The two examples above achieve the same result, but they work slightly differently. Let’s walk throug the syntax:
8 (9.3%)
, there are two format groups. The
value of the first is 8. The value of the second is 9.3. This controls
the number that we’ll use to establish our condition.x
, which takes on the value of the from your chosen format
group. This condition should be a filter condition, so it must return a
boolean vector of TRUE/FALSE.Finally, the last parameter is full_string
, and this is
the difference between the first and second examples.
apply_conditional_format()
can do two types of
replacements. The first example is a full string replacement. In this
case, whatever value you provide as the replacement is used verbatim
when the condition evaluates to TRUE
. If this is set to
false, the only the format group specified is replaced.
For more context, let’s look at a third example.
apply_conditional_format(string, 2, x < 1, "(<1%)")
#> [1] " 0 (<1%)" " 8 (9.3%)" "78 (90.7%)"
In this example we target the percent field using format group 2, and
now our replacement text is (<1%)
. If
full_string
uses its default value of FALSE
,
only the format group specified is replaced.
apply_conditional_format()
establishes the width of the
format group targeted, and the replacement text will right align within
that targetted portion of the string to ensure that the alignment of the
string is preserved.
For any example within a Tplyr result dataframe,
let’s look at the example dataset from earlier. Let’s say that we have
n (%)
values within the first count layer to conditional
format. Using some fancy dplyr code, and
apply_conditional_format()
, we can make it happen.
<- dat %>%
dat_new mutate(
across(starts_with('var'), # Apply to variables that start with `var`
~ if_else(
== 1, # Target the count layer
ord_layer_index apply_conditional_format(
string = ., # This is dplyr::across syntax
format_group = 2, # The percent field
condition = x == 0, # Our condition
replacement = "" # Replacement value
),
.
)
)
)
kable(dat_new)
row_label1 | row_label2 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 | ord_layer_2 |
---|---|---|---|---|---|---|---|
Race n (%) | AMERICAN INDIAN OR ALASKA NATIVE | 0 | 1 ( 1.2%) | 0 | 1 | 1 | 1 |
Race n (%) | BLACK OR AFRICAN AMERICAN | 8 ( 9.3%) | 9 ( 10.7%) | 6 ( 7.1%) | 1 | 1 | 2 |
Race n (%) | WHITE | 78 ( 90.7%) | 74 ( 88.1%) | 78 ( 92.9%) | 1 | 1 | 3 |
Age (years) | n | 86 | 84 | 84 | 2 | 1 | 1 |
Age (years) | Mean (SD) | 76.3 ( 8.59) | 75.9 ( 7.89) | 77.4 ( 8.29) | 2 | 1 | 2 |
Age (years) | Median | 76.0 | 76.0 | 77.5 | 2 | 1 | 3 |
Age (years) | Q1, Q3 | 69.0, 81.0 | 70.0, 80.0 | 71.0, 82.0 | 2 | 1 | 4 |
Age (years) | Min, Max | 52, 89 | 56, 88 | 51, 88 | 2 | 1 | 5 |
Age (years) | Missing | 0 | 0 | 0 | 2 | 1 | 6 |
The syntax here gets a bit complicated, by using
dplyr::across()
we can apply the same function across each
of the result variables, which in this the variable names start with
var
. The function here is using a purrr
style anonymous function for simplicity. There are a couple ways you can
do this in R. Referencing the documentation of purrr::map()
:
~ . + 1
. You must use .
to
refer to the first argument. Only recommended if you require backward
compatibility with older versions of R.\(x) x + 1
or
function(x) x + 1
.Within that function, we’re additionally using if_else()
to only apply this function on the first layer by using the
ord_layer_index
variable. All this together, we’re
effectively running the apply_conditional_formats()
function only on the first layer, and running it across all of the
variables that start with var
.
When Tplyr outputs a result, using
set_format_strings()
and f_str()
the result
are concatenated together within a single result cell. For example,
within a count layer in a Tplyr table, there’s no way
to directly output the n
and pct
values in
separate columns. If this is a result you want, then in a
post-processing step you can use the function
str_extract_fmt_group()
.
str_extract_fmt_group()
allows you to reach within a
result string and extract an individual format group. Consider this
example:
<- c(" 5 (5.8%)", " 8 (9.3%)", "78 (90.7%)")
string
# Get the n counts
str_extract_fmt_group(string, 1)
#> [1] " 5" " 8" "78"
# Get the pct counts
str_extract_fmt_group(string, 2)
#> [1] "(5.8%)" "(9.3%)" "(90.7%)"
In the first call to str_extract_fmt_group()
, we target
the n counts. The first format group from each string is extracted,
preserving the allotted width of that portion of the string. Similarly,
in the second group we extract all the percent counts, including the
surround parentheses.
In practice, str_extract_fmt_group()
can then be used to
separate format groups into their own columns.
<- tplyr_table(adsl, TRT01P) %>%
dat add_layer(
group_count(RACE)
%>%
) build()
%>%
dat mutate(
across(starts_with('var'),
~ str_extract_fmt_group(., 1),
.names = "{.col}_n"),
across(starts_with('var'),
~ str_extract_fmt_group(., 2),
.names = "{.col}_pct")
%>%
) select(row_label1, var1_Placebo_n, var1_Placebo_pct) %>%
kable()
row_label1 | var1_Placebo_n | var1_Placebo_pct |
---|---|---|
AMERICAN INDIAN OR ALASKA NATIVE | 0 | ( 0.0%) |
BLACK OR AFRICAN AMERICAN | 8 | ( 9.3%) |
WHITE | 78 | ( 90.7%) |
For the sake of display, in this output I only select the Placebo
column, but note that we were able to dynamically separate out the
n
results from the pct
results. In some cases,
functions such as tidyr::separate()
could also be used to
get a result like this, but str_extract_fmt_group()
specifically targets the expected formatting of format groups, without
having to craft a specific expression that may get confused over things
like spaces in unexpected places.
In very much the same vein as str_extract_fmt_group()
,
the function str_extract_num()
allows you to target a
format group and extract the number from within. This can be used in any
circumstance where you may want to pull a number out of a result cell,
but probably the best example of this would be for a highly specific
sort sequence.
Consider an adverse event table. In vignette("sort")
we
go over circumstances where you may want to sort by the descending
occurrence of a result. We’ve received questions about how to establish
tie breakers in this scenario, where ties should be broken sorting
descending occurrence of an adverse event within the high dose group,
then the low dose group, and finally the placebo group.
Tplyr doesn’t allow you to output these order variables
by default, but getting these numbers is quite simple with
str_extract_num()
. Let’s consider a simplified scenario
<- tplyr_table(adae, TRTA) %>%
dat set_pop_data(adsl) %>%
set_pop_treat_var(TRT01A) %>%
add_layer(
group_count(AEDECOD) %>%
set_format_strings(f_str("xx (XX.x%) [A]", distinct_n, distinct_pct, n)) %>%
set_distinct_by(USUBJID)
%>%
) build()
%>%
dat head() %>%
kable()
row_label1 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 |
---|---|---|---|---|---|
ACTINIC KERATOSIS | 0 (0.0%) [0] | 1 (1.2%) [1] | 0 (0.0%) [0] | 1 | 1 |
ALOPECIA | 1 (1.2%) [1] | 0 (0.0%) [0] | 0 (0.0%) [0] | 1 | 2 |
BLISTER | 0 (0.0%) [0] | 1 (1.2%) [2] | 5 (6.0%) [8] | 1 | 3 |
COLD SWEAT | 1 (1.2%) [3] | 0 (0.0%) [0] | 0 (0.0%) [0] | 1 | 4 |
DERMATITIS ATOPIC | 1 (1.2%) [1] | 0 (0.0%) [0] | 0 (0.0%) [0] | 1 | 5 |
DERMATITIS CONTACT | 0 (0.0%) [0] | 0 (0.0%) [0] | 1 (1.2%) [2] | 1 | 6 |
Given this data, let’s say we want to sort by descending occurrence of the event, using the number of subjects. That would be the first format group. And then we want to sort using high dose, then low dose, then placebo. Let’s create the order variables.
<- dat %>%
dat_ord mutate(
across(starts_with('var1'),
~str_extract_num(., 1),
.names = "{.col}_ord")
)
%>%
dat_ord head() %>%
select(row_label1, matches('^var.*ord$'))
#> # A tibble: 6 × 4
#> row_label1 var1_Placebo_ord `var1_Xanomeline High Dose_ord` var1_Xan…¹
#> <chr> <dbl> <dbl> <dbl>
#> 1 ACTINIC KERATOSIS 0 1 0
#> 2 ALOPECIA 1 0 0
#> 3 BLISTER 0 1 5
#> 4 COLD SWEAT 1 0 0
#> 5 DERMATITIS ATOPIC 1 0 0
#> 6 DERMATITIS CONTACT 0 0 1
#> # … with abbreviated variable name ¹`var1_Xanomeline Low Dose_ord`
Now we effectively have additional order variables necessary to do the sort sequence desired.
The last post processing function worth mentioning isn’t necessarily
meant for post-processing data from Tplyr itself. We
understand that Tplyr can’t produce every single
summary you’d need for a clinical trial - and we never intended it to be
able to do this. But we built Tplyr to try to work
effectively with other packages and tools. Tplyr’s
string formatting tools work quite well, so we’ve externalized this
capability using the function apply_formats()
.
As a basic example, let’s look at the mtcars
data.
%>%
mtcars mutate(
new_column = apply_formats("xx (xx.x)", gear, mpg)
%>%
) select(gear, mpg, new_column) %>%
head() %>%
kable()
gear | mpg | new_column | |
---|---|---|---|
Mazda RX4 | 4 | 21.0 | 4 (21.0) |
Mazda RX4 Wag | 4 | 21.0 | 4 (21.0) |
Datsun 710 | 4 | 22.8 | 4 (22.8) |
Hornet 4 Drive | 3 | 21.4 | 3 (21.4) |
Hornet Sportabout | 3 | 18.7 | 3 (18.7) |
Valiant | 3 | 18.1 | 3 (18.1) |
Here we were able to leverage the string formatting available in
f_str()
, but apply it generically in another data frame
within dplyr::mutate()
. This allows you to format other
data, outside of Tplyr, but still bring some of the
quality of life that Tplyr has to offer.