mathml: Translate R expressions to MathML and LaTeX/MathJax

Matthias Gondan and Irene Alfarone (Department of Psychology, Universität Innsbruck, Austria)

2023-01-14

This R package translates R objects to suitable elements in MathML or LaTeX, thereby allowing for a pretty mathematical representation of R objects and functions in data analyses, scientific reports and interactive web content. In the RMarkdown document rendering language, R code and mathematical content already exist side-by-side. The present package allows to use the same R objects for both data analysis and typesetting in documents or web content. This tightens the link between the statistical analysis and its verbal description or symbolic representation, which is another step towards reproducible science. User-defined hooks allow to extend the package by mapping specific variables or functions to new MathML and LaTeX entities. A few working examples illustrate the use of the package in a scientific report.

Introduction

The R extension of the markdown language (Xie, Dervieux, and Riederer 2020) enables reproducible statistical reports with nice typesetting in HTML, Microsoft Word, and Latex. Since recently (R Core Team 2022, version 4.2), R's manual pages include support for mathematical expressions (Sarkar and Hornik 2022; Viechtbauer 2022), which already is a big improvement. However, rules for the mapping of built-in language elements to their mathematical representation are still lacking. So far, R expressions such as pbinom(k, N, p) are printed as they are; pretty mathematical formulae such as \(P_{\mathrm{Bi}}(X \le k; N, p)\) require explicit Latex commands, that is, P_{\mathrm{Bi}}\left(X \le k; N, p\right). Except for minimalistic use cases, these commands are tedious to type in and their source code is hard to read.

The present R package defines a set of rules for the automatic translation of R expressions to mathematical output in RMarkdown documents (Xie, Dervieux, and Riederer 2020) and ShinyApp webpages (Chang et al. 2022). The translation is done by an embedded Prolog interpreter that maps nested expressions recursively to MathML and LaTeX/MathJax, respectively. User-defined hooks enable to extend the set of rules, for example, to represent specific R elements by custom mathematical signs.

The main feature of the package is that the same R expressions and equations can be used for both mathematical typesetting and calculations. This saves time and reduces mistakes, as will be illustrated below.

The paper is organized as follows. Section 2 presents the technical background of the package, including the two main classes of Prolog rules for translating R objects to mathematical expressions. Section 3 illustrates the main features of the mathml package, potential issues and workarounds using examples from the day-to-day perspective of a user. Section 4 present a case study for the use of the package in a scientific report. Section 5 concludes with a discussion and ideas for further development.

Background

The translation of R expressions to mathematical output is achieved through a Prolog interpreter. Prolog is a classical logic programming language with many applications in expert systems, computer linguistics and symbolic artificial intelligence. The main strength of Prolog is its concise representation of facts and rules for the representation of knowledge and grammar, as well as its efficient built-in search engine for closed world domains. As it is well-known, R is a statistical programming language for data analysis and statistical modeling which is widely used in academia and industry. Besides the core library, a lot of packages have been developed for all kinds of statistical problems, including statistics-based artificial intelligence tools such as neural networks for machine learning and deep learning. Whereas Prolog is weak in statistical computation, but strong in symbolic manipulation, the converse may be said for the R language. The rolog package (Gondan 2022) bridges this gap by providing an interface to a SWI-Prolog distribution in R. The communication between the two systems is mainly in the form of queries from R to Prolog, but two predicates allow Prolog to ring back and evaluate terms in R.

For a first illustration of the mathml package, we consider the binomial probability.

term <- quote(pbinom(k, N, p))
term
## pbinom(k, N, p)

The term is quoted to avoid its immediate evaluation (which would raise an error anyway since the variables k, N, p have not yet been defined). Experienced readers will remember that the quoted expression above is a short form for

term <- call("pbinom", as.name("k"), as.name("N"), as.name("p"))

As is seen from the output above, the variable term is not assigned the result of the calculation, but an R “call” (see, e.g., Wickham 2019, for details on “non-standard evaluation”). This call can eventually be evaluated with eval(),

k <- 10
N <- 22
p <- 0.4
eval(term)
## [1] 0.77195

The R package mathml can now be used to render the call in MathML, that is the dialect for mathematical elements on HTML webpages or in MathJax/LaTeX, as shown below (some of the curly braces are not really needed in this simple example, but are necessary in edge cases).

library(mathml)
mathjax(term)
## [1] "${P}_{\\mathrm{Bi}}{\\left({{X}{\\le}{k}}{{;}{{N}{{,}{p}}}}\\right)}$"

We can include the output in a RMarkdown document by specifying results='asis' in the R code chunk, as is done in the next example. The R function mathout() is a wrapper that invokes mathml() for HTML output and mathjax() for LaTeX output.

mathout(term)

PBi(Xk;N,p)

At the Prolog end, a predicate math/2 translates the call pbinom(K, N, Pi) into a “function” fn/2 with the name P_Bi, one argument X =< K, and the two parameters N and Pi.

math(pbinom(K, N, Pi), M)
 => M = fn(subscript('P', "Bi"), (['X' =< K] ; [N, Pi])).

Thus, the predicate math/2 could be considered a “macro” that translates a mathematical element (here, pbinom(K, N, Pi)) to a different mathematical element, namely fn(Name, (Args ; Pars)). The low-level predicate ml/3 is used to convert these basic elements to MathML.

ml(Flags, fn(Name, (Args ; Pars)), M)
 => ml(Flags, Name, N),
    ml(Flags, paren(list(op(;), [list(op(','), Args), list(op(','), Pars)])), X),
    M = mrow([N, mo(&(af)), X]).

The relevant rule for ml/3 builds the MathML entity mrow([N, mo(&(af)), X]), with N representing the name of the function and X its arguments and parameters, enclosed in parentheses. A corresponding rule jax/3 does the same for MathJax/LaTeX. A list of flags can be used for context-sensitive translation (see, e.g., the errors below).

Package mathml in practice

mathml is an R package for pretty mathematical representation of R functions and objects in data analysis, scientific reports and interactive web content. The currently supported features are listed below, roughly following the order proposed by (Murrell and Ihaka 2000).

Basic elements

mathml handles the basic elements of everyday mathematical expressions, such as numbers, Latin and Greek letters, multi-letter identifiers, accents, subscripts, and superscripts.

term <- quote(1 + -2L + a + abc + "a" + phi + Phi + varphi + hat(b)[i, j]^2L)
mathout(term)

1.00+-2+a+abc+a+φ+Φ+ϕ+b^ij2

term <- quote(NaN + NA + TRUE + FALSE + Inf + (-Inf))
mathout(term)

nan+na+T+F++(-)

An expression such as 1 + -2 may be considered unsatisfactory from an aesthetical perspective. It is correct R syntax, though, and is reproduced accordingly, without the parentheses. Parentheses around negated numbers or symbols can be added as shown for -Inf.

Note that an R function hat() does not exist in base R, it is provided by the package for convenience and points to the identity function.

Operators and parentheses

Arithmetic operators and parentheses are translated as they are, as illustrated below.

term <- quote(a - ((b + c)) - d*e + f*(g + h) + i/j + k^(l + m) + (n*o)^{p + q})
mathout(term)

a-[(b+c)]-de+f(g+h)+i/j+k(l+m)+(no)p+q

term <- quote(dot(a, b) + frac(1L, nodot(c, d + e)) + dfrac(1L, times(g, h)))
mathout(term)

ab+1c(d+e)+1g×h

For multiplications involving only numbers and symbols, the multiplication sign is omitted. This heuristic does not always produce the desired result; therefore, mathml defines alternative multiplication functions dot(), nodot(), and times() that calculate the product and produce the respective multiplication signs. Similarly, frac() and dfrac() can be used for small and large fractions.

For standard operators with known precedence, mathml is generally able to detect if parentheses are needed; for example, parentheses are automatically placed around g + h in the nodot-example. However, we note unecessary parentheses around l + m above. Thes parentheses are a consequence of quote(a^(b + c)) actually producing a nested R call of the form '^'(a, (b + c)) instead of '^'(a, b + c):

term <- quote(a^(b + c))
paste(term)
## [1] "^"       "a"       "(b + c)"

For the present purpose, this feature is unfortunate because extra parentheses around b + c are not needed. The preferred result is obtained by using the functional form quote('^'(k, l + m)) of the power, or curly braces as a workaround (see p + q above).

Custom operators

Whereas in standard infix operators, the parentheses typically follow the rules for precedence, undesirable results may be obtained in custom operators.

term <- quote(mean(X) %+-% 1.96 * s / sqrt(N))
mathout(term)

(X¯±1.96)s/N

term <- quote('%+-%'(mean(X), 1.96 * s / sqrt(N)))
term <- quote(mean(X) %+-% {1.96 * s / sqrt(N)})   # the same
mathout(term)

X¯±1.96s/N

The example is a reminder that it is not possible to define the precedence of custom operators in R, and that expressions with such operators are evaluated strictly from left to right. As in the previous example, the solution is to either use the functional form of the operator or a curly brace to enforce the correct operator precedence.

More operators are shown in Table 1, including the suggestions by Murrell and Ihaka (2000) for graphical annotations and arrows in R figures.

Table 1. Custom operators in mathml
Operator Output Operator Output Operator Arrow
A %*% B A×B A != B AB A %<->% B A↔︎B
A %.% B AB A ~ B AB A %->% B AB
A %x% B AB A %~~% B AB A %<-% B AB
A %/% B A/B A %==% B AB A %up% B AB
A %% B mod(A,B) A %=~% B AB A %down% B AB
A & B AB A %prop% B AB A %<=>% B AB
A | B AB A %in% B AB A %=>% B AB
xor(A, B) AB intersect(A, B) AB A %<=% B AB
!A ¬A union(A, B) AB A %dblup% B AB
A == B A=B crossprod(A, B) AT×B A %dbldown% B AB
A <- B A=B is.null(A) A=

Builtin functions

There is support for most functions from package base, with adequate use and omission of parentheses.

term <- quote(sin(x) + sin(x)^2L + cos(pi/2L) + tan(2L*pi) * expm1(x))
mathout(term)

sinx+sin2x+cos(π/2)+tan(2π)(expx-1)

term <- quote(choose(N, k) + abs(x) + sqrt(x) + floor(x) + ceiling(x))
mathout(term)

(Nk)+|x|+x+x+x

A few more examples are shown in Table 2, including functions from stats.

Table 2. R functions from base and stats
Function Output Function Output
sin(x) sinx dbinom(k, N, pi) PBi(X=k;N,π)
cosh(x) coshx pbinom(k, N, pi) PBi(Xk;N,π)
tanpi(alpha) tan(απ) qbinom(p, N, pi) argmink[PBi(Xk;N,π)>p]
asinh(alpha) sinh-1α dpois(k, lambda) PPo(X=k;λ)
log(p) logp ppois(k, lambda) PPo(Xk;λ)
log1p(x) log(1+x) qpois(p, lambda) argmaxk[PPo(Xk;λ)>p]
logb(x, e) logex dexp(x, lambda) fExp(x;λ)
exp(x) expx pexp(x, lambda) FExp(x;λ)
expm1(x) expx-1 qexp(p, lambda) FExp-1(p;λ)
choose(n, k) (nk) dnorm(x, mu, sigma) φ(x;μ,σ)
lchoose(n, k) log(nk) pnorm(x, mu, sigma) Φ(x;μ,σ)
factorial(n) n! qnorm(alpha/2L) Φ-1(α/2)
lfactorial(n) logn! 1L - pchisq(x, 1L) 1-Fχ2(1df)(x)
sqrt(x) x qchisq(1L - alpha, 1L) Fχ2(1df)-1(1-α)
mean(X) X¯ pt(t, N - 1L) P(Tt;N-1df)
abs(x) |x| qt(alpha/2L, N - 1L) Tα/2(N-1df)

Custom functions

this part is incomprehensible for the naive reader

For self-written functions, matters are a bit more complicated. The name of a self-written function is not transparent to R, since only the function body is represented, as illustrated by the redefinition of sign below. The function body is generally a nested R “call” of the form '{'(L), with L being a list of commands. mathml provides limited support for typical control structures.

sgn <- function(x)
{
  if(x == 0L) return(0L)
  if(x < 0L) return(-1L)
  if(x > 0L) return(1L)
}

mathout(sgn)

{0,ifx=0-1,ifx<01,ifx>0

mathout(call("<-", quote(sgn(x)), sgn))

sgnx={0,ifx=0-1,ifx<01,ifx>0

As shown above, we can still display functions in the form head(x) = body if we embed the object to be shown into a call such as head() <- body.

Indices and powers

Indices in square brackets are rendered as subscripts, powers are rendered as superscript. Moreover, mathml defines the functions subscript(), superscript() and subsuperscript() that simply return their first argument. The other arguments serve as decorations, for example, for summation and product signs.

term <- quote(S[Y]^2L <- frac(1L, N) * sum(Y[i] - mean(Y))^2L)
mathout(term)

SY2=1N(Yi-Y¯)2

term <- quote(log(subsupscript(prod(L[i]), i==1L, N)) <- 
          subsupscript(sum(log(L[i])), i==1L, N))
mathout(term)

log(i=1NLi)=i=1NlogLi

Ringing back to R

R's integrate function takes a number of arguments, the most important ones being the function to integrate, and the lower and the upper bound of the integration.

term <- quote(integrate(sin, 0L, 2L*pi))
mathout(term)

02πsinxdx

eval(term)

2.221501e-16 with absolute error < 4.4e-14

For mathematical typesetting in the form of \(\int f(x)\, dx\), mathml needs to find out the name of the integration variable. For that purpose, the underlying Prolog bridge provides a predicate r_eval/2 that calls R from Prolog. In the example above, this predicate evaluates formalArgs(args(sin)), which returns the names of the arguments of sin, namely, x.

Note that in the example above, the quoted term is an abbreviation for call("integrate", quote(sin), ...), with sin being an R symbol, not a function. R’s integrate() can handle both. However, as already mentioned in the previous subsection, mathml would be unable to find out the function name, it needs the symbol.

Names and order of arguments

One of R’s great features is the possibility to refer to function arguments by their names, not only by their position in the list of arguments. At the other end, most Prolog-powered handlers for the R calls are rather rigid, for example, integrate/3 accepts exactly three arguments in a particular order and without names, that is, integrate(lower=0L, upper=2L*pi, sin), would not print the desired result.

To “canonicalize” function calls with named arguments and arguments in unusual order, mathml provides an auxilliary R function canonical(f, drop) that reorders the argument list of calls to known R functions and, if drop=TRUE (which is the default), also removes the names of the arguments.

term <- quote(integrate(lower=0L, upper=2L*pi, sin))
canonical(term)
## integrate(sin, 0L, 2L * pi)
mathout(canonical(term))

02πsinxdx

This function can be used to feed mixtures of partially named and positional arguments into the renderer. For details, see the R function match.call().

Matrices and Vectors

Of course, mathml also supports matrices and vectors.

v <- 1:3
mathout(call("t", v))

(123)T

A <- matrix(data=11:16, nrow=2, ncol=3)
B <- matrix(data=21:26, nrow=2, ncol=3)
term <- call("+", A, B)
mathout(term)

(111315121416)+(212325222426)

Note that the seemingly more convient term <- quote(A + B) yields \(A + B\) in the output—instead of the desired matrix representation.

Abbreviations

We consider the \(t\)-statistic for independent samples with equal variance. To avoid clutter in the equation, the pooled variance \(s^2_{\mathrm{pool}}\) is abbreviated, and a comment is given with the expression for \(s^2_{\mathrm{pool}}\). For this purpose, mathml provides a function denote(abbr, expr, info), with expr actually being evaluated, abbr being rendered, plus a comment of the form “with expr denoting info”. The hooks are described in the next section.

hook(m_A, mean(X)["A"]) ; hook(s2_A, s["A"]^2L) ;
hook(n_A, n["A"])
hook(m_B, mean(X)["B"]) ; hook(s2_B, s["B"]^2L)
hook(n_B, n["B"]) ; hook(s2_p, s["pool"]^2L)

term <- quote(t <- dfrac(m_A - m_B, 
    sqrt(denote(s2_p, frac((n_A - 1L)*s2_A + (n_B - 1L)*s2_B, n_A + n_B - 2L),
                "the pooled variance.") * (frac(1L, n_A) + frac(1L, n_B)))))
mathout(term)

t=X¯A-X¯Bspool2(1nA+1nB), with spool2=(nA-1)sA2+(nB-1)sB2nA+nB-2 denoting the pooled variance.

The term is evaluated below. print() is needed because the return value of the assignment in the term t <- dfrac(...) is invisible.

m_A <- 1.5; s2_A <- 2.4^2; n_A <- 27; m_B <- 3.9; s2_B <- 2.8^2; n_B <- 20
print(eval(term))
## [1] -3.157427

Context-dependent rendering

Consider an educational scenario in which we want to highlight a certain element of a term, for example, that a student has forgotten to subtract the null hypothesis in a \(t\)-ratio:

t <- quote(dfrac(omit_right(mean(D) - mu[0L]), s / sqrt(N)))
mathout(t, flags=list(error='highlight'))

D¯-μ0s/N

mathout(t, flags=list(error='fix'))

D¯-μ0s/N

The R function omit_right(a + b) uses non-standard evaluation techniques (e.g., Wickham 2019) to return only the left part an operation, and cancels the right part. This may not always be desired, for example, when illustrating how to fix the mistake.

For this purpose, the functions mathml() or mathjax() have an optional argument flags which is a list with named elements. In this example, we use this argument to tell mathml how to render such erroneous expressions using the flag error which can be asis, highlight, fix, or ignore. For more examples, see Table 3.

Table 3. Highlighting elements of a term
Operation error = asis highlight fix ignore
omit_left(a + b) b a+b a+b a+b
omit_right(a + b) a a+b a+b a+b
list(quote(a), quote(omit(b))) a ab ab ab
add_left(a + b) a+b a+b a+b b
add_right(a + b) a+b a+b a+b a
list(quote(a), quote(add(b))) ab ab ab a
instead(a, b) + c a+c ainstead ofb+c b+c b+c

Customizing the package

In typical R functions, variable names are typically longer than just single letters, which may yield unsatisfactory results in the mathematical output.

term <- quote(pbinom(successes, Ntotal, prob))
mathout(term)

PBi(Xsuccesses;Ntotal,prob)

hook(successes, k)
hook(quote(Ntotal), quote(N), quote=FALSE)
hook(prob, pi)
mathout(term)

PBi(Xk;N,π)

To improve the situation, mathml provides a simple hook that can be used to replace elements (e.g., verbose variable names) of the code by concise mathematical symbols, as illustrated in the example. To simplify notation, the quote flag of hook() defaults to TRUE, and hook() uses non-standard evaluation to unpack its arguments. If quote is FALSE, as shown above, the user has to provide the quoted expressions. Care should be taken to avoid recursive hooks such as hook(s, s["A"]) that endlessly replace the \(s\) from \(s_{\mathrm{A}}\) as in \(s_{\mathrm{A}_{\mathrm{A}_{\mathrm{A}\cdots}}}\).

Further customizations require the assertion of custom Prolog rules math/2, ml/3, jax/3, as shown in the introduction.

A case study

This case study describes a model by Schwarz (1994) from mathematical Psychology using the features of package mathml. Schwarz presents a new explanation of redundancy gains that occur when observers respond to stimuli of different sources, and the same information is presented on two or more channels. In Schwarz’ model, decision-making builds on a process of noisy accumulation of information over time (e.g., Ratcliff et al. 2016). In redundant stimuli, the model assumes a superposition of channel-specific diffusion processes that eventually reach an absorbing barrier to elicit the response. For a detailed description the reader may refer to the original article.

Schwarz’ (1994) model refers to two stimuli A and B, presented either alone or in combination (AB, redundant stimuli), with the redundant stimuli being presented either simultaneously or with onset asynchrony \(\tau\). The channel activation is described as a two-dimensional Wiener process with drifts \(\mu_i\), variances \(\sigma^2_i\), and initial conditions \(X_i(t = 0) = 0, i = \mathrm{A, B}\). The channels may be correlated with \(\rho_{\mathrm{AB}}\), but we assume \(\rho_{\mathrm{AB}} = 0\) for simplicity.

A response is elicited when the process reaches an absorbing barrier \(c > 0\) for the first time. In single-target trials, the first passages at \(c\) are expected at

ED_single <- function(c, mu)
  dfrac(c, mu)

# display as E(D; mu), c is a scaling parameter
hook(ED_single(.C, .Mu), E(`;`(D, .Mu)))
mathout(call("=", quote(ED_single(c, mu)), ED_single))

E(D;μ)=cμ

In redundant stimuli, the activation of the channel-specific diffusion processes adds up, \(X_{\mathrm{AB}}(t) = X_{\mathrm A}(t) + X_{\mathrm B}(t)\), hence the name, superposition. \(X_{\mathrm{AB}}(t)\) is again a Wiener process with drift \(\mu_{\mathrm A} + \mu_{\mathrm B}\) and variance \(\sigma^2_{\mathrm A} + \sigma^2_{\mathrm B}\). For the expected first-passage time, we have

hook(mu_A, mu["A"])
hook(mu_B, mu["B"])
hook(sigma_A, sigma["A"])
hook(sigma_B, sigma["B"])
hook(mu_M, mu["M"])
hook(M, overline(X))

mathout(call("=", quote(E(D["AB"])), quote(ED_single(c, mu_A + mu_B))))

E(DAB)=E(D;μA+μB)

For asynchronous stimuli, Schwarz [-Schwarz (1994); Eq. 10] derived the expected first-passage time as a function of the stimulus onset asyncrony \(\tau\),

ED_async <- function(tau, c, mu_A, sigma_A, mu_B)
{ dfrac(c, mu_A) + (dfrac(1L, mu_A) - dfrac(1L, mu_A + mu_B)) *
    ((mu_A*tau - c) * pnorm(dfrac(c - mu_A*tau, sqrt(sigma_A^2L*tau)))
      - (mu_A*tau + c) * exp(dfrac(2L*c*mu_A, sigma_A^2L))
        * pnorm(dfrac(-c - mu_A*tau, sqrt(sigma_A^2L*tau))))
}

hook(ED_async(.Tau, .C, .MA, .SA, .MB), E(`;`(D[.Tau], `,`(.MA, .SA, .MB))))
mathout(call("=", quote(ED_async(tau, c, mu_A, sigma_A, mu_B)), ED_async))

E(Dτ;μA,σA,μB)=cμA+(1μA-1μA+μB)[(μAτ-c)Φ(c-μAτσA2τ)-(μAτ+c)exp2cμAσA2Φ(-c-μAτσA2τ)]

For negative onset asynchrony (i.e., B before A), the parameters need to be switched.

ED <- function(tau, c, mu_A, sigma_A, mu_B, sigma_B)
{
  if(tau == Inf) return(ED_single(c, mu_A)) ;
  if(tau == -Inf) return(ED_single(c, mu_B)) ;
  if(tau == 0L) return(ED_single(c, mu_A + mu_B)) ;
  if(tau > 0L) return(ED_async(tau, c, mu_A, sigma_A, mu_B)) ;
  if(tau < 0L) return(ED_async(abs(tau), c, mu_B, sigma_B, mu_A))
}

hook(ED(.Tau, .C, .MA, .SA, .MB, .SB), E(`;`(D[.Tau], `,`(.MA, .SA, .MB, .SB))))
mathout(call("=", quote(ED(tau, c, mu_A, sigma_A, mu_B, sigma_B)), ED))

E(Dτ;μA,σA,μB,σB)={E(D;μA),ifτ=E(D;μB),ifτ=-E(D;μA+μB),ifτ=0E(Dτ;μA,σA,μB),ifτ>0E(D|τ|;μB,σB,μA),ifτ<0

The observable response time is assumed to be the sum of \(D\), the time employed to reach the threshold for the decision, and a residual \(M\) denoting other processes such a motor preparation and execution. Correspondingly, the expected response time amounts to

ET <- function(tau, c, mu_A, sigma_A, mu_B, sigma_B, mu_M)
  ED(tau, c, mu_A, sigma_A, mu_B, sigma_B) + mu_M

hook(ET(.Tau, .C, .MA, .SA, .MB, .SB, .MM), 
     E(`;`(T[.Tau], `,`(.MA, .SA, .MB, .SB, .MM))))
mathout(call("=", quote(E(T[tau])), ET))

E(Tτ)=E(Dτ;μA,σA,μB,σB)+μM

Schwarz (1994) applied the model to data from a redundant signals task with 13 onset asynchronies \(0, \pm33, \pm67, \pm100, \pm133, \pm167, \pm\infty\) ms, where \(0\) refers to the synchronous condition, and \(\pm\infty\) to the single-target presentations. Each condition was replicated 400 times. The observed mean response times and their standard deviations are given in Table 4 below.

Table 4. Miller data
\(\tau\) \(m\) \(s\) \(n\)
- 231 56 400
-167 234 58 400
-133 230 40 400
-100 227 40 400
-67 228 32 400
-33 221 28 400
0 217 28 400
33 238 28 400
67 263 26 400
100 277 30 400
133 298 32 400
167 316 34 400
348 92 400

Assuming that the model is correct, the observable mean reaction times follow an approximate Normal distribution around the model prediction \(E(T)\). We can, therefore, use a standard goodness-of-fit measure by \(z\)-standardisation.

z <- function(m, s, n, tau, c, mu_A, sigma_A, mu_B, sigma_B, mu_M)
  dfrac(m - denote(E, ET(tau, c, mu_A, sigma_A, mu_B, sigma_B, mu_M),
              "the expected mean response time"),
    s / sqrt(n))

mathout(call("=", quote(z[tau]), z))

zτ=m-Es/n, with E=E(Tτ;μA,σA,μB,σB,μM) denoting the expected mean response time

The overall goodness-of-fit is the sum of the squared \(z\)-statistics for each onset asynchrony. Assuming again that the architecture of the model is correct, but the parameters are adjusted to the data, it follows a \(\chi^2(8~\mathrm{df})\)-distribution.

zv <- Vectorize(z, vectorize.args = c('m', 's', 'n', 'tau'))

gof <- function(par, tau, m, s, n)
  sum(zv(m, s, n, tau, c=100L, mu_A=par["mu_A"], sigma_A=par["sigma_A"], 
         mu_B=par["mu_B"], sigma_B=par["sigma_B"], mu_M=par["mu_M"])^2L)

hook(zv(.M, .S, .N, .Tau, .C, .MA, .SA, .MB, .SB, .MM), z[.Tau])
mathout(call("=", quote(X["8 df"]^2L), gof))

X8 df2=zτ2

The degrees of freedom result from the 13 predicted mean response times, minus 5 free parameters \(\theta = \langle\mu_{\mathrm A}, \sigma_{\mathrm A}, \mu_{\mathrm B}, \sigma_{\mathrm B}, \mu_{\mathrm M}\rangle\) (\(c\) is a redundant scaling parameter).

θ^=argmingof(θ)

The best fitting parameter values and their confidence intervals are given in Table 5 below.

Table 5. Model fit
Parameter Estimate Lo Up
μA 0.53 0.51 0.56
σA 4.51 3.89 5.12
μB 1.38 1.24 1.51
σB 13.24 7.45 19.03
μM 161.73 157.54 165.91

The statistic goodness-of-fit value is \(X^2(8) = 30.54\), \(p = .0002\), indicating some lack of fit. Given the large trial numbers in the original study, some lack of fit is expected. For more detail, especially on fitting the observed standard deviations, the reader is referred to the original paper (Schwarz 1994).

Conclusions

This package expands the possibilities of R by allowing calculations and graphical rendering of the same terms. Building on current features of R and existing packages for displaying formulas in R (Murrell and Ihaka 2000; Allaire et al. 2018)), the mathml package bridges the gap between computational needs, presentation of results, and their reproducibility. Two possible modes for typesetting are available to export R Markdown documents in different formats or ShinyApp webpages: MathML and MathJax. A further, not-yet-mentioned, use of mathml is related to presentations made using RMarkdown. By adopting mathml researchers or teachers can use the RMarkdown to both conduct analysis and show results, smoothing the process and uniforming the graphical output.

The case study presented above shows that mathml can help to improve data analyses and statistical reports from an aesthetical perspective, as well as regarding reproducibility of research. Furthermore, the package may also allow for a better detection of possible mistakes in R programs.

Similar to most programming languages (Green 1977), R code is notoriously hard to read, and the poor legibility of the language is one of the main sources of mistakes. For illustration, we consider again Equation 10 in Schwarz (1994).

f1 <- function(tau)
{ dfrac(c, mu["A"]) + (dfrac(1L, mu["A"]) - dfrac(1L, mu["A"] + mu["B"]) * 
    ((mu["A"]*tau - c) * pnorm(dfrac(c - mu["A"]*tau, sqrt(sigma["A"]^2L*tau)))
      - (mu["A"]*tau + c) * exp(dfrac(2L*mu["A"]*tau, sigma["A"]^2L))
        * pnorm(dfrac(-c - mu["A"]*tau, sqrt(sigma["A"]^2L*tau)))))
}

mathout(f1)

cμA+{1μA-1μA+μB[(μAτ-c)Φ(c-μAτσA2τ)-(μAτ+c)exp2μAτσA2Φ(-c-μAτσA2τ)]}

The first version has a wrong parenthesis, which is barely visible in the code, whereas the wrong curly brace in the mathematical representation is immediately obvious. The correct version is shown below for comparison.

f2 <- function(tau)
{ dfrac(c, mu["A"]) + (dfrac(1L, mu["A"]) - dfrac(1L, mu["A"] + mu["B"])) * 
    ((mu["A"]*tau - c) * pnorm(dfrac(c - mu["A"]*tau, sqrt(sigma["A"]^2L*tau)))
      - (mu["A"]*tau + c) * exp(dfrac(2L*mu["A"]*tau, sigma["A"]^2L))
        * pnorm(dfrac(-c - mu["A"]*tau, sqrt(sigma["A"]^2L*tau))))
}

mathout(f2)

cμA+(1μA-1μA+μB)[(μAτ-c)Φ(c-μAτσA2τ)-(μAτ+c)exp2μAτσA2Φ(-c-μAτσA2τ)]

As the reader may know from own experience, missed parentheses are frequent causes of wrong results and errors that are hard to locate in programming code. This particular example shows that mathematical rendering can help to substantially reduce the amount of careless errors in programming.

In its current version mathml has some limitations. For example, it is currently not possible to use the functions of mathml for writing intratextual formulas, here, the user has to adopt the usual LateX notation. However, we do not consider this to be a serious limitation, since the most productive use of mathml still comes from more complicated calculations.

Also: In long equations, we did not yet find a convenient way to insert linebreaks. This is mostly because it is not properly supported by MathML and LaTeX renderers. In its current stage, the LaTeX package breqn (Robertson et al. 2021) is mostly a proof of concept.

mathml is available for R version (…) and later, and can be easily installed using the usual install.packages("mathml"). The source code of the package is found at https://github.com/mgondan/mathml.

References

Allaire, JJ, Rich Iannone, Alison Presmanes Hill, and Yihui Xie. 2018. “Distill for r Markdown.” https://rstudio.github.io/distill/.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2022. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Gondan, Matthias. 2022. Rolog: Query ’SWI’-’Prolog’ from r. https://github.com/mgondan/rolog.
Green, T. R. G. 1977. “Conditional Program Statements and Their Comprehensibility to Professional Programmers.” Journal of Occupational Psychology 50: 93–109.
Murrell, Paul, and Ross Ihaka. 2000. “An Approach to Providing Mathematical Annotation in Plots.” Journal of Computational and Graphical Statistics 9: 582–99. https://www.jstor.org/stable/1390947.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ratcliff, Roger, Philip L. Smith, Scott D. Brown, and Gail McKoon. 2016. “Diffusion Decision Model: Current Issues and History.” Trends in Cognitive Sciences 20: 260–81.
Robertson, Will, Joseph Wright, Frank Mittelbach, and Ulrike Fischer. 2021. Breqn: Automatic Line Breaking of Displayed Equations. https://www.ctan.org/pkg/breqn.
Sarkar, Deepayan, and Kurt Hornik. 2022. Enhancements to HTML Documentation. https://blog.r-project.org/2022/04/08/enhancements-to-html-documentation/index.html.
Schwarz, Wolf. 1994. “Diffusion, Superposition, and the Redundant-Targets Effect.” Journal of Mathematical Psychology 38: 504–20.
Viechtbauer, Wolfgang. 2022. Mathjaxr: Using ’Mathjax’ in Rd Files. https://CRAN.R-project.org/package=mathjaxr.
Wickham, H. 2019. Advanced R. Cambridge: Chapman; Hall/CRC.
Xie, Y., C. Dervieux, and E. Riederer. 2020. R Markdown Cookbook. Cambridge: Chapman; Hall/CRC.