In this vignette we explain how to use functions which compute asymptotic timings of different git versions of a package (useful for determining when a difference in performance started to happen).
atime_versions
functionIn this vignette we show you how to compare asymptotic timings of an R expression which uses different versions of a package. Let us begin by cloning the binsegRcpp package,
old.opt <- options(width=100)
pkg.path <- tempfile()
dir.create(pkg.path)
git2r::clone("https://github.com/tdhock/binsegRcpp", pkg.path)
#> cloning into 'C:\Users\th798\AppData\Local\Temp\RtmpwJth9N\file9546274db6'...
#> Receiving objects: 1% (13/1258), 63 kb
#> Receiving objects: 11% (139/1258), 63 kb
#> Receiving objects: 21% (265/1258), 127 kb
#> Receiving objects: 31% (390/1258), 127 kb
#> Receiving objects: 41% (516/1258), 183 kb
#> Receiving objects: 51% (642/1258), 183 kb
#> Receiving objects: 61% (768/1258), 183 kb
#> Receiving objects: 71% (894/1258), 239 kb
#> Receiving objects: 81% (1019/1258), 239 kb
#> Receiving objects: 91% (1145/1258), 239 kb
#> Receiving objects: 100% (1258/1258), 251 kb, done.
#> Local: master C:/Users/th798/AppData/Local/Temp/RtmpwJth9N/file9546274db6
#> Remote: master @ origin (https://github.com/tdhock/binsegRcpp)
#> Head: [977f385] 2022-08-24: rm rcppdeepstate yaml action
Next, to satisfy the CRAN requirement that we can not install packages to the default library, we must create a library under /tmp,
tmp.lib.path <- tempfile()
dir.create(tmp.lib.path)
lib.path.vec <- c(tmp.lib.path, .libPaths())
.libPaths(lib.path.vec)
Next, we define a helper function run.atime
that will run
atime_versions
, which is a simple way to compare different github
versions of a function:
run.atime.versions <- function(PKG.PATH, LIB.PATH){
if(!missing(LIB.PATH)).libPaths(LIB.PATH)
atime::atime_versions(
pkg.path=PKG.PATH,
N=2^seq(2, 20),
setup={
max.segs <- as.integer(N/2)
data.vec <- 1:N
},
expr=binsegRcpp::binseg_normal(data.vec, max.segs),
cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
"rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
"mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091")
}
Here is an explanation of the arguments specified above:
pkg.path
is the path to the github repository containing the R package,N
is a numeric vector of data sizes,setup
is an R expression which will be run to create data for each size N
,expr
is an R expression which will be timed for each package
version. Under the hood, a different R package is created for each
package version, with package names like Package.SHA,
binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e
. This expr
must contain double or triple colon package name prefix code, like
binsegRcpp::binseg_normal
above, which will be translated to
several different version-specific expressions, like
binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal
.Note that in your code you don't have to create a helper function like
run.atime.versions
in the code above. We do it in the package
vignette code, in order to run the different versions of the code
using callr::r
, in a separate R process. This allows us to avoid
CRAN warnings about unexpected files found in the package check
directory, by safely delete/remove the installed packages, after
having run the example code. For a more typical usage see
example(atime_versions, package="atime")
.
In the code block below we compute the timings,
atime.ver.list <- if(requireNamespace("callr")){
requireNamespace("atime")
callr::r(run.atime.versions, list(pkg.path, lib.path.vec))
}else{
run.atime.versions(pkg.path)
}
#> Loading required namespace: callr
names(atime.ver.list$measurements)
#> [1] "N" "expr.name" "min" "median" "itr/sec" "gc/sec" "n_itr" "n_gc"
#> [9] "result" "memory" "time" "gc" "kilobytes" "q25" "q75" "max"
#> [17] "mean" "sd"
atime.ver.list$measurements[, .(N, expr.name, min, median, max, kilobytes)]
#> N expr.name min median max kilobytes
#> <num> <char> <num> <num> <num> <num>
#> 1: 4 cv 0.0002704 0.00028050 0.0003828 9287.15625
#> 2: 4 rm unord map 0.0011429 0.00121880 0.0013492 2676.03906
#> 3: 4 mvl_construct 0.0007904 0.00088055 0.0046422 278.95312
#> 4: 8 cv 0.0002772 0.00028570 0.0003860 18.74219
#> 5: 8 rm unord map 0.0011047 0.00117040 0.0056885 70.00781
#> 6: 8 mvl_construct 0.0007495 0.00078750 0.0010035 67.48438
#> 7: 16 cv 0.0002791 0.00028660 0.0003750 18.92188
#> 8: 16 rm unord map 0.0011254 0.00117590 0.0014362 70.18750
#> 9: 16 mvl_construct 0.0007927 0.00082335 0.0010313 67.66406
#> 10: 32 cv 0.0002924 0.00030190 0.0003840 19.64062
#> 11: 32 rm unord map 0.0011198 0.00114925 0.0038412 72.82031
#> 12: 32 mvl_construct 0.0008042 0.00080810 0.0010458 70.29688
#> 13: 64 cv 0.0002989 0.00030825 0.0003990 25.38281
#> 14: 64 rm unord map 0.0011732 0.00126680 0.0020850 88.72656
#> 15: 64 mvl_construct 0.0008978 0.00093855 0.0035149 86.20312
#> 16: 128 cv 0.0003013 0.00033235 0.0004542 43.35156
#> 17: 128 rm unord map 0.0011266 0.00114870 0.0013636 120.67969
#> 18: 128 mvl_construct 0.0010265 0.00109995 0.0012513 117.26562
#> 19: 256 cv 0.0003379 0.00036305 0.0004166 64.85156
#> 20: 256 rm unord map 0.0011712 0.00122605 0.0015247 165.67969
#> 21: 256 mvl_construct 0.0013660 0.00148645 0.0017355 161.51562
#> 22: 512 cv 0.0004085 0.00043350 0.0007886 107.85156
#> 23: 512 rm unord map 0.0013379 0.00138370 0.0015692 255.67969
#> 24: 512 mvl_construct 0.0020461 0.00213430 0.0023014 250.01562
#> 25: 1024 cv 0.0004806 0.00052625 0.0007722 193.85156
#> 26: 1024 rm unord map 0.0015276 0.00159040 0.0048540 435.67969
#> 27: 1024 mvl_construct 0.0034157 0.00346345 0.0041241 427.01562
#> 28: 2048 cv 0.0007000 0.00074945 0.0008686 365.85156
#> 29: 2048 rm unord map 0.0020499 0.00221235 0.0024546 795.67969
#> 30: 2048 mvl_construct 0.0064843 0.00654775 0.0068192 781.01562
#> 31: 4096 cv 0.0011759 0.00121870 0.0013393 709.85156
#> 32: 4096 rm unord map 0.0037065 0.00405860 0.0055938 1515.67969
#> 33: 4096 mvl_construct 0.0138652 0.01413085 0.0169543 1489.01562
#> 34: 8192 cv 0.0020957 0.00246470 0.0026336 1397.85156
#> 35: 8192 rm unord map 0.0064265 0.00695385 0.0131263 2955.67969
#> 36: 16384 cv 0.0047889 0.00500260 0.0108923 2773.85156
#> 37: 16384 rm unord map 0.0113848 0.01187885 0.0173260 5835.67969
#> 38: 32768 cv 0.0096927 0.00994195 0.0148679 5525.85156
#> 39: 65536 cv 0.0178206 0.02179605 0.0313384 11032.03125
#> N expr.name min median max kilobytes
The result is a list with a measurements
data table that contains
measurements of time in seconds (min
, median
, max
) and memory
usage (kilobytes
) for every version (expr.name
) and data size
(N
). A more convenient version of the data for plotting can be
obtained via the code below:
best.ver.list <- atime::references_best(atime.ver.list)
names(best.ver.list$measurements)
#> [1] "unit" "N" "expr.name" "min" "median" "itr/sec" "gc/sec"
#> [8] "n_itr" "n_gc" "result" "memory" "time" "gc" "kilobytes"
#> [15] "q25" "q75" "max" "mean" "sd" "fun.name" "fun.latex"
#> [22] "expr.class" "expr.latex" "empirical"
best.ver.list$measurements[, .(N, expr.name, unit, empirical)]
#> N expr.name unit empirical
#> <num> <char> <char> <num>
#> 1: 4 cv kilobytes 9.287156e+03
#> 2: 8 cv kilobytes 1.874219e+01
#> 3: 16 cv kilobytes 1.892188e+01
#> 4: 32 cv kilobytes 1.964063e+01
#> 5: 64 cv kilobytes 2.538281e+01
#> 6: 128 cv kilobytes 4.335156e+01
#> 7: 256 cv kilobytes 6.485156e+01
#> 8: 512 cv kilobytes 1.078516e+02
#> 9: 1024 cv kilobytes 1.938516e+02
#> 10: 2048 cv kilobytes 3.658516e+02
#> 11: 4096 cv kilobytes 7.098516e+02
#> 12: 8192 cv kilobytes 1.397852e+03
#> 13: 16384 cv kilobytes 2.773852e+03
#> 14: 32768 cv kilobytes 5.525852e+03
#> 15: 65536 cv kilobytes 1.103203e+04
#> 16: 4 rm unord map kilobytes 2.676039e+03
#> 17: 8 rm unord map kilobytes 7.000781e+01
#> 18: 16 rm unord map kilobytes 7.018750e+01
#> 19: 32 rm unord map kilobytes 7.282031e+01
#> 20: 64 rm unord map kilobytes 8.872656e+01
#> 21: 128 rm unord map kilobytes 1.206797e+02
#> 22: 256 rm unord map kilobytes 1.656797e+02
#> 23: 512 rm unord map kilobytes 2.556797e+02
#> 24: 1024 rm unord map kilobytes 4.356797e+02
#> 25: 2048 rm unord map kilobytes 7.956797e+02
#> 26: 4096 rm unord map kilobytes 1.515680e+03
#> 27: 8192 rm unord map kilobytes 2.955680e+03
#> 28: 16384 rm unord map kilobytes 5.835680e+03
#> 29: 4 mvl_construct kilobytes 2.789531e+02
#> 30: 8 mvl_construct kilobytes 6.748437e+01
#> 31: 16 mvl_construct kilobytes 6.766406e+01
#> 32: 32 mvl_construct kilobytes 7.029687e+01
#> 33: 64 mvl_construct kilobytes 8.620313e+01
#> 34: 128 mvl_construct kilobytes 1.172656e+02
#> 35: 256 mvl_construct kilobytes 1.615156e+02
#> 36: 512 mvl_construct kilobytes 2.500156e+02
#> 37: 1024 mvl_construct kilobytes 4.270156e+02
#> 38: 2048 mvl_construct kilobytes 7.810156e+02
#> 39: 4096 mvl_construct kilobytes 1.489016e+03
#> 40: 4 cv seconds 2.805000e-04
#> 41: 8 cv seconds 2.857000e-04
#> 42: 16 cv seconds 2.866000e-04
#> 43: 32 cv seconds 3.019000e-04
#> 44: 64 cv seconds 3.082500e-04
#> 45: 128 cv seconds 3.323500e-04
#> 46: 256 cv seconds 3.630500e-04
#> 47: 512 cv seconds 4.335000e-04
#> 48: 1024 cv seconds 5.262500e-04
#> 49: 2048 cv seconds 7.494500e-04
#> 50: 4096 cv seconds 1.218700e-03
#> 51: 8192 cv seconds 2.464700e-03
#> 52: 16384 cv seconds 5.002600e-03
#> 53: 32768 cv seconds 9.941950e-03
#> 54: 65536 cv seconds 2.179605e-02
#> 55: 4 rm unord map seconds 1.218800e-03
#> 56: 8 rm unord map seconds 1.170400e-03
#> 57: 16 rm unord map seconds 1.175900e-03
#> 58: 32 rm unord map seconds 1.149250e-03
#> 59: 64 rm unord map seconds 1.266800e-03
#> 60: 128 rm unord map seconds 1.148700e-03
#> 61: 256 rm unord map seconds 1.226050e-03
#> 62: 512 rm unord map seconds 1.383700e-03
#> 63: 1024 rm unord map seconds 1.590400e-03
#> 64: 2048 rm unord map seconds 2.212350e-03
#> 65: 4096 rm unord map seconds 4.058600e-03
#> 66: 8192 rm unord map seconds 6.953850e-03
#> 67: 16384 rm unord map seconds 1.187885e-02
#> 68: 4 mvl_construct seconds 8.805500e-04
#> 69: 8 mvl_construct seconds 7.875000e-04
#> 70: 16 mvl_construct seconds 8.233500e-04
#> 71: 32 mvl_construct seconds 8.081000e-04
#> 72: 64 mvl_construct seconds 9.385500e-04
#> 73: 128 mvl_construct seconds 1.099950e-03
#> 74: 256 mvl_construct seconds 1.486450e-03
#> 75: 512 mvl_construct seconds 2.134300e-03
#> 76: 1024 mvl_construct seconds 3.463450e-03
#> 77: 2048 mvl_construct seconds 6.547750e-03
#> 78: 4096 mvl_construct seconds 1.413085e-02
#> N expr.name unit empirical
The data table above is a tall/long version of the same data, which can be plotted using the code below:
if(require(ggplot2)){
hline.df <- with(atime.ver.list, data.frame(seconds.limit, unit="seconds"))
gg <- ggplot()+
theme_bw()+
facet_grid(unit ~ ., scales="free")+
geom_hline(aes(
yintercept=seconds.limit),
color="grey",
data=hline.df)+
geom_line(aes(
N, empirical, color=expr.name),
data=best.ver.list$meas)+
geom_ribbon(aes(
N, ymin=min, ymax=max, fill=expr.name),
data=best.ver.list$meas[unit=="seconds"],
alpha=0.5)+
scale_x_log10()+
scale_y_log10("median line, min/max band")
if(require(directlabels)){
gg+
directlabels::geom_dl(aes(
N, empirical, color=expr.name, label=expr.name),
method="right.polygons",
data=best.ver.list$meas)+
theme(legend.position="none")+
coord_cartesian(xlim=c(1,2e7))
}else{
gg
}
}
atime_versions_exprs
with atime
What if you wanted to compare different versions of one R package, to another R package? Continuing the example above, we can get a list of expressions, each one for a different version of the package, via the code below:
(ver.list <- atime::atime_versions_exprs(
pkg.path=pkg.path,
expr=binsegRcpp::binseg_normal(data.vec, max.segs),
cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
"rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
"mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091"))
#> $cv
#> binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal(data.vec,
#> max.segs)
#>
#> $`rm unord map`
#> binsegRcpp.dcd0808f52b0b9858352106cc7852e36d7f5b15d::binseg_normal(data.vec,
#> max.segs)
#>
#> $mvl_construct
#> binsegRcpp.5942af606641428315b0e63c7da331c4cd44c091::binseg_normal(data.vec,
#> max.segs)
The ver.list
created above can be augmented with other expressions,
such as the following alternative implementation of binary
segmentation from the changepoint package,
expr.list <- c(ver.list, if(requireNamespace("changepoint")){
list(changepoint=substitute(changepoint::cpt.mean(
data.vec, penalty="Manual", pen.value=0, method="BinSeg",
Q=max.segs-1)))
})
The expr.list
created above can be provided as an argument to the
atime
function as in the code below,
run.atime <- function(ELIST, LIB.PATH){
if(!missing(LIB.PATH)).libPaths(LIB.PATH)
atime::atime(
N=2^seq(2, 20),
setup={
max.segs <- as.integer(N/2)
data.vec <- 1:N
},
expr.list=ELIST)
}
atime.list <- if(requireNamespace("callr")){
requireNamespace("atime")
callr::r(run.atime, list(expr.list, lib.path.vec))
}else{
run.atime(expr.list)
}
Again note in the code above that we defined a helper function,
run.atime
, and used callr::r
, to avoid CRAN issues. For a more
typical usage see example(atime_versions_exprs, package="atime")
.
atime.list$measurements[, .(N, expr.name, median, kilobytes)]
#> N expr.name median kilobytes
#> <num> <char> <num> <num>
#> 1: 4 cv 0.00027335 9319.429688
#> 2: 4 rm unord map 0.00113670 2783.750000
#> 3: 4 mvl_construct 0.00080920 278.765625
#> 4: 4 changepoint 0.00038040 6463.679688
#> 5: 8 cv 0.00053845 18.742188
#> 6: 8 rm unord map 0.00116285 70.007812
#> 7: 8 mvl_construct 0.00088920 67.484375
#> 8: 8 changepoint 0.00044910 33.375000
#> 9: 16 cv 0.00031435 18.921875
#> 10: 16 rm unord map 0.00109740 70.187500
#> 11: 16 mvl_construct 0.00079900 67.664062
#> 12: 16 changepoint 0.00054340 3.523438
#> 13: 32 cv 0.00033250 19.640625
#> 14: 32 rm unord map 0.00120460 72.820312
#> 15: 32 mvl_construct 0.00091615 70.296875
#> 16: 32 changepoint 0.00085755 10.750000
#> 17: 64 cv 0.00029965 25.382812
#> 18: 64 rm unord map 0.00193415 88.726562
#> 19: 64 mvl_construct 0.00091755 86.203125
#> 20: 64 changepoint 0.00074250 52.734375
#> 21: 128 cv 0.00058440 43.351562
#> 22: 128 rm unord map 0.00127300 120.679688
#> 23: 128 mvl_construct 0.00122245 117.265625
#> 24: 128 changepoint 0.00113980 195.000000
#> 25: 256 cv 0.00034975 64.851562
#> 26: 256 rm unord map 0.00122055 165.679688
#> 27: 256 mvl_construct 0.00141035 161.515625
#> 28: 256 changepoint 0.00218125 704.359375
#> 29: 512 cv 0.00040850 107.851562
#> 30: 512 rm unord map 0.00135290 255.679688
#> 31: 512 mvl_construct 0.00225600 250.015625
#> 32: 512 changepoint 0.00668260 2633.656250
#> 33: 1024 cv 0.00051640 193.851562
#> 34: 1024 rm unord map 0.00164260 435.679688
#> 35: 1024 mvl_construct 0.00364840 429.195312
#> 36: 1024 changepoint 0.03029920 10144.984375
#> 37: 2048 cv 0.00070935 365.851562
#> 38: 2048 rm unord map 0.00211185 795.679688
#> 39: 2048 mvl_construct 0.00668765 781.015625
#> 40: 4096 cv 0.00134405 709.851562
#> 41: 4096 rm unord map 0.00402655 1515.679688
#> 42: 4096 mvl_construct 0.01358965 1489.015625
#> 43: 8192 cv 0.00266665 1397.851562
#> 44: 8192 rm unord map 0.00686065 2955.679688
#> 45: 16384 cv 0.00522300 2773.851562
#> 46: 16384 rm unord map 0.01224850 5835.679688
#> 47: 32768 cv 0.01022960 5533.359375
#> N expr.name median kilobytes
The results above show that timings were computed for the three different versions of the binsegRcpp code, along with the changepoint code. These data can be plotted via the default method as in the code below,
refs.best <- atime::references_best(atime.list)
plot(refs.best)
Below we remove the installed packages, in order to avoid CRAN warnings:
atime::atime_versions_remove("binsegRcpp")
#> [1] 0
options(old.opt)