library(dataset)
In this case, we will create the dataset from a file. We add the
mandatory Title
attribute to the dataset, and a mandatory
data frame identifier dataset_id
. Because the
iris
dataset does not have a clear identifier, the
dataset()
constructor will use the row names as unique
observation identifiers. We do not define a dimension
column, and apart from the Species
attribute, we define all
variables as measurements
.
In a tidy dataset, we only apply one unit of measure. In this case
this is millimeters. We add this as an optional parameter to the
dataset
.
The .f
parameter names the function that will read in
the dataset. Any function can be used, but it must be installed on the
user’s computer. The ...
ellipsis must contain all
parameters that are necessary to make the call to the file reading
function successful. In this case, we will use a singe parameter, the
path to the file.
<- file.path(tempdir(), "iris.csv")
temp_path write.csv(iris, file = temp_path, row.names = F)
file.exists(temp_path)
<- read_dataset(
iris_ds dataset_id = "iris_dataset",
obs_id = NULL,
dimensions = NULL,
measures = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
attributes = "Species",
Title = "Iris Dataset",
unit = list(code = "MM", label = "millimeter"),
.f = "utils::read.csv", file = temp_path )
That attributes of the dataset contain the creation date and the function call that created the dataset. The related items contain the bibliographic reference to the creator function. Because we used a local file, the entire path is recorded for reproducibility. A similar function will be defined for internet URLs.
print(attributes(iris_ds))
Let’s apply a very simple function (method) on this dataset, the summary function (in this case, the summary method of the base R data.frame class.)
<- use_function(dataset = iris_ds, .f = "summary") iris_summary
The new dataset, iris_summary
shows that it was created
by a read.csv
call and them the summary
method
created its current form and contents.
::kable(attr(iris_summary, "Date")) knitr
The dataset
class maintains the entire processing
history and all the bibliographic references of the functions that
contributed to its creation. [There version will be recorded, too.] This
creates the full reproducibility of the dataset
class.