case_sub_criteria
and
recurrence_sub_criteria
in episodes()
led to
incorrect results. Resolved.merge_ids()
- shrink
and
expand
.plot
.format
.true()
. Predefined logical test for use
with sub_criteria()
.false()
. Predefined logical test for use
with sub_criteria()
.links()
- batched
. Specify
if all record pairs are created or compared at once ("no"
)
or in batches ("yes"
).links()
- repeats_allowed
.
Specify if record-pairs with duplicate elements should be created.links()
-
permutations_allowed
. Specify if permutations of the same
record-pair should be created.links()
-
ignore_same_source
. Specify if record-pairs from different
datasets should be created.
eval_sub_criteria()
-
depth
. First order of recursion.sets()
and make_sets()
.
Create permutations of record-sets.links()
- When shrink
is
TRUE
, records in a record-group must meet every listed
match criteria
and sub_criteria
. For example,
if pid_cri
is 3, then the record must have meet matched
another on the the first three match criteria.links()
- pid@iteration
now tracks when a
record was dealt with instead of when it was assigned to a record-group.
For example, a record can be closed (matched or not matched) at
iteration 1 but assigned to a record-group at iteration 5.make_pairs()
- x.*
and y.*
values in the output are now swapped.sub_criteria
can now export any data created by
match_func
. To do this, match_func
must export
a list
, where the first element is a logical object. See an
example below.library(diyar)
<- rep(month.abb[1:5], 2); val
val #> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
<- function(x, y){
match_and_export <- list(x == y,
output data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}.1 <- sub_criteria(
sub.crimatch_funcs = list(match.export = match_and_export)
val,
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSE
links
can now export any data created within a
sub_criteria
. To do this, the sub_criteria
must be created as described above. See an example below<- 1:5
val <- function(x, y){
diff_one_and_export <- x - y
diff <- diff <= 1
is_match <- list(is_match,
output data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}.2 <- sub_criteria(
sub.crimatch_funcs = list(diff.export = diff_one_and_export)
val,
)links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 5 1 4 FALSE
#> 2 4 1 3 FALSE
#> 3 3 1 2 FALSE
#> 4 2 1 1 TRUE
#> 5 1 1 0 TRUE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 5 3 2 FALSE
#> 2 4 3 1 TRUE
#> 3 3 3 0 TRUE
#>
#>
#> $export$cri.1$iteration.3
#> $export$cri.1$iteration.3$mf.0.1
#> x_val y_val diff is_match
#> 1 5 5 0 TRUE
summary.epid()
- Incorrect count for
‘by episode type
’. Resolved.episodes()
- Incorrect results in some instances with
skip_order
. Resolved.make_ids()
- Did not capture all records in that should
be in a record-group when matches are recursive. Resolved.make_pairs()
- Incorrect record-pairs in some
instances. Resolved.eval_sub_criteria()
- When output of
match_func
is length one, it’s not recycled. Resolved.reverse_number_line()
- Incorrect results in some
instances. Resolved.links()
- Incorrect iteration
(pids
slot) for non-matches. Resolved.links()
and episodes()
- Timing for each
iteration was incorrect. Resolved.overlap_method_names()
. Overlap methods
for a corresponding overlap method codes.*with_report
options for
display."chain"
overlap method split into
"x_chain_y"
and "y_chain_x"
.
"chain"
will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x"
method"across"
overlap method split into
"x_across_y"
and "y_across_x"
.
"across"
will continue to be supported as a keyword for
"x_across_y" OR "y_across_x"
methods"inbetween"
overlap method split into
"x_inbetween_y"
and "y_inbetween_x"
.
"inbetween"
will continue to be supported as a keyword for
"x_inbetween_y" OR "y_inbetween_x"
methodsoverlaps()
.overlap_method_names()
.make_batch_pairs()
(internal) created invalid record
pairs. Resolved.reframe()
. Modify the attributes of a
sub_criteria
object.link_records()
. Record linkage by
creating all record pairs as opposed to batches as with
link()
.make_pairs()
. Create every combination
of records-pairs for a given dataset.make_pairs_wf_source()
. Create
records-pairs from different sources only.make_ids()
. Convert an edge list to a
group identifier.merge_ids()
. Merge two group
identifiers.attrs()
. Pass a set of attributes to one
instance of match_funcs
or equal_funcs
.episodes_wf_splits()
episodes()
and links()
. Reduced
processing times.display
argument.
"progress_with_report"
, "stats_with_report"
and "none_with_report"
. Creates a d_report
; a
status of the analysis over its run time.eval_sub_criteria()
. Record-pairs are no longer created
in the function. Therefore, index_record
and
sn
arguments have been replaced with x_pos
and
y_pos
.link_records()
and
links_wf_probabilistic()
. The cmp_threshold
argument has been renamed to attr_threshold
.show_labels
argument in schema()
. Two new
options - "wind_nm"
and "length"
to replace
"length_label"
.wind_id
list in
episodes(..., data_link = "XX")
in . Resolved.link_id
in
links(..., recursive = TRUE)
. Resolved.iteration
not recorded in some situations with
episodes()
. Resolved.skip_order
ends an open episode. Resolved.NA
in dist_wind_index
and
dist_epid_index
when sn
is supplied.
Resolved.overlap_method_codes()
- overlap method codes not
recycled properly. Resolved.delink()
. Unlink identifiers.episodes_wf_splits()
. Wrapper function
of episodes()
. Better optimised for handling datasets with
many duplicate records.combi()
. Numeric codes for unique
combination of vectors.attr_eval()
. Recursive evaluation of a
function on each attribute of a sub_criteria
.case_nm
values - Case_CR
and
Recurrence_CR
which are Case
and
Recurrence
without a sub-criteria match.schema.epid
.eval_sub_criteria
with 1
result.links_wf_probabilistic()
. Probabilistic
record linkage.partitions()
. Spilt events into sections
in time.schema()
. Plot schema diagrams for
pid
, epid
, pane
and
number_line
objects.encode()
and decode()
.
Encode and decode slots values to minimise memory usage.episodes()
-
case_sub_criteria
and recurrence_sub_criteria
.
Additional matching conditions for temporal links.episodes()
-
case_length_total
and recurrence_length_total
.
Number of temporal links required for a
window
/episode
.links()
- recursive
.
Control if matches can spawn new matches.links()
-
check_duplicates
. Control the checking of logical tests on
duplicate values. If FALSE
, results are recycled for the
duplicates.as.data.frame
and as.list
S3 methods for
the pid
, number_line
, epid
,
pane
objects.episode_type
in episodes()
- “recursive”. For recursive episodes where every linked events can be
used as a subsequent index event.recurrence_from_last
renamed to
reference_event
and given two new options.episodes()
and links()
. Speed
improvements.epid_interval
or
pane_interval
with POSIXct
objects is now
“GMT”.number_line_sequence()
- splits number_line objects.
Also available as a seq
method.epid_total
, pid_total
and
pane_total
slots are populated by default. No need to used
group_stats
to get these.to_df()
- Removed. Use as.data.frame()
instead.to_s4()
- Now an internal function. It’s no longer
exported.compress_number_line()
- Now an internal function. It’s
no longer exported. Use episodes()
instead.sub_criteria()
- produces a sub_criteria
object. Nested “AND” and “OR” conditions are now possible.case_overlap_methods
,
recurrence_overlap_methods
and overlap_methods
now take integer
codes for different combinations of
overlap methods. See overlap_methods$options
for the full
list. character
inputs are still supported."Single-record"
was wrong in links
summary
output. Resolved.Inf
in number_line
objects.case_length
or
recurrence_length
for the same event.
overlap_methods
for the
corresponding case_length
and
recurrence_length
.links()
to replace
record_group()
.sub_criteria()
. The new way of supplying a
sub_criteria
in links()
.exact_match()
, range_match()
and range_match_legacy()
. Predefined logical tests for use
with sub_criteria()
. User-defined tests can also be used.
See ?sub_criteria
.custom_sort()
for nested sorting.epid_lengths()
to show the required
case_length
or recurrence_length
for an
analyses. Useful in confirming the required case_length
or
recurrence_length
for episode tracking.epid_windows()
. Shows the period a
date
will overlap with given a particular
case_length
or recurrence_length
. Useful in
confirming the required case_length
or
recurrence_length
for episode tracking.strata
in links()
. Useful
for stratified data linkage. As in stratified episode tracking, a record
with a missing strata
(NA_character_
) is
skipped from data linkage.data_links
in links()
.
Unlink record groups that do not include records from certain data
sourceslistr()
. Format atomic
vectors as a
written list.combns()
. An extension of combn
to
generate permutations not ordinarily captured by
combn
.iteration
slot for pid
and
epid
objectsoverlap_method
- reverse()
number_line()
- l
and r
must
have the same length or be 1
.episodes()
- case_nm
differentiates
between duplicates of "Case"
("Duplicate_C"
)
and "Recurrent"
events ("Duplicate_R"
).episodes()
.
"Case"
).
episode_type
- simultaneously track both
"fixed"
and "rolling"
episodes.skip_if_b4_lengths
- simultaneously track episodes
where events before a cut-off range are both skipped and not
skipped.episode_unit
- simultaneously track episodes by
different units of time.case_for_recurrence
- simultaneously track
"rolling"
episodes with and without an additional case
window for recurrent events.recurrence_from_last
- simultaneously track
"rolling"
episodes with reference windows calculated from
the first and last event of the previous window.strata
. Options must be the
same in each strata.
from_last
- simultaneously track episodes in both
directions of time - past to present and present to past.episodes_max
- simultaneously track different number of
episodes within the dataset.include_overlap_method
- "overlap"
and
"none"
will not be combined with other methods.
"overlap"
- mutually inclusive with the other methods,
so their inclusion is not necessary."none"
- mutually exclusive and prioritised over the
other methods (including "none"
), so their inclusion is not
necessary.NA_real_
)
or periods (number_line(NA_real_, NA_real_)
)
case_length
and recurrence_length
. This
ensures that the event does not become an index case however, it can
still be part of different episode. For reference, an event with a
missing strata
(NA_character_
) ensures that
the event does not become an index case nor part of any episode.fixed_episodes
, rolling_episodes
and
episode_group
- include_index_period
didn’t
work in certain situations. Corrected.fixed_episodes
, rolling_episodes
and
episode_group
- dist_from_wind
was wrong in
certain situations. Corrected.record_group()
- strata
.
Perform record linkage separately within subsets of a dataset.overlap()
,
compress_number_line()
, fixed_sepisodes()
,
rolling_episodes()
and episode_group()
-
overlap_methods
and methods
. Replaces
overlap_method
and method
respectively. Use
different sets of methods within the same dataset when grouping episodes
or collapsing number_line
objects.
overlap_method
and method
only permits 1
method per per dataset.epid
objects - win_nm
. Shows
the type of window each event belongs to i.e. case or recurrence
windowepid
objects - win_id
. Unique
ID for each window. The ID is the sn
of the reference event
for each window
epid
objects updated to reflect thisepid
objects - dist_from_wind
.
Shows the duration of each event from its window’s reference eventepid
objects - dist_from_epid
.
Shows the duration of each event from its episode’s reference eventepisode_group()
and
rolling_episodes()
- recurrence_from_last
.
Determine if reference events should be the first or last event from the
previous window.episode_group()
and
rolling_episodes()
- case_for_recurrence
.
Determine if recurrent events should have their own case windows or
not.episode_group()
,
fixed_episodes()
and rolling_episodes()
-
data_links
. Unlink episodes that do not include records
from certain data_source(s)
.episode_group()
, fixed_episodes()
and
rolling_episodes()
- case_length
and
recurrence_length
arguments. You can now use a range
(number_line
object).episode_group()
,
fixed_episodes()
and rolling_episodes()
-
include_index_period
. If TRUE
, overlaps with
the index event or period are grouped together even if they are outside
the cut-off range (case_length
or
recurrence_length
).pid
objects - link_id
. Shows
the record (sn
slot) to which every record in the dataset
has matched to.invert_number_line()
. Invert the
left
and/or right
points to the opposite end
of the number lineleft_point(x)<-
,
right_point(x)<-
, start_point(x)<-
and
end_point(x)<-
overlap()
renamed to overlaps()
.
overlap()
is now a convenience overlap_method
to capture ANY kind of overlap."none"
is another convenience
overlap_method
for NO kind of overlapexpand_number_line()
- new options for
point
; "left"
and "right"
compress_number_line()
- compressed
number_line
object inherits the direction of the widest
number_line
among overlapping group of
number_line
objectsoverlap_methods
- have been changed such that each pair
of number_line
objects can only overlap in one way. E.g.
"chain"
and "aligns_end"
used to be
possible but this is now considered a "chain"
overlap
only"aligns_start"
and "aligns_end"
use to be
possible but this is now considered an "exact"
overlapnumber_line_sequence()
- Output is now a
list
.number_line_sequence()
- now works across multiple
number_line
objects.to_df()
- can now change number_line
objects to data.frames.
to_s4()
can do the reverse.epid
objects are the default outputs for
fixed_episodes()
, rolling_episodes()
and
episode_group()
pid
objects are the default outputs for
record_group()
case_nm
for events that were
skipped due to rolls_max
or episodes_max
is
now "Skipped"
.episode_group()
and record_group()
,
sn
can be negative numbers but must still be uniqueepisode_group()
and
record_group()
. Runs just a little bit faster …x
and y
to
have the same lengths in overlap functions.
episode_group
- case_length
and
recurrence_length
arguments. Now accepts negative numbers.
end_point()
of the first
period.
number_line_width()
, both will be collapsed if the second
one is within some days (or any other episode_unit
) before
the start_point()
of the first period.case_nm
wasn’t right for rolling episodes.
Resolvedepisode_group()
, fixed_episodes()
and
rolling_episodes()
- optimized to take less time when
working with large datasetsepisode_group()
, fixed_episodes()
and
rolling_episodes()
- date
argument now
supports numeric valuescompress_number_line()
- the output (gid
slot) is now a group identifier just like in epid
objects
(epid_interval
)pid
S4 object class for results of
record_group()
. This will replace the current default
(data.frame
) in the next major releaseepid
S4 object class for results of
episode_group()
, fixed_episodes()
and
rolling_episodes()
. This will replace the current default
(data.frame
) in the next releaseto_s4()
and to_s4
argument in
record_group()
, episode_group()
,
fixed_episodes()
and rolling_episodes()
.
Changes their output from a data.frame
(current default) to
epid
or pid
objectsto_df()
changes epid
or pid
objects to a data.frame
deduplicate
argument from fixed_episodes()
and rolling_episodes()
added to
episode_group()
fixed_episodes()
and rolling_episodes()
are now wrapper functions of episode_group()
. Functionality
remains the same but now includes all arguments available to
episode_group()
fixed_episodes()
and
rolling_episodes()
from number_line
to
data.frame
, pending the change to epid
objectspid_cri
column returned in record_group
is
now numeric
. 0
indicates no match.criteria
multiple times
record_group()
number_line
objects can now be used as a
criteria
in record_group()
episode_unit
in
episode_group()
bi_direction
in
episode_group()
fixed_episodes()
and rolling_episodes()
-
Group records into fixed or rolling episodes of events or period of
events.episode_group()
- A more comprehensive implementation
of fixed_episodes()
and rolling_episodes()
,
with additional features such as user defined case assignment.record_group()
- Multistage deterministic linkage that
addresses missing data.number_line
S4 object.
record_group()
fixed_episodes()
, rolling_episodes()
and
episode_group()
fixed_episodes()
and
rolling_episodes()