Editing II: Using TidyRez
edit_tidyRez.Rmd
Getting started
This tutorial is about TidyRez, the subset of functions derived from
dplyr
in Tidyverse. We will use the file saved at the end
of vignette("edit_easyEdit")
. As always, you don’t have to
have read that tutorial beforehand, though it may be helpful if you are
new to rezonateR.
library(rezonateR)
path = system.file("extdata", "rez007_edit2.Rdata", package = "rezonateR", mustWork = T)
rez007 = rez_load(path)
#> Loading rezrObj ...
Unlike most other tutorials, this one will be very brief, because it
assumes that you have knowledge of dplyr
, the R package
containing functions like dplyr::mutate()
and
dplyr::select()
. If you are not familiar with
dplyr
beforehand, I suggest using a tutorial like this
first, and then coming back to this page.
Why TidyRez?
In general, TidyRez functions are called by adding rez_
in front of a dplyr
function name, such a
rez_group_by()
or rez_mutate()
. You might
wonder why you’d want to use TidyRez instead of plain
dplyr
. The main reason is that TidyRez functions allow you
to keep and/or update your field access values, inNodeMap
values, and updateFunctions
. Using base R or classic
dplyr
functions with rezrDF
s will result in
reload()
fails (unless you add those attributes back
yourself).
Thus, TidyRez functions that add or change columns, such as
rez_mutate()
or rez_left_join()
. will give you
the option to change the field access value of that field through the
fieldaccess
field. updateFunction
s are
automatically added if you choose auto
or
foreign
. Other TidyRez functions do not differ
substantially from classic dplyr
; they only allow you to
keep your field access labels, updateFunction
s, and
inNodeMap
values.
To see the power of TidyRez, let’s try creating an emancipated
rezrDF
with only a subset of the original columns with
rez_select()
. Here, we take trackDF$refexpr
,
the table of referential expressions. We then damage one of the
auto
fields using a classic dplyr
function. As
you can see here, the emancipated rezrDF
can still be
updated using the rezrObj
, effectively overriding the
damage:
refTable = rez007$trackDF$default %>% rez_select(id, token, chain, name, text, tokenOrderLast)
print("Before:")
#> [1] "Before:"
head(refTable %>% select(id, tokenOrderLast))
#> # A tibble: 6 × 2
#> id tokenOrderLast
#> <chr> <dbl>
#> 1 1096E4AFFFE65 1
#> 2 92F20ACA5F06 3
#> 3 7E5BB65072C 8
#> 4 1F74D2B049FA4 9
#> 5 2485C4F740FC0 2
#> 6 1BF2260B4AB78 5
refTable = refTable %>% mutate(tokenSeqLast = 1) #Damage refTable with a classic dplyr function
print("After:")
#> [1] "After:"
refTable = refTable %>% reload(rez007)
head(refTable %>% select(id, tokenOrderLast))
#> # A tibble: 6 × 2
#> id tokenOrderLast
#> <chr> <dbl>
#> 1 1096E4AFFFE65 1
#> 2 92F20ACA5F06 3
#> 3 7E5BB65072C 8
#> 4 1F74D2B049FA4 9
#> 5 2485C4F740FC0 2
#> 6 1BF2260B4AB78 5
A warning is in order: TidyRez only updates the current
table. If other tables have references to the table you’re editing,
they will not be updated. You must bear this in mind when using
rez_select()
and rez_rename()
. No problems
will arise if you use these functions on emancipated rezrDFs. However,
if you use these functions on rezrDFs within rezrObj
s, you
should manually update any fields in other rezrDF
s that
refer to the field you’ve deleted or added. I plan to add a rename
feature to EasyEdit in the near future that will update references from
other rezrDFs.
Another implication is that if you use rez_mutate()
and
create an auto field, any references to other tables will not work.
Thus, the EasyEdit function addFieldForeign()
should be
used instead.
What functions are available?
A few dplyr functions are completely safe to use in rezonateR, mostly
those that focus on selecting rows of a table, such as
dplyr::filter()
, dplyr::arrange()
or
dplyr::slice()
. Currently implemented TidyRez functions
include:
rez_add_row()
for adding new entries (not recommended;addRow()
is better)rez_mutate()
for adding and editing columnsrez_rename()
for renaming columnsrez_bind_rows()
for combining rezrDFs verticallyrez_group_split()
for splitting rezrDFs verticallyrez_group_by()
andrez_ungroup()
for groupingrez_select()
for selecting certain columns inside a rezrDFrez_left_join()
for left joins
Potential future additions include rez_bind_cols()
and
rez_outer_join()
; suggestions for others are welcome if you
have a use for them. The functions rez_dfop()
and
rez_validate_fieldchange()
are used behind the scenes by
TidyRez functions; if you want to create your own, please look through
the documentation for these functions (and sent in a pull request when
you’re done with it!).
rez_left_join()
: A special case
Most of the TidyRez functions’ syntax deviate from dplyr
only minimally in ways that you can read about in the documentation.
However, rez_left_join()
is a notable exception.
Firstly, by default, if no suffix is specified, the suffixes are
c("", "_lower")
. That is, if you are joining two
data.frames, both with a column called name
, then the left
data.frame’s column will still be called
name
’in the new data.frame but the right data.frame's column will get called
name_lower. Because
_lower`
is not a very informative name, it’s best to supply your own
suffixes.
In addition to a fieldaccess
field, as we’ve mentioned
before, you will also need:
-
rezrObj
which is self-explanatory -
fkey
- the name of the field in the first rezrDF that corresponds to IDs of the second rezrDF. If this is not specified, it will be guessed from theby
argument ofdplyr::left_join()
. -
df2key
- the name of the field in the second rezrDF that corresponds tofkey
. If this is not specified, it will be guessed from theby
argument ofdplyr::left_join()
. - An
df2Address
field a string that tellsrez_left_join()
how to find the source rezrDF from the rezrObj next time.
Addresses are mostly used by rezonateR
under the hood,
but in the case of rez_left_join()
, you do need to be able
to use it. If the source rezrDF
doesn’t belong to a layer,
e.g. tokenDF
, then the DF name is the address. If the
source rezrDF
belongs to a layer, put a ‘/’ between the
table and the layer, e.g. 'trackDF/refexpr'
. Don’t forget
that default layers are also layers!
As an example, we’ll take averageWordLength
in the
unitDF
, which we added in
vignette("edit_easyEdit")
, and add it back to
tokenDF
, so that when we look at each word, we know the
average length of the word in its units. Here’s how:
rez007$tokenDF = rez007$tokenDF %>% rez_left_join(
y = rez007$unitDF %>% rez_select(id, averageWordLength),
by = c("unit" = "id"),
fieldaccess = "foreign",
df2Address = "unitDF",
fkey = "unit",
rezrObj = rez007
)
#> You didn't give me a df2 key for future updates, so I've guessed it from your by-line.
rez007$tokenDF %>% select(id, text, averageWordLength) %>% head
#> # A tibble: 6 × 3
#> id text averageWordLength
#> <chr> <chr> <dbl>
#> 1 31F282855E95E (...) 3
#> 2 363C1D373B2F7 God 3
#> 3 3628E4BD4CC05 , 3
#> 4 37EFCBECFD691 I 2.82
#> 5 12D67756890C1 said 2.82
#> 6 936363B71D59 I 2.82
Onwards!
Using EasyEdit or TidyRez, it is not hard to use some rules to add
some automatic annotations, and then correct them by hand in a
spreadsheet program. The next tutorial,
vignette("edit_external")
, will cover exactly this use
case. We will export data to a .csv
, edit it outside R, and
then import it back.
As always, saving is a virtue!
savePath = "rez007.Rdata"
rez_save(rez007, savePath)
#> Saving rezrObj ...