Editing II: Using TidyRez

Getting started

This tutorial is about TidyRez, the subset of functions derived from dplyr in Tidyverse. We will use the file saved at the end of vignette("edit_easyEdit"). As always, you don’t have to have read that tutorial beforehand, though it may be helpful if you are new to rezonateR.

library(rezonateR)
path = system.file("extdata", "rez007_edit2.Rdata", package = "rezonateR", mustWork = T)
rez007 = rez_load(path)
#> Loading rezrObj ...

Unlike most other tutorials, this one will be very brief, because it assumes that you have knowledge of dplyr, the R package containing functions like dplyr::mutate() and dplyr::select(). If you are not familiar with dplyr beforehand, I suggest using a tutorial like this first, and then coming back to this page.

Why TidyRez?

In general, TidyRez functions are called by adding rez_ in front of a dplyr function name, such a rez_group_by() or rez_mutate(). You might wonder why you’d want to use TidyRez instead of plain dplyr. The main reason is that TidyRez functions allow you to keep and/or update your field access values, inNodeMap values, and updateFunctions. Using base R or classic dplyr functions with rezrDFs will result in reload() fails (unless you add those attributes back yourself).

Thus, TidyRez functions that add or change columns, such as rez_mutate() or rez_left_join(). will give you the option to change the field access value of that field through the fieldaccess field. updateFunctions are automatically added if you choose auto or foreign. Other TidyRez functions do not differ substantially from classic dplyr; they only allow you to keep your field access labels, updateFunctions, and inNodeMap values.

To see the power of TidyRez, let’s try creating an emancipated rezrDF with only a subset of the original columns with rez_select(). Here, we take trackDF$refexpr, the table of referential expressions. We then damage one of the auto fields using a classic dplyr function. As you can see here, the emancipated rezrDF can still be updated using the rezrObj, effectively overriding the damage:

refTable = rez007$trackDF$default %>% rez_select(id, token, chain, name, text, tokenOrderLast)
print("Before:")
#> [1] "Before:"
head(refTable %>% select(id, tokenOrderLast))
#> # A tibble: 6 × 2
#>   id            tokenOrderLast
#>   <chr>                  <dbl>
#> 1 1096E4AFFFE65              1
#> 2 92F20ACA5F06               3
#> 3 7E5BB65072C                8
#> 4 1F74D2B049FA4              9
#> 5 2485C4F740FC0              2
#> 6 1BF2260B4AB78              5
refTable = refTable %>% mutate(tokenSeqLast = 1) #Damage refTable with a classic dplyr function
print("After:")
#> [1] "After:"
refTable = refTable %>% reload(rez007)
head(refTable %>% select(id, tokenOrderLast))
#> # A tibble: 6 × 2
#>   id            tokenOrderLast
#>   <chr>                  <dbl>
#> 1 1096E4AFFFE65              1
#> 2 92F20ACA5F06               3
#> 3 7E5BB65072C                8
#> 4 1F74D2B049FA4              9
#> 5 2485C4F740FC0              2
#> 6 1BF2260B4AB78              5

A warning is in order: TidyRez only updates the current table. If other tables have references to the table you’re editing, they will not be updated. You must bear this in mind when using rez_select() and rez_rename(). No problems will arise if you use these functions on emancipated rezrDFs. However, if you use these functions on rezrDFs within rezrObjs, you should manually update any fields in other rezrDFs that refer to the field you’ve deleted or added. I plan to add a rename feature to EasyEdit in the near future that will update references from other rezrDFs.

Another implication is that if you use rez_mutate() and create an auto field, any references to other tables will not work. Thus, the EasyEdit function addFieldForeign() should be used instead.

What functions are available?

A few dplyr functions are completely safe to use in rezonateR, mostly those that focus on selecting rows of a table, such as dplyr::filter(), dplyr::arrange() or dplyr::slice(). Currently implemented TidyRez functions include:

rez_add_row() for adding new entries (not recommended; addRow() is better)
rez_mutate() for adding and editing columns
rez_rename() for renaming columns
rez_bind_rows() for combining rezrDFs vertically
rez_group_split() for splitting rezrDFs vertically
rez_group_by() and rez_ungroup() for grouping
rez_select() for selecting certain columns inside a rezrDF
rez_left_join() for left joins

Potential future additions include rez_bind_cols() and rez_outer_join(); suggestions for others are welcome if you have a use for them. The functions rez_dfop() and rez_validate_fieldchange() are used behind the scenes by TidyRez functions; if you want to create your own, please look through the documentation for these functions (and sent in a pull request when you’re done with it!).

`rez_left_join()`: A special case

Most of the TidyRez functions’ syntax deviate from dplyr only minimally in ways that you can read about in the documentation. However, rez_left_join() is a notable exception.

Firstly, by default, if no suffix is specified, the suffixes are c("", "_lower"). That is, if you are joining two data.frames, both with a column called name, then the left data.frame’s column will still be called name’in the new data.frame but the right data.frame's column will get calledname_lower. Because_lower` is not a very informative name, it’s best to supply your own suffixes.

In addition to a fieldaccess field, as we’ve mentioned before, you will also need:

rezrObj which is self-explanatory
fkey - the name of the field in the first rezrDF that corresponds to IDs of the second rezrDF. If this is not specified, it will be guessed from the by argument of dplyr::left_join().
df2key - the name of the field in the second rezrDF that corresponds to fkey. If this is not specified, it will be guessed from the by argument of dplyr::left_join().
An df2Address field a string that tells rez_left_join() how to find the source rezrDF from the rezrObj next time.

Addresses are mostly used by rezonateR under the hood, but in the case of rez_left_join(), you do need to be able to use it. If the source rezrDF doesn’t belong to a layer, e.g. tokenDF, then the DF name is the address. If the source rezrDF belongs to a layer, put a ‘/’ between the table and the layer, e.g. 'trackDF/refexpr'. Don’t forget that default layers are also layers!

As an example, we’ll take averageWordLength in the unitDF, which we added in vignette("edit_easyEdit"), and add it back to tokenDF, so that when we look at each word, we know the average length of the word in its units. Here’s how:

rez007$tokenDF = rez007$tokenDF %>% rez_left_join(
  y = rez007$unitDF %>% rez_select(id, averageWordLength),
  by = c("unit" = "id"),
  fieldaccess = "foreign",
  df2Address = "unitDF",
  fkey = "unit",
  rezrObj = rez007
)
#> You didn't give me a df2 key for future updates, so I've guessed it from your by-line.
rez007$tokenDF %>% select(id, text, averageWordLength) %>% head
#> # A tibble: 6 × 3
#>   id            text  averageWordLength
#>   <chr>         <chr>             <dbl>
#> 1 31F282855E95E (...)              3   
#> 2 363C1D373B2F7 God                3   
#> 3 3628E4BD4CC05 ,                  3   
#> 4 37EFCBECFD691 I                  2.82
#> 5 12D67756890C1 said               2.82
#> 6 936363B71D59  I                  2.82

Onwards!

Using EasyEdit or TidyRez, it is not hard to use some rules to add some automatic annotations, and then correct them by hand in a spreadsheet program. The next tutorial, vignette("edit_external"), will cover exactly this use case. We will export data to a .csv, edit it outside R, and then import it back.

As always, saving is a virtue!

savePath = "rez007.Rdata"
rez_save(rez007, savePath)
#> Saving rezrObj ...

Getting started

Why TidyRez?

What functions are available?

rez_left_join(): A special case

Onwards!

`rez_left_join()`: A special case