Editing III: Using external tools
edit_external.Rmd
Getting started
This tutorial is about editing rezonateR data externally. In many
situations, you can use the automatic editing tools that we have
discussed in the last two tutorials
(vignette("edit_easyEdit")
and
vignette("edit_tidyRez")
) to prepare an automatic
annotation, and then edit it using external tools (such as Microsoft
Excel). RezonateR has a number of tools to make this a less painful and
error-prone experience.
library(rezonateR)
path = system.file("extdata", "rez007_edit3.Rdata", package = "rezonateR", mustWork = T)
rez007 = rez_load(path)
#> Loading rezrObj ...
The workflow
In general, the workflow when working externally is to first finish
at least some annotations in Rezonator. You then export it from
rezonateR
and, typically, use the editing functions from
the previous tutorials to add some annotations automatically.
You will then export a .csv
file from
rezonateR.
Because rezrDF
s often contain a lot
of information that will be a distraction when working with a
.csv
, you can export a lighter version of it as a
.csv
file. After working with the .csv
file,
you will import it into Rezonator, then combine the information in the
newly imported .CSV
back to the original
rezrDF
.
Let’s say we want to annotate the number of the referential
expressions inside trackDF$default
. A good approximation
will be to mark everything that ends with <s> as plural, along
with common plural pronouns and demonstratives, mark coordinate noun
phrases as plural, mark noun phrases with singular demonstratives as
singular, mark you as uncertain, and then mark the rest as
singular. Here is how it may be done with TidyRez:
rez007$trackDF$default = rez007$trackDF$default %>% rez_mutate(number = case_when(
str_detect(text, " and ") ~ "pl",
str_ends(text, "s") ~ "pl",
str_detect(tolower(text), "(these|those)") ~ "pl",
str_detect(tolower(text), "(this|that)") ~ "sg",
tolower(text) %in% c("we", "they", "us", "them") ~ "pl",
tolower(text) %in% c("you", "<0>") ~ "?",
T ~ "sg"
))
head(rez007$trackDF$default %>% select(id, name, text, number))
#> # A tibble: 6 × 4
#> id name text number
#> <chr> <chr> <chr> <chr>
#> 1 1096E4AFFFE65 Mary I sg
#> 2 92F20ACA5F06 Mary I sg
#> 3 7E5BB65072C Trail 87 was n't gon na do sg
#> 4 1F74D2B049FA4 Staying up late this pl
#> 5 2485C4F740FC0 Mary <0> ?
#> 6 1BF2260B4AB78 Staying up late Stay up late sg
Updating rezonateR using external information
Before we export this as a CSV for annotation, I would like to add a
column inside the rezrDF
that gives us the word of the
entire unit. (Since this document currently does not have multi-unit
track entries, it will suffice to use unitLast
or
unitFirst
). It will be useful to be able to see this column
while making manual annotations:
rez007$trackDF$default = rez007$trackDF$default %>%
rez_left_join(rez007$unitDF %>% rez_select(unitSeq, text), by = c(unitSeqLast = "unitSeq"), suffix = c("", "_unit"), df2key = "unitSeq", rezrObj = rez007, df2Address = "unitDF", fkey = "unitSeqLast") %>%
rez_rename(unitLastText = text_unit)
#> Tip: When performed on a rezrDF inside a rezrObj, rez_rename is a potentially destructive action. It is NOT recommended to assign it back to a rezrDF inside a rezrObj. If you must do so, be careful to update all addresses from other DFs to this DF.
The next step is to write the .csv
file.
rez_write_csv()
allows us to do this easily. This function
has three arguments. The first two are straightforward: The
rezrDF
we want to export, and the name of the filename we
want to export to. The third argument is a vector of field names,
i.e. column names, that we want to export. It is a good idea to keep the
number of exported fields small to make the spreadsheet more manageable
and require less scrolling. Here is how I would do it:
rez_write_csv(rez007$trackDF$default, "rez007_refexpr.csv", c("id", "unitLastText", "tokenOrderLast", "text", "name", "number"))
Whenever you export a .csv
with the intention of
importing it back, you must export the id
column so
that the resulting table can be merged back. The
text
column is obviously needed since you need to know
which referential expression you’re dealing with, and
tokenOrderLast
is occasionally useful when a formally
identical noun phrase appears twice in an IU (you will need to
usetokenSeq
or wordSeq
). name
is
the name of the chain/entity. Finally, for obvious reasons you should
export the column(s) you’re editing.
In general, I recommend that you copy the .csv
file
before editing it, and edit from the copy. This is because you don’t
want to accidentally re-run your code and lose all your edits. In this
case, I have added _edited
to the end of the filename of
the copy.
Note: From time to time, you will find cases where an ID is a pure
numeral. In that case, Microsoft Excel will treat it as a number. This
will cause issues when importing back to rezonateR
, so you
need to manually mark it as a string by adding a ` to the beginning of
the cell. Unfortunately, because this is an Excel issue, there is no way
for me to get around this.
After editing the CSV in a spreadsheet program, let’s import it back
using rez_read_csv()
. The first argument of this function
is just the path to the .csv
, and the origDF
argument tells rezonateR
to look in the original
rezrDF
that produced the CSV, and determine the data types
accordingly:
changeDF = rez_read_csv("rez007_refexpr_edited.csv", origDF = rez007$trackDF$default)
Finally, the updateFromDF()
function allows us to update
the original rezrDF using information from the new rezrDF.
The first argument of this function is the target rezrDF
(i.e. the rezrDF
you’re changing), and the second one is
the data frame that you’re changing from, i.e. the data frame you just
imported. The rest of the arguments are all settings:
-
changeCols
: Columns to be changed. By default, this is all theflex
columns, but I encourage you to set this to avoid changing columns you did not mean to change. -
changeType
: Which types of columns (in field-access terms) will you change? If you did not specify the first one, then you can use this to select all columns of a field-access type. Note thatauto
andforeign
columns’ changes will be lost upon reload. -
renameCols
: Will you rename columns according to the new data frame? Generally, and by default, this should be ‘no’. -
colCorr
: IfrenameCols = T
, specify a list where keys are the new names and values are the old names. IfrenameCols = F
but you changed some names in your spreadsheet program anyway, you can specify a list where the keys are the old names and the values are the new names. -
delRows
: Will you delete rows fromtargetDF
if not present inchangeDF
? Generally, and by default, this should be a ‘no’. -
addRows
: Will you add rows totargetDF
if not present intargetDF
? Bydefault
, this is ‘no’. -
addCols
: Will you add columns present in thechangeDF
but not in thetargetDF
? By default, ‘no’. -
reloadAfterCorr
: Would you like to do a local reload on therezrDF
afterwards? By default, ‘yes’.
Here is one example:
rez007$trackDF$default = rez007$trackDF$default %>% updateFromDF(changeDF, changeCols = "number")
#>
#> "id"
#> NULL
head(rez007$trackDF$default %>% select(id, text, number))
#> # A tibble: 6 × 3
#> id text number
#> <chr> <chr> <chr>
#> 1 1096E4AFFFE65 I sg
#> 2 92F20ACA5F06 I sg
#> 3 7E5BB65072C was n't gon na do sg
#> 4 1F74D2B049FA4 this pl
#> 5 2485C4F740FC0 <0> sg
#> 6 1BF2260B4AB78 Stay up late sg
What if I want to go back and change my Rezonator annotations?
If you don’t edit anything outside of R, it’s straightforward to
change something in Rezonator. All you need to do is import your new
.rez
file and run all your code again. However, if your
workflow also involves external editing, this becomes a little more
complicated.
The rez_write_csv()
> rez_read_csv()
> updateFromDF()
workflow allows for a fair amount of
flexibility in changing your Rezonator annotations. If you need to go
back and change something in your .rez
file, don’t panic!
Here is what you will need to do in each situation:
- If you simply changed values of tags that you
manually annotate in Rezonator, then simply re-run all your code from
the beginning, and make sure that the fields you’ve changed in Rezonator
are not being updated in
updateFromDF()
through thechangeCols
field. - If you deleted certain fields from Rezonator, simply re-run all your
code from the beginning, and ensure that
addCols = FALSE
inupdateFromDF()
. - If you added fields in Rezonator, simply re-run all your code. There is nothing to worry about.
- If you deleted certain elements (e.g. chunks, tracks, rezzes) from
Rezonator, simply re-run all your code from the beginning, and ensure
that
addRows = FALSE
inupdateFromDF()
. - If you added certain elements in Rezonator, this is by far the
trickiest. You should first make sure that your edited
.csv
is separate from the.csv
exported byrezonateR
(see the previous section). Then, you should re-run all your code up to the point where you export your.csv
. Go to the newly exported.csv
and copy the rows corresponding to the new elements to your edited.csv
, edit the columns as needed, and then run the rest of your code.
This should cover the vast majority of use cases!
Onwards!
By now you should be very familiar with all the editing features that
Rezonator provides. The next step is, naturally, to understand how to
analyse the data. The next tutorial vignette("tree")
will
start with the dependency tree. If you will not be working with this
data structure, you can skip forward to vignette("track")
,
which is currently the final tutorial.
As always, saving is a virtue:
savePath = "rez007.Rdata"
rez_save(rez007, savePath)
#> Saving rezrObj ...