Functions for getting summary information about a vector of strings.
complexActions.Rd
These functions may be used for the complexAction field in addFieldForeign / changeFieldForeign], or within expressions in functions like addField.
Usage
concatenateAll(x)
longestLength(x, isWord = T)
longest(x, isWord = T)
shortestLength(x, isWord = T)
shortest(x, isWord = T)
inLength(x, isWord = T)
Arguments
- x
The information from the source rezrDF.
- isWord
Name of the column that determines whether a token is a word or not.
Note
concatenateAll concatenates everything together. It is not to be confused with concatStringFields, which is applied on dataFrames. longest and shortest give the longest and shortest strings, and may have multiple entries if there are ties. longestLength and shortestLength give the lengths of the longest and shortest strings in x. Some base R functions that may be used include max, min, mean, range, etc.
Remember to include only the function name in complexAction fields, and include the 'x' (normally the name of a column inside your rezrDF) in expression fields.
Examples
sbc007 = addField(sbc007, entity = "token", layer = "",
fieldName = "longestWordInUnit",
expression = longestLength(text),
type = "complex",
groupField = "unit",
fieldaccess = "auto")
sbc007UnitLengths = sbc007$tokenDF %>%
rez_group_by(unit) %>%
summarise(lenWords = inLength(text, isWord = (kind == "Word")))