Count the number of competing referents to the current mention
countCompete.Rd
This may either be counted within a window of tokens from the current one, or all referents competing with the current one may be counted, or a mix of both conditions. By default, we count referents intervening between the current and previous mention. Despite its name, tokenOrder can be set as unitSeqLast or similar.
Usage
countCompetitors(
cond = NULL,
windowSize = Inf,
tokenSeq = NULL,
unitSeq = NULL,
chain = NULL,
between = T,
exclFrag = F,
combinedChunk = NULL,
nonFragmentMember = F,
windowType = "unit"
)
countCompetitorsMatch(
matchCol,
windowSize = Inf,
tokenOrder = NULL,
chain = NULL,
between = T,
exclFrag = F,
combinedChunk = NULL,
nonFragmentMember = F
)
countMatchingCompetitors(
matchCol,
windowSize = Inf,
tokenOrder = NULL,
chain = NULL,
between = T,
exclFrag = F,
combinedChunk = NULL,
nonFragmentMember = F
)
Arguments
- cond
The condition under which something counts as a competitor. Leave blank if anything goes.
- windowSize
The size of the window in which you will be counting.
- unitSeq
The vector of tokenOrder values where the mentions appeared. You can choose tokenOrderFirst, tokenOrderFirst, or maybe an average of the two. By default it's tokenOrderFirst.
- chain
The chain that each mention belongs to.
- between
Do we only count competitors between the current mention and previous mention? (If
T
, then the value isNA
for first mentions.)- exclFrag
Exclude 'fragments' (i.e. members of a combined chunk which do not serve as meaningful chunks in their own right)
- combinedChunk
The
combinedChunk
column of the rezrDF. By default, namedcombinedChunk
.- nonFragmentMember
Vector indicating whether each entry is a non-fragment member, i.e. a member of a combined chunk that also serves as a meaningful chunk in its own right.
- matchCol
The column for which a value is to be matched.
- tokenOrder
The vector of sequence values values where the mentions appeared. Common choices are docTokenSeqFirst, docTokenSeqLast, wordTokenSeqFirst and wordTokenseqLast (the last two are available after running addIsWordField on a rezrObj. By default it's docTokenSeqLast.
Examples
sbc007$trackDF$default %>%
rez_mutate(isZero = (text == "<0>")) %>%
rez_mutate(noCompetitors = countCompetitors(windowSize = 40, between = F),
noMatchingCompetitors = countMatchingCompetitors(isZero, windowSize = 40, between = F))
#> # A tibble: 236 × 34
#> id doc chain sourc…¹ token gapWo…² charC…³ token…⁴ gapUn…⁵ kind place
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 1096E4… sbc0… 278D… "" 37EF… N/A 1 1 N/A "Wor… "1"
#> 2 92F20A… sbc0… 278D… "174E6… 9363… 2 1 1 0 "Wor… "3"
#> 3 7E5BB6… sbc0… 2B67… "" 744A… N/A 17 5 N/A "" ""
#> 4 1F74D2… sbc0… 2A01… "52452… 1265… N/A 4 1 N/A "Wor… "9"
#> 5 2485C4… sbc0… 278D… "CB1D9… 2113… 10 3 1 1 "" ""
#> 6 1BF226… sbc0… 2A01… "" 35E3… 5 12 3 1 "" ""
#> 7 6B37B5… sbc0… 2A01… "ED8C9… 233E… 5 3 1 1 "" ""
#> 8 259C2C… sbc0… 251A… "" 1F6B… N/A 40 8 N/A "" ""
#> 9 1D1F2B… sbc0… 10FA… "" 24FE… N/A 25 5 N/A "" ""
#> 10 1FA380… sbc0… 3067… "" 158B… N/A 11 2 N/A "" ""
#> # … with 226 more rows, 23 more variables: text <chr>, transcript <chr>,
#> # endNote <chr>, order <chr>, negPlace <chr>, corpusSeq <chr>,
#> # pSentOrder <chr>, POS_dft <chr>, tokenSeq <chr>, chunkType <chr>,
#> # turnOrder <chr>, largerChunk <chr>, tokenOrderFirst <dbl>,
#> # tokenOrderLast <dbl>, docTokenSeqFirst <dbl>, docTokenSeqLast <dbl>,
#> # chainCreateSeq <dbl>, name <chr>, chainSize <dbl>, layer <chr>,
#> # isZero <lgl>, noCompetitors <int>, noMatchingCompetitors <int>, and …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names