Count the number of competing referents to the current mention

This may either be counted within a window of tokens from the current one, or all referents competing with the current one may be counted, or a mix of both conditions. By default, we count referents intervening between the current and previous mention. Despite its name, tokenOrder can be set as unitSeqLast or similar.

Usage

countCompetitors(
  cond = NULL,
  windowSize = Inf,
  tokenSeq = NULL,
  unitSeq = NULL,
  chain = NULL,
  between = T,
  exclFrag = F,
  combinedChunk = NULL,
  nonFragmentMember = F,
  windowType = "unit"
)

countCompetitorsMatch(
  matchCol,
  windowSize = Inf,
  tokenOrder = NULL,
  chain = NULL,
  between = T,
  exclFrag = F,
  combinedChunk = NULL,
  nonFragmentMember = F
)

countMatchingCompetitors(
  matchCol,
  windowSize = Inf,
  tokenOrder = NULL,
  chain = NULL,
  between = T,
  exclFrag = F,
  combinedChunk = NULL,
  nonFragmentMember = F
)

Arguments

cond: The condition under which something counts as a competitor. Leave blank if anything goes.
windowSize: The size of the window in which you will be counting.
unitSeq: The vector of tokenOrder values where the mentions appeared. You can choose tokenOrderFirst, tokenOrderFirst, or maybe an average of the two. By default it's tokenOrderFirst.
chain: The chain that each mention belongs to.
between: Do we only count competitors between the current mention and previous mention? (If T, then the value is NA for first mentions.)
exclFrag: Exclude 'fragments' (i.e. members of a combined chunk which do not serve as meaningful chunks in their own right)
combinedChunk: The combinedChunk column of the rezrDF. By default, named combinedChunk.
nonFragmentMember: Vector indicating whether each entry is a non-fragment member, i.e. a member of a combined chunk that also serves as a meaningful chunk in its own right.
matchCol: The column for which a value is to be matched.
tokenOrder: The vector of sequence values values where the mentions appeared. Common choices are docTokenSeqFirst, docTokenSeqLast, wordTokenSeqFirst and wordTokenseqLast (the last two are available after running addIsWordField on a rezrObj. By default it's docTokenSeqLast.

Value

A vector of number of competitors.

Examples

sbc007$trackDF$default %>%
rez_mutate(isZero = (text == "<0>")) %>%
 rez_mutate(noCompetitors = countCompetitors(windowSize = 40, between = F),
            noMatchingCompetitors = countMatchingCompetitors(isZero, windowSize = 40, between = F))
#> # A tibble: 236 × 34
#>    id      doc   chain sourc…¹ token gapWo…² charC…³ token…⁴ gapUn…⁵ kind  place
#>    <chr>   <chr> <chr> <chr>   <chr> <chr>     <dbl>   <dbl> <chr>   <chr> <chr>
#>  1 1096E4… sbc0… 278D… ""      37EF… N/A           1       1 N/A     "Wor… "1"  
#>  2 92F20A… sbc0… 278D… "174E6… 9363… 2             1       1 0       "Wor… "3"  
#>  3 7E5BB6… sbc0… 2B67… ""      744A… N/A          17       5 N/A     ""    ""   
#>  4 1F74D2… sbc0… 2A01… "52452… 1265… N/A           4       1 N/A     "Wor… "9"  
#>  5 2485C4… sbc0… 278D… "CB1D9… 2113… 10            3       1 1       ""    ""   
#>  6 1BF226… sbc0… 2A01… ""      35E3… 5            12       3 1       ""    ""   
#>  7 6B37B5… sbc0… 2A01… "ED8C9… 233E… 5             3       1 1       ""    ""   
#>  8 259C2C… sbc0… 251A… ""      1F6B… N/A          40       8 N/A     ""    ""   
#>  9 1D1F2B… sbc0… 10FA… ""      24FE… N/A          25       5 N/A     ""    ""   
#> 10 1FA380… sbc0… 3067… ""      158B… N/A          11       2 N/A     ""    ""   
#> # … with 226 more rows, 23 more variables: text <chr>, transcript <chr>,
#> #   endNote <chr>, order <chr>, negPlace <chr>, corpusSeq <chr>,
#> #   pSentOrder <chr>, POS_dft <chr>, tokenSeq <chr>, chunkType <chr>,
#> #   turnOrder <chr>, largerChunk <chr>, tokenOrderFirst <dbl>,
#> #   tokenOrderLast <dbl>, docTokenSeqFirst <dbl>, docTokenSeqLast <dbl>,
#> #   chainCreateSeq <dbl>, name <chr>, chainSize <dbl>, layer <chr>,
#> #   isZero <lgl>, noCompetitors <int>, noMatchingCompetitors <int>, and …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names