Skip to contents

Inter-annotator agreement

Usage

IAA(
  d1,
  d2,
  m = NA,
  transCost = 0.5,
  boundaries = c(",", ".", "?", "-", "+"),
  noboundary = ";",
  trans = TRUE,
  K = 100,
  metric = "kappa"
)

Arguments

d1

A data.frame of the first annotator's annotation. Each line represents a segment. Space is used for tokenisation, which may be spaces in the case of intonation unit segmentation, turn constructional units for turn segmentation, and so on.

d2

A data.frame of the second annotator's annotation, similar to `d1`.

m

A similarity matrix to customize substitution cost. The size of the matrix should either be the number of boundary types in `boundaries` plus two, if `noboundary` has been set, or the number of boundary types in `boundaries` plus one, otherwise. In both cases, the final column gives deletion cost, and the final row gives insertion cost. In the first case, the second-last row and column are for unclassified boundaries.

transCost

a transposition cost: either a single value, or a vector with the same length as the number of rows/columns in `m`.

boundaries

A vector of boundary symbols that will exist in the data.

noboundary

A symbol assigned for unclassified boundary types. This will be appended to lines that do not end in any symbol found in `boundaries`. Use "" if unclassified boundaries are not allowed; lines not ending with a defined boundary type will then be treated as not ending in a boundary.

trans

If `TRUE`, the transposition operation will be performed..

K

Number of iterations

metric

One of `"kappa"` (Cohen's kappa), `"pi"` (Scott's pi), `"s"` (Bennett's S), or `"s_modified"` (Bennett's S, except with the probability of non-boundary estimated from data).

Value

IAA value for Inter-annotator agreement