Similarity score calculation

Usage

sim_Score(
  d1,
  d2,
  record = FALSE,
  m = NA,
  transCost = 0.5,
  boundaries = c(",", ".", "?", "-", "+"),
  noboundary = ";",
  trans = TRUE
)

d1: A data.frame of the first annotator's annotation. Each line represents a segment. Space is used for tokenisation, which may be spaces in the case of intonation unit segmentation, turn constructional units for turn segmentation, and so on.
d2: A data.frame of the second annotator's annotation, similar to `d1`.
record: Whether you want to get the step of transformation (slow process!).
m: A similarity matrix to customize substitution cost. The size of the matrix should either be the number of boundary types in `boundaries` plus two, if `noboundary` has been set, or the number of boundary types in `boundaries` plus one, otherwise. In both cases, the final column gives deletion cost, and the final row gives insertion cost. In the first case, the second-last row and column are for unclassified boundaries.
transCost: a transposition cost: either a single value, or a vector with the same length as the number of rows/columns in `m`.
boundaries: A vector of boundary symbols that will exist in the data.
noboundary: A symbol assigned for unclassified boundary types. This will be appended to lines that do not end in any symbol found in `boundaries`. Use "" if unclassified boundaries are not allowed; lines not ending with a defined boundary type will then be treated as not ending in a boundary.
trans: If `TRUE`, the transposition operation will be performed..

similarity score

sim_Score(nccu_t049_1, nccu_t049_2, record = T)
#> Error in calCostV2(bdlist1, bdlist2, m, order, transCost): different length of elements