Skip to contents

Similarity score calculation

Usage

sim_Score(
  d1,
  d2,
  record = FALSE,
  m = NA,
  transCost = 0.5,
  boundaries = c(",", ".", "?", "-", "+"),
  noboundary = ";",
  trans = TRUE
)

Arguments

d1

A data.frame of the first annotator's annotation. Each line represents a segment. Space is used for tokenisation, which may be spaces in the case of intonation unit segmentation, turn constructional units for turn segmentation, and so on.

d2

A data.frame of the second annotator's annotation, similar to `d1`.

record

Whether you want to get the step of transformation (slow process!).

m

A similarity matrix to customize substitution cost. The size of the matrix should either be the number of boundary types in `boundaries` plus two, if `noboundary` has been set, or the number of boundary types in `boundaries` plus one, otherwise. In both cases, the final column gives deletion cost, and the final row gives insertion cost. In the first case, the second-last row and column are for unclassified boundaries.

transCost

a transposition cost: either a single value, or a vector with the same length as the number of rows/columns in `m`.

boundaries

A vector of boundary symbols that will exist in the data.

noboundary

A symbol assigned for unclassified boundary types. This will be appended to lines that do not end in any symbol found in `boundaries`. Use "" if unclassified boundaries are not allowed; lines not ending with a defined boundary type will then be treated as not ending in a boundary.

trans

If `TRUE`, the transposition operation will be performed..

Value

similarity score

Examples

sim_Score(nccu_t049_1, nccu_t049_2, record = T)
#> Error in calCostV2(bdlist1, bdlist2, m, order, transCost): different length of elements