Similarity score calculation
sim_Score.RdSimilarity score calculation
Usage
sim_Score(
d1,
d2,
record = FALSE,
m = NA,
transCost = 0.5,
boundaries = c(",", ".", "?", "-", "+"),
noboundary = ";",
trans = TRUE
)Arguments
- d1
A data.frame of the first annotator's annotation. Each line represents a segment. Space is used for tokenisation, which may be spaces in the case of intonation unit segmentation, turn constructional units for turn segmentation, and so on.
- d2
A data.frame of the second annotator's annotation, similar to `d1`.
- record
Whether you want to get the step of transformation (slow process!).
- m
A similarity matrix to customize substitution cost. The size of the matrix should either be the number of boundary types in `boundaries` plus two, if `noboundary` has been set, or the number of boundary types in `boundaries` plus one, otherwise. In both cases, the final column gives deletion cost, and the final row gives insertion cost. In the first case, the second-last row and column are for unclassified boundaries.
- transCost
a transposition cost: either a single value, or a vector with the same length as the number of rows/columns in `m`.
- boundaries
A vector of boundary symbols that will exist in the data.
- noboundary
A symbol assigned for unclassified boundary types. This will be appended to lines that do not end in any symbol found in `boundaries`. Use "" if unclassified boundaries are not allowed; lines not ending with a defined boundary type will then be treated as not ending in a boundary.
- trans
If `TRUE`, the transposition operation will be performed..