Basic functions
intro.RmdGet similarity score, with a record of operations taken (simple example):
library(segsimflex)## Warning: package 'tidyverse' was built under R version 4.2.2
data_raw = c("JOHN: Hello , how are you ? MARY: I am fine , thank you .", "JOHN: Hello how are you ? MARY: I am fine thank you .")
data_df = create_dfs(data_raw)
segsimflex::sim_Score(data_df[[1]], data_df[[2]], boundaries = c(",", "?", "."), record = T)## [1] "Speaker 1/2, Segment 1/1 - t1:, |t2: "
## [1] "Speaker 2/2, Segment 1/1 - t1: , |t2: "
## $cost
## [1] 2
##
## $actions
## [1] 4
##
## $record
## type e1 e2 pos1 pos2 postTranspose
## 1 Substitution , 1 1 FALSE
## 2 Substitution , 3 3 FALSE
##
## $fullMatches
## [1] "." "?"
##
## $sim_N
## [1] 0.7777778
##
## $sim_B
## [1] 0.5
Here, cost indicates the total cost of the actions
taken, which is 2 because of the two substitution operations (from
, to ). actions indicates the
total number of actions, including boundaries left intact, which is 4.
record indicates all the operations performed.
fullMatches gives the boundaries which matched in position
and type. The final numbers are the similarity scores, normalised by the
potential vs actual numbers of boundaries.
Get similarity score:
M_nccu = matrix(
c(1, .5, .25, .25, 1, 0,
.5, 1, .5, .5, 1, .25,
.25, .5, 1, .25, 1, 0,
.25, .5, .25, 1, 1, .25,
1,1,1,1,1, 1,
0, .25, 0, .25, 1, 1),
nrow = 6)
bounds_nccu = c(",", ".", "?", "--")
transCost = (1-M_nccu[,6])*.5
nccu_t001 = nccu_squareno[[1]]
t01_sim = segsimflex::sim_Score(nccu_t001[[1]], nccu_t001[[2]], transCost = transCost, m = M_nccu, boundaries = bounds_nccu)
t01_sim## [1] 0.941676 0.745334
Get IAA (K = 2 for speed, for display purposes):
t01_iaa_m = segsimflex::IAA(nccu_t001[[1]], nccu_t001[[2]], transCost = transCost, m = M_nccu, boundaries = bounds_nccu, K = 2)
t01_iaa_m## [1] 0.7354592 0.6333383