R/corplingr_collex_fye.R
collex_fye.RdThis is a vectorised wrapper for the dhyper function in the stats package.
The implementation of the code is adapted from Gries (2012).
collex_fye also provides a logical argument (i.e., two_sided) whose value is passed to the alternative argument of the embedded fisher.test if two_sided is TRUE.
collex_fye( a = "frequency of co-occurrence of the collocate and the node", a_exp = "expected frequency", n_w_in_corp = "total frequency of collexemes/collocates in the whole corpus", corpus_size = "total size of the corpus", n_pattern = "total frequency of the construction/node word in the whole corpus", two_sided = FALSE, collstr_res = TRUE, float = 3 )
| a | cell |
|---|---|
| a_exp | expected frequency for cell |
| n_w_in_corp | the total frequency of the collexemes/collocates of the target construction/node word in the corpus. |
| corpus_size | the total size (in word tokens) of the corpus. |
| n_pattern | the total frequency of occurrence of the target construction/node word in the corpus. |
| two_sided | logical; whether to perform one-sided test ( |
| collstr_res | logical; whether output the FYE p-value as the Collostruction Strength value ( |
| float | the floating digits of the Collostruction/Collocation Strength. The default value is |
Numeric vector of the same length as a interpreted as the Collostruction Strength of the construction/node word with the collexemes/collocates.
Collostruction Strength is (i) the negative logarithm to the base of ten of the Fisher-Yates Exact test p-value when a > a_exp, and (ii) the positive logarithm when a <= a_exp.
if (FALSE) { # do the collocate search using "corpus_path" input-option library(tidyverse) df <- colloc_default(corpus_path = orti_bali_path, pattern = "^nuju$", window = "b", # focusing on both left and right context window span = 3) # retrieve 3 collocates to the left and right of the node # prepare the collexeme analysis input tibble # and select to focus on R1 and R2 collocates. collex_tb <- collex_prepare(df, span = c("r1", "r2")) # run the Fisher-Yates Exact (FYE) Test in vectorised fashion with the help of purrr's pmap # the example below runs the one-tailed FYE and output the p-value in log10 of CollStr value collex_tb <- mutate(collex_tb, collstr = purrr::pmap_dbl(list(a, a_exp, n_w_in_corp, corpus_size, n_pattern), collex_fye, two_sided = FALSE, collstr_res = TRUE)) # preview the results collex_tb }