Multiple distinctive collexeme analysis (MDCA)

Function to perform Multiple Distinctive Collexeme Analysis (MDCA) in Rajeg (2019, Chapter 7).

mdca(
  df = NULL,
  cxn_var = "synonyms",
  coll_var = "metaphors",
  already_count_table = FALSE,
  assocstr_digits = 3L,
  correct_holm = TRUE,
  concise_output = TRUE
)

Arguments

the data frame for the thesis (phd_data_metaphor) included in the package.

cxn_var

character strings for the column name for the constructions variable, in this case, the "synonyms" column.

coll_var

character strings for the column name for the collocates variable, in this case, the "metaphors" column.

already_count_table

logical; the default is FALSE indicating mdca takes raw input data frame for observation-per-row format as in the case of phd_data_metaphor. When it is TRUE, it expects tidy co-occurrence count between values of var_cxn and var_coll with three columns:

synonyms	metaphors	n
kesenangan	happiness is a possessable object	182
kebahagiaan	happiness is a possessable object	181
...	...	...

assocstr_digits

integer for the floating points/digits of the Association Strength. The default is 3L.

correct_holm

logical; the default is TRUE for performing Holm's correction method of the p-value (cf. Gries, 2009, p. 249).

concise_output

logical; if TRUE (the default), mdca outputs the following columns:

metaphors
synonyms
n (for the observed co-occurrence frequency between metaphors and the synonyms).
exp (for the expected co-occurrence frequency between metaphors and the synonyms).
p_binomial (the one-tailed p-value of the Binomial Test).
assocstr (the log10 transformed values of the p-value of the Binomial Test. The assocstr values are positive when n is higher than the exp frequency, and they are negative when otherwise.).
p_holm (when correct_holm is TRUE)
dec (significance decision after Holm's correction) (when correct_holm is TRUE)

If concise_output is FALSE, mdca returns the total tokens in the data, total frequency of each collexeme/collocate, total frequency of each construction, the sum of absolute deviation of the collexeme/collocate, the construction name, showing the largest deviation from the expected, co-occurrence frequency with the collexeme, expected probability of the co-occurrence, and the direction of the deviation from the expected frequency (i.e., whether a collexeme is attracted or repelled by a construction).

Value

A tbl_df (cf. the concise_output).

Details

The mdca function is built on top of the core members of the tidyverse suit of packages. The computation of the Association Strength is based on the dbinom function (Gries, 2009, pp. 41-42; cf. Hilpert, 2006). The computation of the corrected p-value of the one-tailed Binomial Test with Holm's method is performed using p.adjust.

There is a well-known interactive R script to perform MDCA by Stefan Th. Gries that is called Coll.analysis 3.5 (Gries, 2014). The script includes the other codes to compute the family of methods of Collostructional Analyses. The mdca function in happyr aims to achieve the same analytical goal as that in Coll.analysis 3.5, but is designed differently in terms of its usage and the internal codes, as it is based on the tidyverse.

mdca allows users to have input and output data frame directly in the R environment, primarily enabling them to write interactive document in R Markdown in relation to MDCA. Moreover, happyr provides two functions dedicated to handle the output of mdca to retrieve the distinctive/attracted and repelled collexemes/collocates for a given construction. In contrast, Stefan Gries' script has two options to either print the output into (i) terminal or (ii) into external plain text, which requires post-processing of the results, mostly on a spreadsheet.

References

Gries, S. T. (2009). Statistics for linguistics with R: A practical introduction. Berlin: Mouton de Gruyter.
Gries, S. T. (2014). Coll.analysis 3.5. A script for R to compute perform collostructional analyses. http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/index.html.
Hilpert, M. (2006). Distinctive collexeme analysis and diachrony. Corpus Linguistics and Linguistic Theory, 2(2), 243–256.
Rajeg, G. P. W. (2019). Metaphorical profiles and near-synonyms: A corpus-based study of Indonesian words for HAPPINESS (PhD Thesis). Monash University. Melbourne, Australia. https://doi.org/10.26180/5cac231a97fb1.

Examples

# for distinctive metaphors
mdca_res <- mdca(df = phd_data_metaphor,
                 cxn_var = "synonyms",
                 coll_var = "metaphors",
                 correct_holm = TRUE,
                 concise_output = TRUE,
                 already_count_table = FALSE,
                 assocstr_digits = 3L)

# for distinctive 4-window span collocates
data("colloc_input_data")
mdca_colloc <- mdca(df = colloc_input_data,
                   cxn_var = "synonyms",
                   coll_var = "collocates",
                   correct_holm = TRUE,
                   concise_output = TRUE,
                   already_count_table = FALSE,
                   assocstr_digits = 3L)