The function to compute collocation association measure with Mutual Information.

collex_MI(df, collstr_digit = 3)



The output of assoc_prepare.


The numeric vector for floating digits of the collostruction strength. The default is 3.


A tibble consisting of the collocates (column w), co-occurrence frequencies with the node (column a), the expected co-occurrence frequencies with the node (column a_exp), the direction of the association (e.g., attraction or repulsion) (column assoc), the Mutual Information score (column MI), and two uni-directional association measures of Delta P.


out <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig, pattern = "ke", # it is a preposition meaning 'to(wards)' window = "r", span = 2L, save_interim = FALSE)
#> Detecting a 'named list' input!
#> You chose NOT to SAVE INTERIM RESULTS, which will be stored as a list in console!
#> 1. Tokenising the "ind_mixed_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_mixed_2012_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_news_2008_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_news_2008_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_news_2009_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_news_2009_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_news_2010_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_news_2010_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_news_2011_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_news_2011_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_news_2012_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_news_2012_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_newscrawl_2011_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_newscrawl_2011_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_newscrawl_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_newscrawl_2012_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_newscrawl_2015_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_newscrawl_2015_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_newscrawl_2016_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_newscrawl_2016_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_web_2011_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_web_2011_300K.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_web_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_web_2012_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind_wikipedia_2016_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind_wikipedia_2016_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind-id_web_2013_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind-id_web_2013_1M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 1. Tokenising the "ind-id_web_2015_3M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'ke' in ind-id_web_2015_3M.
#> 2.1 Gathering the collocates for 'ke' ...
#> 3. Storing all of the outputs...
#> #> DONE!
assoc_tb <- assoc_prepare(colloc_out = out, stopword_list = collogetr::stopwords[collogetr::stopwords != "ke"])
#> Your colloc_leipzig output is stored as list!
#> You chose to combine the collocational and frequency list data from ALL CORPORA!
#> Tallying frequency list of all words in ALL CORPORA!
#> You chose to remove stopwords!
#> # A tibble: 301 x 8 #> # Groups: node, w [301] #> node w a a_exp assoc MI dP_collex_cue_cxn dP_cxn_cue_coll… #> <chr> <chr> <int> <dbl> <chr> <dbl> <dbl> <dbl> #> 1 ke rumah 10 0.578 attracti… 4.11 0.036 0.104 #> 2 ke luar 6 0.273 attracti… 4.46 0.022 0.133 #> 3 ke arah 5 0.127 attracti… 5.30 0.019 0.244 #> 4 ke berbagai 5 0.33 attracti… 3.92 0.018 0.09 #> 5 ke kata 5 1.33 attracti… 1.92 0.014 0.018 #> 6 ke negara 5 0.647 attracti… 2.95 0.017 0.043 #> 7 ke daerah 4 0.489 attracti… 3.03 0.013 0.046 #> 8 ke tempat 4 0.317 attracti… 3.66 0.014 0.074 #> 9 ke belakang 3 0.076 attracti… 5.30 0.011 0.244 #> 10 ke dua 3 0.609 attracti… 2.3 0.009 0.025 #> # … with 291 more rows