word_pairs
searches for the occurrences of a pair of words in sentences. These words can be separated by intervening strings (viz. other in-between words).
word_pairs(corpus, word_1 = NULL, word_2 = NULL, min_intervening = 0L, max_intervening = 3L)
corpus | A character vector of sentences. |
---|---|
word_1 | A regular expressions for the first word. The regex must enclose the word with word boundary character (i.e. |
word_2 | A regular expressions for the second word. The regex must enclose the word with word boundary character (i.e. |
min_intervening | Number of minimum occurrence of the intervening word.
The default is |
max_intervening | Number of minimum occurrence of the intervening word.
The default is |
A list object with the following elements:
pattern
: the extracted pattern spanning from the first word to the second word.
pattern_tagged
: the version of pattern
containing tags for the first and the second word.
matches
: the sentence matches containing the word pairs that are tagged for the first and the second word.
Rajeg, Gede Primahadi Wijaya. (2018). wordpairs: An R package to retrieve word pair in sentences of the (Indonesian) Leipzig Corpora.
# co-occurrence of *me-X-kan* transitive verbs with *kepada* word_1 <- "\\bmen[a-z]{3,}kan\\b" word_2 <- "\\bkepada\\b" corpus <- my_leipzig_sample m <- word_pairs(corpus, word_1 = word_1, word_2 = word_2, min_intervening = 0L, max_intervening = 3L) # inspect the snippet of the results head(m$pattern)#> [1] "menyampaikan secara langsung kepada" #> [2] "mengutamakan pemberian kredit kepada" #> [3] "menuangkan pikiran dan menyampaikan kepada" #> [4] "mengembalikan ID card kepada" #> [5] "mengeluhkan diskriminasi pemberian nilai kepada" #> [6] "menawarkan jasa kepada"head(m$pattern_tagged)#> [1] "<w id='1'>menyampaikan</w> secara langsung <w id='2'>kepada</w>" #> [2] "<w id='1'>mengutamakan</w> pemberian kredit <w id='2'>kepada</w>" #> [3] "<w id='1'>menuangkan</w> pikiran dan menyampaikan <w id='2'>kepada</w>" #> [4] "<w id='1'>mengembalikan</w> ID card <w id='2'>kepada</w>" #> [5] "<w id='1'>mengeluhkan</w> diskriminasi pemberian nilai <w id='2'>kepada</w>" #> [6] "<w id='1'>menawarkan</w> jasa <w id='2'>kepada</w>"# generate frequency table for the patterns freq_tb <- table(m$pattern_tagged) # sort in decreasing order of frequency head(sort(freq_tb, decreasing = TRUE))#> #> <w id='1'>mengatakan</w> <w id='2'>kepada</w> #> 56 #> <w id='1'>mengingatkan</w> <w id='2'>kepada</w> #> 6 #> <w id='1'>mengucapkan</w> terima kasih <w id='2'>kepada</w> #> 5 #> <w id='1'>menyampaikan</w> <w id='2'>kepada</w> #> 5 #> <w id='1'>mengemukakan</w> <w id='2'>kepada</w> #> 4 #> <w id='1'>mengusulkan</w> <w id='2'>kepada</w> #> 4