word_pairs searches for the occurrences of a pair of words in sentences. These words can be separated by intervening strings (viz. other in-between words).

word_pairs(corpus, word_1 = NULL, word_2 = NULL,
  min_intervening = 0L, max_intervening = 3L)



A character vector of sentences.


A regular expressions for the first word. The regex must enclose the word with word boundary character (i.e. "\\b").


A regular expressions for the second word. The regex must enclose the word with word boundary character (i.e. "\\b").


Number of minimum occurrence of the intervening word. The default is 0L.


Number of minimum occurrence of the intervening word. The default is 3L. Use Inf to get infinite intervening words after word_1 and before the occurrence of word_2.


A list object with the following elements:

  • pattern: the extracted pattern spanning from the first word to the second word.

  • pattern_tagged: the version of pattern containing tags for the first and the second word.

  • matches: the sentence matches containing the word pairs that are tagged for the first and the second word.


Rajeg, Gede Primahadi Wijaya. (2018). wordpairs: An R package to retrieve word pair in sentences of the (Indonesian) Leipzig Corpora.


# co-occurrence of *me-X-kan* transitive verbs with *kepada* word_1 <- "\\bmen[a-z]{3,}kan\\b" word_2 <- "\\bkepada\\b" corpus <- my_leipzig_sample m <- word_pairs(corpus, word_1 = word_1, word_2 = word_2, min_intervening = 0L, max_intervening = 3L) # inspect the snippet of the results head(m$pattern)
#> [1] "menyampaikan secara langsung kepada" #> [2] "mengutamakan pemberian kredit kepada" #> [3] "menuangkan pikiran dan menyampaikan kepada" #> [4] "mengembalikan ID card kepada" #> [5] "mengeluhkan diskriminasi pemberian nilai kepada" #> [6] "menawarkan jasa kepada"
#> [1] "<w id='1'>menyampaikan</w> secara langsung <w id='2'>kepada</w>" #> [2] "<w id='1'>mengutamakan</w> pemberian kredit <w id='2'>kepada</w>" #> [3] "<w id='1'>menuangkan</w> pikiran dan menyampaikan <w id='2'>kepada</w>" #> [4] "<w id='1'>mengembalikan</w> ID card <w id='2'>kepada</w>" #> [5] "<w id='1'>mengeluhkan</w> diskriminasi pemberian nilai <w id='2'>kepada</w>" #> [6] "<w id='1'>menawarkan</w> jasa <w id='2'>kepada</w>"
# generate frequency table for the patterns freq_tb <- table(m$pattern_tagged) # sort in decreasing order of frequency head(sort(freq_tb, decreasing = TRUE))
#> #> <w id='1'>mengatakan</w> <w id='2'>kepada</w> #> 56 #> <w id='1'>mengingatkan</w> <w id='2'>kepada</w> #> 6 #> <w id='1'>mengucapkan</w> terima kasih <w id='2'>kepada</w> #> 5 #> <w id='1'>menyampaikan</w> <w id='2'>kepada</w> #> 5 #> <w id='1'>mengemukakan</w> <w id='2'>kepada</w> #> 4 #> <w id='1'>mengusulkan</w> <w id='2'>kepada</w> #> 4