The function produces tibble-output collocates for Leipzig Corpora files.

  leipzig_path = NULL,
  leipzig_corpus_list = NULL,
  pattern = NULL,
  window = "b",
  span = 2,
  case_insensitive = TRUE,
  to_lower_colloc = TRUE,
  save_results = FALSE,
  coll_output_name = "colloc_tidy_colloc_out.txt",
  sent_output_name = "colloc_tidy_sent_out.txt"



character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files.


specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a list. Example of this type of data-input can be seen in data("demo_corpus_leipzig"). So specify either leipzig_path OR leipzig_corpus_list and set one of them to NULL.


regular expressions/exact patterns for the target pattern.


window-span direction of the collocates: "r" ('right of the node'), "l" ('left of the node'), or the DEFAULT is "b" ('both left and right context-window').


integer vector indicating the span of the collocate scope.


whether the search pattern ignores case (TRUE -- the default) or not (FALSE).


whether to lowercase the retrieved collocates and the nodes (TRUE -- default) or not (FALSE).


whether to output the collocates into a tab-separated plain text (TRUE) or not (FALSE -- default).


name of the file for the collocate tables.


name of the file for the full sentence match containing the collocates.


a list of two tibbles: (i) for collocates with sentence number of the match, window span information, and the corpus files, and (ii) full-sentences per match with sentence number and corpus file


if (FALSE) { # get the corpus filepaths # so this example use the filepath input rather than list of corpus leipzig_corpus_path <- c("my/path/to/leipzig_corpus_file_1M-sent_1.txt", "my/path/to/leipzig_corpus_file_300K-sent_2.txt", "my/path/to/leipzig_corpus_file_300K-sent_3.txt") # run the function colloc <- colloc_leipzig(leipzig_path = leipzig_corpus_path[2:3], pattern = "\\bterelakkan\\b", window = "b", span = 3, save_results = FALSE, to_lower_colloc = TRUE) # Inspect outputs ## This one outputs the collocates tibble colloc$collocates ## This one outputs the sentence matches tibble colloc$sentence_matches }