The function produces tibble-output collocates for Leipzig Corpora files.

colloc_leipzig(
  leipzig_path = NULL,
  leipzig_corpus_list = NULL,
  pattern = NULL,
  window = "b",
  span = 2,
  case_insensitive = TRUE,
  to_lower_colloc = TRUE,
  save_results = FALSE,
  coll_output_name = "colloc_tidy_colloc_out.txt",
  sent_output_name = "colloc_tidy_sent_out.txt"
)

Arguments

leipzig_path

character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files.

leipzig_corpus_list

specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a list. Example of this type of data-input can be seen in data("demo_corpus_leipzig"). So specify either leipzig_path OR leipzig_corpus_list and set one of them to NULL.

pattern

regular expressions/exact patterns for the target pattern.

window

window-span direction of the collocates: "r" ('right of the node'), "l" ('left of the node'), or the DEFAULT is "b" ('both left and right context-window').

span

integer vector indicating the span of the collocate scope.

case_insensitive

whether the search pattern ignores case (TRUE -- the default) or not (FALSE).

to_lower_colloc

whether to lowercase the retrieved collocates and the nodes (TRUE -- default) or not (FALSE).

save_results

whether to output the collocates into a tab-separated plain text (TRUE) or not (FALSE -- default).

coll_output_name

name of the file for the collocate tables.

sent_output_name

name of the file for the full sentence match containing the collocates.

Value

a list of two tibbles: (i) for collocates with sentence number of the match, window span information, and the corpus files, and (ii) full-sentences per match with sentence number and corpus file

Examples

if (FALSE) { # get the corpus filepaths # so this example use the filepath input rather than list of corpus leipzig_corpus_path <- c("my/path/to/leipzig_corpus_file_1M-sent_1.txt", "my/path/to/leipzig_corpus_file_300K-sent_2.txt", "my/path/to/leipzig_corpus_file_300K-sent_3.txt") # run the function colloc <- colloc_leipzig(leipzig_path = leipzig_corpus_path[2:3], pattern = "\\bterelakkan\\b", window = "b", span = 3, save_results = FALSE, to_lower_colloc = TRUE) # Inspect outputs ## This one outputs the collocates tibble colloc$collocates ## This one outputs the sentence matches tibble colloc$sentence_matches }