The function generates a tibble of token-count for a particular word(s)/regex(es) for each supplied Leipzig corpus file.

freqlist_leipzig_each(
  pattern = NULL,
  leipzig_path = "(full) filepath to a (set of) Leipzig corpus files",
  case_insensitive = TRUE
)

Arguments

pattern

the regular expressions/exact patterns for the target pattern/word whose frequency in a (set of) Leipzig Corpus file(s) you want to generate.

leipzig_path

gives the (i) file names of the corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig.

case_insensitive

logical; whether case differences should be ignored (TRUE -- the default) or not (FALSE).

Value

a tibble with three columns (i) match, (ii) corpus_id, and (iii) n, which is the count/token.

Examples

if (FALSE) { # prepare the input regex <- "\\bmemberi(kan)?\\b" corpus.path <- leipzig_file_path[1:2] # generate the frequency count freqlist_leipzig_each(pattern = regex, leipzig_path = corpus.path, case_insensitive = TRUE) }