Generate word-/regex-specific frequency list from the Leipzig Corpora

The function generates a tibble of token-count for a particular word(s)/regex(es) for each supplied Leipzig corpus file.

freqlist_leipzig_each(
  pattern = NULL,
  leipzig_path = "(full) filepath to a (set of) Leipzig corpus files",
  case_insensitive = TRUE
)

Arguments

pattern	the regular expressions/exact patterns for the target pattern/word whose frequency in a (set of) Leipzig Corpus file(s) you want to generate.
leipzig_path	gives the (i) file names of the corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig.
case_insensitive	logical; whether case differences should be ignored (`TRUE` -- the default) or not (`FALSE`).

Value

a tibble with three columns (i) match, (ii) corpus_id, and (iii) n, which is the count/token.

Examples

if (FALSE) {
# prepare the input
regex <- "\\bmemberi(kan)?\\b"
corpus.path <- leipzig_file_path[1:2]

# generate the frequency count
freqlist_leipzig_each(pattern = regex,
                leipzig_path = corpus.path,
                case_insensitive = TRUE)
}