The function generates a frequency list of all word-tokens in a single Leipzig Corpus file. While users can input all filepath to all corpus files, for memory-efficiency, it is recommended that each file is processed in separate function-call. If it is decided to process all corpus files, the functions output a List with as many elements as the number of the input filepath.

freqlist_leipzig_all(
  split_regex = "([^a-zA-Z0-9-]+|--)",
  leipzig_path = NULL,
  case_insensitive = TRUE
)

Arguments

split_regex

user-defined regular expressions to tokenise the corpus.

leipzig_path

full filepath to one or more of the Leipzig Corpus file(s).

case_insensitive

logical; ignoring (TRUE) or maintaining (FALSE) the case when splitting the corpus into word token.

Value

A tibble of frequency list in descending order of the frequency.

Examples

if (FALSE) { wlist_all <- freqlist_leipzig_all(split_regex = "([^a-zA-Z0-9-]+|--)", leipzig_path = leipzig_corpus_path[1]) }