Frequency list of all words in a Leipzig Corpus file — freqlist_leipzig

The function generates a frequency list of all word-tokens in a single Leipzig Corpus file. While users can input all filepath to all corpus files, for memory-efficiency, it is recommended that each file is processed in separate function-call. If it is decided to process all corpus files, the functions output a List with as many elements as the number of the input filepath.

freqlist_leipzig_all(
  split_regex = "([^a-zA-Z0-9-]+|--)",
  leipzig_path = NULL,
  case_insensitive = TRUE
)

Arguments

split_regex	user-defined regular expressions to tokenise the corpus.
leipzig_path	full filepath to one or more of the Leipzig Corpus file(s).
case_insensitive	logical; ignoring (`TRUE`) or maintaining (`FALSE`) the case when splitting the corpus into word token.

Value

A tibble of frequency list in descending order of the frequency.

Examples

if (FALSE) {
wlist_all <- freqlist_leipzig_all(split_regex = "([^a-zA-Z0-9-]+|--)",
                                  leipzig_path = leipzig_corpus_path[1])
}