R/corplingr_freqlist_leipzig_all.R
freqlist_leipzig_all.Rd
The function generates a frequency list of all word-tokens in a single Leipzig Corpus file. While users can input all filepath to all corpus files, for memory-efficiency, it is recommended that each file is processed in separate function-call. If it is decided to process all corpus files, the functions output a List with as many elements as the number of the input filepath.
freqlist_leipzig_all( split_regex = "([^a-zA-Z0-9-]+|--)", leipzig_path = NULL, case_insensitive = TRUE )
split_regex | user-defined regular expressions to tokenise the corpus. |
---|---|
leipzig_path | full filepath to one or more of the Leipzig Corpus file(s). |
case_insensitive | logical; ignoring ( |
A tibble of frequency list in descending order of the frequency.
if (FALSE) { wlist_all <- freqlist_leipzig_all(split_regex = "([^a-zA-Z0-9-]+|--)", leipzig_path = leipzig_corpus_path[1]) }