Generate Leipzig corpus-size — corpus_size

function to get a total word-token count of a given leipzig corpus file. It is built on top of str_count.

corpus_size_leipzig(
  leipzig_path = "(full) filepath to Leipzig corpus files",
  word_regex = "\\b(?i)([-a-zA-Z0-9]+)\\b"
)

Arguments

leipzig_path	file path to the directory folder in which the Leipzig corpus files are stored
word_regex	regular expressions defining what "a word" is

tibble containing corpus_id, size, and size_print (for text-printing)