R/colloc_leipzig.R
colloc_leipzig.Rd
The function produces tibble-output collocates for the Leipzig corpus files.
colloc_leipzig( leipzig_path = NULL, leipzig_corpus_list = NULL, pattern = NULL, case_insensitive = TRUE, window = "b", span = 2, split_corpus_pattern = "([^a-zA-Z-¬]+|--)", to_lower_colloc = TRUE, save_interim = FALSE, freqlist_output_file = "collogetr_out_1_freqlist.txt", colloc_output_file = "collogetr_out_2_collocates.txt", corpussize_output_file = "collogetr_out_3_corpus_size.txt", search_pattern_output_file = "collogetr_out_4_search_pattern.txt" )
leipzig_path | Character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files. |
---|---|
leipzig_corpus_list | Specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a named list.
Example of this type of data-input can be seen in |
pattern | Character vector input containing a set of exact word forms. |
case_insensitive | Logical; whether the search for the |
window | Character; window-span direction of the collocates: |
span | A numeric vector indicating the span of the collocate scope. The default is |
split_corpus_pattern | Regular expressions used to tokenise the corpus into word-vector.
The default regex is |
to_lower_colloc | Logical; whether to lowercase the retrieved collocates and the nodes ( |
save_interim | Logical; whether to save interim results into plain text files or not ( |
freqlist_output_file | Character strings for the name of the file for the word frequency in a corpus. |
colloc_output_file | Character strings for the name of the file for the raw collocate table. |
corpussize_output_file | Character strings for the name of the file for the total word-size of a corpus. |
search_pattern_output_file | Character strings for the name of the file for the search_pattern. |
List of raw collocate items, frequency list of all words in the loaded corpus files, the total word tokens in each loaded corpus, and the search pattern.
collout <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig, pattern = "mengatakan", window = "r", span = 3, save_interim = FALSE)#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> #># collout <- colloc_leipzig(leipzig_corpus_path = c('path_to_corpus1.txt', # 'path_to_corpus2.txt'), # pattern = "mengatakan", # window = "r", # span = 3, # save_interim = TRUE # save interim output file # # you need to specify path in the argument # # with \code{...output_file} # )