R/corplingr_colloc_leipzig.R
colloc_leipzig.Rd
The function produces tibble-output collocates for Leipzig Corpora files.
colloc_leipzig( leipzig_path = NULL, leipzig_corpus_list = NULL, pattern = NULL, window = "b", span = 2, case_insensitive = TRUE, to_lower_colloc = TRUE, save_results = FALSE, coll_output_name = "colloc_tidy_colloc_out.txt", sent_output_name = "colloc_tidy_sent_out.txt" )
leipzig_path | character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files. |
---|---|
leipzig_corpus_list | specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a list.
Example of this type of data-input can be seen in |
pattern | regular expressions/exact patterns for the target pattern. |
window | window-span direction of the collocates: |
span | integer vector indicating the span of the collocate scope. |
case_insensitive | whether the search pattern ignores case (TRUE -- the default) or not (FALSE). |
to_lower_colloc | whether to lowercase the retrieved collocates and the nodes (TRUE -- default) or not (FALSE). |
save_results | whether to output the collocates into a tab-separated plain text (TRUE) or not (FALSE -- default). |
coll_output_name | name of the file for the collocate tables. |
sent_output_name | name of the file for the full sentence match containing the collocates. |
a list of two tibbles: (i) for collocates with sentence number of the match, window span information, and the corpus files, and (ii) full-sentences per match with sentence number and corpus file
if (FALSE) { # get the corpus filepaths # so this example use the filepath input rather than list of corpus leipzig_corpus_path <- c("my/path/to/leipzig_corpus_file_1M-sent_1.txt", "my/path/to/leipzig_corpus_file_300K-sent_2.txt", "my/path/to/leipzig_corpus_file_300K-sent_3.txt") # run the function colloc <- colloc_leipzig(leipzig_path = leipzig_corpus_path[2:3], pattern = "\\bterelakkan\\b", window = "b", span = 3, save_results = FALSE, to_lower_colloc = TRUE) # Inspect outputs ## This one outputs the collocates tibble colloc$collocates ## This one outputs the sentence matches tibble colloc$sentence_matches }