The corpus is a random subset of 25,000 sentences from one of the Indonesian Leipzig Corpora files, i.e., the "ind_news_2008_300K-sentences.txt"
. This corpus file originally contains 300,000 sentences of Indonesian online newspapers.
my_leipzig_sample
A character vector of 25,000 elements of sentences.