Corpus data

The corpus is a random subset of 25,000 sentences from one of the Indonesian Leipzig Corpora files, i.e., the "ind_news_2008_300K-sentences.txt". This corpus file originally contains 300,000 sentences of Indonesian online newspapers.

my_leipzig_sample

Format

A character vector of 25,000 elements of sentences.

Source

http://wortschatz.uni-leipzig.de/en/download

Format

Source

Contents