The function extract full sentence-match for a (set of significant) collocate(s) for a given nodeword.

colloc_sentmatch(
  collout,
  colloc = NULL,
  wspan = NULL,
  nodeword = NULL,
  sampled = NULL
)

Arguments

collout

List output of colloc_leipzig.

colloc

Character vector of the collocate(s) whose sentence match(es) to be retrieved.

wspan

Character vector of the window span in which the collocates occur. Default to NULL, which will retrieve the collocate's occurrence in all span.

nodeword

Character vector specifying one of the nodewords if search parameter in colloc_leipzig includes more than one nodeword.

sampled

Integer vector indicating the number of random sample of the sentence match to be retrieve. Default to NULL, which will retrieve all sentence-matches.

Value

Character vector of sentence-match(es).

Details

Experimental lifecycle

See also

colloc_sentmatch_tagged for tagged and data frame version of the output.

Examples

collout <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig, pattern = "mengatakan", window = "r", span = 3, save_interim = FALSE)
#> Detecting a 'named list' input!
#> You chose NOT to SAVE INTERIM RESULTS, which will be stored as a list in console!
#> 1. Tokenising the "ind_mixed_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_mixed_2012_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_news_2008_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_news_2008_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_news_2009_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_news_2009_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_news_2010_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_news_2010_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_news_2011_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_news_2011_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_news_2012_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_news_2012_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_newscrawl_2011_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_newscrawl_2011_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_newscrawl_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_newscrawl_2012_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_newscrawl_2015_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_newscrawl_2015_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_newscrawl_2016_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_newscrawl_2016_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_web_2011_300K" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_web_2011_300K.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_web_2012_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_web_2012_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind_wikipedia_2016_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind_wikipedia_2016_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind-id_web_2013_1M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind-id_web_2013_1M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 1. Tokenising the "ind-id_web_2015_3M" corpus. This process may take a while!
#> 1.1 Removing one-character tokens...
#> 1.2 Lowercasing the tokenised corpus...
#> At least a match is detected for 'mengatakan' in ind-id_web_2015_3M.
#> 2.1 Gathering the collocates for 'mengatakan' ...
#> 3. Storing all of the outputs...
#> #> DONE!
colloc_sentmatch(collout, colloc = "bahwa", sampled = 10)
#> [1] "195109 Ia mengatakan bahwa dalam prosesi pemberangkatan jenazah di Puro Pakualaman, Paku Alam IX meminta izin kepada Sri Sultan HB X sebagai raja." #> [2] "126523 Juru bicara partai Nyan Win mengatakan bahwa penulis di surat kabar tersebut mungkin menulis \"untuk kepentingannya\"." #> [3] "261205 Sementara itu, Venny, seorang Tim Relawan Derry Drajat mengatakan bahwa Derry serius dalam pencalonan sebagai wakil wali kota Depok pada Pilkada 2010 dan bukan main-main." #> [4] "45217 Sekretaris Perusahaan XL Ike Andriani mengatakan bahwa pembatalan tersebut dilakukan karena harga penawaran penjualan menara tidak mencapai target yang diharapkan karena calon pembeli sulit mendapatkan dana akibat krisis global." #> [5] "880598 Ketua Panwaslu, Tommy Sumakul SH MH senada mengatakan bahwa KPU Manado jelas-jelas melanggar UU." #> [6] "880598 Ketua Panwaslu, Tommy Sumakul SH MH senada mengatakan bahwa KPU Manado jelas-jelas melanggar UU." #> [7] "157580 Sementara itu Panglima TNI Jenderal TNI Djoko Santoso mengatakan bahwa TNI siap menghalau para pelanggar batas wilayah." #> [8] "45217 Sekretaris Perusahaan XL Ike Andriani mengatakan bahwa pembatalan tersebut dilakukan karena harga penawaran penjualan menara tidak mencapai target yang diharapkan karena calon pembeli sulit mendapatkan dana akibat krisis global." #> [9] "123872 Para aktivis oposisi mengatakan bahwa amnesti dukungan negara ditujukan kepada para pembunuh, tetapi tidak pada korban." #> [10] "261205 Sementara itu, Venny, seorang Tim Relawan Derry Drajat mengatakan bahwa Derry serius dalam pencalonan sebagai wakil wali kota Depok pada Pilkada 2010 dan bukan main-main."
# This will produce message indicating that # the queried sample number is higher than # the sentence match for "akan" colloc_sentmatch(collout, colloc = "akan", sampled = 10)
#> Warning: Returning all 6matches! #> Length of matches (6) is lower than the number of the queried sample (10).
#> [1] "221918 Dia mengatakan, pemerintah akan terus mencermati setiap perkembangan eksternal yang akan terjadi dan mempengaruhi perekonomian dalam negeri." #> [2] "282144 Sumber itu mengatakan Sarkozy akan mengeluarkan pengumuman tersebut pekan depan, \"terkecuali terjadi bencana\"." #> [3] "118409 Sementara itu di London pada Kamis, sekretaris jenderal OPEC Abdalla Salem El-Badri mengatakan bahwa kartel akan mempertimbangkan peningkatan produksi minyak mentah di pertemuan berikutnya pada Desember jika kondisi kunci terpenuhi." #> [4] "242862 Rekan senegara Safina, Dementieva sementara itu mengatakan dia akan menerapkan strategi berbeda ketika menghadapi Serena." #> [5] "9443\tAndyawan mengatakan pemadaman akan menurunkan dua helikopter bantuan perusahaan untuk menjatuhkan bom air di daerah Cagar Biosfer Giam Siak Kecil, Bengkalis." #> [6] "283324\tTheater mengatakan, pemilu akan sedikit mendorong pertumbuhan ekonomi Indonesia."