R Notebook for “Usage-based perspective on argument realisation: A corpus study of Indonesian BUY verbs in applicative construction with -KAN”

Authors
Affiliations

Universitas Udayana

University of Oxford

Australian National University

Universitas Udayana

Published

January 6, 2023

Modified

December 1, 2023


The paper (Rajeg & Arka 2023) associated with this R (Quarto) Notebook has been published in NUSA (Linguistic studies of languages in and around Indonesia) special volume on “Applicatives in Austronesian Languages” (Aznar, Döhler & Vander Klok 2023).

How to cite the paper:

Rajeg, Gede Primahadi Wijaya and I Wayan Arka, 2023. ‘Usage-based perspective on argument realisation: A corpus study of Indonesian BUY verbs in applicative construction with -kan’. In Jocelyn Aznar, Christian Döhler, Jonzina Vander Klok, eds. Applicatives in Austronesian Languages. NUSA 74: 83-114. https://tufs.repo.nii.ac.jp/records/2000019

1 Preparation

Code
# load packages =====
library(tidyverse)
library(readxl)
library(vcd)
library(EMT)
library(knitr)
library(ggpubr)
library(rstatix)

Below is the code to load the corpus size table.

Code
# load the corpus size table
# corpussize <- tibble::as_tibble(read.table(file = "/Volumes/GoogleDrive/Other computers/My MacBook Pro/Documents/Corpora/_corpusindo/Leipzig Corpora/corpus_total_size_per_file.txt", header = TRUE, sep = "\t", comment.char = "", quote = "")[-c(1, 13, 15), ])

# readr::write_tsv(corpussize, "data/corpussize.txt")
corpussize <- readr::read_tsv("data/corpussize.txt")

The total size (in word-tokens) for the corpus is 119,557,093 tokens.

Below is the code to read-in the spreadsheet containing the annotated concordance data for beli (membeli, membelikan, dibeli, and dibelikan).

Code
# mydat <- read_xlsx("data/BELI-main.xlsx")
mydat <- read_csv2("data/BELI-main.csv")
df_membelikan <- mydat %>% 
  filter(node == "membelikan")
df_membeli <- mydat %>% 
  filter(node == "membeli")
df_dibeli <- mydat %>% 
  filter(node == "dibeli")
df_dibelikan <- mydat %>% 
  filter(node == "dibelikan")
mydat_clause_type <- mydat |> 
  count(clause_type, sort = TRUE) |> 
  mutate(perc = n/sum(n) * 100,
         perc = round(perc, 1))
mydat_clause_type_binom <- binom.test(mydat_clause_type$n)
#   Exact binomial test
# 
# data:  mydat_clause_type$n
# number of successes = 254, number of trials = 400, p-value = 7.382e-08
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#  0.5857092 0.6822768
# sample estimates:
# probability of success 
#                  0.635

The sample comprises of 254 tokens (63.5%) of subordinate clauses and 146 tokens (36.5%) of main clauses (pBinomial < 0.0001).

2 Analyses for membeli

2.1 Construction types/schemas and syntactic transitivity

Below is the code to count the construction types and syntactic valence/transitivity for membeli (Table 1).

Code
cxn_type_membeli <- df_membeli %>% 
  mutate(schema = replace(schema, 
                          schema == "theme_obj_cxn", 
                          "[Goods]{.smallcaps}_Obj Construction"),
         schema = replace(schema, 
                          schema == "intransitive", 
                          "Intransitive Construction")) %>% 
  count(schema, syntactic_transitivity) %>% 
  arrange(desc(n))
cxn_type_membeli %>% 
  rename(`syntactic transitivity` = syntactic_transitivity,
         `token freq.` = n) %>% 
  kable(caption = "Construction types/schemas and syntactic valence/transitivity for *membeli*")
Table 1: Construction types and syntactic valence for membeli
schema syntactic transitivity token freq.
Goods_Obj Construction monotransitive 87
Intransitive Construction intransitive 13

Below is the code to run the Proportion Test (Gries 2013: 135) for the schema and syntactic transitivity frequency.

Code
cxn_type_membeli <- cxn_type_membeli %>% 
         # create factor for plotting.
  mutate(syntactic_transitivity = factor(syntactic_transitivity, 
                                         levels = c("monotransitive", 
                                                    "intransitive")),
         N = sum(n),
         expected = N/nrow(.),
         alternatives = if_else(n < expected, "less", "greater"),
         
         # run binomial test
         binomtest = pmap(list(x = n, n = N), binom.test, conf.level = 0.99), 
         
         # extract confidence interval
         conf_low = map_dbl(binomtest, list("conf.int", 1)), 
         conf_high = map_dbl(binomtest, list("conf.int", 2)),
         
         # extract the estimate
         estimate = map_dbl(binomtest, "estimate"),
         
         # extract p-value
         pval = map_dbl(binomtest, "p.value"),
         signifs = "ns",
         signifs = if_else(pval < 0.05, "*", signifs),
         signifs = if_else(pval < 0.01, "**", signifs),
         signifs = if_else(pval < 0.001, "***", signifs)
         )
cxn_type_membeli %>% 
  select(-binomtest, -N, -alternatives, -expected) %>% 
  mutate(conf_low = round(conf_low, 2),
         conf_high = round(conf_high, 2),
         estimate = round(estimate, 2),
         pval = format(pval, digits = 4, scientific = TRUE)) %>% 
  kable()
Table 2: Output of the Proportion Test for membeli
schema syntactic_transitivity n conf_low conf_high estimate pval signifs
Goods_Obj Construction monotransitive 87 0.76 0.94 0.87 1.313e-14 ***
Intransitive Construction intransitive 13 0.06 0.24 0.13 1.313e-14 ***
Code
# get the base, "red" ggplot2 colour using `scales` package
ggred <- scales::hue_pal()(2)[1]

cxn_type_membeli %>% 
  # edit factor for plotting.
  mutate(schema = replace(schema, str_detect(schema, "Goods"), "GOODS_obj\n(Monotransitive)"),
         schema = replace(schema, str_detect(schema, "Intran"), "Deprofiled_obj\n(Intransitive)"),
         schema = factor(schema, levels = c("GOODS_obj\n(Monotransitive)", "Deprofiled_obj\n(Intransitive)"))) %>% 
  ggplot(aes(x = schema, 
             y = estimate, 
             fill = syntactic_transitivity)) + 
  geom_col(position = position_dodge(.9), colour = "gray50") +
  geom_text(aes(label = paste("n=", n, sep = "")), 
            position = position_dodge(.9),
            vjust = c(8.75, 1.25),
            hjust = c(0.5, -.5),
            colour = c("white", "black"),
            size = 9) +
  theme_bw() +
  scale_fill_manual(values = c(ggred, "gold")) +
  labs(y = "Proportion",
       fill = NULL,
       x = NULL) +
  theme(legend.position = "none",
        axis.title.y = element_text(size = 20),
        axis.text.y = element_text(size = 11.5),
        axis.text.x = element_text(size = 22)) +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high), 
                width = .2, position = position_dodge(.9))

Figure 1: Proportion of the syntactic transitivity for membeli. The monotransitive pattern only realises the Goods as the direct object (see Table 2)

The results in Table 2 and Figure 1 show that the Monotransitive, Goods-as-object construction is unsurprisingly and highly significantly the predominant argument realisation pattern for the base membeli.

2.2 Intransitive membeli in subordinate clause

2.2.1 Frequency of membeli in main vs. subordinate clause

The code below shows the count for the distribution of the intransitive membeli in main vs subordinate clauses (question raised by Reviewer A).

Code
intransitive_membeli_clause_type <- df_membeli |> 
  filter(schema == "intransitive") |> 
  count(clause_type, sort = TRUE) |> 
  mutate(percentage = n/sum(n) * 100)

intransitive_membeli_clause_type_binom <- binom.test(intransitive_membeli_clause_type$n)
intransitive_membeli_clause_type_binom_pval <- intransitive_membeli_clause_type_binom$p.value

intransitive_membeli_clause_type
# A tibble: 2 × 3
  clause_type     n percentage
  <chr>       <int>      <dbl>
1 subordinate    11       84.6
2 main            2       15.4

The intransitive membeli is significantly more frequent in the subordinate clause (N=11) than in the main clause (N=2) (pBinomial < 0.05).

2.2.2 Frequency of the subordinate clause types of the intransitive membeli

The following code presents count of membeli’s subordinate clause types.

Code
df_membeli |> 
  filter(syntactic_transitivity=='intransitive', clause_type == "subordinate") |> 
  count(subordinate_clause_type)
# A tibble: 3 × 2
  subordinate_clause_type     n
  <chr>                   <int>
1 adverbial                   6
2 complement_clause           4
3 relative_clause             1

2.2.3 The intransitive membeli in main clause of compound sentence

The following code extract the occurrences of intransitive membeli in main clauses of compound sentences.

Code
df_membeli |> 
  filter(syntactic_transitivity=='intransitive', clause_type == "main") |> 
  pull(node_sentences)
[1] "4. Tarmizi on 2 Agustus 2009 Kami dari toko buku Azhar dari Malaysia ingin tahu adakah buku Kiamat 2012 sudah tersedia dan bulan Agustus tanggal berapa akan di launching buku tersebut dan kami ingin<m>membeli</m>in bulk."
[2] "\"Mereka bilang pasar produk ini captive-market, konsumen tidak hanya<m>membeli</m>tapi juga menikmatinya,\" ujar Hidayat."                                                                                                  

2.2.4 The intransitive membeli in subordinate clause without antecedent in the matrix clause

Code
df_membeli |> 
  filter(syntactic_transitivity=='intransitive', clause_type == "subordinate") |> 
  pull(node_sentences)
 [1] "Ia menambahkan, pemerintah, melalui Badan Perencanaan Pembangunan Nasional (Bappenas) dan Kementerian Keuangan, harus menggodoknya terlebih dulu, baru kemudian mengambil keputusan untuk<m>membeli</m>atau tidak."                  
 [2] "Ketika<m>membeli</m>Barclays berkomitmen tak akan menggadaikan Bank Akita selama lima tahun."                                                                                                                                        
 [3] "Ini artinya ketiganya harus<m>membeli</m>dengan proporsi yang berimbang yaitu masing-masing Rp752 miliar."                                                                                                                           
 [4] "Ada juga cara lain dengan<m>membeli</m>di tempat penjualan tak resmi."                                                                                                                                                               
 [5] "Sedangkan penerima ginjal atau yang<m>membeli</m>diminta bayaran sebesar Rp250 juta hingga Rp300 juta."                                                                                                                              
 [6] "! jika anda memesan atau<m>membeli</m>jangan lupa untuk mencantumkan NGG dikarenakan NGG merupakan kode pemesanan yang wajib di cantumkan pada setiap pemesanan dan apabila tidak mencantumkan maka kami anggap tidak ada pemesanan."
 [7] "Selain dengan modus itu, kalau ada orang yang mau<m>membeli</m>juga dilayani."                                                                                                                                                       
 [8] "Ny Suhartinah (45) warga Kecamatan Cibadak, Kabupaten Lebak, mengatakan hingga kini sulit mendapatkan minyak tanah di wilayahya sehingga ia terpaksa<m>membeli</m>ke pangkalan di Jalan Pasar Baru Rangkasbitung."                   
 [9] "Menurut dia, kondisi ini banyak dimanfaatkan atau mendorong pelaku bisnis yang mengadalkan bahan baku impor akan<m>membeli</m>lebih banyak, karena harga dolar lagi murah."                                                          
[10] "Ini karena kemampuan masyarakat untuk<m>membeli</m>pun terbatas."                                                                                                                                                                    
[11] "Sudah ada beberapa investor yang tertarik untuk<m>membeli</m>tetapi kami belum tentukan siapanya."                                                                                                                                   

Sentences [5], [8] and [9] include reference of the omitted object in the previous, higher clauses (ginjal ‘kidney’ for [5], minyak tanah ‘kerosine’ for [8], and bahan baku impor ‘import basic materials’ in [9]).

3 Analyses for dibeli

Code
cxn_type_dibeli <- df_dibeli %>% 
  filter(schema != "???") %>% 
  count(schema, syntactic_transitivity) %>%
  arrange(desc(n))
Code
cxn_type_dibeli2 <- df_dibeli %>% 
  filter(schema != "???") %>% 
  count(schema) %>%
  arrange(desc(n)) %>% 
  mutate(schema = str_replace(schema, "theme", "goods"),
         schema = replace(schema, schema == "subj_goods", "GOODS_pass.subj"),
         schema = replace(schema, schema == "subj_rate", "RATE_pass.subj"),
         schema = factor(schema, levels = c("GOODS_pass.subj", "RATE_pass.subj")),
         N = sum(n), 
         
         # run binomial test
         binomtest = pmap(list(x = n, n = N), binom.test, conf.level = 0.99), 
         
         # extract confidence interval
         conf_low = map_dbl(binomtest, list("conf.int", 1)), 
         conf_high = map_dbl(binomtest, list("conf.int", 2)),
         
         # extract the estimate
         estimate = map_dbl(binomtest, "estimate"),
         
         # extract the p-value
         pval = map_dbl(binomtest, "p.value"),
         signifs = "ns",
         signifs = if_else(pval < 0.05, "*", signifs),
         signifs = if_else(pval < 0.01, "**", signifs),
         signifs = if_else(pval < 0.001, "***", signifs))
Code
cxn_type_dibeli2 %>% 
  ggplot(aes(x = schema, 
             y = estimate, 
             fill = schema)) + 
  geom_col(position = position_dodge(.9), colour = "gray50") +
  geom_text(aes(label = paste("n=", n, sep = "")), 
            position = position_dodge(.9),
            vjust = c(9, -.5),
            hjust = c(.5, -.75),
            size = 9,
            colour = c("white", "black")) +
  theme_bw() +
  # scale_fill_manual(values = c("limegreen", "gold")) +
  labs(y = "Proportion",
       fill = "Cxn Type",
       x = NULL) +
  theme(axis.text.x = element_text(size = 22),
        legend.position = "none",
        axis.title.y = element_text(size = 20),
        axis.text.y = element_text(size = 11.5)) +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high), 
                width = .2, position = position_dodge(.9))

Figure 2: Constructional profiles of dibeli

4 Analyses for membelikan

Code
cxn_type_membelikan <- df_membelikan %>% 
  count(schema, syntactic_transitivity) %>% 
  arrange(syntactic_transitivity, desc(n)) %>% 
  group_by(syntactic_transitivity) %>% 
  mutate(n_transitivity = sum(n)) %>%
  arrange(desc(n_transitivity), desc(n)) %>% 
  ungroup()

## retrieve the intransitive singleton for "membelikan"
df_membelikan %>% filter(syntactic_transitivity=='intransitive') %>% pull(node_sentences)
[1] "AQSIQ minta kepada orang tua untuk melakukan pemeriksaan terhadap mainan sebelum<m>membelikan</m>kepada anak-anaknya."
Code
padjs1 <- 0.05/3
padjs2 <- 0.01/3
padjs3 <- 0.001/3

synt_trans_membelikan <- cxn_type_membelikan %>% 
  # filter(schema != "intransitive") %>% 
  mutate(schema = str_replace(schema, "theme", "goods"), 
         schema = str_replace(schema, "recipient", "recipient/beneficiary"), 
         schema = replace(schema, schema == "intransitive", "deprofiled_obj"),
         schema = str_replace(schema, "_cxn$", ""), 
         syntactic_transitivity = factor(syntactic_transitivity, 
                                         levels = c("monotransitive", "ditransitive", "intransitive")), 
         schema = factor(schema, 
                         levels = c("goods_obj", "recipient/beneficiary_obj", "deprofiled_obj"))) %>% 
  group_by(syntactic_transitivity) %>% 
  summarise(n = sum(n))

synt_trans_membelikan <- synt_trans_membelikan %>% 
  mutate(N = sum(n), 
         
         # run binomial test
         binomtest = pmap(list(x = n, n = N), binom.test, conf.level = 0.99), 
         
         # extract confidence interval
         conf_low = map_dbl(binomtest, list("conf.int", 1)), 
         conf_high = map_dbl(binomtest, list("conf.int", 2)),
         
         # extract the estimate
         estimate = map_dbl(binomtest, "estimate"),
         
         # extract the p-value
         pval = map_dbl(binomtest, "p.value"),
         signifs = "ns",
         signifs = if_else(pval < padjs1, "*", signifs),
         signifs = if_else(pval < padjs2, "**", signifs),
         signifs = if_else(pval < padjs3, "***", signifs))
Code
synt_trans_membelikan_vector <- synt_trans_membelikan$n
names(synt_trans_membelikan_vector) <- synt_trans_membelikan$syntactic_transitivity
synt_trans_membelikan_pairwise_binom <- pairwise_binom_test(synt_trans_membelikan_vector, 
                                                            p.adjust.method = "bonferroni",
                                                            conf.level = 0.99) %>% 
  mutate(p.adjt = paste(format(p.adj, digits = 4, scientific = TRUE), " (", p.adj.signif, ")", sep = ""))
Code
length_valence <- length(synt_trans_membelikan$syntactic_transitivity)
prob_valence <- rep(1/length_valence, length_valence)
pmultinom <- EMT::multinomial.test(observed = synt_trans_membelikan$n, prob = prob_valence)

 The model includes 5151 different events.


 Exact Multinomial Test

    Events        pObs     p.value
      5151 8.31986e-26 1.80758e-06
Code
# p-value = 0
Code
synt_trans_membelikan %>% 
  ggplot(aes(x = syntactic_transitivity, 
             y = n, 
             fill = syntactic_transitivity)) + 
  geom_col(position = position_dodge(.9), colour = "gray50") +
  geom_text(aes(label = paste("n=", n, sep = "")), 
            position = position_dodge(.9),
            vjust = c(10, 3, -.5),
            hjust = c(.5, .5, .5),
            size = c(8, 8, 7.5),
            colour = c("white", "white", "black")) +
  theme_bw() +
  scale_fill_manual(values = c("limegreen", "royalblue1", "gold")) +
  labs(y = "Raw frequency",
       fill = NULL,
       x = NULL) +
  theme(legend.position = "none",
        axis.title.y = element_text(size = 20),
        axis.text.y = element_text(size = 11.5),
        axis.text.x = element_text(size = 22)) # +
  #geom_errorbar(aes(ymin = conf_low, ymax = conf_high), 
  #              width = .2, position = position_dodge(.9)) # +
  # ylim(NA, 1) +
  # geom_segment(x = 1.3, xend = 2, y = 0.84, yend = 0.84) +
  # geom_segment(x = 1.3, xend = 1.3, y = 0.84, yend = 0.82) +
  # geom_segment(x = 2, xend = 2, y = 0.84, yend = 0.82) +
  # annotate("text",
  #          x = 1.7, y = 0.88,
  #          label = pull(filter(synt_trans_membelikan_pairwise_binom, group1 == "monotransitive", group2 == "ditransitive"), p.adjt)) +
  # 
  # geom_segment(x = 1, xend = 3, y = 0.96, yend = 0.96) +
  # geom_segment(x = 1, xend = 1, y = 0.96, yend = 0.94) +
  # geom_segment(x = 3, xend = 3, y = 0.96, yend = 0.94) +
  # annotate("text",
  #          x = 2, y = 1,
  #          label = pull(filter(synt_trans_membelikan_pairwise_binom, group1 == "monotransitive", group2 == "intransitive"), p.adjt)) +
  # 
  # geom_segment(x = 2, xend = 3, y = 0.38, yend = 0.38) +
  # geom_segment(x = 2, xend = 2, y = 0.38, yend = 0.36) +
  # geom_segment(x = 3, xend = 3, y = 0.38, yend = 0.36) +
  # annotate("text",
  #          x = 2.5, y = 0.42,
  #          label = pull(filter(synt_trans_membelikan_pairwise_binom, group1 == "ditransitive", group2 == "intransitive"), p.adjt))

Figure 3: Syntactic transitivity of membelikan
Code
# get the base, "blue" ggplot2 colour using `scales` package
ggblue <- scales::hue_pal()(3)[3]

### 2.1 data preparation and binomial test for CI =====
cxn_type_synt_trans_membelikan <- cxn_type_membelikan %>% 
  filter(schema != "intransitive") %>% 
  mutate(schema = str_replace(schema, "theme", "GOODS"), 
         schema = str_replace(schema, "recipient", "BEN/REC"), 
         schema = str_replace(schema, "_cxn$", ""), 
         syntactic_transitivity = factor(syntactic_transitivity, 
                                         levels = c("monotransitive", "ditransitive")), 
         schema = factor(schema, levels = c("GOODS_obj", "BEN/REC_obj")),
         perc_schema = round(n/n_transitivity * 100, 2),
         binomtest = pmap(list(x = n, n = n_transitivity), 
                          binom.test, conf.level = 0.99), 
         conf_low = map_dbl(binomtest, list("conf.int", 1)), 
         conf_high = map_dbl(binomtest, list("conf.int", 2)), 
         estimate = map_dbl(binomtest, "estimate"), 
         pval = map_dbl(binomtest, "p.value")
         )

### 2.2 visualisation proper =======
cxn_type_synt_trans_membelikan %>% 
  ggplot(aes(x = syntactic_transitivity, 
           y = n, 
           fill = schema)) + 
  geom_col(position = position_dodge(.9), colour = "gray50") +
  geom_text(aes(label = paste("n=", n, sep = "")), 
            position = position_dodge(.9),
            vjust = c(9, -.35, 3, -.35),
            hjust = c(.5, .5, .5, .5),
            size = c(8, 7, 8, 7),
            colour = c("white", "black", "white", "black")) +
  theme_bw() +
  scale_fill_manual(values = c(ggred, ggblue, ggblue, ggred, ggblue)) +
  labs(y = "Raw frequency",
       fill = NULL,
       x = NULL) +
  # geom_errorbar(aes(ymin = conf_low, ymax = conf_high), 
  #              width = .2, position = position_dodge(.9)) +
  theme(axis.text.x = element_text(size = 22),
        axis.title.y = element_text(size = 20),
        axis.text.y = element_text(size = 11.5),
        legend.text = element_text(size = 14),
        legend.title = element_text(size = 18),
        legend.position = "top")

Figure 4: Syntactic transitivity and construction types of membelikan

The applicative membelikan shows similar profile as the base membeli in their predominant monotransitive usage, especially in the GOOD-as-OBJ construction. There is no statistical difference for this construction between membelikan (N=77) and membeli (N=87). The statistical test for this is shown in the code below.

Code
goods_obj_membelikan <- cxn_type_synt_trans_membelikan %>% filter(schema == "GOODS_obj", syntactic_transitivity == "monotransitive") %>% pull(n)
goods_obj_membeli <- cxn_type_membeli %>% slice_max(n = 1, order_by = n) %>% pull(n)

chisq.test(c(goods_obj_membelikan, goods_obj_membeli))

    Chi-squared test for given probabilities

data:  c(goods_obj_membelikan, goods_obj_membeli)
X-squared = 0.60976, df = 1, p-value = 0.4349
Code
membelikan_theme_object_clause_type <- count(filter(df_membelikan, syntactic_transitivity == "monotransitive", schema == "theme_obj_cxn"), schema, clause_type)

Moreover, the occurrence of membelikan in the Monotransitive, THEME-Object Construction can be equally likely in the main (N=31) and subordinate (N=46) clauses without (statistically) significant difference (pBinomial=0.11).

4.1 Co-referentiality of Beneficiary in control construction

Code
coreferentiality_df <- df_membelikan %>% 
  filter(syntactic_transitivity == "monotransitive",
         str_detect(recipient_syntax, "MATRIX"))

4.2 Binomial test for Oblique vs. Double Object patterns for membelikan

Code
# 1. data preparation ======
oblique_membelikan_df <- df_membelikan %>% 
  filter(syntactic_transitivity == "monotransitive") %>% 
  count(recipient_syntax) %>% 
  mutate(syntx = if_else(str_detect(recipient_syntax, "^PP"), 
                         "PP", 
                         "others")) %>% 
  group_by(syntx) %>% 
  summarise(n=sum(n)) %>% 
  mutate(perc = n/sum(n) * 100)
oblique_membelikan <- oblique_membelikan_df %>% 
  filter(syntx == "PP") %>% pull(n)
ditrans_membelikan <- df_membelikan %>% 
  filter(syntactic_transitivity == "ditransitive") %>% 
  nrow()
# 2. binomial test
binom.test(c(oblique_membelikan, ditrans_membelikan))

    Exact binomial test

data:  c(oblique_membelikan, ditrans_membelikan)
number of successes = 25, number of trials = 45, p-value = 0.5515
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.3999735 0.7035561
sample estimates:
probability of success 
             0.5555556 

Not a significant distributional difference between the Monotransitive Oblique and the Double Object constructions.

4.3 PRONOMINALITY of Beneficiary in the Oblique vs. Double Object patterns for membelikan

Below is pronominality analysis. No significant difference (perhaps due to small sample) between Oblique vs. Double Object choice in terms of pronominality of the Beneficiary in three-way categories: NP, Pronoun, Proper Name.

Code
benef_pron_mono <- df_membelikan %>% 
  filter(syntactic_transitivity %in% c("monotransitive"), 
         str_detect(recipient_syntax, "^PP")) %>% 
  count(recipient_pronominality) %>% 
  mutate(cxn = "monotransitive")

benef_pron_doubleobject <- df_membelikan %>% 
  filter(syntactic_transitivity %in% c("ditransitive")) %>% 
  count(recipient_pronominality) %>% 
  mutate(cxn = "ditransitive")

benef_pron <- bind_rows(benef_pron_mono, 
                        benef_pron_doubleobject) %>% 
  # merge personal-pronoun-suffix with personal-pronoun
  mutate(recipient_pronominality = str_replace(recipient_pronominality,
                                               "^personal\\-pronoun(\\-suffix)?$", 
                                               "pronoun")) %>% 
  group_by(cxn, recipient_pronominality) %>% 
  summarise(n = sum(n), .groups = "drop")

benef_pron_mtx <- benef_pron %>% 
  pivot_wider(names_from = "recipient_pronominality", values_from = "n") %>% 
  data.frame(row.names = 1) %>% 
  as.matrix()

benef_pron_mtx
               np pronoun proper.name
ditransitive    8       9           3
monotransitive 16       4           5
Code
fisher.test(benef_pron_mtx)

    Fisher's Exact Test for Count Data

data:  benef_pron_mtx
p-value = 0.1035
alternative hypothesis: two.sided

Proportion of lexical types of the Beneficiary/Recipient role of ditransitive membelikan are 45% (n=9) pronoun, 40% (n=8) noun phrase referring to animate entity/human, and 15% (n=3) proper name.

Now, trying to merge the categories into Pronoun vs. Non-Pronoun (proper name and NP). The result is also not significant.

Code
benef_pron_merge <- benef_pron %>% 
  mutate(recipient_pronominality = replace(recipient_pronominality,
                                           recipient_pronominality %in% c("np", "proper-name"),
                                           "non_pronoun")) %>% 
  group_by(cxn, recipient_pronominality) %>% 
  summarise(n = sum(n), .groups = "drop")

benef_pron_merge_mtx <- benef_pron_merge %>% 
  pivot_wider(names_from = "recipient_pronominality", values_from = "n") %>% 
  data.frame(row.names = 1) %>% 
  as.matrix()

benef_pron_merge_mtx
               non_pronoun pronoun
ditransitive            11       9
monotransitive          21       4
Code
chisq.test(benef_pron_merge_mtx) # assumption met for exp. frequency

    Pearson's Chi-squared test with Yates' continuity correction

data:  benef_pron_merge_mtx
X-squared = 3.2465, df = 1, p-value = 0.07157

4.4 ANIMACY of Beneficiary in the Oblique vs. Double Object patterns for membelikan

Code
# 1. data preparation ======
benef_anim_monotransitive_oblique <- df_membelikan %>%
  filter(syntactic_transitivity %in% c("monotransitive"),
         str_detect(recipient_syntax, "^PP")) %>% 
  count(recipient_animacy) %>% 
  mutate(cxn = "monotransitive_oblique")
benef_anim_ditransitive <- df_membelikan %>%
  filter(syntactic_transitivity %in% c("ditransitive")) %>% 
  count(recipient_animacy) %>% 
  mutate(cxn = "ditransitive")
benef_anim_combined <- bind_rows(benef_anim_ditransitive, benef_anim_monotransitive_oblique)
benef_anim_combined_mtx <- benef_anim_combined %>% 
  pivot_wider(names_from = "recipient_animacy", values_from = "n", values_fill = 0L) %>% 
  data.frame(row.names = 1) %>% 
  as.matrix()
benef_anim_combined_mtx
                       animate inanimate
ditransitive                20         0
monotransitive_oblique      24         1
Code
# 2. Fisher-Yates Excat Test ======
fisher.test(benef_anim_combined_mtx)

    Fisher's Exact Test for Count Data

data:  benef_anim_combined_mtx
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.02052812        Inf
sample estimates:
odds ratio 
       Inf 

The distribution of Oblique vs. Double Object pattern regarding the Animacy of the Beneficiary is also not significantly different.

5 Analyses for dibelikan

Code
df_dibelikan1 <- df_dibelikan %>% 
  filter(str_detect(schema, "^null_", negate = TRUE)) %>% 
  mutate(schema = str_replace(schema, "recipient", "BEN/REC"),
         schema = str_replace(schema, "theme", "GOODS"),
         schema = str_replace(schema, "money", toupper("money")),
         schema = str_replace_all(schema, "^([^_]+)_([^_]+)$", "\\2_pass.\\1"),
         schema = factor(schema, 
                         levels = c("GOODS_pass.subj",
                                    "MONEY_pass.subj",
                                    "BEN/REC_pass.subj")))

cxn_type_dibelikan <- df_dibelikan1 %>% 
  count(schema) %>% 
  mutate(prop = n/sum(n), prop = round(prop, 2),
         N = sum(n))
Code
padjs1 <- 0.05/3
padjs2 <- 0.01/3
padjs3 <- 0.001/3

cxn_type_dibelikan1 <- cxn_type_dibelikan %>% 
  
  # run binomial test
  mutate(binomtest = pmap(list(x = n, n = N), binom.test, conf.level = 0.99), 
         
         # extract confidence interval
         conf_low = map_dbl(binomtest, list("conf.int", 1)), 
         conf_high = map_dbl(binomtest, list("conf.int", 2)),
         
         # extract the estimate
         estimate = map_dbl(binomtest, "estimate"),
         pval = map_dbl(binomtest, "p.value"),
         
         # p-value
         signifs = "ns",
         signifs = if_else(pval < padjs1, "*", signifs),
         signifs = if_else(pval < padjs2, "**", signifs),
         signifs = if_else(pval < padjs3, "***", signifs))

## pairwise binom
cxn_type_dibelikan_vector <- cxn_type_dibelikan$n
names(cxn_type_dibelikan_vector) <- cxn_type_dibelikan$schema
cxn_type_dibelikan_binom_pairwise <- pairwise_binom_test(cxn_type_dibelikan_vector, conf.level = .99, p.adjust.method = "bonferroni") %>% 
  mutate(p.adjt = paste(format(p.adj, digits = 3, scientific = TRUE), " (", p.adj.signif, ")", sep = ""))
Code
cxn_type_dibelikan1 %>% 
  ggplot(aes(x = fct_reorder(schema, -estimate), 
             y = estimate, 
             fill = schema)) + 
  geom_col(position = position_dodge(.9), colour = "gray50") +
  geom_text(aes(label = paste("n=", n, sep = "")), 
            position = position_dodge(.9),
            vjust = c(-.5, 5, 5),
            hjust = c(-.5, .5, .5),
            colour = c("black", "white", "white"),
            size = 7) +
  theme_bw() +
  # scale_fill_manual(values = c("limegreen", "gold")) +
  labs(y = "Proportion",
       fill = NULL,
       x = NULL) +
  theme(axis.text.x = element_text(size = 13),
        axis.text.y = element_text(size = 11.5),
        axis.title.y = element_text(size = 20),
        legend.position = "none") +
  geom_errorbar(aes(ymin = conf_low, ymax = conf_high), 
                width = .2, position = position_dodge(.9)) +
  ylim(NA, 0.95) +
  geom_segment(x = 1, xend = 2, y = 0.78, yend = 0.78) +
  geom_segment(x = 1, xend = 1, y = 0.78, yend = 0.74) +
  geom_segment(x = 2, xend = 2, y = 0.78, yend = 0.74) +
  annotate("text",
           x = 1.5, y = 0.82,
           label = cxn_type_dibelikan_binom_pairwise[3,][["p.adj.signif"]]) +

  geom_segment(x = 1, xend = 3, y = 0.9, yend = 0.9) +
  geom_segment(x = 1, xend = 1, y = 0.9, yend = 0.86) +
  geom_segment(x = 3, xend = 3, y = 0.9, yend = 0.86) +
  annotate("text",
           x = 2, y = 0.94,
           label = cxn_type_dibelikan_binom_pairwise[2,][["p.adj.signif"]]) +
  
  geom_segment(x = 2, xend = 3, y = 0.58, yend = 0.58) +
  geom_segment(x = 2, xend = 2, y = 0.58, yend = 0.54) +
  geom_segment(x = 3, xend = 3, y = 0.58, yend = 0.54) +
  annotate("text",
           x = 2.5, y = 0.62,
           label = cxn_type_dibelikan_binom_pairwise[1,][["p.adj.signif"]])

Figure 5: Constructional profiles of dibelikan

References

Aznar, Jocelyn, Christian Döhler & Jozina Vander Klok (eds.). 2023. Applicatives in Austronesian Languages (NUSA: Linguistic Studies of Languages in and Around Indonesia). Vol. 74.
Gries, Stefan Th. 2013. Statistics for linguistics with R: A practical introduction. 2nd edn. Berlin: Mouton de Gruyter.
Rajeg, Gede Primahadi Wijaya & I Wayan Arka. 2023. Usage-based perspective on argument realisation: A corpus study of Indonesian BUY verbs in applicative construction with -Kan. In Jocelyn Aznar, Christian Döhler & Jozina Vander Klok (eds.), NUSA (special issue on "Applicatives in Austronesian Languages"), vol. 74, 83–114. https://tufs.repo.nii.ac.jp/records/2000019.

Citation

For attribution, please cite this work as:
Rajeg, Gede Primahadi Wijaya & I Wayan Arka. 2023. R Notebook for “Usage-based perspective on argument realisation: A corpus study of Indonesian BUY verbs in applicative construction with -KAN.” https://doi.org10.6084/m9.figshare.23612631. https://gederajeg.github.io/applicative-buy/.