Additional Notebook

Author

Affiliation

University of Antwerp

Published

April 9, 2024

Keywords

Bayesian analysis, speech intelligibility, bounded outcomes, clustering, measurement error, outliers, heteroscedasticity, generalized linear latent and mixed models, robust regression models.

1 Aim

The purpose of this Additional Notebook is to improve the transparency and replicability of the analysis for the study Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores. A digital version of the study can be found in the Code-links section of this document.

Furthermore, the notebook meticulously follows the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist) developed by Depaoli & van de Schoot (2017). The checklist outlines the ten crucial points that need careful scrutiny when employing Bayesian inference procedures.

WAMBS checklist

Questionnaire outlining the ten crucial points that need careful scrutiny when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of Bayesian analysis (Depaoli & van de Schoot, 2017).

code

# load packages
libraries = c('RColorBrewer','stringr','dplyr','tidyverse',
              'rechape2','knitr','kableExtra',
              'rstan','StanHeaders','runjags','rethinking')
sapply(libraries, require, character.only=T)

code

# load functions
main_dir = '/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/paper1'
source( file.path( main_dir, 'walkthrough/code', 'user-defined-functions.R') )

2 Organization

In this notebook, Section 3 introduces various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explains the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrates the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.

The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate the walk-through: RColorBrewer (Neuwirth, 2022) and quarto (Allaire, Teague, Scheidegger, Xie, & Dervieux, 2022). Second, the packages used for the handling the data: stringr (H. Wickham, 2022), dplyr (Hadley Wickham, François, Henry, Müller, & Vaughan, 2023), tidyverse (Hadley Wickham et al., 2019), reshape2 (H. Wickham, 2007), knitr (Xie, 2024), and kableExtra (Zhu, 2024). Lastly, the packages used for the Bayesian estimation: coda (Plummer, Best, Cowles, & Vines, 2006), loo (A. Vehtari et al., 2023; A. Vehtari, Simpson, Gelman, Yao, & Gabry, 2021), cmdstanr (Gabry & Češnovar, 2022), rstan (Stan Development Team, 2020), runjags (Denwood, 2016), and rethinking (McElreath, 2021).

3 Interludes

3.1 Bayesian inference

3.1.1 Theory

Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure consists of defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. Upon observing empirical data, these priors undergo updating to posterior distributions following Bayes’ rule (Jeffreys, 1998), from which the statistical inferences are derived ¹. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:

Bayesian inference

Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt & Skrondal, 2010).

\begin{align*} P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) } \end{align*} \tag{1}

where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the assumed probability distribution for the outcome Y, given the parameter \beta and covariate X. P( \beta ) defines the prior distribution of the parameter \beta. Lastly, P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data. Consequently, the posterior distribution of the parameter P( \beta | Y, X ) describes the probability distribution of \beta after observing empirical data.

Likelihood

probability distribution that describes the assumption about the underlying processes that give rise to the data (Everitt & Skrondal, 2010).

Prior distribution

Probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010).

Posterior distribution

Probability distribution summarizing the information about a parameter after observing empirical data (Everitt & Skrondal, 2010).

Before implementing Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This just says that the numerator in the equation is re-scaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:

\begin{align*} P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\ \end{align*} \tag{2}

where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition makes the calculation of posterior distributions easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).

Second, a dataset usually has multiple observations of the outcome Y and covariates X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities, and assuming independence among the observations, the likelihood of a dataset can be rewritten as the product of all individual likelihoods. Consequently, Equation 2 can also be rewritten as follows:

\begin{align*} P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align*} \tag{3}

3.1.2 Estimation methods

Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works.

Markov Chain Monte Carlo (MCMC)

Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010) ².

Grid approximation

Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020) ³.

3.1.3 How does it work?

A simple Bayesian linear regression model can be written in the following form:

\begin{align*} y_{i} &= \beta \cdot x_{i} + e_{i} \\ e_{i} &\sim \text{Normal}( 0, 1 ) \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the residuals e_{i} are assumed to follow a normal distribution with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20]. This prior implies that equal probabilities are assigned to all values of \beta within this range. However, a more convenient generalized manner to represent the same linear regression model is as follows:

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed that \beta has a uniform prior within the range of [-20,+20]. Moreover, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed of the covariate x_{i} and its effect on the outcome \beta.

For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).

code

set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )

1: replication seed
2: simulation sample size
3: covariate effect
4: covariate simulation
5: linear predictor on outcome mean
6: outcome simulation

code

# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)

1: number of points in candidate list
2: candidate list for parameter
3: user defined function: linear predictor for each candidate
4: calculation of the linear predictor for each candidate
5: user defined function: product of individual observation likelihoods
6: outcome data likelihood
7: uniform prior distribution for parameter (min=-20, max=20)
8: proportional posterior distribution for parameter
9: posterior distribution for parameter

code

paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )

1: true values for the parameter
2: expected value for the parameter
3: maximum probability value for the parameter
4: standard deviation for the parameter
5: probability that the parameter is greater than zero

[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.299"
[1] "estimated beta (maximum probability) = 0.3"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"

code

plot( x, y, xlim=c(-3,3), ylim=c(-3,3),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue' )
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

1: simulation plot

code

par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp), lty=2, 
        col=c('gray','blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 2: Bayesian inference: grid approximation

3.1.4 Priors and their effects

Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.

code

# grid approximation
Ngp = 1000

post = data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) )

ud_func = function(i){ post$b_cand[i]*x }
mu_y = sapply( 1:length(post$b_cand), ud_func )

ud_func = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(post$b_cand), ud_func )

post$b_prior1 = rep( 1/40, length(post$b_cand) )
post$b_prior2 = dnorm( post$b_cand, mean=0, sd=0.5 )
post$b_prior3 = dnorm( post$b_cand, mean=0.2, sd=0.05 )

nam = c()
for( i in 1:3 ){
  b_prop = y_lik * post[, paste0('b_prior',i) ] 
  
  nam = c(nam, paste0('b_post',i) )
  post = cbind(post, data.frame( b_prop / sum(b_prop) ) )  
}
names(post)[5:7] = nam

1: number of points in candidate list
2: candidate list for parameter
3: user defined function: linear predictor for each candidate
4: calculation of the linear predictor for each candidate
5: user defined function: product of individual observation likelihoods
6: outcome data likelihood
7: prior 1: uniform prior distribution (min=-20, max=+20)
8: prior 2: normal prior distribution (mean=0, sd=0.5)
9: prior 3: normal prior distribution (mean=0.2, sd=0.05)
10: posterior distribution for each prior

First, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior.

Non-informative priors

Prior that reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).

code

par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior1')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post1')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 3: Bayesian inference: posterior distributions with non-informative prior distribution.

Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution.

Weakly informative priors

Prior that reflects the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).

code

par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior2')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post2')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 4: Bayesian inference: posterior distributions with weakly-informative prior distribution.

Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution.

Informative priors

Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).

code

par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior3')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post3')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 5: Bayesian inference: posterior distributions with informative prior distributions.

Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.

3.1.5 What are Hyperpriors?

In cases requiring greater modeling flexibility, a more refined representation of the parameters’ priors can be defined in terms of hyperparameters and hyperpriors. A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:

Hyperparameters

Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1} (Everitt & Skrondal, 2010).

Hyperpriors

Prior distributions for hyperparameters (Everitt & Skrondal, 2010).

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\ v &\sim \text{Normal}(0, 3) \end{align*} where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.

However, setting prior distributions through hyperparameters brings its own challenges. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that prior probabilistic representations defined in terms of hyperparameters sometimes exhibit complex sample geometries compared to simple priors ⁴. The reparametrization of hyperpriors into such simpler sample geometries leads to the notion of non-centered priors. By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:

Non-centered priors

Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &= z \cdot \text{exp}(v) \\ v &\sim \text{Normal}(0, 3) \\ z &\sim \text{Normal}( 0, 1 ) \end{align*} where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures can have a harder time sampling from the former than the latter distributions.

code

n = 5000

v = rnorm( n=n, mean=0, sd=1 )
z = rnorm( n=n, mean=0, sd=1 )

b_cent = rnorm( n=n, mean=0, sd=exp(v) )
b_non = z*exp(v)

1: simulation sample size
2: hyperparameter simulation
3: centered parametrization simulation
4: non-centered parameterization simulation

code

par( mfrow=c(1,2) )

plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1),
      xlab=expression(beta), ylab=expression(v),
      main='Centered parametrization' ) 

plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1),
      xlab=expression(z), ylab=expression(v),
      main='Non-centered parametrization' )

par( mfrow=c(1,1) )

1: plot of centered parametrization
2: plot of non-centered parametrization

Figure 6: Centered and non-centered parameter spaces

3.1.6 Importance

The selection of the Bayesian approach was based on three key properties. Firstly, empirical evidence from prior research demonstrates that Bayesian methods outperform frequentist methods, particularly in handling complex and over-parameterized models (Baker, 1998; Kim & Cohen, 1999). This superiority is evident when dealing with complex models, like the proposed GLLAMM, that are challenging to program or are not viable under frequentist methods (Depaoli, 2014).

Benefits of Bayesian inference procedures

More suitable to deal with:

Complex or highly-parameterized model
Parameter’s constraints.
Small sample sizes

Secondly, the approach allows for the incorporation of prior information, ensuring that certain parameters are confined within specified boundaries. This helps mitigate non-convergence or improper parameter estimation issues commonly observed in complex models under frequentist methods (Martin & McDonald, 1975; Seaman III, Seaman Jr., & Stamey, 2011). In this study, for example, this property was leveraged to incorporate information about the variances of random effects and constrain them to be positive.

Lastly, the Bayesian approach demonstrates proficiency in handling relatively small sample sizes (Baldwin & Fellingham, 2013; Depaoli, 2014; Lambert, Sutton, Burton, Abrams, & Jones, 2006). In this case, despite the study dealing with 2,263 entropy scores, these were derived from a modest sample size of 32 speakers, from whom the inferences are drawn. Consequently, reliance on the asymptotic properties of frequentist methods may not be warranted in this context, underscoring the pertinence of this property to the current study.

3.2 A tale of two distributions

3.2.1 The normal distribution

A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma (Everitt & Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).

Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.

code

require(rethinking)

mu = c(-1.5, 0, 1.5)
sigma = c(1.5, 1, 0.5) 

par(mfrow=c(1,2))

cp = sapply( 1:length(mu), col.alpha, alpha=0.7) 
for(i in 1:length(mu)){
  if(i==1){
    curve( dnorm(x, mean=mu[i], sd=1),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=mu, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( mu[1]==-1.5,
                              mu[2]==0,
                              mu[3]==+1.5,
                              sigma==1) )
  } else{
    curve( dnorm(x, mean=mu[i], sd=1),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}


cp = sapply( 1:length(sigma), col.alpha, alpha=0.7)
for(i in 1:length(sigma)){
  if(i==1){
    curve( dnorm(x, mean=0, sd=sigma[i]),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
           xlab="outcome values", ylab="density")
    abline(v=0, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( sigma[1]==1.5,
                              sigma[2]==1,
                              sigma[3]==0.5,
                              mu==0) )
  } else{
    curve( dnorm(x, mean=0, sd=sigma[i]), 
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}

par(mfrow=c(1,1))

1: required package
2: parameter to plot: means and standard deviations
3: plotting normal distribution with different ‘mu’ and ‘sigma=1’
4: plotting normal distribution with ‘mu=0’ and different sigma’s

Figure 7: Normal distribution with different mean and standard deviations

3.2.2 The beta-proportion distribution

A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample size M (Everitt & Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample size M. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.

Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.

code

require(rethinking)

mu = c(0.2, 0.5, 0.8)
M = c(2, 5, 20) 

par(mfrow=c(1,2))

cp = sapply( 1:length(mu), col.alpha, alpha=0.7) 
for(i in 1:length(mu)){
  if(i==1){
    curve( dbeta2(x, prob=mu[i], theta=10),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=mu, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( mu[1]==0.2,
                              mu[2]==0.5,
                              mu[3]==0.8,
                              M==10) )
  } else{
    curve( dbeta2(x, prob=mu[i], theta=10),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}


cp = sapply( 1:length(M), col.alpha, alpha=0.7)
for(i in 1:length(M)){
  if(i==1){
    curve( dbeta2(x, prob=0.3, theta=M[i]),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=0.3, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( M[1]==2,
                              M[2]==5,
                              M[3]==20,
                              mu==0.3) )
  } else{
    curve( dbeta2(x, prob=0.3, theta=M[i]),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}

par(mfrow=c(1,1))

1: required package
2: parameter to plot: means and ‘sample size’
3: plotting beta-proportion distribution with different ‘mu’ and ‘M=10’
4: plotting beta-proportion distribution with ‘mu=0.5’ and different M’s

Figure 8: Beta-proportion distribution with different mean and sample sizes

3.2.3 Importance

It is crucial to comprehend what signifies for an outcome to follow a normal distribution, as the assumption of normally distributed outcomes is ubiquitous in speech intelligibility research (see Boonen, Kloots, Nurzia, & Gillis, 2021; Flipsen, 2006; Lagerberg, Asberg, Hartelius, & Persson, 2014).

Boundedness

Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)

Underfitting

Occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data (Everitt & Skrondal, 2010).

Misspecification

Occurs when the model’s functional form or inclusion of covariates poorly represent relevant aspects of the true data. This can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).

In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Neglecting the bounded nature of an outcome can lead, at best, to underfitting, and, at worse, to misspecification.

3.3 Linear Mixed Models

3.3.1 The ordinary LMM

A commonly know Bayesian probabilistic representation of an ordinary linear mixed model (LMM) can be expressed as follows:

Ordinary linear mixed model (LMM)

Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, & Kelley, 2019).

\begin{align*} y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\ \varepsilon_{ib} &\sim \text{Normal}(0, 1) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \end{align*}

where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.

3.3.2 The generalized LMM

A generalized linear mixed model (GLMM) is a generalization of LMM (Lee & Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:

Generalized linear mixed model (GLMM)

Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996).

\begin{align*} y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\ \mu_{ib} &= \beta x_{i} + a_{b} \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}

Notice this representation explicitly highlights the three components of a GLMM: the likelihood component, the linear predictor, and the link function (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, resulting from the use of Bayesian procedures, a fourth component can be added to any GLMM: the prior distributions. The priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.

GLMM components

Likelihood component
Linear predictor
Link function

On the other hand, a Beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:

\begin{align*} y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\ \mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}

Notice the representation also highlights the three components of a GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the additional fourth component resulting from using Bayesian procedures, the prior assumptions for \beta and a_{b} are also declared.

3.3.3 Importance

Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error.

Clustering

It occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).

Statistical power

The model’s ability to reject the null hypothesis when it is false (Everitt & Skrondal, 2010).

Type I error

The error that results when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).

Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.

3.4 Measurement error in an outcome

3.4.1 What is the problem?

The problem with measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:

Measurement error

It refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010).

\begin{align*} \tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\ y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.

For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.

The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.

code

set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )

s = 2
y_tilde = rnorm( n=n, mean=y, sd=s )

1: replication seed
2: simulation sample size
3: covariate effect
4: covariate simulation
5: linear predictor on outcome mean
6: latent outcome simulation
7: measurement error
8: manifest outcome simulation

code

# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) }
y_lik_man = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik_lat = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop_man = y_lik_man * b_prior
b_post_man = b_prop_man / sum(b_prop_man)

b_prop_lat = y_lik_lat * b_prior
b_post_lat = b_prop_lat / sum(b_prop_lat)

1: number of points in candidate list
2: candidate list for parameter
3: user defined function: linear predictor for each candidate
4: calculation of the linear predictor for each candidate
5: user defined function: product of individual observation likelihoods for manifest outcome
6: manifest outcome data likelihood
7: user defined function: product of individual observation likelihoods for latent outcome
8: latent outcome data likelihood
9: uniform prior distribution for parameter, on manifest and latent outcomes
10: proportional posterior distribution for parameter on manifest outcome
11: posterior distribution for parameter on manifest outcome
12: proportional posterior distribution for parameter on latent outcome
13: posterior distribution for parameter on latent outcome

code

paste0( 'true beta = ', b )

# manifest outcome
b_exp_man = sum( b_cand * b_post_man )
paste0( 'estimated beta (expectation on manifest) = ', 
        round(b_exp_man, 3) )

b_var_man = sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) )
paste0( 'estimated beta (standard deviation on manifest) = ', 
        round(b_var_man, 3) )


# latent outcome
b_exp_lat = sum( b_cand * b_post_lat )
paste0( 'estimated beta (expectation on latent) = ', 
        round(b_exp_lat, 3) )

b_var_lat = sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) )
paste0( 'estimated beta (standard deviation on latent) = ', 
        round(b_var_lat, 3) )

# null hypothsis rejection
b_prob_man = sum( b_post_man[ b_cand > 0 ] )
paste0( 'P(estimated beta on manifest > 0) = ', 
        round(b_prob_man, 3) )

b_prob_lat = sum( b_post_lat[ b_cand > 0 ] )
paste0( 'P(estimated beta on latent > 0) = ', 
        round(b_prob_lat, 3) )

1: true values for the parameter
2: expected value for the parameter on manifest outcome
3: standard deviation for the parameter on manifest outcome
4: expected value for the parameter on latent outcome
5: standard deviation for the parameter on latent outcome
6: probability that the parameter is greater than zero, on manifest outcome
7: probability that the parameter is greater than zero, on latent outcome

[1] "true beta = 0.2"
[1] "estimated beta (expectation on manifest) = 0.212"
[1] "estimated beta (standard deviation on manifest) = 0.176"
[1] "estimated beta (expectation on latent) = 0.299"
[1] "estimated beta (standard deviation on latent) = 0.088"
[1] "P(estimated beta on manifest > 0) = 0.887"
[1] "P(estimated beta on latent > 0) = 1"

code

par( mfrow=c(1,2) )

plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7),
      pch=19, col=rgb(0,0,0,alpha=0.3),
      ylab=expression(tilde(y)),
      main='manifest outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_man, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

plot( x, y, xlim=c(-3,3), ylim=c(-7,7),
      pch=19, col=rgb(0,0,0,alpha=0.3),
      main='latent outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_lat, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par( mfrow=c(1,1) )

1: simulation plot of manifest outcome
2: simulation plot of latent outcome

code

par(mfrow=c(1,2))

plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1),
      main='Posterior on manifest outcome',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1),
      main='Posterior on latent outcome',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 10: Bayesian inference: grid approximation on measurement error outcomes

3.4.2 How to solve it?

Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, & Pickles, 2004). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).

Latent variables

Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt & Skrondal, 2010).

Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004).

3.4.3 Importance

It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error.

Type II error

The error that results when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).

Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.

3.5 Distributional departures

3.5.1 Heteroscedasticity

In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable (Everitt & Skrondal, 2010). The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:

Heteroscedasticity

Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity (Everitt & Skrondal, 2010).

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \sigma_{i} &= exp( \gamma \cdot x_{i} ) \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \gamma &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].

Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*} As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.

code

set.seed(12345)
n = 100

b = 0.2
g = 1

x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
s_y = exp(g*x) 

y = rnorm( n=n, mean=mu_y, sd=s_y )

1: replication seed
2: simulation sample size
3: beta and gamma effects
4: covariate simulation
5: (non)linear predictor on outcome mean and standard deviation
6: outcome simulation

code

# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)

1: number of points in candidate list
2: candidate list for parameter
3: user defined function: linear predictor for each candidate
4: calculation of the linear predictor for each candidate
5: user defined function: product of individual observation likelihoods
6: outcome data likelihood
7: uniform prior distribution for parameter (min=-20, max=20)
8: proportional posterior distribution for parameter
9: posterior distribution for parameter

code

paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )

1: true values for the parameter
2: expected value for the parameter
3: maximum probability value for the parameter
4: standard deviation for the parameter
5: probability that the parameter is greater than zero

[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.638"
[1] "estimated beta (maximum probability) = 0.621"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"

code

plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
abline( a=-4, b=-1, lty=2, col=rgb(0,0,0,alpha=0.3))
abline( a=4.4, b=1.5, lty=2, col=rgb(0,0,0,alpha=0.3))
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

1: scatter plot of an heteroscedastic outcome

Figure 11: Heteroscedasticity simulation

code

par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp), lty=2, 
        col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,2) )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 12: Bayesian inference: grid approximation

3.5.2 Outliers

Outliers can occur by chance in any distribution, but they typically indicate novel behavior or structures in a dataset. For instance, outliers may arise due to measurement error, a heavy-tailed distribution within a population, or when some sampled observations come from a different population than intended (Grubbs, 1969). Additionally, outliers may emerge when the assumed (causal) generating mechanisms of the data differ from the observed data at the extreme end.

Outliers

Observation that appear to deviate markedly from other sample data points in which it occurs (Grubbs, 1969).

Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected. The intuition is proven correct when \beta is estimated using the same “incorrect” model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.

code

set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )

idx = which( x>1 )
sel = 1:3
y[idx[sel]] = 6

1: replication seed
2: simulation sample size
3: beta effects
4: covariate simulation
5: linear predictor on outcome mean
6: outcome simulation
7: outlier simulation

code

# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)

1: number of points in candidate list
2: candidate list for parameter
3: user defined function: linear predictor for each candidate
4: calculation of the linear predictor for each candidate
5: user defined function: product of individual observation likelihoods
6: outcome data likelihood
7: uniform prior distribution for parameter (min=-20, max=20)
8: proportional posterior distribution for parameter
9: posterior distribution for parameter

code

paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )

1: true values for the parameter
2: expected value for the parameter
3: maximum probability value for the parameter
4: standard deviation for the parameter
5: probability that the parameter is greater than zero

[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.477"
[1] "estimated beta (maximum probability) = 0.46"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"

code

plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
points( x[idx[sel]], y[idx[sel]],
        pch=19, col=rgb(0,0,1,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

1: scatter plot of an outcome with outliers

code

par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp), lty=2, 
        col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,2) )

par(mfrow=c(1,1))

1: prior distribution density plot
2: posterior distribution density plot

Figure 14: Bayesian inference: grid approximation

3.5.3 Solution

As recommended by McElreath (2020), robust models can be used to deal with these types of distributional departures. The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors or to the presence of outliers.

Robust models

A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model for (Everitt & Skrondal, 2010).

3.5.4 Importance

It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020).

Bias

It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).

Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.

4 Data

code

data_dir = '/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/#data/final'
data_nam = "data_H_list.RData"
model_data = file.path(data_dir, data_nam )
load( model_data )
# str(dlist)

var_int = c('bid','cid','uid','wid','HS','A','Am','Hwsib')
data_H = data.frame(dlist[var_int])
# str(data_H)

As expected, the data reveals the two significant features of entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (Table 1). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected speakers).

code

var_int = c('bid','cid','uid','wid','HS','A','Am','Hwsib')

kable( head( data_H[, var_int], 10 ),
       col.names=c('Block ID','Speaker ID','Sentence ID','Word ID',
                   'Hearing status','Chron. age','Chron. age (centered)',
                   'Entropy') )

1: selecting variables of interest
2: showing the first 10 observations of the data

Table 1: First ten observations of the data

Block ID	Speaker ID	Sentence ID	Word ID	Hearing status	Chron. age	Chron. age (centered)	Entropy
1	1	1	1	2	85	17	0.0570388
1	1	1	2	2	85	17	0.2785730
1	1	1	3	2	85	17	0.2785730
1	1	1	4	2	85	17	0.4607393
1	1	1	5	2	85	17	0.1134471
1	1	1	6	2	85	17	0.0001000
1	1	1	7	2	85	17	0.0001000
1	1	1	8	2	85	17	0.0001000
2	1	2	1	2	85	17	0.0662660
2	1	2	2	2	85	17	0.0662660

code

require(rethinking)

speakers = c(20,8,11, 25,30,6)

par(mfrow=c(2,3))
for( i in speakers ){
  
  speaker = data_H$cid == i
  
  dat = binning( y=data_H$Hwsib[speaker], min_y=0, max_y=1, 
                 n_bins=20, dens=T )
  
  plot(dat, ylab="Frequency-Density", ylim=c(-0.15,max(dat)), xaxt='n',
       xlim=c(-0.05,1.05), xlab='entropy', col=rgb(0,0,0,0.6) )
  abline( h=0, col='gray' )
  abline( v=c(0,1), lty=2, col=rgb(0,0,0,0.3) )
  axis( side=1, at=as.numeric(names(dat)),
        labels=names(dat), las=2, cex.axis=0.8 )
  mtext( text=paste0('Speaker ', i), side=3, adj=0, cex=1.1)
  
}
par(mfrow=c(1,1))

1: package requirement
2: selection of speakers
3: density plot for all sentences of speakerID

Figure 15: Entropy scores distribution: all sentences of selected speakers

Additionally, the data shows the 320 speakers’ speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (M=7.1, SD=1.1, Table 2), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).

code

speech_samples = with(data_H, table(cid, uid) )
speech_samples

1: report speech samples per speaker and sentence

    uid
cid   1  2  3  4  5  6  7  8  9 10
  1   8  8  8  7  8  7  7  6  6  5
  2   9  7  7  7  6  6  6  6  5  7
  3   8  8  8  8  8  7  8  7  7  6
  4   8  8  8  7  7  7  7  7  6  6
  5   8  9  7  7  7  7  7  7  6  7
  6   8  6  3  6  6  6  6  5  4  4
  7   9  7  7  7  7  8  4  6  6  6
  8   8  8  8  9  7  7  6  6  6  6
  9   9 11  7  6  7  6  6  5  5  6
  10  9  8  8  9  8  8  7  6  6  6
  11 10  9  7  7  7  7  7  7  6  7
  12  9  8  7  8  8  8  6  7  6  6
  13  7  7  7  7  7  7  7  7  6  6
  14  8  8  8  7  8  6  6  6  5  6
  15  8  8  7  7  7  8  7  8  7  6
  16  8  7  7  7  7  6  6  6  6  6
  17  8  7  7  7  7  6  7  7  6  6
  18  8  8  8  7  7  7  6  6  6  6
  19  9  8  7  7  8  8  6  8  6  6
  20 10  8  8  7  7  7  7  6  6  6
  21  9  8  8  8  7  7  7  8  6  6
  22  9  8  8  7  7  7  7  7  8  7
  23  8  8  7  7  7  7  7  7  6  6
  24  7  7  7  7  7  7  7  8  7  7
  25  9  8  8  7  7  8  8  7  6  6
  26  7  7  7  9  7  7  7  7  7  7
  27  9  8  8  7  7  7  7  7  6  6
  28  9  8  8  8  7  7  7  7  6  6
  29  8  9  8  9  7  7  7  6  6  7
  30  9  7  7  7  7  7  6  6  5  5
  31 11  8  8  8  7  7  7  7  7  6
  32  8  8  8 10  7  8  6  6  6  6

code

speech_samples = data.frame( speech_samples )
hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),
     main='', xlab='words per sentence')

1: histogram of words per sentences

Figure 16: Histogram of words per sentences in the speech samples

code

kable( psych::describe( speech_samples$Freq ),
       digits=2, row.names=F )

1: statistical descriptors for the speech samples

Table 2: Summary of speech samples across speakers

vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
1	320	7.07	1.06	7	7.04	1.48	3	11	8	0.19	1.45	0.06

Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M=86.3, SD=9.0), while HI/CI group were between 78 and 98 months old (M=86.3, SD=6.7).

code

d_mom = unique( data_H[,c('cid','HS','A')])
with( d_mom, table( A, HS ) )

1: unique hearing status and chronological age per speaker
2: number of speakers per chronological age and hearing status

Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:

code

dlist = list(
  
  N = nrow(data_H),
  B = max(data_H$bid),
  I = max(data_H$cid),
  U = max(data_H$uid),
  W = max(data_H$wid),
  
  cHS = max(data_H$HS),
  
  bid = data_H$bid,
  cid = data_H$cid,
  uid = data_H$uid,
  wid = data_H$wid,
  HS = data_H$HS,
  A = data_H$A,
  Am = with( data_H, A - min(A) ),
  Hwsib = data_H$Hwsib
  
)

str(dlist)

1: Number of observations
2: Maximum number of blocks
3: Maximum number of speakers
4: Maximum number of sentences
5: Maximum number of words
6: Maximum number of categories in hearing status
7: Data block ID
8: Data speaker ID
9: Data sentence ID
10: Data word ID
11: Data hearing status
12: Data chronological age
13: Data chronological age (centered)
14: Data entropy score

List of 14
 $ N    : int 2263
 $ B    : int 5
 $ I    : int 32
 $ U    : int 10
 $ W    : num 11
 $ cHS  : int 2
 $ bid  : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
 $ cid  : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
 $ uid  : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
 $ wid  : num [1:2263] 1 2 3 4 5 6 7 8 1 2 ...
 $ HS   : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
 $ A    : int [1:2263] 85 85 85 85 85 85 85 85 85 85 ...
 $ Am   : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
 $ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...

5 Methods

This section articulates the probabilistic formalism of both the Normal LMM and the proposed Beta-proportion GLLAMM. Subsequently, it details the set of fitted models and the estimation procedure

5.1 Statistical models

5.1.1 Normal LMM

The general mathematical formalism of the Normal LMM posits that the likelihood of the (manifest) entropy scores H_{wsib} follows a normal distribution, i.e.

\begin{align} H_{wsib} & \sim \text{Normal} \left( \mu_{sib}, \sigma_{i} \right) \end{align} \tag{4}

where \mu_{sib} represents the average entropy at the word-level and \sigma_{i} denotes the standard deviation of the average entropy at the word-level, varying for each speaker. Given the clustered nature of the data, \mu_{sib} is defined by the linear combination of individual characteristics and several random effects:

\begin{align} \mu_{sib} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + u_{si} + e_{i} + a_{b} \end{align} \tag{5}

where HS_{i} and A_{i} denote the hearing status and chronological age of speaker i, respectively. Additionally, \alpha denote the general intercept, \alpha_{HS[i]} represents the average entropy for each hearing status group, and \beta_{A,HS[i]} denotes the evolution of the average entropy per unit of chronological age A_{i} for each hearing status group. Furthermore, u_{si} denotes the sentence-speaker random effects measuring the unexplained entropy variability within sentences for each speaker, e_{i} denotes the speaker random effects describing the unexplained entropy variability between speakers, and a_{b} denotes the block random effects assessing the unexplained variability between experimental blocks.

Several notably features of the Normal LLM can be discerned from the equations. Firstly, Equation 4 indicates that the variability of the average entropy at the word-level can differ for each speaker, enhancing the model’s robustness to mild or moderate data departures from the normal distribution assumption, such as heteroscedasticity or outliers (refer to Section 3.5). Secondly, Equation 5 reveals that the model assumes no transformation is applied to the relationship between the average entropy and the linear predictor. This is commonly known as a direct link function. Moreover, Equation 5 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the equation implies the model considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers

Centering

Procedure use to facilitate the interpretation of regression parameters (Everitt & Skrondal, 2010).

5.1.2 Beta-proportion GLLAMM

The general mathematical formalism of the proposed Beta-proportion GLLAMM comprises four components: a response model, with its likelihood, linear predictor, and link function, and a structural model. The response model posits the likelihood of entropy scores follow a beta-proportion distribution,

GLLAMM components

Response model, likelihood
Response model, linear predictor
Response model, link function
Structural equation model

\begin{align} H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right) \end{align} \tag{6}

where\mu_{ib} denotes the average entropy at the word-level and M_{i} signifies the dispersion of the average entropy at the word-level, varying for each speaker. Additionally, \mu_{ib} is defined as,

\begin{align} \mu_{ib} &= \text{logit}^{-1}[ a_{b} - SI_{i} ] \end{align} \tag{7}

where \text{logit}^{-1}(x) = exp(x) / (1+exp(x)) is the inverse-logit link function, a_{b} denotes the block random effects, and SI_{i} describes the speaker’s latent potential intelligibility. Conversely, the structural equation model relates the speakers’ latent potential intelligibility to the individual characteristics:

\begin{align} SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \end{align} \tag{8}

where \alpha defines the general intercept, \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups, and \beta_{A,HS[i]} indicates the evolution of potential intelligibility per unit of chronological age for each hearing status group. Furthermore, e_{i} represents speakers block effects, describing unexplained potential intelligibility variability between speakers, and u_{i} = \sum_{s=1}^{S} u_{si}/S denotes sentence random effects, assessing the average unexplained potential intelligibility variability among sentences within each speaker, with S denoting the total number of sentences per speaker.

Several features are evident in this probabilistic representations. Firstly, akin to the Normal LMM, Equation 6 reveals that the dispersion of average entropy at the word level can differ for each speaker. This enhances the model’s robustness to mild or moderate data departures from the beta-proportion distribution assumption (refer to Section 3.5). Secondly, in contrast with the Normal LMM, Equation 7 shows the potential intelligibility of a speakers has a negative non-linear relationship with the entropy scores, explicitly highlighting the inverse relationship between intelligibility and entropy. This feature also maps the unbounded linear predictor to the bounded limits of the entropy scores. Thirdly, in contrast with the Normal LMM, Equation 8 demonstrates that the structural parameters are interpretable in terms of the latent potential intelligibility scores, where the scale of the latent trait is set by the general intercept \alpha, as required in latent variable models (Depaoli, 2021). Furthermore, the equation implies the model also considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers. Additionally, Equation 8 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the same equation assumes the intelligibility scores have two sources of unexplained variability: e_{i} and u_{i}. The former represents inherent differences in potential intelligibility among different speakers, while the latter assumes that different sentences measure potential intelligibility differently due to variations in word difficulties and their interplay within the sentence.

5.2 Prior distributions

Bayesian procedures require the incorporation of priors (refer to Section 3.1). This study establishes priors and hyperpriors for the parameters of both the Normal LMM and the Beta-proportion GLLAMM using prior predictive simulations.

Prior predictive simulations

Procedure that entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. The procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).

5.2.1 Normal LMM

For the parameters of the Normal LMM, non-informative priors and hyperpriors are established to align with analogous model assumptions in frequentist methods (refer to Section 3.1.4). The specified priors are as follows:

5.2.1.1 Standard deviation \sigma_{i}

As described in Section 5.3, the models initially consider one \sigma prior for all the speakers. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.

\begin{align} \sigma_{i} &\sim \text{Exponential}\left( 2 \right) \end{align} \tag{9}

The left panel of Figure 17 shows the weakly informative prior expects \sigma to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). Furthermore, the right panel of Figure 17 shows that when transformed to the entropy scale, the model expect predictions to fall beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

param_pscale = rexp(n=n, rate=2 )
param_oscale = rnorm( n=n, mean=0.5, sd=param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,3),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(sigma))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: parameter scale
4: entropy scale
5: density plot: unrestricted parameter scale
6: density plot: bounded entropy scale

Figure 17: Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale

Furthermore, as described in Section 5.1.1 and Section 5.3, there is a possibility that the model considers one \sigma_{i} prior for each of the speakers in the data. This choice implies that the presumed uncertainty about unexplained variability of the average entropy at the word-level is similar for each speaker, prior to observing empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5). \begin{align} r_{S} &\sim \text{Exponential}\left( 2 \right) \\ \sigma_{i} &\sim \text{Exponential}\left( r_{S} \right) \end{align} \tag{10}

The left panel of Figure 18 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). The panel also shows the parameters are more likely to happen in the interval of [0, 2.5]. Moreover, the right panel of Figure 18 shows that when the prior is transformed to the entropy scale, the model expect scores to fall beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )

param_pscale = r_s * z_s
param_oscale = rnorm( n=n, mean=0.5, sd=param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,3),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(sigma[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: density plot: unrestricted parameter scale
7: density plot: bounded entropy scale

Figure 18: Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale

5.2.1.2 Intercepts \alpha

This parameter is used in preliminary models where no mathematical formulations regarding how speaker-related factors influence intelligibility are investigated. The prior distribution for \alpha under the Normal LMM is described in Equation 11.

\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{11}

The left panel of Figure 19 show the prior is an narrowly concentrated around zero. Moreover, the right panel of Figure 19, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at low levels of the feasible range of the outcome. This implies that particular bias in entropy scores towards lower values are expected by prior.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = rnorm(n=n, mean=param_pscale, sd=param_hscale)

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-0.5,0.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(M))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 19: Normal LMM, general intercept prior distribution: parameter and entropy scale

5.2.1.3 Hearing status effects \alpha_{HS[i]}

The prior distribution for the Normal LMM is described in Equation 12. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.

\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \end{align} \tag{12}

The left panel of Figure 20 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} between [0.3, 0.7]. This implies that no particular bias towards entropy values above or below 0.5 for different hearing status groups is present in the priors. However, the right panel of Figure 20 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around low levels of entropy, but also beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

param_pscale = rnorm( n=n, mean=0, sd=0.2 )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Parameter scale',
      xlab=expression(alpha[HS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 20: Normal LMM, hearing status effects prior distribution: parameter and entropy scale

5.2.1.4 Chronological age per hearing status \beta_{A,HS[i]}

The prior distribution for the Normal LMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of entropy attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data.

\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{13}

The left panel of Figure 21 shows the prior restricts \beta_{A,HS[i]} to be mostly within the range of [-0.4, 0.4]. This implies that there is no particular bias towards a positive or negative evolution of entropy scores due to chronological age per hearing status group. However, the right panel of Figure 21 show that when this prior is transformed to the entropy scale, the model anticipate a concentration of entropy values at lower levels, but it also expects entropy scores significantly beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-0.5,0.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(beta[AHS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: user defined function
6: parameter scale
7: entropy scale
8: unrestricted parameter scale
9: bounded entropy scale

Figure 21: Normal LMM, chonological age per hearing status effects prior distribution: parameter and entropy scale

5.2.1.5 speaker differences e_{i}

The prior distribution of e_{i} for the Normal LMM is described in Equation 14. The same prior is assigned to each speaker in the sample. This choice implies that differences in entropy scores between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).

\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{14}

The left panel of Figure 22 shows the prior anticipates differences in entropy scores between speakers as large 3 units of entropy. However, the right panel of Figure 22 demonstrate that when transformed to the entropy scale the model anticipates a concentration scores around low levels, but also it expects the differences to go way beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

m_i = rnorm(n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )

param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1.5,1.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(e[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 22: Normal LMM, speaker differences prior distribution: parameter and entropy scale

5.2.1.6 Within sentence-speaker differences u_{si}

The prior distribution of u_{si} for the Normal LMM is described in Equation 15. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average entropy score differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters (refer to Section 3.1.5).

\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ \end{align} \tag{15}

The left panel of Figure 23 shows the prior restricts the average differences in entropy among sentences within speakers can be as large as 3 units of measurement. Furthermore, the right panel of Figure 23 demonstrate that when transformed to the entropy scale the model anticipates a concentration of scores around mid-levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

m_u = rnorm(n=n, mean=0, sd=0.05 )
s_u = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_u, sd=s_u )

param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1.5,1.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(u[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 23: Normal LMM, within sentence-speaker average differences prior distribution: parameter and entropy scale

5.2.1.7 Random block effect a_{b}

The prior distribution for the Normal LMM is described in Equation 16. The same prior is assigned to each block. This choice implies that the average entropy score differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).

\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{16}

The left panel of Figure 24 shows a prior with no particular bias towards differences between blocks above or below zero units of entropy. Nevertheless, the right panel of Figure 24 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, but also contemplates differences beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

m_b = rnorm(n=n, mean=0, sd=0.05 )
s_b = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_b, sd=s_b )

param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1.5,1.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(u[si]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 24: Normal LMM, block differences prior distribution: parameter and entropy scale

5.2.1.8 Linear predictor g(\cdot)

After the careful assessment of the prior implications for each parameter, the expected prior distribution for the linear predcitor can be constructed for the Normal LMM. The prior predictive simulation can be described as in Equation 17:

\begin{align} m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ g(\cdot) &= \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{si} + a_{b} \\ \end{align} \tag{17}

The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-2.5, 2.5], implying there is particular bias towards negative entropy scores is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.

code

require(rethinking)

n = 1000

r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s

m = rnorm(n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_si = rnorm( n=n, mean=m, sd=s )
a_b = rnorm( n=n, mean=m, sd=s )

aHS = rnorm( n=n, mean=0, sd=0.2 )
bAHS = rnorm( n=n, mean=0, sd=0.1 ) 
param_pscale = aHS + bAHS + u_si + e_i + a_b

param_oscale = rnorm( n=n, mean=param_pscale,
                      sd=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-3,3),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(SI[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 25: Normal LMM, linear predictor distribution: parameter and entropy scale

5.2.2 Beta-proportion GLLAMM

For the parameters of the Beta-proportion GLLAMM, weakly informative priors and hyperpriors are established (refer to Section 3.1.4). The specified priors are as follows:

5.2.2.1 Sample size M_{i}

Similar to the Normal LMM, Section 5.3 describes a Beta-proportion GLLAMM that initially considers one M for all speakers in the data. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.

\begin{align} M &\sim \text{Exponential}\left( 0.4 \right) \end{align} \tag{18}

The left and right panel of Figure 26, demonstrate the prior of M expects the parameters to be more probable in a positive range between [0, 7], while predicting scores within the boundaries of the data. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.

code

require(rethinking)

n = 1000

param_pscale = rexp( n=n, rate=0.4 )
param_oscale = rbeta2( n=n, prob=0.5, theta=param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,10),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(M))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: parameter scale
4: entropy scale
5: density plot: restricted parameter scale
6: density plot: bounded entropy scale

Figure 26: Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale

Furthermore, as described in Section 5.1.2 and Section 5.3, there is a possibility that the model considers one M_{i} prior for each speakers in the data. This choice implies the presumed uncertainty for the unexplained dispersion of the average entropy at the word-level is similar for each speaker, prior to the observation of empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).

\begin{align} r_{M} &\sim \text{Exponential}\left( 0.2 \right) \\ M_{i} &\sim \text{Exponential}\left( r_{M} \right) \end{align} \tag{19}

The left and right panel of Figure 27, demonstrate the prior of M_{i} expects the parameters to be more probable in a positive range between [0, 20], while at the same time predicting data within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )

param_pscale = r_M * z_M
param_oscale = rbeta2( n=n, prob=0.5, theta=param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,20),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(M[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: density plot: unrestricted parameter scale
7: density plot: bounded entropy scale

Figure 27: Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale

5.2.2.2 Intercepts \alpha

Considering that the structural parameters are now interpretable in terms of the (latent) potential intelligibility scores, the general intercept \alpha is used to set the scale of the latent trait, as it is required in latent variable models (Depaoli, 2021) (refer to Section 3.1.4). The prior distribution for \alpha under the Beta-proportion GLLAMM is described in Equation 20.

\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{20}

The left panel of Figure 28 show the prior is narrowly concentrated around zero. Moreover, the right panel of Figure 28, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at mid-levels of the feasible range of the outcome. This implies that no particular bias in entropy scores are expected by prior.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
                       theta=param_hscale ) 

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-0.5,0.5),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(M))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 28: Beta-proportion GLLAMM, general intercept prior distribution: parameter and entropy scale

5.2.2.3 Hearing status effects \alpha_{HS[i]}

The prior distribution for the Beta-proportion GLLAMM is described in Equation 21. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.

\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \end{align} \tag{21}

The right panel of Figure 29, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards specific entropy score values are expected from the using the prior.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

param_pscale = rnorm( n=n, mean=0, sd=0.3 )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
                       theta=param_hscale ) 

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(alpha[HS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 29: Beta-proportion GLLAMM, hearing status effects prior distribution: parameter and entropy scale

5.2.2.4 Chronological age per hearing status \beta_{A,HS[i]}

The prior distribution for the Beta-proportion GLLAMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties, prior to the observation of empirical data.

\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{22}

The left panel of Figure 30 shows the weakly informative prior has no particular bias towards a positive or negative evolution of potential intelligibility due to chronological age per hearing status group. Furthermore, the right panel of Figure 30, demonstrate that when transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
                       theta=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(beta[AHS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: user defined function
6: parameter scale
7: entropy scale
8: unrestricted parameter scale
9: bounded entropy scale

Figure 30: Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: parameter and entropy scale

5.2.2.5 speaker differences e_{i}

The prior distribution for the Beta-proportion GLLAMM is described in Equation 23. The same prior is assigned to each speakers in the sample. This choice implies that differences in potential intelligibility differences between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters, called hyperpriors (refer to Section 3.1.5).

\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{23}

The left panel of Figure 31 shows the prior anticipates differences in intelligibility between speakers as large 3 units of measurement. Furthermore, the right panel of Figure 31, demonstrate that when transformed to the entropy scale, the model anticipates a high concentration around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between speakers are expected resulting from using this prior.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )

param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
                       theta=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(e[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 31: Beta-proportion GLLAMM, speakers differences prior distribution: parameter and entropy scale

5.2.2.6 Average within sentence-speaker differences u_{i}

The prior distribution of u_{i} for the Beta-proportion GLLAMM is described in Equation 24. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average potential intelligibility differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5). Next, the within sentence-speaker differences are then aggregated to the speaker level to form the sentence random effects u_{i},

\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \end{align} \tag{24}

The left panel of Figure 32 shows the prior restricts the average differences in potential intelligibility among sentences within speakers can be as large as 0.8 units of measurement. Furthermore, the right panel of Figure 32, demonstrate that when u_{i} is transformed to the entropy scale, the model anticipates a high concentration of scores around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility is expected between speakers.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

m_u = rnorm( n=n, mean=0, sd=0.05 )
s_u = rexp(n=n, rate=2 )
param_pscale = replicate(
  n=n, expr=mean( rnorm( n=10, mean=m_u, sd=s_u ) ) ) 

param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
                       theta=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(u[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 32: Beta-proportion GLLAMM, average within sentence-speaker differences prior distribution: parameter and entropy scale

5.2.2.7 Random block effect a_{b}

The prior distribution for the Beta-proportion GLLAMM is described in Equation 25. The same prior is assigned to each block. This choice implies that the average entropy scores differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).

\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{25}

The left panel of Figure 33 shows a prior with no particular bias towards positive or negative differences between blocks. Furthermore, the right panel of Figure 33 demonstrate that when transformed to the entropy scale, the model anticipates a high concentration of data around mid levels of entropy, but not beyond the feasible range of the outcome.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

m_b = rnorm( n=n, mean=0, sd=0.05 )
s_b = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_b, sd=s_b )

param_oscale = rbeta2( n=n, prob=inv_logit(param_pscale),
                       theta=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(u[si]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 33: Beta-proportion GLLAMM, block differences prior distribution: parameter and entropy scale

5.2.2.8 Speech intelligibility SI_{i}

After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed for the Beta-proportion GLLAMM. The prior predictive simulation can be described as in Equation 26:

\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\ \end{align} \tag{26}

The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-3, 3], implying no particular bias towards positive or negative potential intelligibility is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.

code

require(rethinking)

n = 1000

r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M

m = rnorm( n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_i = replicate( n=n, 
                 exp=mean( rnorm( n=10, mean=m, sd=s ) ) )
a_b = rnorm( n=n, mean=m, sd=s )

a = rnorm( n=n, mean=0, sd=0.05 )
aHS = rnorm( n=n, mean=0, sd=0.3 ) 
bAHS = rnorm( n=n, mean=0, sd=0.1 ) 
param_pscale = a + aHS + bAHS + e_i + u_i 

param_oscale = rbeta2( n=n, prob=inv_logit(a_b-param_pscale),
                       theta=param_hscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-3,3),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(SI[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))

1: package requirement
2: simulated sample size
3: hyperpriors scale
4: parameter scale
5: entropy scale
6: unrestricted parameter scale
7: bounded entropy scale

Figure 34: Beta GLLAMM, potential intelligibility distribution: parameter and entropy scale

5.3 Fitted models

This study evaluates the comparative predictive capabilities of both the Normal LMM and the Beta-proportion GLLAMM (RQ1) while simultaneously examining various formulations regarding how speaker-related factors influence intelligibility (RQ3). In this context, the predictive capabilities of the models are intricately connected to these formulations. As a result, the study requires fitting 12 different models, each representing a specific manner to investigate one or both research questions. The models comprised six versions of both the Normal LMM and the Beta-proportion GLLAMM. The differences among the models hinged on (1) whether they addressed data clustering in conjunction with measurement error, denoted as the model type, (2) the assumed distribution for the entropy scores, which aimed to handle boundedness, (3) whether the model incorporates a robust feature to address mild or moderate departures of the data from distributional assumptions, and (4) the inclusion or exclusion of speaker-related factors in the models. A detailed overview of the fitted models is available in Table 3.

Table 3: Fitted models.

	Model	Entropy	Robust	Fixed effects
Model	type	distribution	feature	\beta_{HS[i]}	\beta_{A}	\beta_{A,HS[i]}
1	LMM	Normal	No	No	No	No
2	LMM	Normal	No	Yes	Yes	No
3	LMM	Normal	No	Yes	No	Yes
4	LMM	Normal	Yes	No	No	No
5	LMM	Normal	Yes	Yes	Yes	No
6	LMM	Normal	Yes	Yes	No	Yes
7	GLLAMM	Beta-prop.	No	No	No	No
8	GLLAMM	Beta-prop.	No	Yes	Yes	No
9	GLLAMM	Beta-prop.	No	Yes	No	Yes
10	GLLAMM	Beta-prop.	Yes	No	No	No
11	GLLAMM	Beta-prop.	Yes	Yes	Yes	No
12	GLLAMM	Beta-prop.	Yes	Yes	No	Yes

The following tabset panel provides the commentated Stan code for all fitted model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    //real<lower=0> r_s;      // global rate for SD
    //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    //vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0, 0.2 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );
    //r_s ~ exponential( 2 );
    //z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model01.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    //real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    //real<lower=0> r_s;      // global rate for SD
    //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    //vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = //a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{

    // fixed effects priors
    //a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0, 0.2 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );
    //r_s ~ exponential( 2 );
    //z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model02.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    //real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    //real<lower=0> r_s;      // global rate for SD
    //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    //vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = //a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{
    
    // fixed effects priors
    //a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0, 0.2 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );
    //r_s ~ exponential( 2 );
    //z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model03.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    real<lower=0> r_s;      // global rate for SD
    vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{
    
    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0, 0.2 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //s_w ~ exponential( 2 );
    r_s ~ exponential( 2 );
    z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model04.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    //real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    real<lower=0> r_s;      // global rate for SD
    vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = //a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{

    // fixed effects priors
    //a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0, 0.2 );
    bAm ~ normal( 0 , 0.1 );
    //bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //s_w ~ exponential( 2 );
    r_s ~ exponential( 2 );
    z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model05.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    //real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // speaker, utterance, word sd (one overall)
    real<lower=0> r_s;      // global rate for SD
    vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] s_w;        // speaker, utterance, word sd (one per speaker)
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    s_w = z_s * r_s;      // non-centered speaker, utterance, word sd
    
    // average entropy
    for(n in 1:N){
      mu[n] = //a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        b_i[ bid[n] ] +
        e_i[ cid[n] ] +
        u_si[ cid[n], uid[n] ];
    }

}
model{
    
    // fixed effects priors
    //a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0, 0.2 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //s_w ~ exponential( 2 );
    r_s ~ exponential( 2 );
    z_s ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model06.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    //real<lower=0> r_M;      // global rate for 'sample size'
    //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
     
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    //vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.4 );
    //r_M ~ exponential( 0.2 );
    //z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model07.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    //real<lower=0> r_M;      // global rate for 'sample size'
    //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
     
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    //vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();

    // variability priors
    Mw ~ exponential( 0.4 );
    //r_M ~ exponential( 0.2 );
    //z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model08.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    //real<lower=0> r_M;      // global rate for 'sample size'
    //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
     
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    //vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    //Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.4 );
    //r_M ~ exponential( 0.2 );
    //z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model09.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;    // 'sample size' parameter
    real<lower=0> r_M;      // global rate for 'sample size'
    vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
     
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //Mw ~ exponential( 0.4 );
    r_M ~ exponential( 0.2 );
    z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model10.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;     // 'sample size' parameter
    real<lower=0> r_M;      // global rate for 'sample size'
    vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //Mw ~ exponential( 0.4 );
    r_M ~ exponential( 0.2 );
    z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model11.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (speakers)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // speaker's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // speaker RE mean
    real<lower=0> s_i;    // speaker RE sd
    vector[I] z_i;        // non-centered speaker RE
    real m_u;             // speaker, utterance RE mean
    real<lower=0> s_u;    // speaker, utterance RE sd
    matrix[I,U] z_u;      // non-centered speaker, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;     // 'sample size' parameter
    real<lower=0> r_M;      // global rate for 'sample size'
    vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // speaker random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] Mw;         // non-centered 'sample size' (one per speaker)
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered speaker RE
    u_si = m_u + s_u*z_u; // non-centered utterance RE
    Mw = z_M * r_M;       // non-centered 'sample size'
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.3 );
    bAmHS ~ normal( 0 , 0.3 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    //Mw ~ exponential( 0.4 );
    r_M ~ exponential( 0.2 );
    z_M ~ exponential( 1 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

# saving
model_nam = "model12.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )

```

Moreover, the following code is provided so the reader can fit all Stan models.

```{r}

for(i in 1:12){

  model_nam = paste0( ifelse(i<10, 'model0', 'model'), i, '.stan')
  model_in = file.path(getwd(), 'real_models')
  model_out = file.path(getwd(), 'real_chain')
  mod = cmdstan_model( file.path(model_in, model_nam) )

  print(model_nam)
  mod$sample( data=dlist,
              output_dir=model_out,
              output_basename = str_replace(model_nam, '.stan', ''),
              num_warmup=2000, num_samples=2000,
              chains=4, parallel_chains=4,
              max_treedepth=20, adapt_delta=0.95) #,init=0

}

            
```

5.4 Chain quality and information

Given the considerable number of fitted models and the resulting abundance of parameters, this section opted to exclusively showcase the quality and information embedded in the Bayesian chains through models 6 and 12. The selection of these models is grounded in their parameter counts, with both registering the highest among those detailed in Section 5.3. It is crucial to underscore that a meticulous examination of all fitted models was conducted. Notably, all models demonstrated comparable results to those specifically chosen for illustrative purposes.

In general, both graphical analysis and diagnostic statistics indicated that all chains exhibited low to moderate autocorrelation, explored the parameter space in a seemingly random manner, and converged to a constant mean and variance in their post-warm-up phase. Figure 35 visualizes the \widehat{\text{R}} diagnostic statistic and Figure 36 through Figure 48 illustrate the chain’s graphical analysis.

code

# load reference models
for(i in c(6,12)){
  model_nam = paste0( ifelse(i<10, 'model0', 'model'), i)
  model_out = file.path( main_dir, 'real_chain')
  model_fit = file_id( model_out, model_nam )
  assign( model_nam,
          rstan::read_stan_csv( file.path( model_out, model_fit ) ) )
}

code

par_int = c( 'a','aHS[1]','aHS[2]',
             'bAm','bAmHS[1]','bAmHS[2]', 
             'm_b','s_b',
             'm_i','s_i',
             'm_u','s_u',
             'r_s','r_M',
             's_w','Mw')

model06_parameters = par_recovery(
  stanfit_obj = model06,
  est_par = par_int,
  p = 0.95 )

model12_parameters = par_recovery(
  stanfit_obj = model12,
  est_par = par_int,
  p = 0.95 )

1: parameters of interest
2: user-defined function: displays concise parameter estimate information for selected model
3: user-defined function: displays concise parameter estimate information for selected model

code

par( mfrow=c(1,2) )

plot( 1:nrow(model06_parameters), model06_parameters$Rhat4,
      ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Rhat',
      main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
      labels=rownames(model06_parameters),
      cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )

plot( 1:nrow(model12_parameters), model12_parameters$Rhat4,
      ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Rhat',
      main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
      labels=rownames(model12_parameters),
      cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )

par( mfrow=c(1,1) )

1: model 06: Rhat values plot
2: model 06: parameters names in x-axis
3: model 06: convergence threshold
4: model 12: Rhat values plot
5: model 12: parameters names in x-axis
6: model 12: convergence threshold

code

tri_plot( stan_object=model06,
          pars=c('aHS[1]','aHS[2]') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 36: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('a','aHS[1]','aHS[2]') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 37: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model06,
          pars=c('bAmHS[1]','bAmHS[2]') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 38: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('bAmHS[1]','bAmHS[2]') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 39: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model06,
          pars=c('m_b','s_b') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 40: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('m_b','s_b') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 41: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model06,
          pars=c('m_i','s_i') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 42: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('m_i','s_i') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 43: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model06,
          pars=c('m_u','s_u') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 44: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('m_u','s_u') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 45: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model06,
          pars=c('r_s', paste0('s_w[', 1:4,']')) )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 46: Model 06, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=c('r_M', paste0('Mw[', 1:4,']')) )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 47: Model 12, trace, trace rank and ACF plots for selected parameters

code

tri_plot( stan_object=model12,
          pars=paste0('SI[', 1:5, ']') )

1: used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 48: Model 12, trace, trace rank and ACF plots for selected parameters

Moreover, the density plots and n_{\text{eff}} statistics collectively confirmed that all posterior distributions are unimodal distributions with values centered around a mean, generated with a satisfactory number of uncorrelated sampling points, making substantive sense compared to the models’ prior beliefs. Figure 49 visualizes the n_{\text{eff}} diagnostic statistic and Figure 51 through Figure 56 illustrate the chains’ graphical analysis.

code

par( mfrow=c(1,2) )

plot( 1:nrow(model06_parameters), model06_parameters$n_eff,
      ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Neff',
      main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
      labels=rownames(model06_parameters),
      cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )

plot( 1:nrow(model12_parameters), model12_parameters$n_eff,
      ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Neff',
      main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
      labels=rownames(model12_parameters),
      cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )

par( mfrow=c(1,1) )

1: model 06: Neff values plot
2: model 06: parameters names in x-axis
3: model 06: convergence threshold
4: model 12: Neff values plot
5: model 12: parameters names in x-axis
6: model 12: convergence threshold

code

par_int = c('a','aHS','bAm','bAmHS')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 50: Model 06, density plots for selected parameters

code

par_int = c('a','aHS','bAm','bAmHS')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 51: Model 12, density plots for selected parameters

code

par_int = c('m_b','m_i','m_u','s_b','s_i','s_u')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 52: Model 06, density plots for selected parameters

code

par_int = c('m_b','m_i','m_u','s_b','s_i','s_u')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 53: Model 06, density plots for selected parameters

code

par_int = c('r_s','s_w')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 54: Model 06, density plots for selected parameters

code

par_int = c('r_M','Mw')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 55: Model 12, density plots for selected parameters

code

par_int = 'SI'
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)

1: parameters of interest
2: used-defined function: generation of density plot with HPDI for selected parameters

Figure 56: Model 12, density plots for selected parameters

References

Allaire, J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto. https://doi.org/10.5281/zenodo.5960048

Baker, F. (1998). An investigation of the item parameter recovery characteristics of a gibbs sampling procedure. Applied Psychological Measurement, 22(22), 153–169. https://doi.org/10.1177/01466216980222005

Baldwin, S., & Fellingham, G. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Journal of Psychological Methods, 18(2), 151–164. https://doi.org/10.1037/a0030642

Boonen, N., Kloots, H., Nurzia, P., & Gillis, S. (2021). Spontaneous speech intelligibility: Early cochlear implanted children versus their normally hearing peers at seven years of age. Journal of Child Language, 1–26. https://doi.org/10.1017/S0305000921000714

Brooks, S., Gelman, A., Jones, G., & Meng, X. (2011). Handbook of markov chain monte carlo (1st ed.). Chapman; Hall, CRC. https://doi.org/10.1201/b10905

Denwood, M. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09

Depaoli, S. (2014). The impact of inaccurate “informative” priors for growth parameters in bayesian growth mixture modeling. Journal of Structural Equation Modeling, 21, 239–252. https://doi.org/10.1080/10705511.2014.882686

Depaoli, S. (2021). Bayesian structural equation modeling. The Guilford Press.

Depaoli, S., & van de Schoot, R. (2017). Improving transparency and replication in bayesian statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261. https://doi.org/10.1037/met0000065

Everitt, B., & Skrondal, A. (2010). The cambridge dictionary of statistics. Cambridge University Press.

Flipsen, P. (2006). Measuring the intelligibility of conversational speech in children. Clinical Linguistics & Phonetics, 20(4), 303–312. https://doi.org/10.1080/02699200400024863

Gabry, J., & Češnovar, R. (2022). Cmdstanr: R interface to ’CmdStan’. Retrieved from https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org

Gorinova, M., Moore, D., & Hoffman, M. (2019). Automatic reparameterisation of probabilistic programs. Retrieved from https://doi.org/10.48550/arXiv.1906.03028

Grubbs, F. E. (1969). Procedures for detecting outlying observations in samples. Technometrics, 11(1), 1–21. https://doi.org/10.1080/00401706.1969.10490657

Holmes, W., Bolin, J., & Kelley, K. (2019). Multilevel modeling using r (2nd edition). Chapman; Hall/CRC. https://doi.org/10.1201/9781351062268

Hoyle, R. (eds. ). (2014). Handbook of structural equation modeling. Guilford Press.

Jeffreys, H. (1998). Theory of probability. Oxford University Press.

Kim, S., & Cohen, A. (1999). Accuracy of parameter estimation in gibbs sampling under the two-parameter logistic model. Retrieved from https://eric.ed.gov/?id=ED430012

Kruschke, D. (2015). Doing bayesian data analysis: A tutorial with r, JAGS, and stan. Elsevier. Retrieved from https://www.sciencedirect.com/book/9780124058880/doing-bayesian-data-analysis

Lagerberg, T., Asberg, J., Hartelius, L., & Persson, C. (2014). Assessment of intelligibility using children’s spontaneous speech: Methodological aspects. International Journal of Language and Communication Disorders, 49(2), 228–239. https://doi.org/10.1111/1460-6984.12067

Lambert, P., Sutton, A., Burton, P., Abrams, K., & Jones, D. (2006). How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Journal of Statistics in Medicine, 24(15), 2401–2428. https://doi.org/10.1002/sim.2112

Lebl, J. (2022). Basic analysis i & II: Introduction to real analysis, volumes i & II. Retrieved from https://www.jirka.org/ra/html/frontmatter-1.html

Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 619–656. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x

Luce, R. (1959). On the possible psychophysical laws. The Psychologcal Review, 66(2), 482–499. https://doi.org/10.1037/h0043178

Martin, J., & McDonald, R. (1975). Bayesian estimation in unrestricted factor analysis: A treatment for heywood cases. Psychometrika, (40), 505–517. https://doi.org/10.1007/BF02291552

McElreath, R. (2020). Statistical rethinking: A bayesian course with examples in r and STAN. Chapman; Hall/CRC.

McElreath, R. (2021). Rethinking: Statistical rethinking book package.

Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705–741. https://doi.org/)

Neuwirth, E. (2022). RColorBrewer: ColorBrewer palettes. Retrieved from https://CRAN.R-project.org/package=RColorBrewer

Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. Retrieved from https://www.r-project.org/doc/Rnews/Rnews_2006-1.pdf

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modeling. Psychometrika, 69(2), 167–190. https://doi.org/https://www.doi.org/10.1007/BF02295939

Seaman III, J., Seaman Jr., J., & Stamey, J. (2011). Hidden dangers of specifying noninformative priors. The American Statistician, 66(2), 77–84. https://doi.org/10.1080/00031305.2012.695938

Stan Development Team. (2020). RStan: The R interface to Stan. Retrieved from http://mc-stan.org/

Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34(4), 482–499. https://doi.org/10.1037/h0070288

Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P., Paananen, T., & Gelman, A. (2023). Loo: Efficient leave-one-out cross-validation and WAIC for bayesian models. Retrieved from https://mc-stan.org/loo/

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., & Gabry, J. (2021). Pareto smoothed importance sampling. Retrieved from https://arxiv.org/abs/1507.02646

Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 1–20. Retrieved from http://www.jstatsoft.org/v21/i12/

Wickham, H. (2022). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://CRAN.R-project.org/package=stringr

Wickham, Hadley, Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, Hadley, François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr

Xie, Y. (2024). Knitr: A general-purpose package for dynamic report generation in r. Retrieved from https://yihui.org/knitr/

Zhu, H. (2024). kableExtra: Construct complex table with ’kable’ and pipe syntax. Retrieved from https://haozhu233.github.io/kableExtra/

Footnotes

For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎
An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎
An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎

Citation

BibTeX citation:

@online{rivera espejo2024,
  author = {Rivera Espejo, Jose Manuel},
  title = {Additional {Notebook}},
  date = {2024-04-09},
  langid = {en}
}

For attribution, please cite this work as:

Rivera Espejo, J. M. (2024, April 9). Additional Notebook.