code
# load packages
= c('RColorBrewer','stringr','dplyr','tidyverse',
libraries 'rechape2','knitr','kableExtra',
'rstan','StanHeaders','runjags','rethinking')
sapply(libraries, require, character.only=T)
Bayesian analysis, speech intelligibility, bounded outcomes, clustering, measurement error, outliers, heteroscedasticity, generalized linear latent and mixed models, robust regression models.
The purpose of this Additional Notebook is to improve the transparency and replicability of the analysis for the study Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores. A digital version of the study can be found in the Code-links
section of this document.
Furthermore, the notebook meticulously follows the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist) developed by Depaoli & van de Schoot (2017). The checklist outlines the ten crucial points that need careful scrutiny when employing Bayesian inference procedures.
WAMBS checklist
Questionnaire outlining the ten crucial points that need careful scrutiny when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of Bayesian analysis (Depaoli & van de Schoot, 2017).
# load packages
= c('RColorBrewer','stringr','dplyr','tidyverse',
libraries 'rechape2','knitr','kableExtra',
'rstan','StanHeaders','runjags','rethinking')
sapply(libraries, require, character.only=T)
# load functions
= '/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/paper1'
main_dir source( file.path( main_dir, 'walkthrough/code', 'user-defined-functions.R') )
In this notebook, Section 3 introduces various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explains the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrates the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.
The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate the walk-through: RColorBrewer
(Neuwirth, 2022) and quarto
(Allaire, Teague, Scheidegger, Xie, & Dervieux, 2022). Second, the packages used for the handling the data: stringr
(H. Wickham, 2022), dplyr
(Hadley Wickham, François, Henry, Müller, & Vaughan, 2023), tidyverse
(Hadley Wickham et al., 2019), reshape2
(H. Wickham, 2007), knitr
(Xie, 2024), and kableExtra
(Zhu, 2024). Lastly, the packages used for the Bayesian estimation: coda
(Plummer, Best, Cowles, & Vines, 2006), loo
(A. Vehtari et al., 2023; A. Vehtari, Simpson, Gelman, Yao, & Gabry, 2021), cmdstanr
(Gabry & Češnovar, 2022), rstan
(Stan Development Team, 2020), runjags
(Denwood, 2016), and rethinking
(McElreath, 2021).
Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure consists of defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. Upon observing empirical data, these priors undergo updating to posterior distributions following Bayes’ rule (Jeffreys, 1998), from which the statistical inferences are derived 1. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:
Bayesian inference
Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt & Skrondal, 2010).
\begin{align*} P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) } \end{align*} \tag{1}
where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the assumed probability distribution for the outcome Y, given the parameter \beta and covariate X. P( \beta ) defines the prior distribution of the parameter \beta. Lastly, P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data. Consequently, the posterior distribution of the parameter P( \beta | Y, X ) describes the probability distribution of \beta after observing empirical data.
Likelihood
probability distribution that describes the assumption about the underlying processes that give rise to the data (Everitt & Skrondal, 2010).
Prior distribution
Probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010).
Posterior distribution
Probability distribution summarizing the information about a parameter after observing empirical data (Everitt & Skrondal, 2010).
Before implementing Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This just says that the numerator in the equation is re-scaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:
\begin{align*} P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\ \end{align*} \tag{2}
where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition makes the calculation of posterior distributions easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).
Second, a dataset usually has multiple observations of the outcome Y and covariates X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities, and assuming independence among the observations, the likelihood of a dataset can be rewritten as the product of all individual likelihoods. Consequently, Equation 2 can also be rewritten as follows:
\begin{align*} P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align*} \tag{3}
Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works.
Markov Chain Monte Carlo (MCMC)
Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010) 2.
Grid approximation
Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020) 3.
A simple Bayesian linear regression model can be written in the following form:
\begin{align*} y_{i} &= \beta \cdot x_{i} + e_{i} \\ e_{i} &\sim \text{Normal}( 0, 1 ) \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the residuals e_{i} are assumed to follow a normal distribution with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20]. This prior implies that equal probabilities are assigned to all values of \beta within this range. However, a more convenient generalized manner to represent the same linear regression model is as follows:
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed that \beta has a uniform prior within the range of [-20,+20]. Moreover, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed of the covariate x_{i} and its effect on the outcome \beta.
For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).
set.seed(12345)
= 100
n
= 0.2
b = rnorm( n=n, mean=0, sd=1 )
x
= b*x
mu_y = rnorm( n=n, mean=mu_y, sd=1 ) y
# grid approximation
= 1000
Ngp
= seq( from=-20, to=20, length.out=Ngp )
b_cand
= function(i){ b_cand[i]*x }
udf = sapply( 1:length(b_cand), udf )
mu_y
= function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
udf = sapply( 1:length(b_cand), udf )
y_lik
= rep( 1/40, length(b_cand) )
b_prior
= y_lik * b_prior
b_prop = b_prop / sum(b_prop) b_post
paste0( 'true beta = ', b )
= sum( b_cand * b_post )
b_exp paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
= b_cand[ b_post==max(b_post) ]
b_max paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
= sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
b_var paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
= sum( b_post[ b_cand > 0 ] )
b_prob paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.299"
[1] "estimated beta (maximum probability) = 0.3"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-3,3),
pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue' )
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,1))
Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.
# grid approximation
= 1000
Ngp
= data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) )
post
= function(i){ post$b_cand[i]*x }
ud_func = sapply( 1:length(post$b_cand), ud_func )
mu_y
= function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
ud_func = sapply( 1:length(post$b_cand), ud_func )
y_lik
$b_prior1 = rep( 1/40, length(post$b_cand) )
post$b_prior2 = dnorm( post$b_cand, mean=0, sd=0.5 )
post$b_prior3 = dnorm( post$b_cand, mean=0.2, sd=0.05 )
post
= c()
nam for( i in 1:3 ){
= y_lik * post[, paste0('b_prior',i) ]
b_prop
= c(nam, paste0('b_post',i) )
nam = cbind(post, data.frame( b_prop / sum(b_prop) ) )
post
}names(post)[5:7] = nam
First, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior.
Non-informative priors
Prior that reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior1')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post1')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))
Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution.
Weakly informative priors
Prior that reflects the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior2')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post2')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))
Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution.
Informative priors
Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior3')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post3')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))
Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.
In cases requiring greater modeling flexibility, a more refined representation of the parameters’ priors can be defined in terms of hyperparameters and hyperpriors. A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:
Hyperparameters
Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1} (Everitt & Skrondal, 2010).
Hyperpriors
Prior distributions for hyperparameters (Everitt & Skrondal, 2010).
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\ v &\sim \text{Normal}(0, 3) \end{align*} where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.
However, setting prior distributions through hyperparameters brings its own challenges. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that prior probabilistic representations defined in terms of hyperparameters sometimes exhibit complex sample geometries compared to simple priors 4. The reparametrization of hyperpriors into such simpler sample geometries leads to the notion of non-centered priors. By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:
Non-centered priors
Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &= z \cdot \text{exp}(v) \\ v &\sim \text{Normal}(0, 3) \\ z &\sim \text{Normal}( 0, 1 ) \end{align*} where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures can have a harder time sampling from the former than the latter distributions.
= 5000
n
= rnorm( n=n, mean=0, sd=1 )
v = rnorm( n=n, mean=0, sd=1 )
z
= rnorm( n=n, mean=0, sd=exp(v) )
b_cent = z*exp(v) b_non
par( mfrow=c(1,2) )
plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1),
xlab=expression(beta), ylab=expression(v),
main='Centered parametrization' )
plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1),
xlab=expression(z), ylab=expression(v),
main='Non-centered parametrization' )
par( mfrow=c(1,1) )
The selection of the Bayesian approach was based on three key properties. Firstly, empirical evidence from prior research demonstrates that Bayesian methods outperform frequentist methods, particularly in handling complex and over-parameterized models (Baker, 1998; Kim & Cohen, 1999). This superiority is evident when dealing with complex models, like the proposed GLLAMM, that are challenging to program or are not viable under frequentist methods (Depaoli, 2014).
Benefits of Bayesian inference procedures
More suitable to deal with:
Secondly, the approach allows for the incorporation of prior information, ensuring that certain parameters are confined within specified boundaries. This helps mitigate non-convergence or improper parameter estimation issues commonly observed in complex models under frequentist methods (Martin & McDonald, 1975; Seaman III, Seaman Jr., & Stamey, 2011). In this study, for example, this property was leveraged to incorporate information about the variances of random effects and constrain them to be positive.
Lastly, the Bayesian approach demonstrates proficiency in handling relatively small sample sizes (Baldwin & Fellingham, 2013; Depaoli, 2014; Lambert, Sutton, Burton, Abrams, & Jones, 2006). In this case, despite the study dealing with 2,263 entropy scores, these were derived from a modest sample size of 32 speakers, from whom the inferences are drawn. Consequently, reliance on the asymptotic properties of frequentist methods may not be warranted in this context, underscoring the pertinence of this property to the current study.
A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma (Everitt & Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).
Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.
require(rethinking)
= c(-1.5, 0, 1.5)
mu = c(1.5, 1, 0.5)
sigma
par(mfrow=c(1,2))
= sapply( 1:length(mu), col.alpha, alpha=0.7)
cp for(i in 1:length(mu)){
if(i==1){
curve( dnorm(x, mean=mu[i], sd=1),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=mu, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( mu[1]==-1.5,
2]==0,
mu[3]==+1.5,
mu[==1) )
sigmaelse{
} curve( dnorm(x, mean=mu[i], sd=1),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
= sapply( 1:length(sigma), col.alpha, alpha=0.7)
cp for(i in 1:length(sigma)){
if(i==1){
curve( dnorm(x, mean=0, sd=sigma[i]),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=0, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( sigma[1]==1.5,
2]==1,
sigma[3]==0.5,
sigma[==0) )
muelse{
} curve( dnorm(x, mean=0, sd=sigma[i]),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
par(mfrow=c(1,1))
A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample size M (Everitt & Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample size M. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.
Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.
require(rethinking)
= c(0.2, 0.5, 0.8)
mu = c(2, 5, 20)
M
par(mfrow=c(1,2))
= sapply( 1:length(mu), col.alpha, alpha=0.7)
cp for(i in 1:length(mu)){
if(i==1){
curve( dbeta2(x, prob=mu[i], theta=10),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=mu, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( mu[1]==0.2,
2]==0.5,
mu[3]==0.8,
mu[==10) )
Melse{
} curve( dbeta2(x, prob=mu[i], theta=10),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
= sapply( 1:length(M), col.alpha, alpha=0.7)
cp for(i in 1:length(M)){
if(i==1){
curve( dbeta2(x, prob=0.3, theta=M[i]),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=0.3, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( M[1]==2,
2]==5,
M[3]==20,
M[==0.3) )
muelse{
} curve( dbeta2(x, prob=0.3, theta=M[i]),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
par(mfrow=c(1,1))
It is crucial to comprehend what signifies for an outcome to follow a normal distribution, as the assumption of normally distributed outcomes is ubiquitous in speech intelligibility research (see Boonen, Kloots, Nurzia, & Gillis, 2021; Flipsen, 2006; Lagerberg, Asberg, Hartelius, & Persson, 2014).
Boundedness
Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)
Underfitting
Occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data (Everitt & Skrondal, 2010).
Misspecification
Occurs when the model’s functional form or inclusion of covariates poorly represent relevant aspects of the true data. This can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).
In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Neglecting the bounded nature of an outcome can lead, at best, to underfitting, and, at worse, to misspecification.
A commonly know Bayesian probabilistic representation of an ordinary linear mixed model (LMM) can be expressed as follows:
Ordinary linear mixed model (LMM)
Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, & Kelley, 2019).
\begin{align*} y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\ \varepsilon_{ib} &\sim \text{Normal}(0, 1) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \end{align*}
where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.
A generalized linear mixed model (GLMM) is a generalization of LMM (Lee & Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:
Generalized linear mixed model (GLMM)
Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996).
\begin{align*} y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\ \mu_{ib} &= \beta x_{i} + a_{b} \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}
Notice this representation explicitly highlights the three components of a GLMM: the likelihood component, the linear predictor, and the link function (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, resulting from the use of Bayesian procedures, a fourth component can be added to any GLMM: the prior distributions. The priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.
GLMM components
On the other hand, a Beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:
\begin{align*} y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\ \mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}
Notice the representation also highlights the three components of a GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the additional fourth component resulting from using Bayesian procedures, the prior assumptions for \beta and a_{b} are also declared.
Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error.
Clustering
It occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).
Statistical power
The model’s ability to reject the null hypothesis when it is false (Everitt & Skrondal, 2010).
Type I error
The error that results when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).
Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.
The problem with measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:
Measurement error
It refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010).
\begin{align*} \tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\ y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.
For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.
The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.
set.seed(12345)
= 100
n
= 0.2
b = rnorm( n=n, mean=0, sd=1 )
x
= b*x
mu_y = rnorm( n=n, mean=mu_y, sd=1 )
y
= 2
s = rnorm( n=n, mean=y, sd=s ) y_tilde
# grid approximation
= 1000
Ngp
= seq( from=-20, to=20, length.out=Ngp )
b_cand
= function(i){ b_cand[i]*x }
udf = sapply( 1:length(b_cand), udf )
mu_y
= function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) }
udf = sapply( 1:length(b_cand), udf )
y_lik_man
= function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
udf = sapply( 1:length(b_cand), udf )
y_lik_lat
= rep( 1/40, length(b_cand) )
b_prior
= y_lik_man * b_prior
b_prop_man = b_prop_man / sum(b_prop_man)
b_post_man
= y_lik_lat * b_prior
b_prop_lat = b_prop_lat / sum(b_prop_lat) b_post_lat
paste0( 'true beta = ', b )
# manifest outcome
= sum( b_cand * b_post_man )
b_exp_man paste0( 'estimated beta (expectation on manifest) = ',
round(b_exp_man, 3) )
= sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) )
b_var_man paste0( 'estimated beta (standard deviation on manifest) = ',
round(b_var_man, 3) )
# latent outcome
= sum( b_cand * b_post_lat )
b_exp_lat paste0( 'estimated beta (expectation on latent) = ',
round(b_exp_lat, 3) )
= sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) )
b_var_lat paste0( 'estimated beta (standard deviation on latent) = ',
round(b_var_lat, 3) )
# null hypothsis rejection
= sum( b_post_man[ b_cand > 0 ] )
b_prob_man paste0( 'P(estimated beta on manifest > 0) = ',
round(b_prob_man, 3) )
= sum( b_post_lat[ b_cand > 0 ] )
b_prob_lat paste0( 'P(estimated beta on latent > 0) = ',
round(b_prob_lat, 3) )
[1] "true beta = 0.2"
[1] "estimated beta (expectation on manifest) = 0.212"
[1] "estimated beta (standard deviation on manifest) = 0.176"
[1] "estimated beta (expectation on latent) = 0.299"
[1] "estimated beta (standard deviation on latent) = 0.088"
[1] "P(estimated beta on manifest > 0) = 0.887"
[1] "P(estimated beta on latent > 0) = 1"
par( mfrow=c(1,2) )
plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7),
pch=19, col=rgb(0,0,0,alpha=0.3),
ylab=expression(tilde(y)),
main='manifest outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_man, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
plot( x, y, xlim=c(-3,3), ylim=c(-7,7),
pch=19, col=rgb(0,0,0,alpha=0.3),
main='latent outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_lat, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par( mfrow=c(1,1) )
par(mfrow=c(1,2))
plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1),
main='Posterior on manifest outcome',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1),
main='Posterior on latent outcome',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,1))
Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, & Pickles, 2004). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).
Latent variables
Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt & Skrondal, 2010).
Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004).
It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error.
Type II error
The error that results when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).
Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.
In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable (Everitt & Skrondal, 2010). The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:
Heteroscedasticity
Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity (Everitt & Skrondal, 2010).
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \sigma_{i} &= exp( \gamma \cdot x_{i} ) \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \gamma &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].
Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*} As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.
set.seed(12345)
= 100
n
= 0.2
b = 1
g
= rnorm( n=n, mean=0, sd=1 )
x
= b*x
mu_y = exp(g*x)
s_y
= rnorm( n=n, mean=mu_y, sd=s_y ) y
# grid approximation
= 1000
Ngp
= seq( from=-20, to=20, length.out=Ngp )
b_cand
= function(i){ b_cand[i]*x }
udf = sapply( 1:length(b_cand), udf )
mu_y
= function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
udf = sapply( 1:length(b_cand), udf )
y_lik
= rep( 1/40, length(b_cand) )
b_prior
= y_lik * b_prior
b_prop = b_prop / sum(b_prop) b_post
paste0( 'true beta = ', b )
= sum( b_cand * b_post )
b_exp paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
= b_cand[ b_post==max(b_post) ]
b_max paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
= sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
b_var paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
= sum( b_post[ b_cand > 0 ] )
b_prob paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.638"
[1] "estimated beta (maximum probability) = 0.621"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
abline( a=-4, b=-1, lty=2, col=rgb(0,0,0,alpha=0.3))
abline( a=4.4, b=1.5, lty=2, col=rgb(0,0,0,alpha=0.3))
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,2) )
par(mfrow=c(1,1))
Outliers can occur by chance in any distribution, but they typically indicate novel behavior or structures in a dataset. For instance, outliers may arise due to measurement error, a heavy-tailed distribution within a population, or when some sampled observations come from a different population than intended (Grubbs, 1969). Additionally, outliers may emerge when the assumed (causal) generating mechanisms of the data differ from the observed data at the extreme end.
Outliers
Observation that appear to deviate markedly from other sample data points in which it occurs (Grubbs, 1969).
Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected. The intuition is proven correct when \beta is estimated using the same “incorrect” model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.
set.seed(12345)
= 100
n
= 0.2
b = rnorm( n=n, mean=0, sd=1 )
x
= b*x
mu_y = rnorm( n=n, mean=mu_y, sd=1 )
y
= which( x>1 )
idx = 1:3
sel = 6 y[idx[sel]]
# grid approximation
= 1000
Ngp
= seq( from=-20, to=20, length.out=Ngp )
b_cand
= function(i){ b_cand[i]*x }
udf = sapply( 1:length(b_cand), udf )
mu_y
= function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
udf = sapply( 1:length(b_cand), udf )
y_lik
= rep( 1/40, length(b_cand) )
b_prior
= y_lik * b_prior
b_prop = b_prop / sum(b_prop) b_post
paste0( 'true beta = ', b )
= sum( b_cand * b_post )
b_exp paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
= b_cand[ b_post==max(b_post) ]
b_max paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
= sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
b_var paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
= sum( b_post[ b_cand > 0 ] )
b_prob paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.477"
[1] "estimated beta (maximum probability) = 0.46"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
pch=19, col=rgb(0,0,0,alpha=0.3) )
points( x[idx[sel]], y[idx[sel]],
pch=19, col=rgb(0,0,1,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,2) )
par(mfrow=c(1,1))
As recommended by McElreath (2020), robust models can be used to deal with these types of distributional departures. The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors or to the presence of outliers.
Robust models
A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model for (Everitt & Skrondal, 2010).
It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020).
Bias
It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).
Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.
= '/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/#data/final'
data_dir = "data_H_list.RData"
data_nam = file.path(data_dir, data_nam )
model_data load( model_data )
# str(dlist)
= c('bid','cid','uid','wid','HS','A','Am','Hwsib')
var_int = data.frame(dlist[var_int])
data_H # str(data_H)
As expected, the data reveals the two significant features of entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (Table 1). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected speakers).
= c('bid','cid','uid','wid','HS','A','Am','Hwsib')
var_int
kable( head( data_H[, var_int], 10 ),
col.names=c('Block ID','Speaker ID','Sentence ID','Word ID',
'Hearing status','Chron. age','Chron. age (centered)',
'Entropy') )
Block ID | Speaker ID | Sentence ID | Word ID | Hearing status | Chron. age | Chron. age (centered) | Entropy |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 2 | 85 | 17 | 0.0570388 |
1 | 1 | 1 | 2 | 2 | 85 | 17 | 0.2785730 |
1 | 1 | 1 | 3 | 2 | 85 | 17 | 0.2785730 |
1 | 1 | 1 | 4 | 2 | 85 | 17 | 0.4607393 |
1 | 1 | 1 | 5 | 2 | 85 | 17 | 0.1134471 |
1 | 1 | 1 | 6 | 2 | 85 | 17 | 0.0001000 |
1 | 1 | 1 | 7 | 2 | 85 | 17 | 0.0001000 |
1 | 1 | 1 | 8 | 2 | 85 | 17 | 0.0001000 |
2 | 1 | 2 | 1 | 2 | 85 | 17 | 0.0662660 |
2 | 1 | 2 | 2 | 2 | 85 | 17 | 0.0662660 |
require(rethinking)
= c(20,8,11, 25,30,6)
speakers
par(mfrow=c(2,3))
for( i in speakers ){
= data_H$cid == i
speaker
= binning( y=data_H$Hwsib[speaker], min_y=0, max_y=1,
dat n_bins=20, dens=T )
plot(dat, ylab="Frequency-Density", ylim=c(-0.15,max(dat)), xaxt='n',
xlim=c(-0.05,1.05), xlab='entropy', col=rgb(0,0,0,0.6) )
abline( h=0, col='gray' )
abline( v=c(0,1), lty=2, col=rgb(0,0,0,0.3) )
axis( side=1, at=as.numeric(names(dat)),
labels=names(dat), las=2, cex.axis=0.8 )
mtext( text=paste0('Speaker ', i), side=3, adj=0, cex=1.1)
}par(mfrow=c(1,1))
Additionally, the data shows the 320 speakers’ speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (M=7.1, SD=1.1, Table 2), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).
= with(data_H, table(cid, uid) )
speech_samples speech_samples
uid
cid 1 2 3 4 5 6 7 8 9 10
1 8 8 8 7 8 7 7 6 6 5
2 9 7 7 7 6 6 6 6 5 7
3 8 8 8 8 8 7 8 7 7 6
4 8 8 8 7 7 7 7 7 6 6
5 8 9 7 7 7 7 7 7 6 7
6 8 6 3 6 6 6 6 5 4 4
7 9 7 7 7 7 8 4 6 6 6
8 8 8 8 9 7 7 6 6 6 6
9 9 11 7 6 7 6 6 5 5 6
10 9 8 8 9 8 8 7 6 6 6
11 10 9 7 7 7 7 7 7 6 7
12 9 8 7 8 8 8 6 7 6 6
13 7 7 7 7 7 7 7 7 6 6
14 8 8 8 7 8 6 6 6 5 6
15 8 8 7 7 7 8 7 8 7 6
16 8 7 7 7 7 6 6 6 6 6
17 8 7 7 7 7 6 7 7 6 6
18 8 8 8 7 7 7 6 6 6 6
19 9 8 7 7 8 8 6 8 6 6
20 10 8 8 7 7 7 7 6 6 6
21 9 8 8 8 7 7 7 8 6 6
22 9 8 8 7 7 7 7 7 8 7
23 8 8 7 7 7 7 7 7 6 6
24 7 7 7 7 7 7 7 8 7 7
25 9 8 8 7 7 8 8 7 6 6
26 7 7 7 9 7 7 7 7 7 7
27 9 8 8 7 7 7 7 7 6 6
28 9 8 8 8 7 7 7 7 6 6
29 8 9 8 9 7 7 7 6 6 7
30 9 7 7 7 7 7 6 6 5 5
31 11 8 8 8 7 7 7 7 7 6
32 8 8 8 10 7 8 6 6 6 6
= data.frame( speech_samples )
speech_samples hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),
main='', xlab='words per sentence')
kable( psych::describe( speech_samples$Freq ),
digits=2, row.names=F )
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 320 | 7.07 | 1.06 | 7 | 7.04 | 1.48 | 3 | 11 | 8 | 0.19 | 1.45 | 0.06 |
Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M=86.3, SD=9.0), while HI/CI group were between 78 and 98 months old (M=86.3, SD=6.7).
= unique( data_H[,c('cid','HS','A')])
d_mom with( d_mom, table( A, HS ) )
HS
A 1 2
68 0 1
76 0 1
78 2 0
80 1 1
82 5 0
83 0 1
84 0 1
85 0 6
86 2 1
88 1 0
93 2 2
94 1 0
97 1 0
98 1 0
104 0 2
Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:
= list(
dlist
N = nrow(data_H),
B = max(data_H$bid),
I = max(data_H$cid),
U = max(data_H$uid),
W = max(data_H$wid),
cHS = max(data_H$HS),
bid = data_H$bid,
cid = data_H$cid,
uid = data_H$uid,
wid = data_H$wid,
HS = data_H$HS,
A = data_H$A,
Am = with( data_H, A - min(A) ),
Hwsib = data_H$Hwsib
)
str(dlist)
List of 14
$ N : int 2263
$ B : int 5
$ I : int 32
$ U : int 10
$ W : num 11
$ cHS : int 2
$ bid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ cid : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
$ uid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ wid : num [1:2263] 1 2 3 4 5 6 7 8 1 2 ...
$ HS : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
$ A : int [1:2263] 85 85 85 85 85 85 85 85 85 85 ...
$ Am : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
$ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...
This section articulates the probabilistic formalism of both the Normal LMM and the proposed Beta-proportion GLLAMM. Subsequently, it details the set of fitted models and the estimation procedure
The general mathematical formalism of the Normal LMM posits that the likelihood of the (manifest) entropy scores H_{wsib} follows a normal distribution, i.e.
\begin{align} H_{wsib} & \sim \text{Normal} \left( \mu_{sib}, \sigma_{i} \right) \end{align} \tag{4}
where \mu_{sib} represents the average entropy at the word-level and \sigma_{i} denotes the standard deviation of the average entropy at the word-level, varying for each speaker. Given the clustered nature of the data, \mu_{sib} is defined by the linear combination of individual characteristics and several random effects:
\begin{align} \mu_{sib} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + u_{si} + e_{i} + a_{b} \end{align} \tag{5}
where HS_{i} and A_{i} denote the hearing status and chronological age of speaker i, respectively. Additionally, \alpha denote the general intercept, \alpha_{HS[i]} represents the average entropy for each hearing status group, and \beta_{A,HS[i]} denotes the evolution of the average entropy per unit of chronological age A_{i} for each hearing status group. Furthermore, u_{si} denotes the sentence-speaker random effects measuring the unexplained entropy variability within sentences for each speaker, e_{i} denotes the speaker random effects describing the unexplained entropy variability between speakers, and a_{b} denotes the block random effects assessing the unexplained variability between experimental blocks.
Several notably features of the Normal LLM can be discerned from the equations. Firstly, Equation 4 indicates that the variability of the average entropy at the word-level can differ for each speaker, enhancing the model’s robustness to mild or moderate data departures from the normal distribution assumption, such as heteroscedasticity or outliers (refer to Section 3.5). Secondly, Equation 5 reveals that the model assumes no transformation is applied to the relationship between the average entropy and the linear predictor. This is commonly known as a direct link function. Moreover, Equation 5 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the equation implies the model considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers
Centering
Procedure use to facilitate the interpretation of regression parameters (Everitt & Skrondal, 2010).
The general mathematical formalism of the proposed Beta-proportion GLLAMM comprises four components: a response model, with its likelihood, linear predictor, and link function, and a structural model. The response model posits the likelihood of entropy scores follow a beta-proportion distribution,
GLLAMM components
\begin{align} H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right) \end{align} \tag{6}
where\mu_{ib} denotes the average entropy at the word-level and M_{i} signifies the dispersion of the average entropy at the word-level, varying for each speaker. Additionally, \mu_{ib} is defined as,
\begin{align} \mu_{ib} &= \text{logit}^{-1}[ a_{b} - SI_{i} ] \end{align} \tag{7}
where \text{logit}^{-1}(x) = exp(x) / (1+exp(x)) is the inverse-logit link function, a_{b} denotes the block random effects, and SI_{i} describes the speaker’s latent potential intelligibility. Conversely, the structural equation model relates the speakers’ latent potential intelligibility to the individual characteristics:
\begin{align} SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \end{align} \tag{8}
where \alpha defines the general intercept, \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups, and \beta_{A,HS[i]} indicates the evolution of potential intelligibility per unit of chronological age for each hearing status group. Furthermore, e_{i} represents speakers block effects, describing unexplained potential intelligibility variability between speakers, and u_{i} = \sum_{s=1}^{S} u_{si}/S denotes sentence random effects, assessing the average unexplained potential intelligibility variability among sentences within each speaker, with S denoting the total number of sentences per speaker.
Several features are evident in this probabilistic representations. Firstly, akin to the Normal LMM, Equation 6 reveals that the dispersion of average entropy at the word level can differ for each speaker. This enhances the model’s robustness to mild or moderate data departures from the beta-proportion distribution assumption (refer to Section 3.5). Secondly, in contrast with the Normal LMM, Equation 7 shows the potential intelligibility of a speakers has a negative non-linear relationship with the entropy scores, explicitly highlighting the inverse relationship between intelligibility and entropy. This feature also maps the unbounded linear predictor to the bounded limits of the entropy scores. Thirdly, in contrast with the Normal LMM, Equation 8 demonstrates that the structural parameters are interpretable in terms of the latent potential intelligibility scores, where the scale of the latent trait is set by the general intercept \alpha, as required in latent variable models (Depaoli, 2021). Furthermore, the equation implies the model also considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers. Additionally, Equation 8 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the same equation assumes the intelligibility scores have two sources of unexplained variability: e_{i} and u_{i}. The former represents inherent differences in potential intelligibility among different speakers, while the latter assumes that different sentences measure potential intelligibility differently due to variations in word difficulties and their interplay within the sentence.
Bayesian procedures require the incorporation of priors (refer to Section 3.1). This study establishes priors and hyperpriors for the parameters of both the Normal LMM and the Beta-proportion GLLAMM using prior predictive simulations.
Prior predictive simulations
Procedure that entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. The procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).
For the parameters of the Normal LMM, non-informative priors and hyperpriors are established to align with analogous model assumptions in frequentist methods (refer to Section 3.1.4). The specified priors are as follows:
As described in Section 5.3, the models initially consider one \sigma prior for all the speakers. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
\begin{align} \sigma_{i} &\sim \text{Exponential}\left( 2 \right) \end{align} \tag{9}
The left panel of Figure 17 shows the weakly informative prior expects \sigma to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). Furthermore, the right panel of Figure 17 shows that when transformed to the entropy scale, the model expect predictions to fall beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=0.5, sd=param_pscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(sigma))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
Furthermore, as described in Section 5.1.1 and Section 5.3, there is a possibility that the model considers one \sigma_{i} prior for each of the speakers in the data. This choice implies that the presumed uncertainty about unexplained variability of the average entropy at the word-level is similar for each speaker, prior to observing empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5). \begin{align} r_{S} &\sim \text{Exponential}\left( 2 \right) \\ \sigma_{i} &\sim \text{Exponential}\left( r_{S} \right) \end{align} \tag{10}
The left panel of Figure 18 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). The panel also shows the parameters are more likely to happen in the interval of [0, 2.5]. Moreover, the right panel of Figure 18 shows that when the prior is transformed to the entropy scale, the model expect scores to fall beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s
= r_s * z_s
param_pscale = rnorm( n=n, mean=0.5, sd=param_pscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(sigma[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
This parameter is used in preliminary models where no mathematical formulations regarding how speaker-related factors influence intelligibility are investigated. The prior distribution for \alpha under the Normal LMM is described in Equation 11.
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{11}
The left panel of Figure 19 show the prior is an narrowly concentrated around zero. Moreover, the right panel of Figure 19, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at low levels of the feasible range of the outcome. This implies that particular bias in entropy scores towards lower values are expected by prior.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
param_pscale = rnorm(n=n, mean=param_pscale, sd=param_hscale)
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Normal LMM is described in Equation 12. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \end{align} \tag{12}
The left panel of Figure 20 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} between [0.3, 0.7]. This implies that no particular bias towards entropy values above or below 0.5 for different hearing status groups is present in the priors. However, the right panel of Figure 20 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around low levels of entropy, but also beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm( n=n, mean=0, sd=0.2 )
param_pscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(alpha[HS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Normal LMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of entropy attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{13}
The left panel of Figure 21 shows the prior restricts \beta_{A,HS[i]} to be mostly within the range of [-0.4, 0.4]. This implies that there is no particular bias towards a positive or negative evolution of entropy scores due to chronological age per hearing status group. However, the right panel of Figure 21 show that when this prior is transformed to the entropy scale, the model anticipate a concentration of entropy values at lower levels, but it also expects entropy scores significantly beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm( n=n, mean=0, sd=0.1 )
param_pscale = function(i){ param_pscale * data_H$Am[i] }
usd = sapply( 1:length(data_H$Am), usd )
param_mscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(beta[AHS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution of e_{i} for the Normal LMM is described in Equation 14. The same prior is assigned to each speaker in the sample. This choice implies that differences in entropy scores between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{14}
The left panel of Figure 22 shows the prior anticipates differences in entropy scores between speakers as large 3 units of entropy. However, the right panel of Figure 22 demonstrate that when transformed to the entropy scale the model anticipates a concentration scores around low levels, but also it expects the differences to go way beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm(n=n, mean=0, sd=0.05 )
m_i = rexp(n=n, rate=2 )
s_i = rnorm( n=n, mean=m_i, sd=s_i )
param_pscale
= rnorm( n=n, mean=param_pscale, sd=param_hscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(e[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution of u_{si} for the Normal LMM is described in Equation 15. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average entropy score differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters (refer to Section 3.1.5).
\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ \end{align} \tag{15}
The left panel of Figure 23 shows the prior restricts the average differences in entropy among sentences within speakers can be as large as 3 units of measurement. Furthermore, the right panel of Figure 23 demonstrate that when transformed to the entropy scale the model anticipates a concentration of scores around mid-levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm(n=n, mean=0, sd=0.05 )
m_u = rexp(n=n, rate=2 )
s_u = rnorm( n=n, mean=m_u, sd=s_u )
param_pscale
= rnorm( n=n, mean=param_pscale, sd=param_hscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Normal LMM is described in Equation 16. The same prior is assigned to each block. This choice implies that the average entropy score differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{16}
The left panel of Figure 24 shows a prior with no particular bias towards differences between blocks above or below zero units of entropy. Nevertheless, the right panel of Figure 24 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, but also contemplates differences beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm(n=n, mean=0, sd=0.05 )
m_b = rexp(n=n, rate=2 )
s_b = rnorm( n=n, mean=m_b, sd=s_b )
param_pscale
= rnorm( n=n, mean=param_pscale, sd=param_hscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[si]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
After the careful assessment of the prior implications for each parameter, the expected prior distribution for the linear predcitor can be constructed for the Normal LMM. The prior predictive simulation can be described as in Equation 17:
\begin{align} m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ g(\cdot) &= \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{si} + a_{b} \\ \end{align} \tag{17}
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-2.5, 2.5], implying there is particular bias towards negative entropy scores is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
require(rethinking)
= 1000
n
= rexp(n=n, rate=2 )
r_s = rexp(n=n, rate=1 )
z_s = r_s * z_s
param_hscale
= rnorm(n=n, mean=0, sd=0.05 )
m = rexp(n=n, rate=2 )
s = rnorm( n=n, mean=m, sd=s )
e_i = rnorm( n=n, mean=m, sd=s )
u_si = rnorm( n=n, mean=m, sd=s )
a_b
= rnorm( n=n, mean=0, sd=0.2 )
aHS = rnorm( n=n, mean=0, sd=0.1 )
bAHS = aHS + bAHS + u_si + e_i + a_b
param_pscale
= rnorm( n=n, mean=param_pscale,
param_oscale sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-3,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(SI[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
For the parameters of the Beta-proportion GLLAMM, weakly informative priors and hyperpriors are established (refer to Section 3.1.4). The specified priors are as follows:
Similar to the Normal LMM, Section 5.3 describes a Beta-proportion GLLAMM that initially considers one M for all speakers in the data. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
\begin{align} M &\sim \text{Exponential}\left( 0.4 \right) \end{align} \tag{18}
The left and right panel of Figure 26, demonstrate the prior of M expects the parameters to be more probable in a positive range between [0, 7], while predicting scores within the boundaries of the data. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
require(rethinking)
= 1000
n
= rexp( n=n, rate=0.4 )
param_pscale = rbeta2( n=n, prob=0.5, theta=param_pscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,10),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
Furthermore, as described in Section 5.1.2 and Section 5.3, there is a possibility that the model considers one M_{i} prior for each speakers in the data. This choice implies the presumed uncertainty for the unexplained dispersion of the average entropy at the word-level is similar for each speaker, prior to the observation of empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
\begin{align} r_{M} &\sim \text{Exponential}\left( 0.2 \right) \\ M_{i} &\sim \text{Exponential}\left( r_{M} \right) \end{align} \tag{19}
The left and right panel of Figure 27, demonstrate the prior of M_{i} expects the parameters to be more probable in a positive range between [0, 20], while at the same time predicting data within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M
= r_M * z_M
param_pscale = rbeta2( n=n, prob=0.5, theta=param_pscale )
param_oscale
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,20),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
Considering that the structural parameters are now interpretable in terms of the (latent) potential intelligibility scores, the general intercept \alpha is used to set the scale of the latent trait, as it is required in latent variable models (Depaoli, 2021) (refer to Section 3.1.4). The prior distribution for \alpha under the Beta-proportion GLLAMM is described in Equation 20.
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{20}
The left panel of Figure 28 show the prior is narrowly concentrated around zero. Moreover, the right panel of Figure 28, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at mid-levels of the feasible range of the outcome. This implies that no particular bias in entropy scores are expected by prior.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
param_pscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Beta-proportion GLLAMM is described in Equation 21. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \end{align} \tag{21}
The right panel of Figure 29, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards specific entropy score values are expected from the using the prior.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.3 )
param_pscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(alpha[HS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Beta-proportion GLLAMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties, prior to the observation of empirical data.
\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{22}
The left panel of Figure 30 shows the weakly informative prior has no particular bias towards a positive or negative evolution of potential intelligibility due to chronological age per hearing status group. Furthermore, the right panel of Figure 30, demonstrate that when transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.1 )
param_pscale = function(i){ param_pscale * data_H$Am[i] }
usd = sapply( 1:length(data_H$Am), usd )
param_mscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(beta[AHS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Beta-proportion GLLAMM is described in Equation 23. The same prior is assigned to each speakers in the sample. This choice implies that differences in potential intelligibility differences between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters, called hyperpriors (refer to Section 3.1.5).
\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{23}
The left panel of Figure 31 shows the prior anticipates differences in intelligibility between speakers as large 3 units of measurement. Furthermore, the right panel of Figure 31, demonstrate that when transformed to the entropy scale, the model anticipates a high concentration around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between speakers are expected resulting from using this prior.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
m_i = rexp(n=n, rate=2 )
s_i = rnorm( n=n, mean=m_i, sd=s_i )
param_pscale
= rbeta2( n=n, prob=inv_logit(-1*param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-2,2),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(e[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution of u_{i} for the Beta-proportion GLLAMM is described in Equation 24. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average potential intelligibility differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5). Next, the within sentence-speaker differences are then aggregated to the speaker level to form the sentence random effects u_{i},
\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \end{align} \tag{24}
The left panel of Figure 32 shows the prior restricts the average differences in potential intelligibility among sentences within speakers can be as large as 0.8 units of measurement. Furthermore, the right panel of Figure 32, demonstrate that when u_{i} is transformed to the entropy scale, the model anticipates a high concentration of scores around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility is expected between speakers.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
m_u = rexp(n=n, rate=2 )
s_u = replicate(
param_pscale n=n, expr=mean( rnorm( n=10, mean=m_u, sd=s_u ) ) )
= rbeta2( n=n, prob=inv_logit(-1*param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
The prior distribution for the Beta-proportion GLLAMM is described in Equation 25. The same prior is assigned to each block. This choice implies that the average entropy scores differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{25}
The left panel of Figure 33 shows a prior with no particular bias towards positive or negative differences between blocks. Furthermore, the right panel of Figure 33 demonstrate that when transformed to the entropy scale, the model anticipates a high concentration of data around mid levels of entropy, but not beyond the feasible range of the outcome.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
m_b = rexp(n=n, rate=2 )
s_b = rnorm( n=n, mean=m_b, sd=s_b )
param_pscale
= rbeta2( n=n, prob=inv_logit(param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-2,2),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[si]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed for the Beta-proportion GLLAMM. The prior predictive simulation can be described as in Equation 26:
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\ \end{align} \tag{26}
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-3, 3], implying no particular bias towards positive or negative potential intelligibility is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
require(rethinking)
= 1000
n
= rexp(n=n, rate=0.2 )
r_M = rexp(n=n, rate=1 )
z_M = r_M * z_M
param_hscale
= rnorm( n=n, mean=0, sd=0.05 )
m = rexp(n=n, rate=2 )
s = rnorm( n=n, mean=m, sd=s )
e_i = replicate( n=n,
u_i exp=mean( rnorm( n=10, mean=m, sd=s ) ) )
= rnorm( n=n, mean=m, sd=s )
a_b
= rnorm( n=n, mean=0, sd=0.05 )
a = rnorm( n=n, mean=0, sd=0.3 )
aHS = rnorm( n=n, mean=0, sd=0.1 )
bAHS = a + aHS + bAHS + e_i + u_i
param_pscale
= rbeta2( n=n, prob=inv_logit(a_b-param_pscale),
param_oscale theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-3,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(SI[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))
This study evaluates the comparative predictive capabilities of both the Normal LMM and the Beta-proportion GLLAMM (RQ1) while simultaneously examining various formulations regarding how speaker-related factors influence intelligibility (RQ3). In this context, the predictive capabilities of the models are intricately connected to these formulations. As a result, the study requires fitting 12 different models, each representing a specific manner to investigate one or both research questions. The models comprised six versions of both the Normal LMM and the Beta-proportion GLLAMM. The differences among the models hinged on (1) whether they addressed data clustering in conjunction with measurement error, denoted as the model type, (2) the assumed distribution for the entropy scores, which aimed to handle boundedness, (3) whether the model incorporates a robust feature to address mild or moderate departures of the data from distributional assumptions, and (4) the inclusion or exclusion of speaker-related factors in the models. A detailed overview of the fitted models is available in Table 3.
Model | Entropy | Robust | Fixed effects | |||
---|---|---|---|---|---|---|
Model | type | distribution | feature | \beta_{HS[i]} | \beta_{A} | \beta_{A,HS[i]} |
1 | LMM | Normal | No | No | No | No |
2 | LMM | Normal | No | Yes | Yes | No |
3 | LMM | Normal | No | Yes | No | Yes |
4 | LMM | Normal | Yes | No | No | No |
5 | LMM | Normal | Yes | Yes | Yes | No |
6 | LMM | Normal | Yes | Yes | No | Yes |
7 | GLLAMM | Beta-prop. | No | No | No | No |
8 | GLLAMM | Beta-prop. | No | Yes | Yes | No |
9 | GLLAMM | Beta-prop. | No | Yes | No | Yes |
10 | GLLAMM | Beta-prop. | Yes | No | No | No |
11 | GLLAMM | Beta-prop. | Yes | Yes | Yes | No |
12 | GLLAMM | Beta-prop. | Yes | Yes | No | Yes |
The following tabset panel provides the commentated Stan
code for all fitted model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model01.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model02.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model03.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model04.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
bAm ~ normal( 0 , 0.1 );
//bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model05.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model06.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model07.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model08.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model09.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model10.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model11.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.3 );
bAmHS ~ normal( 0 , 0.3 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model12.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```
Moreover, the following code is provided so the reader can fit all Stan
models.
```{r}
for(i in 1:12){
model_nam = paste0( ifelse(i<10, 'model0', 'model'), i, '.stan')
model_in = file.path(getwd(), 'real_models')
model_out = file.path(getwd(), 'real_chain')
mod = cmdstan_model( file.path(model_in, model_nam) )
print(model_nam)
mod$sample( data=dlist,
output_dir=model_out,
output_basename = str_replace(model_nam, '.stan', ''),
num_warmup=2000, num_samples=2000,
chains=4, parallel_chains=4,
max_treedepth=20, adapt_delta=0.95) #,init=0
}
```
Given the considerable number of fitted models and the resulting abundance of parameters, this section opted to exclusively showcase the quality and information embedded in the Bayesian chains through models 6 and 12. The selection of these models is grounded in their parameter counts, with both registering the highest among those detailed in Section 5.3. It is crucial to underscore that a meticulous examination of all fitted models was conducted. Notably, all models demonstrated comparable results to those specifically chosen for illustrative purposes.
In general, both graphical analysis and diagnostic statistics indicated that all chains exhibited low to moderate autocorrelation, explored the parameter space in a seemingly random manner, and converged to a constant mean and variance in their post-warm-up phase. Figure 35 visualizes the \widehat{\text{R}} diagnostic statistic and Figure 36 through Figure 48 illustrate the chain’s graphical analysis.
# load reference models
for(i in c(6,12)){
= paste0( ifelse(i<10, 'model0', 'model'), i)
model_nam = file.path( main_dir, 'real_chain')
model_out = file_id( model_out, model_nam )
model_fit assign( model_nam,
::read_stan_csv( file.path( model_out, model_fit ) ) )
rstan }
= c( 'a','aHS[1]','aHS[2]',
par_int 'bAm','bAmHS[1]','bAmHS[2]',
'm_b','s_b',
'm_i','s_i',
'm_u','s_u',
'r_s','r_M',
's_w','Mw')
= par_recovery(
model06_parameters stanfit_obj = model06,
est_par = par_int,
p = 0.95 )
= par_recovery(
model12_parameters stanfit_obj = model12,
est_par = par_int,
p = 0.95 )
par( mfrow=c(1,2) )
plot( 1:nrow(model06_parameters), model06_parameters$Rhat4,
ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Rhat',
main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
labels=rownames(model06_parameters),
cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )
plot( 1:nrow(model12_parameters), model12_parameters$Rhat4,
ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Rhat',
main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
labels=rownames(model12_parameters),
cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )
par( mfrow=c(1,1) )
tri_plot( stan_object=model06,
pars=c('aHS[1]','aHS[2]') )
tri_plot( stan_object=model12,
pars=c('a','aHS[1]','aHS[2]') )
tri_plot( stan_object=model06,
pars=c('bAmHS[1]','bAmHS[2]') )
tri_plot( stan_object=model12,
pars=c('bAmHS[1]','bAmHS[2]') )
tri_plot( stan_object=model06,
pars=c('m_b','s_b') )
tri_plot( stan_object=model12,
pars=c('m_b','s_b') )
tri_plot( stan_object=model06,
pars=c('m_i','s_i') )
tri_plot( stan_object=model12,
pars=c('m_i','s_i') )
tri_plot( stan_object=model06,
pars=c('m_u','s_u') )
tri_plot( stan_object=model12,
pars=c('m_u','s_u') )
tri_plot( stan_object=model06,
pars=c('r_s', paste0('s_w[', 1:4,']')) )
tri_plot( stan_object=model12,
pars=c('r_M', paste0('Mw[', 1:4,']')) )
tri_plot( stan_object=model12,
pars=paste0('SI[', 1:5, ']') )
Moreover, the density plots and n_{\text{eff}} statistics collectively confirmed that all posterior distributions are unimodal distributions with values centered around a mean, generated with a satisfactory number of uncorrelated sampling points, making substantive sense compared to the models’ prior beliefs. Figure 49 visualizes the n_{\text{eff}} diagnostic statistic and Figure 51 through Figure 56 illustrate the chains’ graphical analysis.
par( mfrow=c(1,2) )
plot( 1:nrow(model06_parameters), model06_parameters$n_eff,
ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Neff',
main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
labels=rownames(model06_parameters),
cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )
plot( 1:nrow(model12_parameters), model12_parameters$n_eff,
ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Neff',
main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
labels=rownames(model12_parameters),
cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )
par( mfrow=c(1,1) )
= c('a','aHS','bAm','bAmHS')
par_int dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)
= c('a','aHS','bAm','bAmHS')
par_int dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)
= c('m_b','m_i','m_u','s_b','s_i','s_u')
par_int dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)
= c('m_b','m_i','m_u','s_b','s_i','s_u')
par_int dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)
= c('r_s','s_w')
par_int dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)
= c('r_M','Mw')
par_int dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)
= 'SI'
par_int dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)
For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎
An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎
An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎
@online{rivera espejo2024,
author = {Rivera Espejo, Jose Manuel},
title = {Additional {Notebook}},
date = {2024-04-09},
langid = {en}
}