Chapter 11

Author
Affiliation
Juan H Klopper (MD), Do Hee Lee (MPH)

The George Washington University

This chapter of Applied Categorical Data Analysis by JH Klopper and DH Lee is licensed under Attribution-NonCommercial-NoDerivatives 4.0 International

1 Introduction

  • Up until now, we have explored only binary response (outcome) variables. For such a binary variable, Y, we chose a level as the success or the disease, and encode it with 1. This allowed us to consider P(Y=1 \, \vert \mathbf{x}, the probability of success, given a vector of explanatory variables in each case Y_{i}.

In this chapter, we consider response variables that are multilevel categorical. The variables can be either nominal or ordinal. We will explore two model types. The baseline-categorical logistic model (for nominal response variables) and the cumulative logistic model.

To start our exploration, we consider the multinomial distribution.

Section 3. Multinomial Distribution multinomial distribution
Section 4. Baseline-Category Logistic Models baseline-category logistic model, polytomous logistic model

2 Libraries

Load the required libraries for Chapter 11. If these packages are not already installed, please do so before loading the libraries.

Code
library(rio)
library(dplyr)
library(VGAM)
library(brant)

3 Multinomial Distribution

We are already familiar with the binomial distribution, described by parameters n and \pi. The binomial distribution in n trials, with a population probability of success of \pi, is a special case of the multinomial distribution.

For the multinomial distribution, we consider a J-level categorical response variable with levels i=\left( 1,2,3, \ldots ,J \right). Each of the J=j levels of the response variable has a probability of \pi_{i} such that Equation 1 holds.

\sum_{i=1}^{J} \pi_{i} = 1 \tag{1}

For any specific experiment, we can express the frequency of each of the J levels of a response variable Y, as n_{i} such that Equation 2 holds, where n is the total number of observations in the experiment.

\sum_{i=1}^{J} n_{i} = n \tag{2}

The outcome of an experiment is then a vector of frequencies written as \left( n_{1} , n_{2} , \ldots , n_{J} \right), where as mentioned, n_{i} is the number of cases of the i^{\text{th}} level of Y. The distribution for these frequencies is written in Equation 3, where Mult refers to the multinomial distribution.

\left( n_{1} , n_{2} , \ldots , n_{J} \right) \sim \text{Mult} \left( n , \left( \pi_{1} , \pi_{2} , \ldots , \pi_{J} \right) \right) \tag{3}

The probability mass function for the multinomial distribution is written in Equation 4.

\begin{align} &P \left( n_{1} , n_{2} , \ldots , n_{J} \right) = \left( \frac{n!}{\prod_{i=1}^{J} n_{i}!} \prod_{i=1}^{J} \pi_{i}^{n_{i}} \right) \\ &P \left( n_{1} , n_{2} , \ldots , n_{J} \right) = \frac{n!}{n_{1}!\, n_{2}! \, \ldots \, n_{J}!} \left( \pi_{1}^{n_{1}} \, \pi_{2}^{n_{2}} \, \ldots \, \pi_{J}^{n_{J}} \right) \end{align} \tag{4}

As is clear from Equation 4, we have that the binomial distribution is indeed a special case with J=2. Furthermore, the marginal distribution of the frequency in each level is binomial. For an arbitrary level, j, n_{j} has a mean of n \pi_{j} and a variance of n \pi_{j} \left( 1 - \pi_{j} \right).

As an example, consider a nominal categorical response variable with three levels A, B, and C, with probabilities of 0.25, 0.5, and 0.25. If we randomly selected three subjects, we might have the 10 possible outcomes below. In the first row, we have that all three subjects have level A, and so on.

\left( n_{1},n_{2},n_{3}\right) Using the probability mass function
\left(3,0,0\right) \left(\frac{3!}{3!0!0!}\right)0.25^{3}0.5^{0}0.25^{0}
\left(2,1,0\right) \left(\frac{3!}{2!1!0!}\right)0.25^{2}0.5^{1}0.25^{0}
\left(2,0,1\right) \left(\frac{3!}{2!0!1!}\right)0.25^{2}0.5^{0}0.25^{1}
\left(1,1,1\right) \left(\frac{3!}{1!1!1!}\right)0.25^{1}0.5^{1}0.25^{1}
\left(1,2,0\right) \left(\frac{3!}{1!2!0!}\right)0.25^{1}0.5^{2}0.25^{0}
\left(1,0,2\right) \left(\frac{3!}{1!0!2!}\right)0.25^{1}0.5^{0}0.25^{2}
\left(0,1,2\right) \left(\frac{3!}{0!1!2!}\right)0.25^{0}0.5^{1}0.25^{2}
\left(0,0,3\right) \left(\frac{3!}{0!0!3!}\right)0.25^{0}0.5^{0}0.25^{3}
\left(0,2,1\right) \left(\frac{3!}{0!2!1!}\right)0.25^{0}0.5^{2}0.25^{1}
\left(0,3,0\right) \left(\frac{3!}{0!3!0!}\right)0.25^{0}0.5^{3}0.25^{0}

We calculate each of the probabilities below using the dmultinom function.

Code
probability = c(0.25, 0.5, 0.25)
s = 3
dmultinom(c(3,0,0), size = s, prob = probability)
[1] 0.015625
Code
dmultinom(c(2,1,0), size = s, prob = probability)
[1] 0.09375
Code
dmultinom(c(2,0,1), size = s, prob = probability)
[1] 0.046875
Code
dmultinom(c(1,1,1), size = s, prob = probability)
[1] 0.1875
Code
dmultinom(c(1,2,0), size = s, prob = probability)
[1] 0.1875
Code
dmultinom(c(1,0,2), size = s, prob = probability)
[1] 0.046875
Code
dmultinom(c(0,1,2), size = s, prob = probability)
[1] 0.09375
Code
dmultinom(c(0,0,3), size = s, prob = probability)
[1] 0.015625
Code
dmultinom(c(0,2,1), size = s, prob = probability)
[1] 0.1875
Code
dmultinom(c(0,3,0), size = s, prob = probability)
[1] 0.125

We can use the multinomial distribution in logistic regression, whic we explore next, starting with the baseline-category logistic model.

4 Baseline-Category Logistic Models

For nominal responses, we use the baseline-category logistic model, sometimes referred to as the polytomous logistic model.

Consider the p explanatory variables x_{1} , x_{2} , \ldots , x_{p} and a response variable, Y, with J levels.

As the name of the model implies, we must select one of the levels of the response variable as the baseline level. For this explanation, we consider J to be the baseline level and construct J-1 unique binary logistic equations, shown in Equation 5.

\begin{align} &\log{\left( \frac{P \left( Y=1 \, \vert \, x_{1} , \ldots , x_{p} \right)}{P \left( Y=J \, \vert \, x_{1} , \ldots , x_{p} \right)} \right)} = \beta_{10} + \beta_{11} x_{1} + \ldots + \beta_{1p} x_{p} \\ \\ &\log{\left( \frac{P \left( Y=2 \, \vert \, x_{1} , \ldots , x_{p} \right)}{P \left( Y=J \, \vert \, x_{1} , \ldots , x_{p} \right)} \right)} = \beta_{20} + \beta_{21} x_{1} + \ldots + \beta_{2p} x_{p} \\ &\vdots \\ \\ &\log{\left( \frac{P \left( Y=J-1 \, \vert \, x_{1} , \ldots , x_{p} \right)}{P \left( Y=J \, \vert \, x_{1} , \ldots , x_{p} \right)} \right)} = \beta_{(J-1)0} + \beta_{(J-1)1} x_{1} + \ldots + \beta_{(J-1)p} x_{p} \end{align} \tag{5}

Note that we still have a random component, which now follows a multinomial distribution, a systematic component, X\beta (in matrix form), and the link function, shown in Equation 6.

\log{\left( \frac{P(Y=j)}{P(Y=J)} \right)}, \text{ for } j = 1,2, \ldots , j-1 \tag{6}

The following holds for the baseline-category logistic model, which should be clear from Equation 5.

  • Each logistic equation in the model has its own intercept and set of slopes
  • For a J-level response variable and p explanatory variables, we have (J-1)(p+1) parameters
  • Any of the J levels can serve as the baseline (and is chosen based on the research question)
    • The quality (fit) of the model will not change for different choices of the baseline level, although the parameter values will change
  • The interpretation of parameters for any of the individual logistic equations in the model are the same as for binary logistic models
    • For the first equation in Equation 5, we would have the following
      • \beta_{10} is the intercept
      • \beta_{11} is the slope for the first explanatory variable
      • \vdots
      • \beta_{1p} is the slope for the p^{\text{th}} explanatory variable
  • As before, maximum likelihood estimation is used to obtain parameter estimates

As an example, we consider a study of children attending an emergency center with a fever due to infection. The three-level nominal response variable, Y, is the type of infection and has levels Pneu for bacterial pneumonia, UTI for a urinary tract infection, and Bact for all other infections.

If we consider a single numerical predictor, \texttt{age}, for age in years, and choose Bact as the baseline case, we can write our model as in Equation 7.

\begin{align} &\log{\left( \frac{P( Y = \text{Pneu} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} = \beta_{10} + \beta_{11} \text{age} \\ \\ &\log{\left( \frac{P( Y = \text{UTI} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} = \beta_{20} + \beta_{21} \text{age} \end{align} \tag{7}

We import the data set below, stored in the comma-separated values file SBI.csv. The response variable, Y, is named \texttt{sbi} for serious bacterial infection.

Code
dfr <- import_dataset("SBI.csv")

We summarize the response variable in a table below.

Code
xtabs(
  ~ sbi,
  data = dfr
)
sbi
         Bact NotApplicable          Pneu           UTI 
           34          1752           251           311 

For the sake of this example, we omit all cases with NotApplicable using the filter function in the dplyr library.

Code
dfr <- dfr %>% 
  filter(sbi != "NotApplicable") # Using the not equal, !=, logical operator

We repeat the summary statistics of the response variable.

Code
xtabs(
  ~ sbi,
  data = dfr
)
sbi
Bact Pneu  UTI 
  34  251  311 

To create a model, we recast the \texttt{sbi} variable in the data set as a factor type and list Bact as first level to choose it as the baseline level for the model.

Code
dfr$sbi <- factor(dfr$sbi, levels = c("Bact", "Pneu", "UTI"))

We use the vglm function from the VGAM library to create our model and assign it to the variable bact_model. The family argument is set the multinomial(refLevel = 1) to indicate that we are using the multinomial distribution and the level that was stated first, Bact, is the reference level.

Code
bact_model <- VGAM::vglm(
  formula = sbi ~ age,
  data = dfr,
  family = VGAM::multinomial(refLevel = 1) # Set Bact (listed first in factor as reference levels)
)

We must know the order of the levels of the response variable to interpret the summary.

Code
bact_model

Call:
VGAM::vglm(formula = sbi ~ age, family = VGAM::multinomial(refLevel = 1), 
    data = dfr)


Coefficients:
(Intercept):1 (Intercept):2         age:1         age:2 
    2.2709729     3.5459520    -0.1263634    -0.8462970 

Degrees of Freedom: 1192 Total; 1188 Residual
Residual deviance: 929.1852 
Log-likelihood: -464.5926 

This is a multinomial logit model with 3 levels

The 1 will then refer to Pneu and the 2 to UTI in accordance with the order of the levels that we set before.

We write our model in Equation 8.

\begin{align} &\log{\left( \frac{P( Y = \text{Pneu} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} = 2.271 - 0.1264 \times \text{age} \\ \\ &\log{\left( \frac{P( Y = \text{UTI} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} = 3.546 - 0.8463 \times \text{age} \end{align} \tag{8}

The interpretation of the estimates, is summarized below.

  • \widehat{\text{OR}}_{\text{Pneu vs Bact}} = e^{-0.1264} \approx 0.8812623
    • For a child that is \nu +1 years old, the odds that the infection type is pneumonia rather than general bacterial is 0.88 times the odds for a child that is \nu years old
    • Alternatively, for every one year increase in age, there is a 12\% decrease in the odds that an infection is due to pneumonia compared to other bacterial infections
  • \widehat{\text{OR}}_{\text{UTI vs Bact}} = e^{-0.8463} \approx 0.4289993
    • For a child that is \nu +1 years old, the odds that the infection type is a urinary tract infection rather than general bacterial is 0.42 times the odds for a child that is \nu years old
    • Alternatively, for every one year increase in age there is a 57\% decrease in the odds that an infection is due to a urinary tract infection compared to other bacterial infections

We can also compare Pneu to UTI by resetting the level order (or by the refLink statement) and recreate the model. As an alternative, we can use the law of logarithms as shown in Equation 9.

\begin{align} &\log{\left( \frac{P( Y = \text{Pneu} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} - \log{\left( \frac{P( Y = \text{UTI} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})} \right)} \\ \\ = &\log{\left( \frac{\frac{P( Y = \text{Pneu} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})}}{\frac{P( Y = \text{UTI} \, \vert \, \text{age})}{P( Y = \text{Bact} \, \vert \, \text{age})}} \right)} \\ \\ = &\log{\left( \frac{P( Y = \text{Pneu} \, \vert \, \text{age})}{P( Y = \text{UTI} \, \vert \, \text{age})} \right)} \\ \\ = &2.271 - 0.1264 \times \text{age} - \left( 3.546 - 0.8463 \times \text{age} \right) \\ = &-1.275 + 0.7199 \times \text{age} \end{align} \tag{9}

The interpretation is as follows.

  • \widehat{\text{OR}}_{\text{Pneu vs UTI}} = e^{0.7199} \approx 2.054228
    • For a child that is \nu +1 years old, the odds that the infection type is pneumonia rather than a urinary tract infection is 2.05 times the odds for a child that is \nu years old
    • Alternatively, for every one year increase in age there is a 105% increase in the odds that infection is due to pneumonia compared to a urinary tract infection

As with binary logistic models, we can use inference with respect to baseline-category logistic models. If we reconsider equation Equation 7, we can state the null hypothesis of no association between the explanatory variable \texttt{age} (age in years) and the response variable \texttt{sbi} (type of serious bacterial infection), with H_{0}: \; \beta_{11} = \beta_{21}=0. The null hypothesis states that age has no effect on the log odds of pneumonia vs. other bacterial infection or the log odds of urinary tract infection vs. other bacterial infection.

If we are interested in a specific association between an explanatory variable and the response variable, then we can test the null hypothesis that all of the slopes for that predictor are equal to 0. We test using a likelihood ratio test (LRT), where the reduced model is the intercept-only model, and the full model is as in Equation 7. If we reject the null hypothesis, that is, we find that at least one of the slopes is not equal to 0, then we can assess the individual slopes with either Wald tests or with confidence intervals (CIs).

Below then, we start with the reduced model (intercept only) and assign it the variable bact_model_red. Note the use of 1 in the formula of the vglm function.

Code
bact_model_red <- VGAM::vglm(
  sbi ~ 1,
  data = dfr,
  family = VGAM::multinomial(refLevel = 1)
)

The lrtest_vgml function perform the LRT. In this case we have that H_{0}: \; \beta_{11} = \beta_{21} = 0. We pass the full model and then the reduced model as arguments.

Code
VGAM::lrtest_vglm(
  bact_model, # Full model
  bact_model_red # Reduced (nested) model
)
Likelihood ratio test

Model 1: sbi ~ age
Model 2: sbi ~ 1
   #Df  LogLik Df  Chisq Pr(>Chisq)    
1 1188 -464.59                         
2 1190 -516.72  2 104.26  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see a G^{2} test statistic of 104.26. G^{2} \sim \chi^{2}_{\text{df}} and df =2. We have very large test statistic value for 2 degree with a p value that approaches 0 and we reject the null hypothesis at the 5% significance level. There is evidence in the data that at least one of the slopes is not equal to 0, in other words, there is an association between age and the type of infection.

Since we rejected the null hypothesis, we can review the summary of the full model using the `summaryvgml function from the VGAM library.

Code
VGAM::summaryvglm(bact_model)

Call:
VGAM::vglm(formula = sbi ~ age, family = VGAM::multinomial(refLevel = 1), 
    data = dfr)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1   2.2710     0.3472   6.541 6.12e-11 ***
(Intercept):2   3.5460     0.3440  10.309  < 2e-16 ***
age:1          -0.1264     0.1315  -0.961    0.337    
age:2          -0.8463     0.1421  -5.954 2.61e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 929.1852 on 1188 degrees of freedom

Log-likelihood: -464.5926 on 1188 degrees of freedom

Number of Fisher scoring iterations: 6 

No Hauck-Donner effect found in any of the estimates


Reference group is level  1  of the response

For H_{0}: \; \beta_{11}=0, we see the results of the Wald test with a Z statistic value of -0.961 and a p value of 0.337. We fail to reject the null hypothesis at the 5% significance level and there is no association between age and whether the type of serious bacterial infection is pneumonia compared to other infection.

For H_{0}: \; \beta_{21}=0, we see the results of the Wald test with a Z statistic value of -5.954 and a p value of < 0.001. We reject the null hypothesis at the 5% significance level and conclude that there is an association between age and whether the type of serious bacterial infection is a urinary tract infection compared to other infection.

We can also calculate a probability that Y=j for any of the J levels of the response variable. This is shown in Equation 10, where the parameters are all set to 0 for baseline level J.

P \left( Y = j \, \vert \, x_{1} , x_{2} , \ldots , x_{p} \right) = \frac{e^{\beta_{j0} + \beta_{j1} x_{1} + \ldots + \beta_{jp} x_{p}}}{\sum_{h=1}^{J} e^{\beta_{h0} + \beta_{h1} x_{1} + \ldots + \beta_{hp} x_{p}}} \tag{10}

The three estimated probabilities for our example problem are shown in Equation 11.

\begin{align} &P \left( Y = \text{Pneu} \, \vert \, \text{age} \right) = \frac{e^{2.2710 - 0.1264 \times \text{age}}}{e^{0}+e^{2.2710 - 0.1264 \times \text{age}} + e^{3.5460 - 0.8463 \times \text{age}}} \\ \\ &P \left( Y = \text{UTI} \, \vert \, \text{age} \right) = \frac{e^{3.5460 - 0.8463 \times \text{age}}}{e^{0}+e^{2.2710 - 0.1264 \times \text{age}} + e^{3.5460 - 0.8463 \times \text{age}}} \\ \\ &P \left( Y = \text{Bact} \, \vert \, \text{age} \right) = \frac{e^{0}}{e^{0}+e^{2.2710 - 0.1264 \times \text{age}} + e^{3.5460 - 0.8463 \times \text{age}}} \end{align} \tag{11}

Since we have a numerical explanatory variable, we can plot all three estimated probabilities as line plots.

Code
age <- seq(0, 5, length = 200)
denominator <- 1 + exp(2.2710-0.1264*age) + exp(3.5460-0.8463*age)
pneu <- exp(2.2710-0.1264*age) / denominator
uti <- exp(3.5460-0.8463*age) / denominator
bact <- 1 / denominator

plot(
  age,
  pneu,
  type = "l",
  col = "deepskyblue",
  main = "Model of serious bacterial infection",
  xlab = "Age [years]",
  ylab = "Probability",
  las = 1,
  lty = 1,
  ylim = c(0, 1),
  lwd = 2,
  panel.first = grid()
)

lines(
  age,
  uti,
  col = "orange",
  lty = 2
)

lines(
  age,
  bact,
  col = "black",
  lty = 3,
  lwd = 2
)

legend(
  0.1,
  1,
  legend = c("Pneumonia", "UTI", "Other"),
  col = c("deepskyblue", "orange", "black"),
  lty = c(1, 2, 3)
)

For any age, the probabilities add to 1.

Next, we explore modeling for ordinal response variables.

5 Cumulative Logistic Model Types

We are still considering response variables with J levels, but now the levels are ordered. The benefit of models with ordinal response variables, is that they are simpler models. We also have greater power to detect effects.

For models with an ordinal response variable, we consider the cumulative probability of Y, that is, the probability that Y falls at or below a particular level. For response level j, we write the cumulative probability as in Equation 12, for \pi_{1} , \pi_{2} , \ldots , \pi_{j}, where, since our logical connective is the disjunctive (using the logical or), we can add all the probabilities.

\begin{align} &P \left( Y \le j \right) = \sum_{i=1}^{j} \pi_{i} \\ &P(Y=1) + P(Y=2) + \ldots + P(Y=j) = \pi_{1} + \pi_{2} + \ldots + \pi_{j} \end{align} \tag{12}

The cumulative probabilities can be expressed as cumulative odds, shown in Equation 13.

\frac{P ( Y \le j)}{1 - P (Y \le j)} = \frac{P ( Y \le j )}{ P ( Y > j)} \tag{13}

The cumulative log odds (also known as cumulative logits) are shown in Equation 14.

\log{ \left( \frac{P ( Y \le j )}{ P ( Y > j)} \right)} = \log{\left( \frac{\sum_{i=1}^{j} \pi_{i}}{\sum_{i=j+1}^{J} \pi_{i}} \right)} \tag{14}

For a J-level ordinal response variable, we will have J-1 log odds functions.

As an example we can consider a five-point Likert scale, commonly used for survey questions. The levels of Y are listed below.

  • Strongly disagree, 1
  • Disagree, 2
  • Neither agree nor disagree, 3
  • Agree, 5
  • Strongly agree, 5

We can therefor look at the following.

  • 1 vs. 2 or 3 or 4 or 5, \text{logit} \left( P \left( Y \le 1 \right) \right)
  • 1 or 2 vs. 3 or 4 or 5, \text{logit} \left( P \left( Y \le 2 \right) \right)
  • 1 or 2 or 3 vs. 4 or 5, \text{logit} \left( P \left( Y \le 3 \right) \right)
  • 1 or 2 or 3 or 4 vs. 5, \text{logit} \left( P \left( Y \le 4 \right) \right)

Notice that it is not interesting to look at the log odds of P (Y \le 5) in this case, as the cumulative probability is 1.

As illustrative example, we consider a study comparing a new drug for arthritis, measured against a placebo. Participants are asked to rate the improvement in their symptoms after one month of treatment. The levels of the explanatory variable, \texttt{trt}, are Active for active drug, and Placebo for the placebo. The levels for the response variable, \texttt{improvement}, are alot for a lot of improvement, some for some improvement, and none for no improvement. The arth_data.csv file contains the data and is imported below, assigned to the variable arth.

Code
arth <- import_dataset("arth_data.csv")

The summary statistics are shown below.

Code
xtabs(
  ~ trt + improvement,
  data = arth
)[2:1, ]
         improvement
trt        1  2  3
  Placebo  7  7 29
  Active  21  7 13

We encode Y as shown in Equation 15.

Y = \begin{cases} 1 \text{ if a lot} \\ 2 \text{ if some} \\ 3 \text{ if none} \end{cases} \tag{15}

The three probabilities, P \left( Y = i \, \vert \, x_{1} \right) = \pi_{i}, for i = ( 1,2,3 ). The cumulative logits are shown in Equation 16.

\begin{align} &\text{logit} \left[ P \left( Y \le 1 \, \vert \, x_{1}\right) \right] = \log{\left( \frac{P \left( Y \le 1 \, \vert \, x_{1} \right)}{P \left( Y > 1 \, \vert \, x_{1} \right)} \right)} = \log{\left( \frac{\pi_{1}}{\pi_{2} + \pi_{3}} \right)} \\ &\text{(the log odds for improvement being a lot vs. some or none)} \\ \\ &\text{logit} \left[ P \left( Y \le 2 \, \vert \, x_{1}\right) \right] = \log{\left( \frac{P \left( Y \le 2 \, \vert \, x_{1} \right)}{P \left( Y > 2 \, \vert \, x_{1} \right)} \right)} = \log{\left( \frac{\pi_{1} + \pi_{2}}{\pi_{3}} \right)} \\ &\text{(the log odds for improvement being a lot or some vs. none)} \end{align} \tag{16}

One generalized model that we can now use for this illustrative example, is the cumulative logistic model. If we have p explanatory variables, we can write the cumulative logistic model as in Equation 17.

\begin{align} &\log\left(\frac{P(Y \le 1 |x_1, \ldots, x_p)}{P(Y > 1 |x_1, \ldots, x_p)}\right) = \beta_{10} + \beta_{11}x_1 + \ldots + \beta_{1p}x_p \\ \\ &\log\left(\frac{P(Y \le 2 |x_1, \ldots, x_p)}{P(Y > 2 |x_1, \ldots, x_p)}\right) = \beta_{20} + \beta_{21}x_1 + \ldots + \beta_{2p}x_p \\ &\vdots \\ &\log\left(\frac{P(Y \le J-1 |x_1, \ldots, x_p)}{P(Y > J - 1 |x_1, \ldots, x_p)}\right) = \beta_{(J-1)1} + \beta_{(J-1)1}x_1 + \ldots + \beta_{(J-1)p}x_{p} \end{align} \tag{17}

Notice that each equation depends on all of the response levels from 1 through J. This contrasts with the baseline-category logistic model for nominal responses, where each equation only depended on two response variable levels (one always being the baseline or reference level).

For equation j in Equation 18, each slope is the log odds ratio for a response in levels 1 or 2 or 3 or \ldots or j vs. a response in levels j+1 or j+2 or \ldots or J.

A greatly simplified model is the cumulative logistic model with proportional odds assumption, usually referred to as proportional odds models.

We make the assumption that the slopes for a specific predictor are the same across the J-1 logistic equations, shown in Equation 17.

\begin{align} &\log\left(\frac{P(Y \le 1 \, \vert \, x_1, \ldots, x_p)}{P(Y > 1 \, \vert \, x_1, \ldots, x_p)}\right) = \beta_{10} + \beta_{1}x_1 + \ldots + \beta_{p}x_p \\ \\ &\log\left(\frac{P(Y \le 2 \, \vert \, x_1, \ldots, x_p)}{P(Y > 2 \, \vert \, x_1, \ldots, x_p)}\right) = \beta_{20} + \beta_{1}x_1 + \ldots + \beta_{p}x_p \\ &\vdots \\ &\log\left(\frac{P(Y \le J-1 \, \vert \, x_1, \ldots, x_p)}{P(Y > J - 1 \, \vert \, x_1, \ldots, x_p)}\right) = \beta_{(J-1)1} + \beta_{1}x_1 + \ldots + \beta_{p}x_p \\ \end{align} \tag{18}

The number of parameters are reduced from (J-1)(p+1) to p+J-1.

For our illustrative example, examining a new drug for arthritis symptoms, we have the two logit equations, shown in Equation 19.

\begin{align} &\log{\left( \frac{P \left( Y \le 1 \, \vert \, x_{1} \right)}{P \left( Y > 1 \, \vert \, x_{1} \right)} \right)} = \beta_{10} + \beta_{1} x_{1} \\ \\ &\log{\left( \frac{P \left( Y \le 2 \, \vert \, x_{1} \right)}{P \left( Y > 2 \, \vert \, x_{1} \right)} \right)} = \beta_{20} + \beta_{1} x_{1} \end{align} \tag{19}

In Equation 20 we interpret the slope of the first equation, and in Equation 21, the second equation. In both Equation 20 and Equation 21 the verbose explanation of \beta_{1} is key.

\begin{align}&\log{\left( \frac{P \left( Y \le 1 \, \vert \, x_{1} = 1 \right)}{P \left( Y > 1 \, \vert \, x_{1} = 1 \right)} \right)} - \log{\left( \frac{P \left( Y \le 1 \, \vert \, x_{1} = 0 \right)}{P \left( Y > 1 \, \vert \, x_{1} = 0 \right)} \right)} = \beta_{10} + \beta_{1} - \beta_{10} \\ \\ &\log{\left( \frac{\frac{P \left( Y \le 1 \, \vert \, x_{1} = 1 \right)}{P \left( Y > 1 \, \vert \, x_{1} = 1 \right)}}{\frac{P \left( Y \le 1 \, \vert \, x_{1} = 0 \right)}{P \left( Y > 1 \, \vert \, x_{1} = 0 \right)}} \right)} = \beta_{1} = \log{\left( \frac{\text{odds } Y \le 1 \, \vert \, \text{Active}}{\text{odds } Y \le 1 \, \vert \, \text{Placebo}} \right)} \end{align} \tag{20}

\begin{align}&\log{\left( \frac{P \left( Y \le 2 \, \vert \, x_{1} = 1 \right)}{P \left( Y > 2 \, \vert \, x_{1} = 1 \right)} \right)} - \log{\left( \frac{P \left( Y \le 2 \, \vert \, x_{1} = 0 \right)}{P \left( Y > 2 \, \vert \, x_{1} = 0 \right)} \right)} = \beta_{20} + \beta_{1} - \beta_{20} \\ \\ &\log{\left( \frac{\frac{P \left( Y \le 2 \, \vert \, x_{1} = 1 \right)}{P \left( Y > 2 \, \vert \, x_{1} = 1 \right)}}{\frac{P \left( Y \le 2 \, \vert \, x_{1} = 0 \right)}{P \left( Y > 2 \, \vert \, x_{1} = 0 \right)}} \right)} = \beta_{1} = \log{\left( \frac{\text{odds } Y \le 2 \, \vert \, \text{Active}}{\text{odds } Y \le 2 \, \vert \, \text{Placebo}} \right)} \end{align} \tag{21}

How do we reconcile the fact that we have two log odds ratio interpretations for \beta_{1}? We can express Equation 20 and Equation 21 as listed below.

  • \beta_{1} = \log{\text{OR}} for improvement being a lot vs. some or none comparing those taking active treatment to those taking a placebo
  • \beta_{1} = \log{\text{OR}} for improvement being a lot or some vs. none comparing those taking active treatment to those taking a placebo

The proportional odds assumption means that we assume the same effect of treatment no matter how we form groups with response variable levels, as long as we group low levels together vs. high levels together (that means, as long as we respect the order). We therefor have a simplified interpretation of \beta_{1} being the log odds ratio for improvement being more vs. less comparing those taking active treatment to those taking placebo. In general, any slope from a proportional odds model is the log odds ratio for lower levels of Y vs. higher levels of Y comparing … (the rest of the sentence depending on the type of explanatory variable).

Below, we use the data from our illustrative example to build a proportional odds model. We use the mutate function in the dplyr library to add new variables and set the factor levels and order. See the code comment for an explanation of the data transformation.

Code
arth <- 
  arth %>% 
  mutate(improvement = factor(improvement), # Covert to a factor
         improvement_fac = case_when(improvement == 1 ~ "alot",
                                     improvement == 2 ~ "some",
                                     improvement == 3 ~ "none"), # Add a column encoding words for the values 1, 2, 3
         improvement_fac = factor(improvement_fac, levels = c("alot", "some", "none")), # Convert to a factor
         improvement_ord = ordered(improvement_fac, levels = c("alot", "some", "none")), # Add an ordered factor
         trt = factor(trt, levels = c("Placebo", "Active"))) # Convert to a factor

We create a proportional odds model below, assigned to the variable po_model. We set the family argument to cumulative, with parallel = TRUE indicating that we want to use the proportional odds assumption.

Code
po_model <- VGAM::vglm(
  formula = improvement_ord ~ trt,
  data = arth,
  family = VGAM::cumulative(parallel = TRUE) # Use PO assumption
)

VGAM::summaryvglm(po_model)

Call:
VGAM::vglm(formula = improvement_ord ~ trt, family = VGAM::cumulative(parallel = TRUE), 
    data = arth)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  -1.5501     0.3579  -4.331 1.48e-05 ***
(Intercept):2  -0.7501     0.3233  -2.320 0.020349 *  
trtActive       1.5679     0.4442   3.529 0.000416 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2])

Residual deviance: 156.606 on 165 degrees of freedom

Log-likelihood: -78.303 on 165 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates


Exponentiated coefficients:
trtActive 
 4.796405 

The two equations for the model as written in Equation 22.

\begin{align} &\log{\left( \frac{\hat{P} \left( Y \le 1 \, \vert \, x_{1} \right)}{\hat{P} \left( Y > 1 \, \vert \, x_{1} \right)} \right)} = -1.5501 + 1.5679 x_{1} \\ \\ &\log{\left( \frac{\hat{P} \left( Y \le 2 \, \vert \, x_{1} \right)}{\hat{P} \left( Y > 2 \, \vert \, x_{1} \right)} \right)} = -0.7501 + 1.5679 x_{1} \end{align} \tag{22}

Our interpretation of the model is listed below.

  • \hat{\beta}_{1} = 1.5679 is the log odds ratio for having more improvement (either a lot or then a lot or sum) vs. less (either some or none or then none) improvement comparing those taking active treatment to those taking a placebo
  • \widehat{\text{OR}} = e^{1.5679} \approx 4.8 is the odds of having more improvement vs. less improvement for those on active treatment are 4.8 times the odds for those taking placebo.

Having interpreted the model, we can conduct hypothesis testing. The null hypothesis is H_{0} : \; \beta_{1} = 0. Without the proportional odds assumption, we would need to test multiple slopes (each slope corresponding to the treatment variable in each of the (J-1) equations). The proportional odds assumption gives us more power to detect an association if one exists. However, we should always check if the proportional odds assumption is reasonable!

One method for evaluating the assumptions is Brant’s test. The test compares the the proportional odds model to one that fits (J - 1) separate binary logistic models allowing for different slopes for each model. The null hypothesis is that the proportional odds assumption holds.

It has been suggested that Brant’s test is too sensitive, especially in the case of large sample sizes, which are commonplace today. Other suggestions include the following.

  • Construct empirical logit plots for each of the (J - 1) cumulative logit equations on the same set of axes and if the lines are approximately parallel, then proportional odds assumption approximately satisfied (this method can become complicated when there are many predictor variables)
  • Look at the estimates from a cumulative logit model without the proportional odds assumption and if the slopes for a predictor are similar across the (J - 1) equations, then the proportional odds assumption may be reasonable

If the proportional odds assumption appears to be badly violated, then we could consider the following.

  • May make sense to use the cumulative logistic model without the proportional odds assumption
  • Use the baseline-category logistic model, treating the response as nominal
  • It is even possible to use the proportional odds assumption for some predictors and not others

We conduct Brant’s test below, using the polr function from the MASS library to create the model and then the brant function from the brant library.

Code
polrm <- MASS::polr(improvement_ord ~ trt, data = arth, Hess = TRUE)

brant(polrm)
-------------------------------------------- 
Test for    X2  df  probability 
-------------------------------------------- 
Omnibus     0.21    1   0.64
trtActive   0.21    1   0.64
-------------------------------------------- 

H0: Parallel Regression Assumption holds

Using Brant’s test, we fail to reject the null hypothesis of no violation of the proportional odds assumption in this case.

Below, we also fit a model without the proportional odds assumption.

Code
po_model_no_assumption <- VGAM::vglm(
  formula = improvement_ord ~ trt,
  data = arth,
  family = VGAM::cumulative(parallel = FALSE) # No proportional odds assumption
)

VGAM::summaryvglm(po_model_no_assumption)

Call:
VGAM::vglm(formula = improvement_ord ~ trt, family = VGAM::cumulative(parallel = FALSE), 
    data = arth)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  -1.6376     0.4131  -3.964 7.36e-05 ***
(Intercept):2  -0.7282     0.3254  -2.238  0.02524 *  
trtActive:1     1.6864     0.5179   3.256  0.00113 ** 
trtActive:2     1.4955     0.4675   3.199  0.00138 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2])

Residual deviance: 156.3861 on 164 degrees of freedom

Log-likelihood: -78.1931 on 164 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates


Exponentiated coefficients:
trtActive:1 trtActive:2 
   5.400000    4.461538 

For the sake of completeness, we now note two different values for \hat{\beta}_{1}.

We can generate probabilities from the proportional odds model. For i = \left( 1,2,3, \ldots ,p \right) explanatory variables, we have Equation 23.

P \left(Y \le j \, \vert \, x_{i} \right) = \frac{e^{\beta_{j0} + \beta_{1}x_{1} + \ldots + \beta_{p} x_{p}}}{1 + e^{\beta_{j0} + \beta_{1}x_{1} + \ldots + \beta_{p} x_{p}}}, \text{ where } j = 1,2, \ldots , J-1 \tag{23}

Note that for the highest level, J, we have that P \left( Y \le J \, \vert \, x_{i} \right) = 1. Probabilities for different levels make use of the fact shown in Equation 24, where we also show the equation for j=2 for our illustrative example.

\begin{align} &P \left( Y = j \, \vert \, x_{i} \right) = P \left( Y \le j-1 \, \vert \, x_{i} \right) - P \left( Y \le j \, \vert \, x_{i} \right) \\ &P \left( Y = 2 \, \vert \, x_{i} \right) = P \left( Y \le 1 \, \vert \, x_{i} \right) - P \left( Y \le 2 \, \vert \, x_{i} \right) \end{align} \tag{24}

In Equation 25 then we see use the fitted proportional odds model for our arthritis example, and find the three predicted probabilities for someone on active treatment having a lot of improvement, j=1, some improvement, j=2, and no improvement, j=3.

\begin{align} &\hat{P} \left( Y = 1 \, \vert \, x_{1} = 1 \right) = \hat{P} \left( Y \le 1 \, \vert \, x_{1} = 1 \right) \\ &\hat{P} \left( Y = 1 \, \vert \, x_{1} = 1 \right) = \frac{e^{-1.5501 + 1.5679 (1)}}{1+ e^{-1.5501 + 1.5679 (1)}} \\ &\hat{P} \left( Y = 1 \, \vert \, x_{1} = 1 \right) \approx 0.50 \\ \\ &\hat{P} \left( Y = 2 \, \vert \, x_{1} = 1 \right) = \hat{P} \left( Y \le 2 \, \vert \, x_{1} = 1 \right) - \hat{P} \left( Y \le 1 \, \vert \, x_{1} = 1 \right) \\ &\hat{P} \left( Y = 2 \, \vert \, x_{1} = 1 \right) = \frac{e^{-0.75011 + 1.5679 (1)}}{1+ e^{-0.7501 + 1.5679 (1)}} - 0.50 \\ &\hat{P} \left( Y = 2 \, \vert \, x_{1} = 1 \right) \approx 0.69 - 0.50 = 0.19 \\ \\ &\hat{P} \left( Y = 3 \, \vert \, x_{1} = 1 \right) = 1 - \hat{P} \left( Y \le 2 \, \vert \, x_{1} = 1 \right) \\ &\hat{P} \left( Y = 3 \, \vert \, x_{1} = 1 \right) \approx 1-0.69 = 0.31 \end{align} \tag{25}

6 Conclusion

In this lecture, we used a baseline-category categorical logistic model in the case of a multi-level nominal variable. We use a likelihood ratio test, considering a reduced model (omitting the variable of interest) and a full model. If a significant result is found (at least one of the slopes in the J-1 models is not equal to 0), we consider individual levels using a Wald test.

We also considered a cumulative logistic model with the proportional odds assumption for ordinal responses. The proportional odds assumption is often used for it simplicity when interpreting the results and it increased power to detect effects.

7 Lab Problems

7.1 Question

Consider a nominal response variable Y with levels 1, 2, and 3 and a single explanatory variable x. If we let level 3 of Y be the baseline category, then the corresponding baseline-category logistic model with Y as the response and x as the explanatory variable is as shown in Equation 26 below.

\begin{align} &\log{\left( \frac{P \left( Y = 1 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)} = \beta_{10} + \beta_{11} x \\ &\log{\left( \frac{P \left( Y = 2 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)} = \beta_{20} + \beta_{21} x \end{align} \tag{26}

The P \left( J = j \, \vert \, x \right) = \pi_{j} is given as in Equation 10 above. The coefficients for the baseline level are all set to 0. Show that this is so.

We start by considering P \left( Y=1 \, \vert \, x \right) and P \left( Y=2 \, \vert \, x \right) in Equation 27 below.

\begin{align} &\log{\left( \frac{P \left( Y = 1 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)} = \beta_{10} + \beta_{11} x \\ &e^{\log{\left( \frac{P \left( Y = 1 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)}} = e^{\beta_{10} + \beta_{11} x} \\ &\frac{P \left( Y = 1 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} = e^{\beta_{10} + \beta_{11} x} \\ &P \left( Y=1 \, \vert \, x \right) = P \left( Y = 3 \, \vert \, x \right) e^{\beta_{10} + \beta_{11} x} \\ \\ &\log{\left( \frac{P \left( Y = 2 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)} = \beta_{20} + \beta_{21} x \\ &e^{\log{\left( \frac{P \left( Y = 2 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} \right)}} = e^{\beta_{20} + \beta_{21} x} \\ &\frac{P \left( Y = 2 \, \vert \, x \right)}{P \left( Y = 3 \, \vert \, x \right)} = e^{\beta_{20} + \beta_{21} x} \\ &P \left( Y=2 \, \vert \, x \right) = P \left( Y = 3 \, \vert \, x \right) e^{\beta_{20} + \beta_{21} x} \end{align} \tag{27}

We know that \pi_{1} + \pi_{2} + \pi_{3} = 1. The derivation in Equation 28 follows.

\begin{align} &P \left( Y=1 \, \vert \, x \right) + P \left( Y=2 \, \vert \, x \right) + P \left( Y=3 \, \vert \, x \right) = 1 \\ &P \left( Y=3 \, \vert \, x \right) e^{\beta_{10} + \beta_{11} x} + P \left( Y=3 \, \vert \, x \right) e^{\beta_{20} + \beta_{21} x} + P \left( Y=3 \, \vert \, x \right) = 1 \\ &P \left( Y=3 \, \vert \, x \right) \left[ e^{\beta_{10}+ \beta_{11}x} + e^{\beta_{20} + \beta_{21}x} + 1 \right] = 1 \\ &P \left( Y=3 \, \vert \, x \right) = \frac{1}{1 + e^{\beta_{10}+ \beta_{11}x} + e^{\beta_{20} + \beta_{21}x} } \end{align} \tag{28}

The results in Equation 27 and Equation 28 can only be so if the coefficients in the case of J=3 are all 0. For this particular case we have Equation 29 below.

P \left( Y = j \, \vert \, x \right) = \frac{e^{\beta_{j0} + \beta_{j1} x}}{\sum_{h=1}^{J} e^{\beta_{h0} + \beta_{h1} x}} \tag{29}

7.2 Question

A subset of data taken from a study to assess factors associated with women’s attitudes, knowledge, and behavior towards mammograms are given in the file mammography_data.csv. At present, we are interested in investigating the association between whether a woman has had a mammography experience, \texttt{ME} , history of breast cancer in the family, \texttt{HIST} , and perceived benefit of mammography, \texttt{PB}. The values and levels as described in Table 1.

Table 1: Problem 2
Variable name Description Levels or Units of Measurement
\texttt {ME} Mammography experience 0 = never, 1 = within one year, 2 = over one year ago
\texttt {HIST} Family history 0 = No, 1 = Yes
\texttt {PB} Perceived benefit A value in the range of 5 – 20 \, ^{*}
  • Where subjects with low values perceive the greatest benefit of mammography

Answer the following questions.

  1. Consider only the association between mammography experience and family history. Use R to construct a 2 \times 3 table that cross classifies these two variables. Test the null hypothesis of no association between mammography experience and family history using a 5% significance level.

  2. Use \texttt{ME} = 0 as the baseline category to compute the sample odds ratio (OR) for having a mammogram within one year vs. never having one comparing those women with a family history to those with no family history.

  3. Using \texttt{ME} = 0 as the baseline category to compute the sample odds ratio (OR) for having a mammogram over a year ago vs. never having one comparing those women with a family history to those with no family history.

  4. Using \texttt{ME} = 0 as the baseline category, fit a baseline-category logistic model in R with \texttt{ME} as the response and \texttt{HIST} as the explanatory variable. Print the estimated coefficient table and write out the fitted model.

  5. Take the antilogs of the slopes from the model that you fit. Interpret these values. How do they compare with the odds ratios computed in QUESTION 2 and QUESTION 3.

  6. Use the logistic model that you fit in QUESTION 4 to test the same null hypothesis that you tested in QUESTION 1. Use a 5% significance level.

  7. Add perceived benefit of mammography, \texttt{PB}, to the model. Adjusting for family history, is perceived benefit of mammography associated with mammography experience? Conduct an appropriate test at the 5% significance level. If it is associated, describe the association.

  8. Based on the model that you fit in QUESTION 7, what is the predicted probability of having a mammogram within one year for a woman who has a family history of breast cancer and has a perceived benefit of mammography score of 10?

Code
mamm <- import_dataset("mammography.csv")

Below is a 2 \times 3 cross tabulation of family history and mammography experience.

Code
mamm_fam_table <- table(
  Fam_hist = mamm$HIST,
  Mam_exper = mamm$ME
)

mamm_fam_table
        Mam_exper
Fam_hist   0   1   2
       0 220  85  63
       1  14  19  11
Code
chisq.test(mamm_fam_table, correct = FALSE)

    Pearson's Chi-squared test

data:  mamm_fam_table
X-squared = 13.05, df = 2, p-value = 0.001466

The chi-square test yields a chi-square statistic of 13.05 and a p value that is less than the significance level of 0.05. Therefore, we can conclude that there is an association between family history and mammography experience.

Code
(220 * 19) / (85 * 14)
[1] 3.512605
Code
(220 * 11) / (63 * 14)
[1] 2.743764
Code
mamm$HIST <- factor(mamm$HIST, levels = c(0, 1))
mamm$ME <- factor(mamm$ME, levels = c(0, 1, 2))

# Model
lab_2_4_model <- VGAM::vglm(
  ME ~ HIST,
  data = mamm,
  family = multinomial(refLevel = 1)
)

# Summary of model
summary(lab_2_4_model)

Call:
VGAM::vglm(formula = ME ~ HIST, family = multinomial(refLevel = 1), 
    data = mamm)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  -0.9510     0.1277  -7.446  9.6e-14 ***
(Intercept):2  -1.2505     0.1429  -8.751  < 2e-16 ***
HIST1:1         1.2564     0.3747   3.353 0.000798 ***
HIST1:2         1.0093     0.4275   2.361 0.018225 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 792.3399 on 820 degrees of freedom

Log-likelihood: -396.17 on 820 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates


Reference group is level  1  of the response

The model is shown in Equation 30 below.

\begin{align} &\log{\left( \frac{\hat{P} \left( \text{ME} = 1 \, \vert \, \text{HIST} \right)}{\hat{P} \left( \text{ME} = 0 \, \vert \, \text{HIST} \right)} \right)} = -0.9510 + 1.2564 \left( \text{HIST} \right) \\ &\log{\left( \frac{\hat{P} \left( \text{ME} = 2 \, \vert \, \text{HIST} \right)}{\hat{P} \left( \text{ME} = 0 \, \vert \, \text{HIST} \right)} \right)} = -1.2505 + 1.0093 \left( \text{HIST} \right) \end{align} \tag{30}

Code
exp(1.2564)
[1] 3.512753
Code
exp(1.0093)
[1] 2.74368

These odds ratios are the same. In the case of the first, we state that the odds of having a mammogram within one year compared to never having a mammogram among those with family history of breast cancer are 3.51 times the odds for those with no family history of breast cancer.

In the case of the second, we state that the odds of having a mammogram more than one year ago compared to never having a mammogram among those with family history of breast cancer are 2.74 times the odds for those with no family history of breast cancer.

We can use a likelihood ratio test (LRT) to test this association. In the LRT, we compare the log likelihood from the model that includes \texttt{HIST} as a predictor to a model that does not include \texttt{HIST} as a predictor (i.e., the intercept-only model).

Code
# Intercept only model
lab_2_6_model <- VGAM::vglm(
  ME ~ 1,
  data = mamm,
  family = VGAM::multinomial(refLevel = 1)
)

# LRT test
VGAM::lrtest(
  lab_2_4_model, # Full model
  lab_2_6_model # Intercept only model (nested model)
)
Likelihood ratio test

Model 1: ME ~ HIST
Model 2: ME ~ 1
  #Df  LogLik Df  Chisq Pr(>Chisq)   
1 820 -396.17                        
2 822 -402.60  2 12.858   0.001614 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject the null hypothesis of no association between mammography experience and family history. The data provide evidence of an association between mammography experience and family history.

Code
# Model
lab_2_7_model <- VGAM::vglm(
  ME ~ HIST + PB,
  data = mamm,
  family = multinomial(refLevel = 1)
)

#Summary of model
VGAM::summaryvglm(lab_2_7_model)

Call:
VGAM::vglm(formula = ME ~ HIST + PB, family = multinomial(refLevel = 1), 
    data = mamm)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  1.60050    0.49177   3.255  0.00114 ** 
(Intercept):2  0.30874    0.52830   0.584  0.55895    
HIST1:1        1.20540    0.39027   3.089  0.00201 ** 
HIST1:2        0.97664    0.43345   2.253  0.02425 *  
PB:1          -0.34706    0.06730  -5.157 2.51e-07 ***
PB:2          -0.20445    0.06878  -2.973  0.00295 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 759.0134 on 818 degrees of freedom

Log-likelihood: -379.5067 on 818 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates


Reference group is level  1  of the response
Code
# LRT
VGAM::lrtest_vglm(
  lab_2_7_model, # Full model (perceived benefit and family history)
  lab_2_4_model # Nested model (family history only)
)
Likelihood ratio test

Model 1: ME ~ HIST + PB
Model 2: ME ~ HIST
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1 818 -379.51                         
2 820 -396.17  2 33.327  5.797e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject the null hypothesis of no association between mammography experience and perceived benefit of mammography, adjusting for family history. The data provide evidence of an association between mammography experience and perceived benefit of mammography, adjusting for family history.

Taking the antilog of the slope for the \texttt{PB} predictor in the first BC logistic equation, we have e^{−0.34706} = 0.71. This means that for each 1-point increase in perceived benefit, the odds of having a mammogram within one year compared to never having a mammogram are reduced by 29% (1-0.71), adjusting for family history. Taking the antilog of the slope for the \texttt{PB} predictor in the second BC logistic equation, we have e^{−0.20445} = 0.82. This means that for each 1-point increase in perceived benefit, the odds of having a mammogram over one year ago compared to never having a mammogram are reduced by a factor of 18% (1-0.82), adjusting for family history.

The model is as shown in Equation 31 below.

\begin{align} &\hat{P} \left( \text{ME} = 1 \, \vert \, \text{HIST} = 1, \, \text{PB} = 10 \right) = \frac{e^{1.6 + 1.21 (1) -0.35 (10)}}{e^{1.6 + 1.21 (1) -0.35 (10)} + e^{0.31 + 0.98 (1) -0.20 (10)} + 1} \\ &\hat{P} \left( \text{ME} = 1 \, \vert \, \text{HIST} = 1, \, \text{PB} = 10 \right) \approx 0.26 \end{align} \tag{31}

7.3 Question

A study was conducted to assess the association between certain pollutants/behaviors and respiratory function. The response measured was chronic respiratory disease status \texttt{level}. The explanatory variables of interest are air pollution, job exposure to pollution , and smoking status. The data are stored in the file resp_data.csv, with a summary of the variable shown in Table 2.

Table 2: Problem 3
Variable Name Description Levels
\texttt {level} Chronic respiratory disease status 1 = no symptoms, 2 = cough for 3 months or less per year, 3 = cough for more than 3 months per years, 4 = cough for more than 3 months per year and shortness of breath
\texttt {air} Air pollution low for low pollution or high for high pollution
\texttt {exposure} Job exposure to pollution no for no or yes for yes
\texttt {smoking} Smoking non for never smoked, ex for history of smoking, current for current smoker
Code
poll <- import_dataset("resp_data.csv")
Code
# Recode
poll <-
poll %>%
mutate(level_rev = case_when(level == 1 ~ 4, # Original no symptoms no last
                             level == 2 ~ 3,
                             level == 3 ~ 2,
                             level == 4 ~ 1), # Original worse symptoms now first
      level_rev = factor(level_rev, ordered = TRUE, levels = c(1,2,3,4)), # Worse symptoms first
       air = factor(air, levels = c("low", "high")),
       exposure = factor(exposure, levels = c("no", "yes")),
       smoking = factor(smoking, levels = c("non", "ex", "cur")))

Answer the following questions. 1. Suppose that we want to fit a proportional odds model with respiratory response level as the ordinal outcome. We are interested in comparing those with worse respiratory response to those with better response. In this case, we can recode the data so that the assigned values for the level variable are reversed. Suppose that we do this and call the recoded variable level_rev so that level_rev = 4 (no symptoms), 3 = cough for 3 months or less per year, 2 = cough more than 3 months per year, 1 = cough for more than 3 months per year and shortness of breath. Define the probabilities of interest and write down the three cumulative logits that we would be modeling.

  1. Fit a proportional odds model with the main effects of air pollution, job exposure to pollutants, and smoking smoking in R. Show the coefficient table.

The data was originally encoded with 1 = no symptoms, 2 = cough for 3 months or less per year, 3 = cough for more than 3 months per years, 4 = cough for more than 3 months per year and shortness of breath. If we were to pass this order 1,2,3,a nd 4, the semantic meaning with have a lower degree of symptoms before higher degree of symptoms. We reencode the values (in reverse order), such that when we pass the order (using the levels argument for the factor function), we have the highest degree of symptoms (worst symptoms) first, as needed.

  1. Is the proportional odds assumption satisfied? Below is the code and output from Brant’s test. Use this output as well as the estimates from a cumulative logistic model to justify your response.
Code
lab_3_3_model <- MASS::polr(
  level_rev ~ air + exposure + smoking,
  data = poll
)
unique(poll)
      air exposure smoking level level_rev
1     low       no     non     1         4
159   low       no     non     2         3
168   low       no      ex     1         4
335   low       no      ex     2         3
354   low       no     cur     1         4
661   low       no     cur     2         3
763   low      yes     non     1         4
789   low      yes     non     2         3
794   low      yes      ex     1         4
832   low      yes      ex     2         3
844   low      yes     cur     1         4
938   low      yes     cur     2         3
986  high       no     non     1         4
1080 high       no     non     2         3
1087 high       no      ex     1         4
1154 high       no      ex     2         3
1162 high       no     cur     1         4
1346 high       no     cur     2         3
1411 high      yes     non     1         4
1443 high      yes     non     2         3
1446 high      yes      ex     1         4
1485 high      yes      ex     2         3
1496 high      yes     cur     1         4
1573 high      yes     cur     2         3
1621  low       no     non     3         2
1626  low       no      ex     3         2
1631  low       no      ex     4         1
1634  low       no     cur     3         2
1717  low       no     cur     4         1
1785  low      yes     non     3         2
1790  low      yes     non     4         1
1791  low      yes      ex     3         2
1795  low      yes      ex     4         1
1799  low      yes     cur     3         2
1845  low      yes     cur     4         1
1905 high       no     non     3         2
1910 high       no     non     4         1
1911 high       no      ex     3         2
1915 high       no      ex     4         1
1918 high       no     cur     3         2
1951 high       no     cur     4         1
1987 high      yes     non     3         2
1993 high      yes     non     4         1
1994 high      yes      ex     3         2
1998 high      yes      ex     4         1
2000 high      yes     cur     3         2
2039 high      yes     cur     4         1
Code
brant::brant(lab_3_3_model)
-------------------------------------------- 
Test for    X2  df  probability 
-------------------------------------------- 
Omnibus     10.66   8   0.22
airhigh     1.39    2   0.5
exposureyes 0.04    2   0.98
smokingex   7.22    2   0.03
smokingcur  5.53    2   0.06
-------------------------------------------- 

H0: Parallel Regression Assumption holds
Warning in brant::brant(lab_3_3_model): 1 combinations in table(dv,ivs) do not
occur. Because of that, the test results might be invalid.
Code
# Fit the proportional odds model
lab_3_3_cum_model <- VGAM::vglm(
  level_rev ~ air + exposure + smoking,
  family = cumulative(parallel = FALSE),
  data = poll
  )

# Summary
VGAM::summaryvglm(lab_3_3_cum_model)

Call:
VGAM::vglm(formula = level_rev ~ air + exposure + smoking, family = cumulative(parallel = FALSE), 
    data = poll)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1 -5.07486    0.59333  -8.553  < 2e-16 ***
(Intercept):2 -2.84239    0.22043 -12.895  < 2e-16 ***
(Intercept):3 -2.08562    0.16441 -12.685  < 2e-16 ***
airhigh:1     -0.01699    0.14598  -0.116   0.9073    
airhigh:2     -0.11130    0.11312  -0.984   0.3252    
airhigh:3     -0.01645    0.09946  -0.165   0.8687    
exposureyes:1  0.91005    0.14403   6.318 2.64e-10 ***
exposureyes:2  0.87061    0.11256   7.734 1.04e-14 ***
exposureyes:3  0.84431    0.10292   8.204 2.33e-16 ***
smokingex:1    1.26099    0.65832   1.915   0.0554 .  
smokingex:2    0.03130    0.28822   0.109   0.9135    
smokingex:3    0.42984    0.20193   2.129   0.0333 *  
smokingcur:1   3.05178    0.59282   5.148 2.63e-07 ***
smokingcur:2   1.75824    0.22084   7.962 1.70e-15 ***
smokingcur:3   1.82920    0.16581  11.032  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]), 
logitlink(P[Y<=3])

Residual deviance: 4161.64 on 6252 degrees of freedom

Log-likelihood: -2080.82 on 6252 degrees of freedom

Number of Fisher scoring iterations: 6 

Warning: Hauck-Donner effect detected in the following estimate(s):
'(Intercept):1', '(Intercept):2'


Exponentiated coefficients:
    airhigh:1     airhigh:2     airhigh:3 exposureyes:1 exposureyes:2 
    0.9831513     0.8946705     0.9836885     2.4844372     2.3883710 
exposureyes:3   smokingex:1   smokingex:2   smokingex:3  smokingcur:1 
    2.3263795     3.5289310     1.0317980     1.5370140    21.1530498 
 smokingcur:2  smokingcur:3 
    5.8022116     6.2289294 

We view the result in Equation 32.

\begin{align*} \log\left(\frac{P(\text{level\_rev} \le 1)}{P(\text{level\_ref} > 1)}\right) &= -5.07486 - 0.01699 x_{1} + 0.91005 x_{2} + 1.26099 x_{3} + 3.05178 x_{4} \\ \log\left(\frac{P(\text{level\_rev} \le 2)}{P(\text{level\_ref} > 2)}\right) &= -2.84239 - 0.11130 x_{1} + 0.87061 x_{2} + 0.03130 x_{3} + 1.75824 x_{4} \\ \log\left(\frac{P(\text{level\_rev} \le 3)}{P(\text{level\_ref} > 3)}\right) &= -2.08562 - 0.01645 x_{1} + 0.84431 x_{2} + 0.40003 x_{3} + 1.82920 x_{4} \\ \end{align*} \tag{32}

  1. Write the fitted model obtained in QUESTION 2 of PROBLEM 3.

  2. Using the fitted model from QUESTION 2 of PROBLEM 3, interpret the odds ratio corresponding to the \texttt{exposure} predictor. Construct a 95% confidence interval (CI) for the corresponding population odds ratio.

  3. Using the model that you fit in question 2 of problem 3, compute the predicted probability of the following event: that a person has cough equal to or less than 3 months per year given that air pollution where they live is low, they are not exposed to pollution at their job, and they are a non-smoker.

The cumulative probabilities of interest are $ P ( )$, $ P ( )$, and $ P ( )$ and the corresponding cumulative logits are as shown in (Problem 3 Question 1).

\begin{align*} &\log{\left( \frac{P \left( \text{level\_rev} \le 1 \right)}{P \left( \text{level\_ref} > 1 \right)} \right)} \\ &\log{\left( \frac{P \left( \text{level\_rev} \le 2 \right)}{P \left( \text{level\_ref} > 2 \right)} \right)} \\ &\log{\left( \frac{P \left( \text{level\_rev} \le 3 \right)}{P \left( \text{level\_ref} > 3 \right)} \right)} \end{align*}\tag{Problem 3 Question 1}

Code
# Fit the proportional odds model
lab_3_2_model <- VGAM::vglm(
  level_rev ~ air + exposure + smoking,
  family = cumulative(parallel = TRUE),
  data = poll
  )

# Summary
summary(lab_3_2_model)

Call:
VGAM::vglm(formula = level_rev ~ air + exposure + smoking, family = cumulative(parallel = TRUE), 
    data = poll)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1 -3.89385    0.17786 -21.893   <2e-16 ***
(Intercept):2 -2.96964    0.16927 -17.544   <2e-16 ***
(Intercept):3 -2.08844    0.16329 -12.790   <2e-16 ***
airhigh       -0.03929    0.09370  -0.419   0.6750    
exposureyes    0.86476    0.09546   9.059   <2e-16 ***
smokingex      0.40003    0.20187   1.982   0.0475 *  
smokingcur     1.85271    0.16503  11.227   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]), 
logitlink(P[Y<=3])

Residual deviance: 4178.511 on 6260 degrees of freedom

Log-likelihood: -2089.256 on 6260 degrees of freedom

Number of Fisher scoring iterations: 5 

No Hauck-Donner effect found in any of the estimates


Exponentiated coefficients:
    airhigh exposureyes   smokingex  smokingcur 
  0.9614696   2.3744386   1.4918673   6.3770828 

(The model below in Equation 33 is for QUESTION 4)

\begin{align*} &\log\left(\frac{P(\text{level\_rev} \le 1)}{P(\text{level\_ref} > 1)}\right) = -3.89385 - 0.03929 \, x_{1} + 0.86476 \, x_{2} + 0.40003 \, x_{3} + 1.85271 x_{4} \\ &\log\left(\frac{P(\text{level\_rev} \le 2)}{P(\text{level\_ref} > 2)}\right) = -2.96964 - 0.03929 \, x_{1} + 0.86476 \, x_{2} + 0.40003 \, x_{3} + 1.85271 x_{4} \\ &\log\left(\frac{P(\text{level\_rev} \le 3)}{P(\text{level\_ref} > 3)}\right) = -2.08844 - 0.03929 \, x_{1} + 0.86476 \, x_{2} + 0.40003 \, x_{3} + 1.85271 x_{4} \end{align*} \tag{33}

From the coefficient table for the cumulative logistic model (without the proportional assumption), we see that the slopes for the air predictor are similar across the 3 cumulative logit equations. The same can be said for the slopes for the exposure predictor. The slopes for the two smoking dummy variables do appear to differ across the 3 cumulative logit equations, which is not consistent with the parallelism implied by the proportional odds assumption.

This analysis also concurs with the p values from Brant’s test where we see small p values for the two smoking dummy variables (p = 0.03 and p = 0.06), suggesting that the proportional odds assumption may not hold for the smoking variable. However, if we look at the omnibus test first, the corresponding p value of 0.22 would suggest that there is not sufficient evidence to claim that there is a violation of the proportional odds assumption, if we consider all of the slopes simultaneously. In a case like this, where we may be unclear about the validity of the proportional odds assumption, some people might choose to stick with the proportional odds model, but make note that the proportional odds assumption may be violated for the smoking variable. Others might opt to simply use the cumulative logistic model, though, one major drawback is that this model is more complex. Another type of model that may be suitable, which we will not cover, is a partial proportional odds model. This model would use the proportional odds assumption for the \texttt{air} and \texttt{exposure} variables but not for the \texttt{smoking} dummy variables.

Shown in Equation 34 below.

\begin{align*} &\log\left( \frac{P(\text{level\_rev} \le 1 \mid x_{1}, x_{2}, x_{3}, x_{4})}{P(\text{level\_ref} > 1 \mid x_{1}, x_{2}, x_{3}, x_{4})} \right) \\ &= -3.89385 - 0.03929 x_{1} + 0.86476 x_{2} + 0.40003 x_{3} + 1.85271 x_{4} \\ \\ &\log\left( \frac{P(\text{level\_rev} \le 2 \mid x_{1}, x_{2}, x_{3}, x_{4})}{P(\text{level\_ref} > 2 \mid x_{1}, x_{2}, x_{3}, x_{4})} \right) \\ &= -2.96964 - 0.03929 x_{1} + 0.86476 x_{2} + 0.40003 x_{3} + 1.85271 x_{4} \\ \\ &\log\left( \frac{P(\text{level\_rev} \le 3 \mid x_{1}, x_{2}, x_{3}, x_{4})}{P(\text{level\_ref} > 3 \mid x_{1}, x_{2}, x_{3}, x_{4})} \right) \\ &= -2.08844 - 0.03929 x_{1} + 0.86476 x_{2} + 0.40003 x_{3} + 1.85271 x_{4} \end{align*} \tag{34}

The estimated odds ratio corresponding to the job exposure predictor is e^{0.86476} = 2.37. The odds of having worse respiratory response vs. better respiratory response for those with job exposure to pollution are 2.37 times the odds for those with no job exposure, adjusting for air pollution and smoking status. We can construct a Wald CI the same way that we have done with other GLMs. The 95% CI for the slope is 0.86476 \pm 1.96 \times 0.09546 = (0.68, 1.05). Therefor, the 95% CI for for the OR is \left( e^{0.68} , e^{1.05} \right) = (1.97, 2.86).

We are asked to calculate (\hat{P}(\text{level\_ref} = 3 \mid x_{1} = 0, x_{2} = 0, x_{3} = 0, x_{4} = 0)) The solution is shown in Equation 35 below, where x_{i}=0 for i= \left( 1,2,3,4 \right).

\begin{align} \hat{P}(\text{level\_ref} = 3 \mid x_{i} = 0) & = \hat{P}(\text{level\_ref} \le 3 \mid x_{i} = 0) - \hat{P}(\text{level\_ref} \le 2 \mid x_{i} = 0) \\ \hat{P}(\text{level\_ref} = 3 \mid x_{i} = 0) & = \frac{e^{-2.08844}}{1 + e^{-2.08844}} - \frac{e^{-2.96964}}{1 + e^{-2.96964}} \\ \hat{P}(\text{level\_ref} = 3 \mid x_{i} = 0) & \approx 0.06 \end{align} \tag{35}