Code
#library(DT)
library(rio)
library(dplyr)
library(epiR)
library(survival)
library(dplyr)
This chapter of Applied Categorical Data Analysis by JH Klopper and DH Lee is licensed under Attribution-NonCommercial-NoDerivatives 4.0 International
In all preceding chapters, we have assumed that the observations in our samples are independent. In this chapter, we explore data where the independence assumption no longer holds, that is to say, we have pairing or matching of observations.
We start by exploring binary responses (dichotomous outcome variables) to clustered data.
Load the required libraries for Chapter 10. If these packages are not already installed, please do so before loading the libraries.
#library(DT)
library(rio)
library(dplyr)
library(epiR)
library(survival)
library(dplyr)
In some studies, observations naturally form matched pairs. Examples are listed below.
Matching can also be introduced by design. In cohort studies, matching can be done on certain factors such as age, preexisting conditions, and so on. Here, matching ensures that the baseline distribution of the matching factors are identical between the exposed and unexposed subjects. Such a study design reduces bias that may arise if there were an imbalance on the matching factors. A pair of individuals, one or more in the exposed group and one or more in the unexposed group, will then be of similar age and have the same comorbidities.
In an example of matching by design, we consider case-control series, where matching on certain factors ensures that cases and controls do not have imbalance on these factors. This is done to increase precision of effect estimates upon adjustment for the matching factors.
In studies using matched pairs, it is the matched pairs that are the sampling unit. Observations within a matched pair are correlated. Observations from different matched pairs remain independent. This is analogous to the paired t test.
For table methods we will explore the use of McNemar’s test and Mantel-Haenszel methods. We will also explore conditional logistic regression. Note that there are even more methods, but they will not be considered here.
Some aspects of the analysis of matched pairs are being debated. You can read more on case-control series and matched pairs in the paper published at https://www.bmj.com/content/352/bmj.i969 The authors state the following: There are two common misconceptions about case-control studies: that matching in itself eliminates (controls) confounding by the matching factors, and that if matching has been performed, then a “matched analysis” is required. However, matching in a case-control study does not control for confounding by the matching factors; in fact it can introduce confounding by the matching factors even when it did not exist in the source population. Thus, a matched design may require controlling for the matching factors in the analysis. However, it is not the case that a matched design requires a matched analysis. Provided that there are no problems of sparse data, control for the matching factors can be obtained, with no loss of validity and a possible increase in precision, using a “standard” (unconditional) analysis, and a “matched” (conditional) analysis may not be required or appropriate.
That being said, matched-pair analysis is performed in the reported literature and it is useful to be aware of the three approahces, McNemar’s test, Mantel-Haenszel methods, and conditional logistic regression.
In this chapter, we consider an illustrative example of mydriatic eye drops and the use of a placebo and a new active drug that purports to return pupil function to normal. Mydriatics dilate the pupils after application, such that ophthalmological examination can be performed. As a complication, light-sensitivity remains after application, as the pupils remain dilated for some time. A new eye drop solution is supposed to return pupils to normal activity soon after application. In a study of 124 subject both pupils are dilated with a mydriatic. Saline solution (placebo) is instilled in one eye and the new solution in the other eye. Both eyes are evaluated for normal activity after 10 minutes.
We import the data below as a tibble object, and assign it to a computer variable, eye
.
<- import_dataset("ophth.csv") eye
The first six rows shows that data for each participant is recorded twice (once for each eye). Encoding is as follows.
#DT::datatable(eye)
The first participant had a paradoxical response. In the eye that received a placebo solution, the pupil returned to normal. In the eye that received the new treatment, the pupil stayed dilated.
We can summarize the results in a 2 \times 2 table, shown in Table 1.
Placebo eye normal | Placebo eye dilated | |
---|---|---|
Treated eye normal | 8 | 72 |
Treated eye still dilated | 25 | 19 |
We should not consider a \chi^{2} test for association as we have matched pairs. We could consider the association between those in which the treated eye returns to normal and the untreated eye stays dilated against those in which the treated eye stays dilated and the untreated eye returns to normal.
In essence, our table design is as shown below, termed a population-averaged table or a marginal table, since the margins provide direct estimates of the marginal population proportions.
Placebo eye $\overline{E}$ | ||||
---|---|---|---|---|
Normal $D$ | Dilated $\overline{D}$ | |||
Treated eye $E$ | Normal $D$ | 8 | 72 | 80 |
Dilated $\overline{D}$ | 25 | 19 | 44 | |
33 | 97 | 124 |
From the population-averaged table, we note that the treated eye returned to normal in 80 cases, compared to 33 cases in which the placebo eye returned to normal. Compare this to Table 2 below, where, because each participant contributes data for two eyes, we have 248 observations. The table below is termed a subject-specific table and arises from pooling the 124 matched pairs.
Normal | Dilated | Total | |
---|---|---|---|
Treated | 80 | 44 | 124 |
Placebo | 33 | 91 | 124 |
Total | 113 | 135 | 248 |
In the code below, we use the factor
function to set the order of the levels and we also label the binary outcomes. The xtabs
function is used to create the subject-specific table.
$undilated <- factor(
eye$undilated,
eyelevels = c(1, 0),
labels = c("Normal", "Dilated")
)$treatment <- factor(
eye$treatment,
eyelevels = c("trt", "placebo"),
labels = c("Treatment", "Placebo")
)xtabs(
~ treatment + undilated,
data = eye
)
undilated
treatment Normal Dilated
Treatment 80 44
Placebo 33 91
Each of the 124 subjects will fall into one of the four outcomes shown in Table 3. The totals follow from the population-averaged table.
A total of 19 subjects had the following outcome.
Normal | Dilated | |
---|---|---|
Treated | 0 | 1 |
Placebo | 0 | 1 |
Total | 0 | 2 |
If we reconsider the data, we note that a total of 72 subjects had the outcome shown in Table 4.
Normal | Dilated | |
---|---|---|
Treated | 1 | 0 |
Placebo | 0 | 1 |
Total | 1 | 1 |
A total of 25 subjects had the outcome shown in table Table 5.
Normal | Dilated | |
---|---|---|
Treated | 0 | 1 |
Placebo | 1 | 0 |
Total | 1 | 1 |
A total of 8 subjects had the outcome shown in Table 6.
Normal | Dilated | |
---|---|---|
Treated | 1 | 0 |
Placebo | 1 | 0 |
Total | 2 | 0 |
Now that we have summarized the data, we can consider the analysis of matched pairs.
McNemar’s test is used to assess association between paired binary variables. The observed pair counts table and the population paired probabilities table, both shown below, identify the frequencies and then the population proportions for exposure status, E (new solution) and \overline{E} (saline solution), and disease status, D (normal) and \overline{D} (dilated).
Saline solution $\overline{E}$ | ||||
---|---|---|---|---|
Normal $D$ | Dilated $\overline{D}$ | |||
New solution $E$ | Normal $D$ | $n_{11}$ | $n_{12}$ | $n_{1+}$ |
Dilated $\overline{D}$ | $n_{21}$ | $n_{22}$ | $n_{2+}$ | |
$n_{+1}$ | $n_{+2}$ | $n$ |
Saline solution $\overline{E}$ | ||||
---|---|---|---|---|
Normal $D$ | Dilated $\overline{D}$ | |||
New solution $E$ | Normal $D$ | $\pi_{11}$ | $\pi_{12}$ | $\pi_{1+}$ |
Dilated $\overline{D}$ | $\pi_{21}$ | $\pi_{22}$ | $\pi_{2+}$ | |
$\pi_{+1}$ | $\pi_{+2}$ |
We want to see if there is an association between exposure and disease. Our null hypothesis is that there is no association between the two binary variables. We have that each exposed observation is paired with an unexposed observation. Therefor, under the null hypothesis, we expect equal proportions of exposed and unexposed to have the disease. We have the corresponding null hypothesis in Equation 1, referred to as marginal homogeneity.
H_{0}: \; \pi_{1+} - \pi_{+1} = 0 \tag{1}
In Equation 1, we state that the proportions of disease (normal eye) is the same in the exposed (treated with new eye drop solution) and in the unexposed (saline solution).
Note that we can write Equation 1 as \pi_{1+} - \pi_{+1} = (\pi_{11} + \pi_{12}) - (\pi_{11} + \pi_{21}) = \pi_{12} - \pi_{21}. We rewrite the null hypothesis in Equation 2.
H_{0}: \; \pi_{12} - \pi_{21} = 0 \tag{2}
The frequencies, n_{12} and n_{21}, are the discordant pairs and the frequencies, n_{11} and n_{22}, are the concordant pairs. From Equation 2, we only need the discordant pairs. For the discordant pairs, we have \pi_{12}, being the treated eye returns to normal and the saline eye stays dilated, and \pi_{21}, being the treated eye stays dilated and the saline eye returns to normal. In the concordant pairs, n_{11} and n_{22}, both eyes have the same outcome (stay dilated or return to normal). The sum total of discordant pairs in our illustrative example is 72+25=97.
If the total number of discordant pairs is denoted by n^{*} = n_{12} + n_{21}, then, under the null hypothesis, any discordant pair is equally likely to fall in each of the discordant groups, that is, a probability to fall in either of \frac{1}{2} \times n^{*}.
The McNemar’s test statistic, Z is given in Equation 3, together with its distribution.
Z = \frac{n_{12} - \frac{1}{2} n^{*}}{\sqrt{n^{*}\frac{1}{2}\frac{1}{2}}} = \frac{n_{12} - n_{21}}{\sqrt{n_{12} + n_{21}}} \sim \text{N}(0,1^{2}), \text{ under } H_{0} \tag{3}
In Equation 3, we have the assumption that the number of discordant pairs is \ge 10. If the number of discordant pairs is less than 10, we use the exact binomial distribution.
Note that McNemar’s test statistic is often reported as Z^{2}, shown in Equation 4.
Z^{2} = \frac{\left( n_{12} - n_{21} \right)^{2}}{n_{12} + n_{21}} \sim \chi^{2}_{1}, \text{ under } H_{0} \tag{4}
In Equation 5, we use the data from our illustrative example to calculate Z^{2}.
Z^{2} = \frac{(72 - 25)^{2}}{72 + 25} \approx 22.773 \tag{5}
This is much larger than the critical test statistic, \chi^{2}_{1,0.05}, calculated below using the qchisq
function.
qchisq(0.05, df = 1, lower.tail = FALSE)
[1] 3.841459
We use the pchisq
function to calculate the p value for our Z^{2} test statistic value.
pchisq(22.7732, df = 1, lower.tail = FALSE)
[1] 1.822901e-06
We can simply use R to conduct McNemar’s. First we create a table to hold the population-averaged data and then use the mcnemar.test
function for a 5% significance level.
<- matrix(c(8, 72, 25, 19), nrow = 2)
eye_pop_ave_table eye_pop_ave_table
[,1] [,2]
[1,] 8 25
[2,] 72 19
mcnemar.test(eye_pop_ave_table, correct = FALSE)
McNemar's Chi-squared test
data: eye_pop_ave_table
McNemar's chi-squared = 22.773, df = 1, p-value = 1.823e-06
The results show that we can reject the null hypothesis at the 5% significance level. There is an association between the exposure and the disease.
Now that we know that there is an association, we can consider the measures of association. The two population-averaged measures of association for matched pair samples are the risk difference and the odds ratio.
The population-averaged risk difference is given in Equation 6, the proportion of the difference between the discordant pairs.
\widehat{\text{RD}} = \frac{n_{12} - n_{21}}{n} \tag{6}
The 100(1 - \alpha)% confidence interval (CI) for Equation 6 is shown in Equation 7.
\widehat{\text{RD}} \pm z_{\frac{\alpha}{2}} \frac{\sqrt{\left( n_{12} + n_{21} - \frac{{\left( n_{12} - n_{21} \right)}^{2}}{n} \right)}}{n} \tag{7}
We calculate the estimated risk difference for our illustrative example in the code cell below.
<- (72 - 25) / 124
rd_est round(rd_est, digits = 3)
[1] 0.379
The 95% CI is calculated next.
<- qnorm(0.975) # Confidence coefficient
z_crit <- sqrt((72 + 25) - (((72-25)^2) / 124)) / 124 # Standard error
se <- rd_est - z_crit * se
lower <- rd_est + z_crit * se
upper
round(lower, digits = 3)
[1] 0.238
round(upper, digits = 3)
[1] 0.52
We can conclude that the chance of a pupil treated with the new solution to return to normal is 37.9 percentage points greater than the chance of a pupil treated with saline solution to return to normal. We are 95% confident that the true risk difference in the population is between 23.8 and 52.0 percentage points.
Note that a population risk difference of 0 is not in the interval of plausible values.
If we consider the sample proportions given in Equation 8, then we can express the estimated odds ratio as in Equation 9.
\begin{align} &p_{1+} = \frac{n_{1+}}{n}, \; p_{2+} = \frac{n_{2+}}{n} \\ &p_{+1} = \frac{n_{+1}}{n}, \; p_{+2} = \frac{n_{+2}}{n} \end{align} \tag{8}
\widehat{\text{OR}} = \frac{\frac{p_{1+}}{1 - p_{1+}}}{\frac{p_{+1}}{1 - p_{+1}}} = \frac{\frac{p_{1+}}{p_{2+}}}{\frac{p_{+1}}{p_{+2}}} = \frac{p_{1+} p_{+2}}{p_{2+} p_{+1}} = \frac{n_{1+} n_{+2}}{n_{2+} n_{+1}} \tag{9}
We have that Equation 9 is based on a logistic model for the marginal probability of response. We will consider an alternative method for calculating the CIs.
The code chunk below calculates Equation 9 for our illustrative example.
round((80 * 91) / (44 * 33), digits = 3)
[1] 5.014
We conclude that the odds of achieving normal function following installation of the new solutions are 5.014 times the odds of achieving normal function following the use of saline.
While we used a population-average approach, we can also consider a subject-specific approach with respect to the subject specific odds ratio and its CI.
We can make note of the fact that matched pair data is actually simply highly stratified data, where the strata are the matched pairs themselves. This means that we can use Mantel-Haenszel techniques. By the way, a further advantage is that we can even consider one-to-many and many-to-many matching.
In the case of our illustrative example, we can construct 124 subject-specific tables or 2 \times 2 tables stratified by subject (pairs of eyes), shown in Table 7. Note that the values in each row are either n_{11} = 1, n_{12} = 0 and n_{21} = 0, n_{22} = 1 or n_{11} = 0, n_{12} = 1 and n_{21} = 1, n_{22} = 0.
D | \overline{D} | ||
---|---|---|---|
E | n_{11} | n_{12} | 1 |
\overline{E} | n_{21} | n_{22} | 1 |
n_{+1} | n_{+2} | 2 |
We note subject 1 below, who had a discordant response as mentioned before.
xtabs(
~ treatment + undilated,
data = eye %>% filter(subj == 1)
)
undilated
treatment Normal Dilated
Treatment 0 1
Placebo 1 0
Subject 9 had a concordant response (both eyes had the same response).
xtabs(
~ treatment + undilated,
data = eye %>% filter(subj == 9)
)
undilated
treatment Normal Dilated
Treatment 1 0
Placebo 1 0
The Mantel-Haenzsel estimated odds ratio is calculated as in Equation 10, from the population-averaged table.
\widehat{\text{OR}}_{\text{ subject specific}} = \frac{n_{12}}{n_{21}} \tag{10}
For our illustrative example we have \widehat{\text{OR}}_{\text{ss}} = 72 \div 25 = 2.88.
In R we can create a table containing all the stratified tables and use the epi.2by2
function to conduct a test of matched pairs with the subject identification as variable that we stratify on.
<- table(
stratified_tables Treatment = eye$treatment, # Rows
Normal = eye$undilated, # Columns
Subject = eye$subj # Strata
)
::epi.2by2(stratified_tables) epiR
Outcome + Outcome - Total Inc risk *
Exposed + 80 44 124 64.52 (55.42 to 72.90)
Exposed - 33 91 124 26.61 (19.08 to 35.30)
Total 113 135 248 45.56 (39.25 to 51.99)
Point estimates and 95% CIs:
-------------------------------------------------------------------
Inc risk ratio (crude) 2.42 (1.76, 3.34)
Inc risk ratio (M-H) 2.42 (1.67, 3.53)
Inc risk ratio (crude:M-H) 1.00
Inc odds ratio (crude) 5.01 (2.92, 8.62)
Inc odds ratio (M-H) 2.88 (1.83, 4.54)
Inc odds ratio (crude:M-H) 1.74
Attrib risk in the exposed (crude) * 37.90 (26.44, 49.37)
Attrib risk in the exposed (M-H) * 37.90 (37.90, 37.90)
Attrib risk (crude:M-H) 1.00
-------------------------------------------------------------------
M-H test of homogeneity of IRRs: chi2(123) = NaN Pr>chi2 = NA
M-H test of homogeneity of ORs: chi2(123) = 71.680 Pr>chi2 = 1.000
Test that M-H adjusted OR = 1: chi2(1) = 22.773 Pr>chi2 = <0.001
Wald confidence limits
M-H: Mantel-Haenszel; CI: confidence interval
* Outcomes per 100 population units
The table that we see contains the marginal values of the population-averaged table. The crude odds ratio estimate (Odds ratio (crude)) shows the population odds ratio that we calculated above. We use the epi.2by2
function incorrectly as it does not allow for matching. The CIs for the OR are therefor not correct. In fact, we ignore all the results except the Odds ratio (M-H) section.
The Mantel-Haenszel odds ratio (Odds ratio (M-H)) is the result and CI that we require. We see a subject-specific odds ratio estimate of 2.88 with a 95% CI of between 1.83 and 4.54.
The interpretation is on a subject level. We state that for any given subject, the odds of an pupil returning to normal function after receiving the new eye solution is 2.88 times their odds of the pupil returning to normal function after saline administration.
Logistic regression models can be used for matched pairs. One commonly used approach is conditional logistic regression.
For each set, i = 1,2,3, \ldots , n, matched observations, we refer to observation 1 and observation 2. The outcome variable for observation 1 in a matched set i is Y_{i1} and the outcome variable for observation 2 in matched set i is Y_{i2}. For our illustrative example we have Y_{i1} corresponding to dilatation status, (0,1), for the placebo eye, and Y_{i2} corresponding to the dilation status, (0,1), of the treatment eye.
The predictor or explanatory variables for observation 1 in a matched set i is x_{i1} and the explanatory variables for observation 2 in matched set i is x_{i2}. For our illustrative example we have x_{i1}=1 indicating placebo, and x_{i2} = 1 indicating treatment with the new solution.
The conditional logistic model considers each subject with different coefficients for each subject, but assumes that the effect of x_{it} is the same across pairs, and is given in Equation 11, where Y_{it} and x_{it} for t=1,2 are defined as above.
\text{logit}[P(Y_{it} = 1)] = \alpha_{i} + \beta x_{it} \Leftrightarrow P(Y_{it} = 1) = \frac{e^{\alpha_{i} + \beta x_{it}}}{1 + e^{\alpha_{i} + \beta x_{it}}} \tag{11}
If \beta = 0, then we have marginal homogeneity, that is, the probability of success is the same for both observations for any matched pair.
Note that a matched pair is a set of eyes from a single individual. If both eyes in a particular person remains dilated then their \alpha_{i} value will be very large. In contrast, if both eyes in a particular person returns to normal, then their \alpha_{i} value will be very small (large negative value). By specifying \alpha_{i} for each individual, the model is flexible enough to account for this.
For a sample size of 124 subjects, we will have 125 coefficients (124 \alpha_{i} values and one \beta value). This is a lot of computation when using maximum likelihood estimation (MLE). We are typically not interested in the intercepts, called nuisance parameters and only want to estimate \beta. In fact, we use conditional MLE to find an estimate for \beta and eliminate the subject-specific parameters, \alpha_{i}.
The section below on the three steps are optional and are only for those that have taken a course in statistical inference and are interested in the reason why we can eliminate subject-specific parameters, \alpha_{i}. It is not required for this course and will not be discussed. The function in R that we will use, completes the three main steps of conditional MLE.
To understand step 1 above, consider that we know the values for \alpha_{i} and \beta within a matched pair. The observations within this pair are independent. We use the product rule for probabilities, shown in Equation 12, where we use the Bernoulli distribution probability mass function \pi^{k} (1 - \pi)^{n-k}.
\begin{align} &P(Y_{i1} = y_{i1},\, Y_{i2} = y_{i2}) = P(Y_{i1} = y_{i1}) P(Y_{i2} = y_{i2}) \\ = &{\left( \frac{e^{\alpha_{i} + \beta x_{i1}}}{1 + e^{\alpha_{i} + \beta x_{i1}}} \right)}^{y_{i1}} {\left( \frac{1}{1 + e^{\alpha_{i}+ \beta x_{i1}}} \right)}^{1-y_{i1}} {\left( \frac{e^{\alpha_{i} + \beta x_{i2}}}{1 + e^{\alpha_{i} + \beta x_{i2}}} \right)}^{y_{i2}} {\left( \frac{1}{1 + e^{\alpha_{i}+ \beta x_{i2}}} \right)}^{1-y_{i2}} \\ = &c_{i} e^{(y_{i1} + y_{i2})\alpha_{i} + (y_{i1}x_{i1} + y_{i1}x_{i1}) \beta}, \text{ where } c_{i} = (1 + e^{\alpha_{i} + \beta x_{i1}})^{-1} + (1 + e^{\alpha_{i} + \beta x_{i2}})^{-1} \end{align} \tag{12}
If we condition on a sufficient statistic (requires knowledge from a statistical inference course), S_{i} = Y_{i1} + Y_{i2}, for a\lpha_{i}, then we can eliminate \alpha_{i} similar to the conditioning on the row and the column totals in Fisher’s exact test.
We have seen that there are four possible outcomes for each subject. The concordant pairs are not of interest since conditional on S_{i} = Y_{i1} + Y_{i2} = 0 we must have that Y_{i1} = Y_{i2} = 0 and P(Y_{i1} = 0 , Y_{i2} = 0 \, \vert \, Y_{i1} + Y_{i2} = 0) = 1, and conditional on S_{i} = Y_{i1} + Y_{i2} = 2 we must have that Y_{i1} = Y_{i2} = 1 and P(Y_{i1} = 1 , Y_{i2} = 1 \, \vert \, Y_{i1} + Y_{i2} = 2) = 1. In neither case is \beta involved. Such matched pairs are not used in the conditional MLE.
On the other hand, conditioning on S_{i} = Y_{i1} + Y_{i2} = 1, we have Equation 13.
\begin{align} &P(Y_{i1} = 0, \, Y_{i2} = 1 \; \vert \; Y_{i1} + Y_{i2} = 1) = \frac{e^{\beta x_{i2}}}{e^{\beta x_{i1}} + e^{\beta x_{i2}}} \\ &P(Y_{i1} = 1, \, Y_{i2} = 0 \; \vert \; Y_{i1} + Y_{i2} = 1) = \frac{e^{\beta x_{i1}}}{e^{\beta x_{i1}} + e^{\beta x_{i2}}} \end{align} \tag{13}
In Equation 13, we do not have \alpha_{i} and have conditioned it away. The conditional likelihood, S_{i} = Y_{i1} + Y_{i2} = 1, is the product of all the pair-specific likelihoods, shown in Equation 14.
L( \beta ) = \prod_{\text{all } (0,1) \text{ pairs}} \frac{e^{\beta x_{i2}}}{e^{\beta x_{i1}} + e^{\beta x_{i2}}} \prod_{\text{all } (1,0) \text{ pairs}} \frac{e^{\beta x_{i1}}}{e^{\beta x_{i1}} + e^{\beta x_{i2}}} \tag{14}
Maximizing Equation 14, yields the slope estimate.
This procedure generalizes to cases with multiple predictors as well as different matching schemes such as one-to-many and many-to-many and the likelihood function uses estimate proportional hazards models.
Of interest to this textbook on applied categorical data analysis, we have that interpretation of the conditional logistic model is subject-specific, where \beta = \log{\text{OR}} for observations from the same matched pair and e^{\beta} = \text{OR} for observations from the same matched pair.
For our illustrative example, we will let the following hold.
Our model is as described in Equation 11, where t = 1,2. We use conditional logistic regression and compare it to the results using the Mantel-Haenszel approach.
We use the clogit
function from the survival
library below.
<- import_dataset("ophth.csv") # Re-imported
eye
<- survival::clogit(
model ~ treatment + strata(subj),
undilated data = eye
)summary(model)
Call:
coxph(formula = Surv(rep(1, 248L), undilated) ~ treatment + strata(subj),
data = eye, method = "exact")
n= 248, number of events= 113
coef exp(coef) se(coef) z Pr(>|z|)
treatmenttrt 1.0578 2.8800 0.2321 4.557 5.2e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treatmenttrt 2.88 0.3472 1.827 4.539
Concordance= 0.742 (se = 0.063 )
Likelihood ratio test= 23.76 on 1 df, p=1e-06
Wald test = 20.76 on 1 df, p=5e-06
Score (logrank) test = 22.77 on 1 df, p=2e-06
We see \hat{\beta} = 1.0578, with e^{\hat{\beta}} = 2.88, as before. We also see the 95% CIs. The test calculates the Wald test statistic, z and expresses a p value for this test statistic for H_{0}: \; \beta = 0.
We conclude that for an individual subject, we have that the estimated odds of the treated eye returning to normal function is 2.88 times the odds of the untreated (saline / placebo instilled) eye returning to normal status.
The exp(-coef)
simply considers the OR for the unexposed compared to the exposed.
A case-control study used matched pairs to study the relationship between adenomatous polyps of the colon and diet (high fiber vs. low fiber). Cases (who have polyps) and controls (who do not have polyps) were matched on several factors including age, gender and race. The table below reports the results for the 105 pairs.
Control | ||||
---|---|---|---|---|
High fibre | Low fibre | |||
Case | High fibre | 45 | 15 | 60 |
High fibre | 20 | 25 | 45 | |
65 | 40 | 105 |
The data from the matched pairs are also available in the file polypcc_data.csv
. The variables in the data set are \texttt{pair} (the pair identifier), \texttt{polyps} (with 1 if case and 0 if control), and \texttt{diet} ( with high or low).
Answer the following questions.
Use McNemar’s test to test the null hypothesis of no association between diet and adenomatous polyps. Use a 5% significance level.
Compute and interpret the population-averaged odds ratio for having adenomatous polyps comparing those with a high fiber diet to those with a low fiber diet.
Use Mantel-Haenszel methods to compute the the matched pairs-specific odds ratio for having adenomatous polyps comparing those with a high fiber diet to those with a low fiber diet. Also report the 95% confidence interval. Interpret both.
Fit an appropriate logistic regression model to analyze the association between development of adenomatous polyps of the colon and diet. Use this model to calculate the OR and its confidence interval for the association between diet and adenomatous polyps for this case-control sample. Does this OR have a marginal of subject-specific interpretation?
Based on the results from the logistic model that you fit in QUESTION 4, use a Wald test at the 5% level of significance to determine whether diet and development of adenomatous polyps are associated.
The data in the file benign_data.csv
contains information from a 1 to 3 matched design studying the risk factors associated with benign breast disease. The data are a subset from a hospital based case-control study designed to examine the epidemiology of fibrocystic breast disease. Data are provided on 50 women who were diagnosed as having benign breast disease and 150 age-matched controls from the same hospital, with three controls per case. The variables in the data set are shown in Table 8 below.
Variable Name | Description | Levels or Units of Measurement |
---|---|---|
\texttt {STR} | Stratum | 1-50 |
\texttt {OBS} | Observation within stratum | 1 for case and 2-4 for control |
\texttt {FNDX} | Final diagnosis | 1 for case and 0 for control |
\texttt {CHK} | Regular medical check-ups | 1 for yes and 2 for no |
\texttt {AGMN} | Age at menarche | Years |
\texttt {WT} | Weight of the subject | Pounds |
Answer the following questions.
Test for an association between final diagnosis and regular medical checkups using Mantel-Haenszel methods. Use a 5% significance level. Also report and interpret the relevant odds ratio and 95% confidence interval.
Fit an appropriate logistic regression model with final diagnosis as the response and CHK, AGMN, and WT as the predictors. Interpret the odds ratio corresponding to the CHK predictor.
Test for an association between final diagnosis and regular medical checkups adjusting for age at menarche and weight. Use a 5% significance level. (Note: Question 3 and 4 are the same. So, the solution to question 3 assesses whether predictors are linearly associated with the log odds of the response.)
Test for an association between final diagnosis and regular medical checkups adjusting for age at menarche and weight. Use a 5% significance level.
Does age at menarche modify the association between final diagnosis and regular medical checkups, adjusting for weight? Perform an appropriate test using a 5% significance level.