--- title: "Dyadic Multilevel Model" output: rmdformats::robobook: html_document: default word_document: default editor_options: chunk_output_type: console --- # Overview Repeated measures data, often obtained from experience sampling or daily diary studies, require analytic methods that accommodate the inherent nesting in the data. One special case of repeated measures data are those from dyads (e.g., couples, parent/child). Dyadic data is structured such that repeated measures are nested within a person, and each person is nested within a dyad. The dyads are assumed to be independently sampled. Non-independece of the members of the dyad and across the repeated occasions are modeled explicitly. *Multivariate multilevel modeling* is one technique for effectively handling this type of data. In this tutorial, we will follow the example from Bolger and Laurenceau (2013) *Chapter 8: Design and Analysis of Intensive Longitudinal Studies of Distingiushable Dyads.* Specifically, we will use the simulated dyadic process data set (p. 150). The data were simulated to represent 100 dual-career heterosexual couples where each partner provided diary reports twice daily over the course of 21 consecutive days. The first report is an end-of-workday report that includes (a) the number of stressors that occurred at work, and (b) work dissatisfaction. The data are complete and clean - with Bolger and Laurenceau's goal being to "create a pedagologically uncomplicated data set." Note that we are using *distinguishable* dyads in this example. *Distinguishability* is typically determined conceptually, based upon a stable, differentiating characteristic (e.g., gender, age, role). Note that notions of "dyadic" can be generalized from persons to any reasonable combination of two variables (e.g., emotion and behavior). In addition, the accompanying "DyadicMLM_Tutorial_2022August20.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). # Outline In this tutorial, we'll cover... * Reading in the data and loading needed packages. * Introducing the research questions and model. * Plotting the data. * The multilevel model. + Males only. + Females only. + The dyad. * Cautions. * Conclusion. # Read in the data and load needed packages. **Let's read the data into R.** The data set ("BLdyads") we are working with contains simulated dyadic repeated measures data. The data set is stored as .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop. ```{r} # Set working directory (i.e., where your data file is stored) # This can be done by going to the top bar of RStudio and selecting # "Session" --> "Set Working Directory" --> "Choose Directory" --> # finding the location of your file setwd("~/Desktop") # Note: You can skip this line if you have #the data file and this .rmd file stored in the same directory # Read in the repeated measures data data <- read.csv(file = "BLdyads.csv", head = TRUE, sep = ",") # View the first 10 rows of the repeated measures data head(data, 10) ``` In the data, we can see each row contains data from a measurement occasion. In total, there should be *N* (number of individuals) x measurement occasions rows per dyad in the data set. In this case, we should have a data set with 4,200 rows (100 dyads x 2 persons x 21 occasions). Specifically, there is a column for: * Couple ID (`coupleid`) * Person ID (`personid`; e.g., 1 = partner 1, 2 = partner 2) * Day within the daily diary study (`time`) * Centered version of `time` (`time7c`; optional) * Gender (`gender`; or whatever feature is distinguishing the dyad; in this case, 0 = male and 1 = female) * Dummy coded gender variables (`male` and `female`) * Outcome variable (in this case, "reldis") * Predictor variable (in this case, "wrkstrs") * Centered version of the predictor variable ("wrkstrsc") * Trait component of the predictor variable ("wrkstrsb") * State component of the predictor variable ("wrkstrsw") The "trick" is to stack the data so that there are two records for each repeated observation. The data file is twice as long, and we structure the model to "turn on" and "turn off" the double records to invoke parameter estimation for each variable. Bolger and Laurenceau's example data are already prepared in this format. **Load the R packages we need.** Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model. ```{r, warning = FALSE, message = FALSE} # install.packages("ggplot2") # Install package if you have never used it before library(ggplot2) # For plotting # install.packages("devtools") # Install package if you have never used it before require(devtools) # For version control # install.packages("nlme") # Install package if you have never used it before library(nlme) # For APIM # install.packages("psych") # Install package if you have never used it before library(psych) # For descriptive statistics ``` # Introduction to the Research Questions and Model. ### The Research Questions. We are going to address: * Is the number of daily work stressors associated with end-of-day relationship satisfaction? + What is the extent of association for the *typical male partner* and for the *typical female partner* (fixed effects)? + Is there heterogeneity in the strength of the association across male partners in couples and across female partners in couples (random effects)? Notice that, so far, the questions are stated separately for the two types of dyad members (males and females). The dyadic longitudinal model provides for another type of question - questions related to non-independence. * Are dyad members' relationship satisfaction on any given day related, after accounting for other explanatory variables? Here, these relations manifest as correlations/covariances between intercepts, slopes, and residuals. ### The Modeling Enterprise. The basic multilevel model is designed as a model with a univariate outcome. The ability to model multiple outcomes simultaneoulsy used to be a distinguishing feature of structural equation models (SEM). However, researches discovered that the multilevel model can be adapted for examination of multivariate outcomes quite easily. One simply has to "trick" the program into thinking that two (or more) variables are one variable. # Plotting the Data. Before we begin running our models, it is always a good idea to look at our data. We start with examining the distribution of our outcome variable end-of-day relationship dissatisfaction, `reldis`. Let's look at the histogram by gender. ```{r, warning = FALSE, message = FALSE} ggplot(# Select data set and variable to plot data = data, aes(x = reldis)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Relationship Dissatisfaction") + # Create separate plot for each gender facet_grid(. ~ gender) + # Plot aesthetics theme_classic() ``` The outcome variable for each gender looks approximately normally distributed, which is good news for when we run our models. Next, let's plot a few dyads' reports of relationship dissatisfaction through the course of the study. Since the the predictor variable has already been split into time-varying (state) and time-invariant (trait) components, we use the time-varying predictor `wrkstrscw` as the "background" context variable. ```{r} ggplot(# Select a subset of dyads # Set time ("time") on x-axis # Plot each dyad member ("personid") separately data = subset(data, coupleid <= 9), aes(x = time, group = personid), legend = FALSE) + # Create rectangles that indicate the time-varying # predictor, daily work stress # Set width of rectangles geom_rect(mapping = aes(xmin = time - .5, xmax = time + .5, # Set height of rectangles ymin = 0, ymax = 10, # Set color of rectangles to indicate work stress fill = wrkstrscw), alpha = 0.6) + # Plot points of variables of interest # Set time ("time") on x-axis, and # variable of interest ("reldis") on y-axis # Change shape, size, and color of point # Set different color for each gender geom_point(aes(x = time, y = reldis, color = factor(gender)), shape = 17, size = 3) + # Plot lines of variables of interest # Set time ("time") on x-axis, and # variable of interest ("reldis") on y-axis # Change style, size, and color of point # Set different color for each gender geom_line(aes(x = time, y = reldis, color = factor(gender)), lty = 1, size = 1) + # Label for x-axis xlab("Time") + # Label for y-axis ylab("Relationship Dissatisfaction") + # Set limits of y-axis ylim(0, 10) + # Set limits and tick marks of x-axis scale_x_continuous(breaks=seq(0, 20, by = 5)) + # Set title for legend of rectangles (workstress) scale_fill_continuous(name = "Daily Work Stress") + # Set title for legend of points/lines (gender) scale_color_manual(name = "Gender", values = c("0" = "blue", "1" = "red")) + # Plot aesthetics theme_classic() + # Create a plot for each dyad facet_wrap( ~ coupleid) ``` It looks like there is quite a lot of day-to-day variability! Finally, we'll examine a histogram of the dyad-level (between-dyad) time-invariant variable is `wrkstrscb` ```{r, warning = FALSE, message = FALSE} ggplot(# Select data set and variable to plot data = data, aes(x = wrkstrscb)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Work Stress (dyad-level centered)") + # Plot aesthetics theme_classic() ``` # The Multilevel Model. We are now ready to start running our models! First, we'll construct a model looking at the within-person and between-person associations of relationship dissatisfaction (`reldis`) with work stressors (`wrkstrs` - which is centered and separated into `wrkstrscw` and `wrkdstrscb`). This multilevel model set-up proceeds as usual; however, we will include a special "dummy" variable indicator as follows: $$reldis_{it} = \beta_{0i}dummy_{it} + \beta_{1i}dummy_{it}time7c_{it}+ \beta_{2i}dummy_{it}wrkstrscw_{it} + e_{it}$$ with two random effects, so that $$\beta_{0i} = \gamma_{00} + \gamma_{01}wrkstrscb_{i} + u_{0i}$$ $$\beta_{1i} = \gamma_{10}$$ $$\beta_{2i} = \gamma_{20} + u_{2i}$$ and residual structures of where $$e_{it} \sim \mathcal{N}(0,\mathbf{R})$$, and $$\mathbf{U_{i}} \sim \mathcal{N}(0,\mathbf{G})$$. where $$\mathbf{R} = \mathbf{I} \left[\begin{array} {r} \sigma^2_{e} \end{array}\right]$$ , which with the auto-regressive structure becomes $$\mathbf{R} = \sigma^2 \left[\begin{array} {rrrr} 1 & \phi & \phi^2 & \cdots & \phi^{T-1} \\ \phi & 1 & \phi & \cdots & \phi^{T-2} \\ \phi^2 & \phi & 1 & \cdots & \phi^{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \phi^{T-1} & \phi^{T-2} & \phi^{T-3} & \cdots & 1 \end{array}\right]$$ and the between-person residuals structure is $$\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u2} \\ \sigma_{u2u0} & \sigma^{2}_{u2} \end{array}\right]$$ Writing out the long equation (after algebraic substitution) we have $$reldis_{it} = \\ \gamma_{00}dummy_{it} + \gamma_{10}dummy_{it}time7c_{it}+ \gamma_{20}dummy_{it}wrkstrscw_{it} + \gamma_{01}dummy_{it}wrkstrscb_{i} + \\ \left[ u_{0i}dummy_{it} + u_{2i}dummy_{it}wrkstrscw_{it} + e_{it} \right]$$ which then is the equation that maps to the `lme()` or `lmer()` code. ## Males-only model First, we run the model for just males. Although not necessary, we use the $dummy_{it}$ variable `male`. *Note also the coding to remove the default intercept* `~ -1 ...` *and for direct specification of interaction terms using colon* `x:z` *This is important. We are tricking the program, so we have to be very specific about what we want.* ```{r} model0male <- lme(# The outcome variable (rel_dis) is regressed onto # no intercept (-1) since we separately estimate an intercept # for males with the dummy coded variables "male", and # the effect of time ("male:time7c"), # association between predictors and outcome as indicated by # the interaction terms of the variable name and dummy code fixed = reldis ~ -1 + male + male:time7c + male:wrkstrscw + male:wrkstrscb, # Allow for random intercepts and slopes # for each dyad (coupleid) # The -1 makes sure no intercept is estimated random = ~ -1 + male + male:wrkstrscw | coupleid, # Set correlation structure, in this case, # a lag-1 autoregressive residual structure correlation = corAR1(), # Select data set data = data[which(data$gender == 0),], # subset to just male # Set optimizer control = list(maxIter = 1000)) # Examine the model summary summary(model0male) ``` We see that the average relationship dissatisfaction for males was 5.08. There were significant "state" effects, such that on days when a man had one more work stressor than usual, his relationship dissatisfaction increased by 0.11 points. There were no significant trends with time or significant "trait" level effects. ## Females-only model Second, we run the model for just females, with the $dummy_{it}$ variable `female`. ```{r} model0female <- lme(# The outcome variable (rel_dis) is regressed onto # no intercept (-1) since we separately estimate an intercept # for females with the dummy coded variables "female", and # the effect of time ("male:time7c"), # association between predictors and outcome as indicated by # the interaction terms of the variable name and dummy code fixed = reldis ~ -1 + female + female:time7c + female:wrkstrscw + female:wrkstrscb, # Allow for random intercepts and slopes # for each dyad (coupleid) # The -1 makes sure no intercept is estimated random = ~ -1 + female + female:wrkstrscw | coupleid, # Set correlation structure, in this case, # a lag-1 autoregressive residual structure correlation=corAR1(), # Select data set data = data[which(data$gender == 1),], # subset to just female # Set optimizer control=list(maxIter=1000)) # Examine the model summary summary(model0female) ``` We see that the average relationship dissatisfaction for females was 4.65. There were significant "state" effects, such that on days when a woman had one more work stressor than usual, her relationship dissatisfaction increased by 0.16 points. There were no significant trends with time or significant "trait" level effects. ## Full model Third, we combine the two models together examining the dyad. As above, we use the two dummy variables to turn on and off the parameters. The parameters invoked with $dummy1$ are associated with one member of the dyad, and parameters invoked with $dummy2$ are associated with the other member of the dyad. $$reldis_{it} = \\ \gamma_{00}dummy1_{it} + \gamma_{10}dummy1_{it}time7c_{it}+ \gamma_{20}dummy1_{it}wrkstrscw_{it} + \gamma_{01}dummy1_{it}wrkstrscb_{i} + \left[ u_{0i}dummy1_{it} + u_{2i}dummy1_{it}wrkstrscw_{it} + e1_{it} \right] + \\ \gamma_{30}dummy2_{it} + \gamma_{40}dummy2_{it}time7c_{it}+ \gamma_{50}dummy2_{it}wrkstrscw_{it} + \gamma_{31}dummy2_{it}wrkstrscb_{i} + \left[ u_{3i}dummy2_{it} + u_{5i}dummy2_{it}wrkstrscw_{it} + e2_{it} \right]$$ Noting that our random effect matrices also expand, $$\mathbf{R} = \left[\begin{array} {rr} \sigma^2_{e1} & \sigma_{e1e2} \\ \sigma_{e1e2} & \sigma^2_{e2} \end{array}\right]$$ where $\sigma_{e1e2}$ is the residual covariance between male and female relationship dissatisfaction. and $$\left[\begin{array} {rrrr} \sigma^{2}_{u0} & \sigma_{u0u2} & \sigma_{u0u3} & \sigma_{u0u5} \\ \sigma_{u2u0} & \sigma^{2}_{u2} & \sigma_{u2u3} & \sigma_{u2u5} \\ \sigma_{u3u0} & \sigma_{u3u2} & \sigma^{2}_{u3} & \sigma_{u3u5} \\ \sigma_{u5u0} & \sigma_{u5u2} & \sigma_{u5u3} & \sigma^{2}_{u5} \end{array}\right]$$ where the matrix is blocks of between-dyad associations, some among males only, some among females only, and some across genders. *These are sample-level, between-dyad relations, and should be interpreted appropriately.* ```{r, cache=TRUE} model1 <- lme(# The outcome variable (rel_dis) is regressed onto # no intercept (-1) since we separately estimate an intercept # for males and females with the dummy coded variables, and # the effect of time ("male:time7c", "female:time7c"), # association between predictors and outcome as indicated by # the interaction terms of the variable name and dummy code fixed = reldis ~ -1 + male + male:time7c + male:wrkstrscw + male:wrkstrscb + female + female:time7c + female:wrkstrscw + female:wrkstrscb, # Allow for random intercepts and slopes # for each dyad (coupleid) # The -1 makes sure no intercept is estimated random = ~ -1 + male + male:wrkstrscw + female + female:wrkstrscw | coupleid, # Set the weights of the variances, # allowing for differences between genders weights = varIdent(form = ~1 | gender), # Set correlation structure, in this case, # compound symmetry over time (coupleid/time) # within each dyad corr = corCompSymm(form = ~1 | coupleid/time), # Select data set data = data, # Set optimizer control=list(maxIter=1000)) # Examine the model summary summary(model1) ``` *Note that the residual structure is not quite the same as in Bolger & Laurenceau (2013).* It is not clear exactly how to get `lme()` structure to match. There is not a straightforward way, but it seems like it is possible. We just haven't found the exact combination of structures to use yet. But, here is another close variant. ```{r, cache=TRUE} model2 <- lme(# The outcome variable (rel_dis) is regressed onto # no intercept (-1) since we separately estimate an intercept # for males and females with the dummy coded variables, and # the effect of time ("male:time7c", "female:time7c"), # association between predictors and outcome as indicated by # the interaction terms of the variable name and dummy code fixed = reldis ~ -1 + male + male:time7c + male:wrkstrscw + male:wrkstrscb + female + female:time7c + female:wrkstrscw + female:wrkstrscb, # Allow for random intercepts and slopes # for each dyad (coupleid) # The -1 makes sure no intercept is estimated random = ~ -1 + male + male:wrkstrscw + female + female:wrkstrscw | coupleid, # Set the weights of the variances, # allowing for differences between genders weights = varIdent(form = ~1 | gender), # Set correlation structure, in this case, # an autoregressive structure # within each dyad and gender (coupleid/gender/time) corr = corAR1(form = ~1 | coupleid/gender/time), # this invokes an AR structure # Select data set data = data, # Set optimizer control=list(maxIter=1000)) # Examine the model summary summary(model2) ``` Although the structure is not exactly matched to the example in the book, the ones we are fitting are among the structures used in the literature. Across variants, the interpretations do not change in meaningful ways, suggesting that looking across the range of possibilites is fine. ### Brief Interpretation. The results for the dyadic model are similar to when the models for males and females were run separately; however, this will not always be the case. **Fixed effects:** On an average day, males and females relationship dissatisfaction is 5.08 and 4.65, respectively, on a 0 to 10 scale. On days when there is an additional work stressor than usual, relationship dissatisfaction increases by 0.11 and 0.16 for males and females, respectively. **Random effects:** The deviation around males and females average relationship dissatisfaction is 1.02 and 0.96, respectively. Additionally, males and females relationship dissatisfaction reports are correlated 0.27, indicating that members of the same dyad often have relatively similar reports of relationship dissatisfaction. The deviation around males and females reactivity to work stressors (i.e., slope of work stressors) is 0.17 and 0.13, respectively. The way in which dyad members' relationship dissatisfaction changes in response to work stressors is quite similar, as indicated by the 0.51 correlation. A more thorough interpretation and formal write-up of the results can be found on pages 165 - 171 of Bolger and Laurenceau (2013) *Chapter 8: Design and Analysis of Intensive Longitudinal Studies of Distingiushable Dyads.* # Cautions. While we only provide a brief description of the multivariate multilevel model for repeated measures dyadic data, we want to a highlight a few consierations when using this model. 1. The model used here is for *distinguishable* dyads. The persons within each dyad are separated (and pooled) according to an a priori defined feature (e.g., gender, role). 2. The model used here examined the *linear* association between work stressors and relationship dissatisfaction. Different formualtions are needed to test for non-linear relations. 3. Model inferences should be done carefully, keeping in mind that that level 2 covariances are between-individual covariances. # Conclusion. This tutorial is meant to accompany the example provided in *Chapter 8: Design and Analysis of Intensive Longitudinal Studies of Distingiushable Dyads* of Bolger and Laurenceau (2013). We provided brief explanation of (1) the underlying model, (2) the code to run these models in R, and (3) the interpretation of the results. More detailed information about these (and related analyses) can be found in Bolger and Laurenceau (2013), Laurenceau and Bolger (2005), and Kenny, Kashy, and Cook (2006). ----- ### Additional Information We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful. ```{r} session_info(pkgs = c("attached")) ```