--- title: "Actor-Partner Interdependence Model (APIM)" output: rmdformats::robobook: html_document: default word_document: default editor_options: chunk_output_type: console --- # Overview This tutorial reviews the Actor-Partner Interdependence Model (APIM; Kashy & Kenny, 2000; Kenny, Kashy, & Cook, 2006), which is often used to examine the association (1) between two constructs for two people using cross-sectional data, or (2) between the same construct from two people across two time points. In this tutorial, we are going to examine the association between verbal and performance ability using measures from first grade and sixth grade. We are interested in simultaneously examining whether (1) verbal ability in the first grade is predictive of verbal ability in the sixth grade, (2) performance ability in the first grade is predictive of performance ability in the sixth grade, (3) verbal ability in the first grade is predictive of performance ability in the sixth grade, and (4) performance ability in the first grade is predictive of verbal ability in the sixth grade. When working with people, the above points 1 and 2 are often referred to as "actor effects" and points 3 and 4 are often referred to as "partner effects." While this example is not a "traditional" dyad - i.e., two distinguishable people - the analytic processes demonstrated here are applicable to the examination of any bivariate relationship. In addition, the accompanying "APIM_Tutorial_2022August20.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). # Outline In this tutorial, we'll cover... * Reading in the data and loading needed packages. * Descriptive statistics for dyadic data. * Dyadic data preparation. * APIM model using `nlme` package. * Other resources. # Read in the data and load needed packages. **Let's read the data into R.** The data set ("wisc3raw_gender") we are working with contains repeated measures of different assessments from children during grades 1, 2, 4, and 6. The data set is stored as .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop. ```{r} # Set working directory (i.e., where your data file is stored) # This can be done by going to the top bar of RStudio and selecting # "Session" --> "Set Working Directory" --> "Choose Directory" --> # finding the location of your file setwd("~/Desktop") # Note: You can skip this line if you have #the data file and this .rmd file stored in the same directory # Read in the repeated measures data data <- read.csv(file = "wisc3raw_gender.csv", head = TRUE, sep = ",") # View the first 10 rows of the repeated measures data head(data, 10) ``` Subset the data to variables of interest. ```{r} # Subset to variables of interest data <- data[, c("id", "verb1", "verb6", "perfo1", "perfo6")] # View the first 10 rows of the data head(data, 10) ``` In the data, we can see each row contains information for one child and the multiple time points are contained in the columns. In this data set, there are columns for: * Child ID (`id`) * Child's verbal score during first grade (`verb1`) * Child's verbal score during sixth grade (`verb6`) * Child's performance score during first grade (`perfo1`) * Child's performance score during sxith grade (`perfo6`) **Load the R packages we need.** Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model. ```{r, warning = FALSE, message = FALSE} # install.packages("ggplot2") # Install package if you have never used it before library(ggplot2) # For plotting # install.packages("devtools") # Install package if you have never used it before require(devtools) # For version control # install.packages("nlme") # Install package if you have never used it before library(nlme) # For APIM # install.packages("psych") # Install package if you have never used it before library(psych) # For descriptive statistics # install.packages("reshape") # Install package if you have never used it before library(reshape) # For reshaping the data (long to wide) ``` Before diving into the data, we will make a long version (i.e., each repeated measure has its own row) of the data set for later use. ```{r} data_long <- reshape(# Select data set data = data, # Identify repeated measures variables varying = c("verb1", "verb6", "perfo1", "perfo6"), # Create new variable that represents time timevar = c("grade"), # Identify child ID variable idvar = c("id"), # Note direction of data reformat direction = "long", # No spaces in new column names sep="") # For easy viewing - reorder by id and grade data_long <- data_long[order(data_long$id, data_long$grade), ] # View the first 10 rows of the repeated measures data head(data_long, 10) ``` Note how each time point (i.e., grades 1 and 6) now have their own row for each child. # Descriptive Statistics for Dyadic Data. Before we run our models, it is useful to become familiar with the data via plotting and descriptive statistics. Let's begin with descriptive statistics of our four variables of interest: first grade verbal and performance ability, and sixth grade verbal and performance ability. ```{r} describe(data$verb1) describe(data$verb6) describe(data$perfo1) describe(data$perfo6) ``` We can see that both the mean and standard deviation of verbal and performance ability increase from first to sixth grade. While this is worth noting, the APIM will not be examining changes in mean differences of verbal and performance ability. Next, we'll plot the distributions of each of these variables as well. ```{r, message = FALSE} ggplot(# Select data set and variable to plot data = data, aes(x = verb1)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Verbal Ability Grade 1") + # Plot aesthetics theme_classic() ggplot(# Select data set and variable to plot data = data, aes(x = verb6)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Verbal Ability Grade 6") + # Plot aesthetics theme_classic() ggplot(# Select data set and variable to plot data = data, aes(x = perfo1)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Performance Ability Grade 1") + # Plot aesthetics theme_classic() ggplot(# Select data set and variable to plot data = data, aes(x = perfo6)) + # Create histogram of selected variable and # set color of histogram bar geom_histogram(fill = "white", color = "black") + # Label x-axis of histogram labs(x = "Performance Ability Grade 6") + # Plot aesthetics theme_classic() ``` Next, let's examine the association (i.e., rank order stability) among each variable of interest using correlations and a plot. ```{r} # Correlations cor(data[, 2:5]) #plot pairs.panels(data[, c("verb1", "verb6", "perfo1", "perfo6")]) ``` We can see there are strong, positive associations both across time and constructs. # Dyadic Data Preparation. We have already manipulated the data from "wide" to "long." Dyadic/bivariate analyses require further manipulation in order to get the data in the correct format for our analyses. We will walk through the data prep in two steps. First, we need to create one column that has the information for both outcome variables - i.e., for each person, the verb6 and perfo6 values will alternate. This is almost like repeated measures data, but instead of having multiple time points nested within person, we have multiple (two) variables nested within person. ```{r} data_melt <- reshape::melt(# Select data set data = data, # Identify columns that we want to remain the same, # that is, the columns that we don't want "long" id.vars = c("id", "verb1", "perfo1"), # Do not remove missing data na.rm=FALSE) # View the first 10 rows of the data head(data_melt, 10) ``` A little more data management on our newly created data set. ```{r} # Rename "variable" and "value" variables to "grade6_variable" and "grade6_outcome" colnames(data_melt)[4:5] <- c("grade6_variable", "grade6_outcome") # Re-order for convenience data_melt <- data_melt[order(data_melt$id, data_melt$grade6_variable), ] # View the first 10 rows of the data head(data_melt, 10) ``` Second, we need to create two dummy variables (each 0/1) that will be useful in our analyses to "turn on/off" a row (more on this later). We will create one column that assigns the first row of the double entry data to 1, and we'll call this "verb_on." We will create another column that assigns the second row of the double entry data to 1, and we'll call this "perform_on." ```{r} # Create new variable ("verb_on") that repeats the sequence 1 0 # half the length of the data set (since 2 * half of the rows = all rows) data_melt$verb_on <- rep(c(1,0), times = (nrow(data_melt)/2)) # Create new variable ("perform_on") that repeats the sequence 0 1 # half the length of the data set (since 2 * half of the rows = all rows) data_melt$perform_on <- rep(c(0,1), times = (nrow(data_melt)/2)) # View the first 10 rows of the data head(data_melt, 10) ``` Please note that this data preparation is probably *not* the most elegant way to organize the data. There are alternative ways one could prepare your data (https://github.com/RandiLGarcia/2day-dyad-workshop/blob/master/Day%201/R%20Code/Day%201-Data%20Restructuring.Rmd), but it will depend on how you choose to run your analysis (described further later). # APIM using `nlme` package. Now that we know a bit more about the data we are working with and have the data prepared in an usable format, we can set up our APIM model. We'll run this model in the `nlme` package. Specifically, we'll examine whether: * verbal ability in the first grade is predictive of verbal ability in the sixth grade (verbal "actor" effect), * performance ability in the first grade is predictive of performance ability in the sixth grade (performance "actor" effect), * verbal ability in the first grade is predictive of performance ability in the sixth grade (verbal "partner" effect), and * performance ability in the first grade is predictive of verbal ability in the sixth grade (performance "partner" effect). Before running this full model, we will examine the empty model to determine how much variability there is within- and between-persons. Specifically, $$Grade6Outcome_{i} = \beta_{0V}VerbOn_{it} + \beta_{0P}PerformOn_{it} + e_{Vi} +e_{Pi}$$ Empty model. ```{r} apim_empty <- gls(# The outcome variable (grade6_outcome) is regressed onto # no intercept (-1) since we separately estimate intercepts # for the two variables with dummy coded variables, specifically # verb_on and perform_on grade6_outcome ~ -1 + verb_on + perform_on, # Select data set data = data_melt, # Set correlation structure, in this case, # compound symmetry within each individual correlation = corCompSymm(form=~1|id), # Set the weights of the variances, # allowing for differences between # variables' error terms weights = varIdent(form=~1|verb_on), # Exclude rows with missing data na.action = na.exclude) # Examine the model summary summary(apim_empty) ``` We examine the correlation of the verbal and performance error terms to determine the degree of non-independence in the data. We can see that Rho = 0.61, indicating the correlation across constructs, such that children who have higher verbal ability have higher performance ability. Other things to note in this output... * The results for "verb_on" indicate the average or expected verbal score at grade 6 is 43.75. * The results for "perform_on" indicate the average or expected performance score at grade 6 is 50.93. * Both of these expected values correspond to their respective averages in the raw data. * The verbal and performance scores each have their own estimated error values. The estimated standard error for verbal scores is 10.67 and the estimated standard error for performance scores is 12.48 (1.17*10.67). Next, we are going to run our full APIM model using the two-intercept approach. Specifically, $$\begin{aligned} Grade6Outcome_{i} = &\beta_{0V}VerbOn_{it} + \beta_{1V}VerbOn_{it}*Verb1_{it} + \beta_{2V}VerbOn_{it}*Perform1_{it} \\ &+ \beta_{0P}PerformOn_{it} + \beta_{1P}PerformOn_{it}*Perform1_{it} \\ &+ \beta_{2P}PerformOn_{it}*Verb1_{it}+ e_{Vi} +e_{Pi} \end{aligned}$$ So when "verb_on" is equal to 0: $$\begin{aligned} Grade6Outcome_{i} = &\beta_{0P}PerformOn_{it} + \beta_{1P}PerformOn_{it}*Perform1_{it} \\ &+\beta_{2P}PerformOn_{it}*Verb1_{it} + e_{Vi} +e_{Pi} \end{aligned}$$ and when "perform_on" is equal to 0: $$\begin{aligned} Grade6Outcome_{it} = &\beta_{0V}VerbOn_{it} + \beta_{1V}VerbOn_{it}*Verb1_{it} + \beta_{2V}VerbOn_{it}*Perform1_{it}\\ &+ e_{Vi} +e_{Pi} \end{aligned}$$ Full model. ```{r} apim_full <- gls(# The outcome variable (grade6_outcome) is regressed onto # no intercept (-1) since we separately estimate intercepts # for the two variables with dummy coded variables, specifically # verb_on and perform_on and # actor and partner effects as indicated by # the interaction terms of the variable name and dummy code grade6_outcome ~ -1 + verb_on + perform_on + verb1:verb_on + # verbal "actor" effect perfo1:perform_on + # performance "actor" effect verb1:perform_on + # verbal "partner" effect perfo1:verb_on, # performance "partner" effect # Select data set data = data_melt, # Set correlation structure, in this case, # compound symmetry within each individual correlation = corCompSymm(form=~1|id), # Set the weights of the variances, # allowing for differences between # variables' error terms weights = varIdent(form=~1|verb_on), # Exclude rows with missing data na.action = na.exclude) # Examine the model summary summary(apim_full) ``` Let's interpret the results! * The expected verbal score at grade 6 = 19.87 and the expected performance score at grade 6 = 30.05 when verbal and performance scores at grade 1 are equal to zero. * Actor effects: + The "actor effect" of verbal ability is 0.81, indicating that a child's verbal ability increases by 0.81 points for every additional point in their grade 1 verbal ability score. + The "actor effect" of performance ability is 0.96, indicating that a child's performance ability increases by 0.96 points for every additional point in their grade 1 performance ability score. * Partner effects: + The "partner effect" of performance ability on verbal ability is 0.45, indicating that a child's verbal ability increases by 0.45 points for every additional point in their grade 1 performance ability score. + The "partner effect" of verbal ability on performance ability is not significant, indicating that a child's performance ability at grade 1 is not associated with their verbal ability at grade 6. * Other things to note: + Rho = 0.31, indicates the correlation across constructs, such that children who have higher verbal ability have higher performance ability. + The verbal and performance scores each have their own estimated error values. The estimated standard error for verbal scores is 7.55 and the estimated standard error for performance scores is 8.98 (1.19*7.55). # Other Resources. We've walked through one way of running an APIM model in R - however, there are alternative ways of doing so. Here are a few resources if you'd like to learn more about running an APIM model or dyadic data analyses in general. * David Kenny's website, where he and his colleagues have created some useful shiny apps for running dyadic analyses in R: http://davidakenny.net/DyadR/DyadRweb.htm * Randi Garcia's github page: https://github.com/RandiLGarcia ----- ### Additional Information We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful. ```{r} session_info(pkgs = c("attached")) ```