--- title: "Dyadic Multinomial Logistic Growth Tutorial" author: Miriam Brinberg output: rmdformats::robobook: html_document: default word_document: default editor_options: chunk_output_type: console --- # Overview This tutorial provides R code to conduct a simplified dyadic multinomial logistic growth curve model presented in the paper "Examining Individual Differences in How Conversation Behaviors Change Over Time: A Dyadic Multinomial Logistic Growth Modeling Approach." Several theoretical perspectives suggest that dyadic experiences are distinguished by patterns of behavioral change that emerge during interactions. Methods for examining change in behavior over time are well elaborated for the study of change along continuous dimensions. Extensions for charting increases and decreases in individuals’ use of specific, categorically defined behaviors, however, are rarely invoked. Greater accessibility of Bayesian frameworks that facilitate formulation and estimation of the requisite models is opening new possibilities. This tutorial provides code for how to implement dyadic multinomial logistic growth models to examine between-dyad differences in within-dyad behavioral change over the course of an interaction. Using turn-by-turn data from a subset of the data analyzed in the paper (specifically, 59 conversations between strangers during which one dyad member, the discloser, talked about a current problem with the other dyad member, the listener). Each speaking turn in these conversations was coded as being one of six types: acknowledgement, advice, elaboration, hedged disclosure, question, or reflection. We are specifically interested in how these six types of listeners’ and disclosers’ behaviors change as support conversations unfold. In addition, the accompanying "MultinomialLogisticGrowthCurve_Tutorial March 2023.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). # Outline In this tutorial, we'll cover... * Reading in the data and loading needed packages. * Plotting two dyads' conversation. * Plotting all the conversation sequences in one figure. * Describing the data. * Preparing the data for the dyadic multinomial logistic growth model. + Establishing a common time metric. + Reformatting to bivariate data frame. * Fitting the dyadic multinomial logistic growth model. * Plotting the model-predicted growth curves. * Conclusion. # Read in the data and load needed packages. **Let's read the data into R.** The data set we are working with is called "StrangerConversations_N59" and is stored as a .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop. ```{r} # Set working directory (i.e., where your data file is stored) # This can be done by going to the top bar of RStudio and selecting # "Session" --> "Set Working Directory" --> "Choose Directory" --> # finding the location of your file setwd("~/Desktop") # Note: You can skip this line if you have #the data file and this .rmd file stored in the same directory # Read in the data data <- read.csv(file = "StrangerConversations_N59.csv", head = TRUE, sep = ",") # View the first 10 rows of the data head(data, 10) ``` In the data, we can see each row contains information for one speaking turn and there are multiple rows (i.e., speaking turns) for each dyad. Specifically, there is a column for: * Dyad ID (`id`) * Time variable - in this case, turn in the conversation (`turn`) * Dyad member ID - in this case, role in the conversation (`role`; discloser = 1, listener = 2) * Turn type - in this case, based upon a previously derived typology (`turn_type`) **Load the R packages we need.** Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model. ```{r, message = FALSE, warning = FALSE} # install.packages("bayestestR") # Install package if you have never used it before library(bayestestR) # For bayesian probability of direction # install.packages("brms") # Install package if you have never used it before library(brms) # For bayesian model # install.packages("devtools") # Install package if you have never used it before library(devtools) # For version control # install.packages("dplyr") # Install package if you have never used it before library(dplyr) # For data management # install.packages("ggplot2") # Install package if you have never used it before library(ggplot2) # For plotting # install.packages("psych") # Install package if you have never used it before library(psych) # For descriptive statistics # install.packages("tidybayes") # Install package if you have never used it before library(tidybayes) # For bayesian analyses ``` # Plot Two Dyads' Conversation. To get a better feel for the conversation data, let's plot two dyads' conversations. **Setting the Color Palette** Before creating the plots, it is helpful to set the colors for each turn type so the color of the conversation behavior categories are consistent across plots (i.e., the number of conversation behaviors present in a given conversation does not affect the color of the turn types). We do this by creating a vector "cols" that contains color assignments (via hex code: https://www.color-hex.com/) for each utterance type. ```{r} cols <- c("Acknowledgement" = "#619CFF", "Advice" = "#FFE700", "Elaboration" = "#F8766D", "HedgedDisclosure" = "#FFA500", "Question" = "#00BA38", "Reflection" = "#DB72FB") ``` Note: To make your plots accessible, you may consider adopting a colorblind-friendly palette. David Nichols' website (https://davidmathlogic.com/colorblind/) provides a great explanation of this issue, as well as a color picking tool. We create the dyadic categorical time series plot for each exemplar dyad and save these plots to the objects "dyad105_plot" and "dyad123_plot". Dyad 105 plot. ```{r, warning = FALSE} # First partition data of interest dyad105 <- data[data$id == 105, ] dyad105_plot <- # Choose the data, set time variable (turn) for the x-axis ggplot(dyad105, aes(x = turn)) + # Create title for plot by combining "Dyad = " with the dyad id variable (id) ggtitle(paste("Dyad =", unique(dyad105$id))) + # Create bars for the category of the listeners' turns # Partition data for listeners (role = 2) geom_rect(data = dyad105[dyad105$role == 2, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn - .5, xmax = turn + .5, # Set the height of each bar to range from 0 to 5 ymin = 0, ymax = 5, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Add a horizontal line to separate bars geom_hline(yintercept = 5, color = "black") + # Create bars for the category of the disclosers' turns # Partition data for disclosers (role = 1) geom_rect(data = dyad105[dyad105$role == 1, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn - .5, xmax = turn + .5, # Set the height of each bar to range from 5 to 10 ymin = 5, ymax = 10, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Set color of conversation behaviors to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Role") + # X-axis ticks and labels scale_x_continuous(breaks = seq(0, 110, by = 10)) + # Y-axis ticks and label scale_y_continuous(breaks = c(2.5, 7.5), labels=c("Listener \n Conversation Behavior", "Discloser \n Conversation Behavior")) + # Legend label labs(fill = "Conversation Behavior") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black")) ``` Dyad 123 plot. ```{r, warning = FALSE} # First partition data of interest dyad123 <- data[data$id == 123, ] dyad123_plot <- # Choose the data, set time variable (turn) for the x-axis ggplot(dyad123, aes(x = turn)) + # Create title for plot by combining "Dyad = " with the dyad id variable (id) ggtitle(paste("Dyad =", unique(dyad123$id))) + # Create bars for the category of the listeners' turns # Partition data for listeners (role = 2) geom_rect(data = dyad123[dyad123$role == 2, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn - .5, xmax = turn + .5, # Set the height of each bar to range from 0 to 5 ymin = 0, ymax = 5, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Add a horizontal line to separate bars geom_hline(yintercept = 5, color = "black") + # Create bars for the category of the disclosers' turns # Partition data for disclosers (role = 1) geom_rect(data = dyad123[dyad123$role == 1, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn - .5, xmax = turn + .5, # Set the height of each bar to range from 5 to 10 ymin = 5, ymax = 10, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Set color of conversation behaviors to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Role") + # X-axis ticks and labels scale_x_continuous(breaks = seq(0, 110, by = 10)) + # Y-axis ticks and label scale_y_continuous(breaks = c(2.5, 7.5), labels=c("Listener \n Conversation Behavior", "Discloser \n Conversation Behavior")) + # Legend label labs(fill = "Conversation Behavior") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black")) ``` Print the plots we just created. ```{r} print(dyad105_plot) print(dyad123_plot) ``` On the x-axis, we have turn in the conversation. On the y-axis, we have the conversation behaviors for the disclosers on the top half and the listeners on the bottom half. Each conversation behavior category is represented by a different color and the gray bars indicate when a particular dyad member is not speaking. We can see that Dyad 105 had greater back-and-forth exchange during their conversation, as indicated by the greater number of turns. In both dyads, we can see that the disclosers spent many of their turns elaborating on their problem (red) and the listener used a variety of different turn types. # Plot All Conversation Sequences in One Figure. Next, we create a single figure that depicts all dyads' conversation behaviors over time. Each tiny row in the figure represents one conversation. First, we create a variable that assigns unique consecutive integers to each dyad, which will help with plotting. ```{r} data <- # Select data data %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Create new variable (plot_id) that relabels the id variable with consecutive numbers dplyr::mutate(plot_id = cur_group_id()) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` Plot all of the conversation sequences. ```{r} all_convos <- # Select data ggplot(data = data) + geom_rect(aes(# Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) xmin = turn - 0.5, xmax = turn + 0.5, # Set the height of each bar as -0.4 and +0.4 the value of the new id variable (plot_id) ymin = plot_id - .4, ymax = plot_id + .4, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Set color of conversation behaviors to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Dyad") + # X-axis ticks and labels scale_x_continuous(breaks=seq(0, 150, by = 10)) + # Y-axis ticks and labels scale_y_continuous(breaks=c(0, 120)) + # Legend label labs(fill = "Conversation Behavior") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black"), axis.text.y = element_blank()) # View plot all_convos ``` In this plot, we can see that the use and timing of conversation behaviors varied between dyads and that the "length" of conversations as measured by the number of speaking turns was quite different between dyads. # Describe the Data. The goal of this step is to describe our sample, specifically, (1) how many dyads are in the data set, (2) how many conversation turns there are for each dyad, (3) the frequency of each conversation behavior across all dyads, and (4) the frequency of each conversation behavior for each role across all dyads. 1. Number of dyads. ```{r, warnings = FALSE} # Number of dyads in the repeated measures data # Length (i.e., number) of unique ID values length(unique(data$id)) ``` There are 59 dyads in the data set. 2. Number of conversation turns for each dyad. ```{r, message = FALSE} num_occ <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) group_by(id) %>% # Count the number of turns in each conversation dplyr::summarise(count = n()) %>% # Save the data as a data.frame as.data.frame() # Calculate descriptives on the number of turns per conversation describe(num_occ$count) ``` The dyads in this subset of the data had supportive conversations that had, on average, approximately 88 turns (*M* = 87.71, *SD* = 19.69), with the conversations ranging in length from 60 to 147 turns. Plot a histogram of the number of turns per conversation. ```{r} # Select data (num_occ) and value on the x-axis (number of turns per conversation: "count") ggplot(data = num_occ, aes(x = count)) + # Create a histogram with binwidth = 5 and white bars outlined in black geom_histogram(binwidth = 5, fill = "white", color = "black") + # Label x-axis labs(x = "Number of Turns per Conversation") + # Change background aesthetics of plot theme_classic() ``` 3. The number of total turns for each conversation behavior. ```{r} # Create table that calculates the number of turns for each conversation behavior turntype_table <- table(data$turn_type) # Display the table turntype_table prop.table(table(data$turn_type)) ``` We can see that *elaboration* behaviors were used the most frequently (2,422 turns; 47%), while *advice* turns were used the least frequently (72 turns; 1%). 4. The number of total turns for each conversation behavior by role. ```{r} # Calculate the frequency and proportion of each conversation behavior by role role_turntable <- # Select the data data %>% # Select grouping variable, in this case, dyad member role (role) dplyr::group_by(role) %>% # Count the number of conversation behaviors for each role dplyr::count(turn_type) %>% # Create new variables in which # the frequency of each conversation behavior is calculated (freq) # the proportion of each conversation behavior is calculated (prop) dplyr::mutate(freq = table(n), prop = prop.table(n)) %>% # Save the data as a data.frame as.data.frame() # View number of proportion of conversation behaviors by role role_turntable ``` For disclosers (role = 1), we can see that *elaboration* behaviors were used the most frequently (1,845 turns; 71%), while *advice* turns were used the least frequently (18 turns; 1%). For listeners (role = 2), we can see that *acknowledgement* behaviors were used the most frequently (1,014 turns; 39%), while *advice* turns were used the least frequently (54 turns; 2%). # Prepare the Data for the Dyadic Multinomial Logistic Growth Model. ## Establish a Common Time Metric. To enable the comparison of conversations that differed in the number of turns, we need to create a new time metric. Here, we represent the timing of conversation behaviors as the relative proportion at which the conversation behavior occurred in the conversation. The beginning of the conversation is represented by 0 and the end of the conversation is represented by 1. First, we create a new time variable that represents the proportion of time in each conversation. ```{r} data <- # Select data data %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Create new variable (time_prop) in which each turn is divided by the total number of turns in the conversation dplyr::mutate(time_prop = turn/max(turn)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` Next, we need to create a lag variable that will act as the lower bound of the time interval. So, each conversation behavior will "occur" between two time points, the lower bound being the lag variable created here and the upper bound being the "time_prop" variable created above. ```{r} data <- # Select data data %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Create new variable (time_prop_lag) that shifts the "time_prop" variable up one row dplyr::mutate(time_prop_lag = lag(time_prop), # Set the first row of each "time_prop_lag" variable to 0 time_prop_lag = ifelse(row_number()==1, 0, time_prop_lag)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` To see how the conversations are now on the same time metric, we can plot the conversation sequences again with our new time variable. ```{r} prop_convos <- # Select data ggplot(data = data) + geom_rect(aes(# Set the width of each bar as the time_prop_lag (lower bound) and time_prop (upper bound) xmin = time_prop_lag, xmax = time_prop, # Set the height of each bar as -0.4 and +0.4 the value of the new id variable (plot_id) ymin = plot_id - .4, ymax = plot_id + .4, # Set the color of each bar to correspond to each conversation behavior fill = turn_type)) + # Set color of conversation behaviors to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Dyad") + # X-axis ticks and labels scale_x_continuous(breaks=seq(0, 1, by = .1)) + # Y-axis ticks and labels scale_y_continuous(breaks=c(0, 120)) + # Legend label labs(fill = "Conversation Behavior") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black"), axis.text.y = element_blank()) # View plot prop_convos ``` Compared to the initial plot, all conversations are now the same length after rescaling the speaking turn variable to represent the proportion of time in the conversation. # Reformat to Bivariate Data Frame. The last step of our data preparation involves (a) identifying and setting a reference category and (b) reformatting the data so that each time point represents a turn pair - i.e., that we have conversation behavior information for both listeners and disclosers simultaneously. First, we want to assign the reference category in our dyadic multinomial logistic growth model. Since we believe listeners will spend a majority of their time using acknowledgments in response to the disclosers' sharing about their stressor, we set the "acknowledgement" turn as the reference category. To ensure this category is the reference category, we list the "acknowledgement" category as the first level within the variable. The other turn types can be listed in any order. ```{r} # Select data and variable to re-order # List reference variable first levels(data$turn_type) <- c("Acknowledgement", "Advice", "Elaboration", "HedgedDisclosure", "Question", "Reflection") ``` Next, we want to reformat our data set so that we have a column for listeners' turns and a column for disclosers' turns. Although there may be many ways to do this, we have chosen to split the data into two data sets (one for listeners and one for disclosers), create a new time variable in each data set for the sake of merging the data sets, and then merge the data sets back together. First, we create two data sets: one for listeners and one for disclosers. ```{r} # Select all rows in data that are labeled "Listener" (i.e., role == 2) listener_growth <- data[data$role == 2, ] # Select all rows in data that are labeled "Discloser" (i.e., role == 1) discloser_growth <- data[data$role == 1, ] ``` Second, we create a new time variable that will help us align turn pairs when we merge the listener and discloser data sets. ```{r} listener_growth <- # Select data set listener_growth %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Create new variable (newturn) that counts the rows from 1:n for each listener dplyr::mutate(newturn = row_number()) %>% # Subset columns of interest dplyr::select(id, newturn, time_prop_lag, time_prop, turn_type) %>% # Save the data as a data.frame as.data.frame() discloser_growth <- # Select data set discloser_growth %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Create new variable (newturn) that counts the rows from 1:n for each discloser dplyr::mutate(newturn = row_number()) %>% # Subset columns of interest dplyr::select(id, newturn, time_prop_lag, time_prop, turn_type) %>% # Save the data as a data.frame as.data.frame() ``` We next need to relabel a few columns so we are able to distinguish the listener and discloser columns after the merge. ```{r} # Rename a few listener columns colnames(listener_growth)[3:5] <- c("list_time_prop_lag", "list_time_prop", "list_turn_type") # View the first 10 rows of listener_growth head(listener_growth, 10) # Rename a few discloser columns colnames(discloser_growth)[3:5] <- c("disc_time_prop_lag", "disc_time_prop", "disc_turn_type") # View the first 10 rows of discloser_growth head(discloser_growth, 10) ``` Finally, we merge the listener and discloser data sets back together by matching dyad IDs (id) and our new turn variable (newturn). ```{r} dyad_growth <- # Select two data sets to merge merge(listener_growth, discloser_growth, # Identify variables to merge and connect data sets by = c("id", "newturn")) # Reorder rows for convenience by dyad ID (id) and our new turn variable (newturn) dyad_growth <- dyad_growth[order(dyad_growth$id, dyad_growth$newturn), ] # View the first 10 rows of dyad_growth head(dyad_growth, 10) ``` We then just have a few odds and ends to take care of before we are ready to fit the model! We need to determine the minimum time proportion between the listener and discloser in each conversation - i.e., the person who spoke first. We will use this as the time variable in the growth model to represent the timing of turn pairs. ```{r} dyad_growth <- # Select data dyad_growth %>% # Select grouping variable, in this case, dyad (id) dplyr::group_by(id) %>% # Compare columns in each row rowwise() %>% # Create new variable (pair_time_prop) that selects the first turn # based on the lowest value of time (time_prop_lag) mutate(pair_time_prop = min(list_time_prop_lag, disc_time_prop_lag)) %>% # Save the data as a data.frame as.data.frame() ``` Finally, we need to reformat some of the variables to make sure they have the right structure. Specifically, we want to make the dyad ID variable (id) and conversation behaviors variables (list_turn_type, disc_turn_type) into factor variables. A factor variable makes sure R interprets the variables as categories instead of integers (like how the `id` variable id currently coded). ```{r} data$id <- as.factor(data$id) dyad_growth$list_turn_type <- as.factor(dyad_growth$list_turn_type) dyad_growth$disc_turn_type <- as.factor(dyad_growth$disc_turn_type) ``` # Fit Dyadic Multinomial Logistic Growth Model. The last step of this process involves fitting the dyadic multinomial logistic growth model. Before we fit the model, we specify our model in the form of a formula. Here, we are specifying a dyadic model in which listener and discloser conversation behaviors (list_turn_type, disc_turn_type) are simultaneously predicted by time in the conversation (pair_time_prop), where we allow for random intercepts (1) and slopes (pair_time_prop) for each dyad (id). ```{r} str(dyad_growth$list_turn_type) # Save model formula model_formula <- bf(# Set multivariate outcome for listeners' and disclosers' turns mvbind(list_turn_type, disc_turn_type) ~ # Predicted by time in the conversation (pair_time_prop) # Random intercepts and slopes for each dyad (id) pair_time_prop + (1 + pair_time_prop|p|id), # Set link function family = "categorical") # Examine prior settings of model parameters (myprior <- get_prior(model_formula, data = dyad_growth, family = "categorical")) ``` We usee the default mildly informative priors available in the program brms (Bürkner, 2017), which include flat priors for the intercepts and time/growth parameters (i.e., gammas), half student t distribution priors with three degrees of freedom, a mean of zero, and (for our data) standard deviations of 2.5, and Lewandowski-Kurowicka-Joe (LKJ) distribution priors for the correlations among random effects. Fit the model. As described in the paper that accompanies this tutorial, fitting a model using a Bayesian approach requires multiple decisions, including establishing the MCMC sampling procedure. Researchers must determine how samples of the parameter estimates will be selected from the iterative computations of the probability distribution. Specifically, choices include length of the burn-in period, number of sampling iterations, and the thinning rate. The burn-in period refers to the number of initial iterations that are used to hone in on a reasonable estimate before determining the estimated posterior distribution. Recent literature suggests that at least 10,000 iterations should be used in the burn-in period (Depaoli & van de Schoot, 2017). Next, the number of sampling iterations following the burn-in period needs to be chosen. Prior work suggests the number of iterations should be large enough to achieve an effective sample size (ESS; to be described in more depth below) of at least 1,000 (Zitzmann & Hecht, 2019), but others have recommended at least 10,000 iterations regardless of the number of iterations needed to achieve a reasonable ESS (Depaoli & van de Schoot, 2017). Generally, the number of iterations should be large enough that the model can obtain stable estimates of the model parameters without wasting computational time. The number of iterations needed will depend on the specific model and data being used. Finally, a thinning rate needs to be chosen. A thinning process helps avoid potential autocorrelation of parameter estimates that may manifest in the sampling procedure. By not retaining all parameter estimates (e.g., only retaining every other parameter estimate or every fifth parameter estimate), the final sampling distribution is less likely to be influenced by any autocorrelation of estimates. Some prior work has shown that thinning can reduce the precision of parameter distributions, so should only be used when computational memory is an issue (Link & Eaton, 2012). In practice, estimation is often pre-tested using a relatively small number of burn-in and estimation iterations to make sure that the model is programmed and running correctly. More extended computational time is then invested to obtain actual modeling results and output. Furthermore, after considering the interpretive relevance of each parameter (using the credible intervals and probability of direction), the parameters of the dyadic multinomial logistic growth model provide information about whether and how the predictors are related to the expected log of the odds ratio of each behavior, relative to a reference category. These parameters are often much easier to interpret after back-transformation into a probability metric using the inverse link function. Specifically, exponentiated model parameters provide for informative description of the model implied baselines and trajectories of change. Finally, visualizations of the model-implied trajectories provide for intuitive understanding of the (nonlinear) change in the observed interaction behaviors, reporting of findings, and interpretation with respect to the study hypotheses. For the sake of demonstration, we fit a model with a much shorter sampling procedure - i.e., fewer iterations and a shorter burn-in period. Please adjust the MCMC sampling values based on current recommendations (e.g., Depaoli & van de Schoot, 2017) for the sake of reporting model results. In the paper that accompanies this tutorial, and in line with current guidelines (e.g., Depaoli & van de Schoot, 2017), MCMC estimation was done using 20,000 iterations, with a 10,000 iteration burn-in period, 4 chains, and a thinning rate of 5. ```{r} model <- brm(# Select model formula model_formula, # Select data set data = dyad_growth, # Set the number of sampling chains chains = 2, # Set the number of cores to use when executing chains #cores = getOption("mc.cores", 1), # Set the number of warmup iterations warmup = 500, # Set the thinning rate thin = 1, # Set the number of iterations iter = 1000, # Set the seed for sampling seed = 1234 ) # Examine model output summary(model) ``` Based on the fixed (i.e., population-level) effects, we see that listeners' advice, elaboration, hedged disclosure, and reflection conversation behaviors were all more likely at the beginning of the conversation as compared to acknowledgements. At the beginning of the conversation, disclosers were less likely to provide advice, ask questions, and reflect and more likely to use elaborations and hedged disclosures as compared to acknowledgements. Across the conversation, listeners' use of advice, elaborations, hedged disclosures, and reflections increased as compared to acknowledgements and disclosers' use of elaborations and hedged disclosures decreased as compared to acknowledgements. As described in the accompanying paper, inference about the parameter estimates in the Bayesian framework makes use of 95% credible intervals (instead of confidence intervals) and transformation into relative risk ratios (given the multinomial nature of the outcome variable). Another useful metric for interpretation is the probability of direction, which is the probability that a parameter (based upon its posterior distribution) is positive or negative, that is, different from zero (Makowski et al., 2019). The probability of direction ranges from 50% to 100%, with 50% indicating that the parameter is equally likely to be positive or negative and 100% indicating the parameter is very likely to be either positive or negative. Please see the accompanying paper for a complete exemplar write-up of these results. View model convergence plots. ```{r} plot(model) ``` The MCMC chains for the model parameters did not show any noticeable trends and indeed looked like “hairy caterpillars” (Roy, 2020), so we can infer successful model convergence. Examine value of Widely Applicable Information Criterion (WAIC). ```{r} model_waic <- waic(model) model_waic ``` The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred to as relative risk (and it is sometimes referred to as odds). The relative risk ratios are determined by exponentiating the fixed effects model parameters. We can exponentiate the coefficients from our model to see these risk ratios. ```{r} # Obtain fixed effects object fe <- fixef(model, summary = TRUE) # Exponentiate the fixed effects estimates (from selected column) and round to two decimal places exp_fe <- round(exp(fe[,-2]), 2) exp_fe ``` Finally, we can examine the probability of direction, which indicates the probability that a parameter is positive or negative (i.e., different from zero). ```{r} pdirect <- p_direction(model, effects = "all", component = "all", method = "direct", null = 0) print(pdirect) ``` *Note:* this only provides the probability of direction for only the random effects. This seems to be because the p_direction() function cannot automatically parse a brms object that has both a multivariate outcome and a categorical outcome. So, we must calculate the probability of direction for the fixed effects "by hand." ```{r} summary(model) # Extract the posterior predictions from the model object post <- as_draws_df(model) # Subset the fixed effects, which happen to be the first 20 columns post_fixed <- post[ , 1:20] # Create a function for calculating the probability of direction mypdirect <- function(x) { p_direction <- ifelse(sign(median(x)) == -1, mean(x < 0), mean(x >= 0)) p_direction } # Calculate the probability of direction of the fixed effects post_pdirect <- # Select data set post_fixed %>% # For each of the 20 columns (i.e., fixed effects), apply the mypdirect function dplyr::summarise(across(1:20, mypdirect)) %>% # Save the data as a data.frame as.data.frame() # Display results t(post_pdirect) ``` Please see the brief description above or the accompanying paper for additional information about the probability of direction. # Plotting the Model-Predicted Growth Curves. As a last step, we plot the model-predicted trajectories of changes in conversation behaviors to gain an intuitive understanding of the (nonlinear) change in the observed behaviors. Before we plot the data, we establish our color palette by assigning each conversation behavior a color. ```{r} # Set colors cols <- c("Acknowledgement" = "#619CFF", "Advice" = "#FFE700", "Elaboration" = "#F8766D", "HedgedDisclosure" = "#FFA500", "Question" = "#00BA38", "Reflection" = "#DB72FB") ``` Next, we calculate the model-predicted trajectories and uncertainty around those estimates for listeners and disclosers. ```{r} # Calculate the predicted scores (and uncertainty) predplotdata <- conditional_effects(model, categorical = TRUE) # Extract predictions for listeners and disclosers into separate data sets # Select listener values from list within predplotdata listenerplotdata <- as.data.frame(predplotdata[['listturntype.listturntype_pair_time_prop:cats__']]) # Select discloser values from list within predplotdata discloserplotdata <- as.data.frame(predplotdata[['discturntype.discturntype_pair_time_prop:cats__']]) ``` Finally, we plot the model-predicted trajectories of conversation behavior change for listeners and disclosers separately. These trajectories were all predicted from the same model, but we display them separately here due to the number of variables (6 conversation behaviors * 2 dyad members = 12 trajectories) that may be difficult to visually examine in a single plot. ```{r} # Listener plot listener_plot <- # Choose the data (listenerplotdata) ggplot(listenerplotdata) + # Create title for plot ggtitle("Probability of Listener Conversation Behavior Use") + # Create credible intervals # Set the value of the time variable (pair_time_prop) on the x-axis # Set the width of the credible intervals, with lower bounds (lower__) and upper bounds (upper__) geom_ribbon(aes(x = pair_time_prop, ymin = lower__, ymax = upper__, # Set the color of the intervals to match the conversation behavior category (cats__) colors # Set the transparency of the intervals with alpha fill = as.factor(cats__)), alpha = 0.3) + # Create the trajectories # Set the value of the time variable (pair_time_prop) on the x-axis # Set the predicted value (estimate__) on the y-axis # Set the color of the trajectory to the conversation behavior category (cats__) geom_line(aes(x = pair_time_prop, y = estimate__, color = cats__, group = as.factor(cats__)), size = 1) + # Set the color of the trajectory and intervals to vector we created earlier (cols) scale_fill_manual(values = cols, name = "Conversation Behavior") + scale_color_manual(values = cols, name = "Conversation Behavior") + # Set tick marks on x- and y-axes scale_y_continuous(limits = c(0, 1), breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)) + scale_x_continuous(limits = c(0, 1), breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)) + # Label for x-axis xlab("Conversation Time (Proportion)") + # Label for y-axis ylab("Probability of Conversation Behavior") + # Plot aesthetics theme_classic() # Discloser plot discloser_plot <- # Choose the data (discloserplotdata) ggplot(discloserplotdata) + # Create title for plot ggtitle("Probability of Discloser Conversation Behavior Use") + # Create credible intervals # Set the value of the time variable (pair_time_prop) on the x-axis # Set the width of the credible intervals, with lower bounds (lower__) and upper bounds (upper__) geom_ribbon(aes(x = pair_time_prop, ymin = lower__, ymax = upper__, # Set the color of the intervals to match the conversation behavior category (cats__) colors # Set the transparency of the intervals with alpha fill = as.factor(cats__)), alpha = 0.3) + # Create the trajectories # Set the value of the time variable (pair_time_prop) on the x-axis # Set the predicted value (estimate__) on the y-axis # Set the color of the trajectory to the conversation behavior category (cats__) geom_line(aes(x = pair_time_prop, y = estimate__, color = cats__, group = as.factor(cats__)), size = 1) + # Set the color of the trajectory and intervals to vector we created earlier (cols) scale_fill_manual(values = cols, name = "Conversation Behavior") + scale_color_manual(values = cols, name = "Conversation Behavior") + # Set tick marks on x- and y-axes scale_y_continuous(limits = c(0, 1), breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)) + scale_x_continuous(limits = c(0, 1), breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)) + # Label for x-axis xlab("Conversation Time (Proportion)") + # Label for y-axis ylab("Probability of Conversation Behavior") + # Plot aesthetics theme_classic() ``` Print the plots we just created. ```{r} print(listener_plot) print(discloser_plot) ``` In both of these plots, the x-axis represents conversation time as a proportion, with the left representing the beginning of the conversation (conversation time = 0.0) and the right representing the end of the conversation (conversation time = 1.0), and the y-axis represents the probability of a conversation behavior occurring. In the plot of the changes in listeners' conversation behaviors, we can see that the use of acknowledgements (blue) decreases and the use of elaborations increases across the conversation. In the plot of the changes in disclosers' conversation behaviors, we can see that the use of elaborations (red) is generally high, the use of acknowledgements (blue) increases, and the use of hedged disclosures (orange) decreases across the conversation. # Conclusion. We hope this tutorial increases the accessibility of methods that can be applied to longitudinal categorical time series to examine how behaviors during dyadic interactions change over time. We encourage researchers to read the accompanying paper to obtain a greater understanding of the specific considerations (e.g., rescaling the time variable, selection of change functions) when fitting dyadic multinomial logistic growth models. ----- ### Additional Information We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial may be helpful. ```{r} session_info(pkgs = c("attached")) ```