--- title: "State Space Grid Metric Tutorial" author: Miriam Brinberg output: rmdformats::robobook: html_document: default word_document: default editor_options: chunk_output_type: console --- # Overview This tutorial will provide R code on establishing a state space grid and deriving several metrics. State space grids are typically used to depict and quantify longitudinal dyadic (or bivariate) data in a 2-dimensional space (Hollenstein, 2013). In this example, we plot the speaking turn behaviors enacted by each dyad member during a support conversation in which one dyad member shared a current problem they were facing. Each turn in the conversations was coded into one of six turn types e.g., as an acknowledgement, question, advice, etc.). Here, we demonstrate how to calculate three metrics illustrated in the accompanying paper ("Using State Space Grids to Quantify and Examine Dynamics of Dyadic Conversation" by Brinberg, Solomon, Bodie, Jones, & Ram in *Communication Methods and Measures*): + *Entropy* as a measure of behavioral flexibility + *Time using problem description behaviors* as a measure of an attractor + *Time of exit from the problem description attractor* as a measure of a phase shift We also demonstrate how to examine the association between the state space grid-derived metrics and a conversational outcome (in this case, distress) using multiple regression. Note that the accompanying "SSGMetric_Tutorial.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). This file can be used so you don't have to copy and paste code or so you can easily rename the variables in the current code. Thank you to Jon Helm for writing the original state space grid plotting and entropy code. # Outline In this tutorial, we'll cover... * Reading in the data and loading needed packages. * General data preparation. * Entropy. * Problem description attractor. * Attractor exit time. * SSG metrics and between-dyad differences. # Read in the data and load needed libraries. Let's read the data into R. The exemplar data set we are working with is called "StrangerConversations_N59" and is stored as a .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop. ```{r} # Set working directory (i.e., where your data file is stored) # This can be done by going to the top bar of RStudio and selecting "Session" --> "Set Working Directory" --> "Choose Directory" --> finding the location of your file setwd("~/Desktop") # Read in the repeated measures data data <- read.csv(file = "StrangerConversations_N59.csv", head = TRUE, sep = ",") # View the first 10 rows of the repeated measures data head(data, 10) # Read in the outcomes data outcomes <- read.csv(file = "StrangerConversations_N59_Outcomes.csv", head = TRUE, sep = ",") # View the first 10 rows of the outcomes data head(outcomes, 10) ``` In the data, we can see each row contains information for one utterance and there are multiple rows (i.e., multiple utterances) for each dyad. Specifically, there is a column for: * Dyad ID (`id`) * Time variable - in this case, turn in the conversation (`turn`) * Dyad member ID - in this case, role in the conversation (`role`; discloser = 1, listener = 2) * Turn type - in this case, a typology of six different speaking turn behaviors: acknowledgement, advice, elaboration, hedged disclosure, question, and reflection (`turn_type`) In the outcome data ("outcomes"), we can see there is one row for each dyad and there are columns for: * Dyad ID (`id`) * Outcome variable - in this case, post-conversation report of distress by the support receiver (`distress`) Load the R packages we need. Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model. ```{r, message = FALSE, warning = FALSE} # install.packages("devtools") # Install package if you have never used it before library(devtools) # For version control # install.packages("dplyr") # Install package if you have never used it before library(dplyr) # For data management # install.packages("ggplot2") # Install package if you have never used it before library(ggplot2) # For plotting # install.packages("psych") # Install package if you have never used it before library(psych) # For descriptive statistics # install.packages("tidyr") # Install package if you have never used it before library(tidyr) # For data management # install.package("vctrs") # Install package if you have never used it before library(vctrs) # For data management ``` # General data preparation. Before calculating our state space grid metrics, we need to create a new variable that will assign each speaking turn transition pair to a cell (or state) in the state space grid. To do so, we first create two new variables that represent the listeners' turn types and disclosers' turn types separately. ```{r} # Add "Discloser" to turn_type variable, then set all Listener (role = 2) turns to NA data$discloser_turntype <- paste("Discloser", data$turn_type, sep = " ") data$discloser_turntype[data$role == 2] <- NA # Add "Listener" to turn_type variable, then set all Discloser (role = 1) turns to NA data$listener_turntype <- paste("Listener", data$turn_type, sep = " ") data$listener_turntype[data$role == 1] <- NA # Reset missing values data$listener_turntype[data$listener_turntype == "Listener NA"] <- NA data$discloser_turntype[data$discloser_turntype == "Discloser NA"] <- NA # View the first 10 rows of the data head(data, 10) ``` We then carry forward each listener and discloser turn type. Specifically, we assign the following row (i.e., the following turn) that is missing to the turn type prior for both the listener turn type column and the discloser turn type column. ```{r} data <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Fill in the turn type for "listener_turntype" and "discloser_turntype" # such that the turn type is carried forward one row # in order to capture turn transitions dplyr::mutate(listener_turntype = vec_fill_missing(listener_turntype, max_fill = 1), discloser_turntype = vec_fill_missing(discloser_turntype, max_fill = 1)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` Next, we create a new variable ("ssg_state") in which we label each combination of the disclosers' and listeners' turns to a cell in the state space grid. Since we have six turn types for both disclosers and listeners, we will have 36 states in which the conversation can exist. ```{r} data$ssg_cell <- NA data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser Elaboration"] <- "LAck_DElab" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser Elaboration"] <- "LQues_DElab" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser Elaboration"] <- "LRefl_DElab" data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LAck_DHedg" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LQues_DHedg" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LRefl_DHedg" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser Elaboration"] <- "LElab_DElab" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LElab_DHedg" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser Reflection"] <- "LElab_DRefl" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser Question"] <- "LElab_DQues" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LElab_DAck" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser Elaboration"] <- "LHedg_DElab" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LHedg_DHedg" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser Reflection"] <- "LHedg_DRefl" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser Question"] <- "LHedg_DQues" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LHedg_DAck" data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser Reflection"] <- "LAck_DRefl" data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser Question"] <- "LAck_DQues" data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LAck_DAck" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser Reflection"] <- "LQues_DRefl" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser Question"] <- "LQues_DQues" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LQues_DAck" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser Reflection"] <- "LRefl_DRefl" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser Question"] <- "LRefl_DQues" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LRefl_DAck" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser Elaboration"] <- "LAdv_DElab" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser HedgedDisclosure"] <- "LAdv_DHedg" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser Reflection"] <- "LAdv_DRefl" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser Question"] <- "LAdv_DQues" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser Acknowledgement"] <- "LAdv_DAck" data$ssg_cell[data$listener_turntype == "Listener Advice" & data$discloser_turntype == "Discloser Advice"] <- "LAdv_DAdv" data$ssg_cell[data$listener_turntype == "Listener Acknowledgement" & data$discloser_turntype == "Discloser Advice"] <- "LAck_DAdv" data$ssg_cell[data$listener_turntype == "Listener Question" & data$discloser_turntype == "Discloser Advice"] <- "LQues_DAdv" data$ssg_cell[data$listener_turntype == "Listener Reflection" & data$discloser_turntype == "Discloser Advice"] <- "LRefl_DAdv" data$ssg_cell[data$listener_turntype == "Listener HedgedDisclosure" & data$discloser_turntype == "Discloser Advice"] <- "LHedg_DAdv" data$ssg_cell[data$listener_turntype == "Listener Elaboration" & data$discloser_turntype == "Discloser Advice"] <- "LElab_DAdv" # View the first 10 rows of the data head(data, 10) ``` Finally, we count the number of turns per conversation, which will serve as a control variable in a later analysis. ```{r} #create variable with total number of turns per dyad total_turns <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Create a new variable ("total_turns") that represents the turn number of turns # within the conversation - i.e., the maximum turn number dplyr::summarise(total_turns = max(turn)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the total_turns data head(total_turns, 10) ``` Merge "total_turns" into "outcomes." ```{r} outcomes <- merge(outcomes, total_turns, by = "id") # View the first 10 rows of the outcomes data head(outcomes, 10) ``` # Entropy. In this section, we calculate entropy for each dyad. Create function to calculate entropy. Thank you to [blinded for review] for creating this function! ```{r} # Create the function called iEntropy which calculates entropy for each conversation # This function takes in two vectors. x is the observed vector on which means are calculated, id is a vector identification values. Entropy = function(x = x, base = exp(1)){ # First, create a vector that counts all of the responses of x count_x = table(x) # Divide that table by its sum to convert the counts of the responses to probabilities of the responses prob_x = count_x / sum(count_x) # Take the logarithm of all of the probabilities with the user-specified base log_x = log(prob_x, base) # If any of the values within log_x equal -Inf (i.e. negative infinity) then replace them with 0 if(any(log_x == -Inf)){ log_x_corrected = log_x log_x_corrected[which(log_x == -Inf)] = 0 } # If all of the values do not equal -Inf, then do not replace any of them if(all(log_x != -Inf)){ log_x_corrected = log_x } # Multiply the probablilities by the corresponding logs of the probabilities prod_x = prob_x * log_x_corrected # sum the products of the probabilities and log probabilities, and multiply by negative one entropy = -1*sum(prod_x) # return entropy return(entropy) } # This second function takes the originally defined Entropy function and applies it to each individual # I entropy takes in a vector of obesrvations, x, and id vector, and a user-specified base of the logarithm iEntropy = function(x = x, id = id, base = exp(1)){ # The aggregate function splits x by id, and then applies the entropy function to each of the subsets of x out = aggregate(x, by = list(id), FUN = Entropy, base = base) # Rename the 'out' to 'id' and 'iEntropy' names(out) = c('id', 'iEntropy') # Return the calculated iEntropy values return(out) } ``` Calculate entropy. ```{r} # Remove missing data for the sake of calculating entropy data_noNA <- data[!is.na(data$ssg_cell), ] # Change "ssg_cell" to factor so the function recognizes the variable as categorical (rather than as a character) data_noNA$ssg_cell <- as.factor(data_noNA$ssg_cell) # Calculate entropy and save to new data frame ("entropy_data") entropy_data <- # Select variable aggregate(data_noNA$ssg_cell, # Apply variable to each group, in this case, dyad id ("id") by = list(data_noNA$id), # Apply the Entropy function we created FUN = Entropy, base = exp(1)) # Rename columns in new data set names(entropy_data) = c('id', 'iEntropy') # View the first 10 rows of the entropy data head(entropy_data, 10) ``` Entropy descriptives. ```{r} # Examine descriptive statistics of iEntropy describe(entropy_data$iEntropy) # Plot iEntropy distribution # Select data and variable of interest (iEntropy) ggplot(entropy_data, aes(x = iEntropy)) + # Create histogram and set width of bars in histogram geom_histogram(binwidth = 0.10) + # Label x-axis xlab("Entropy - Conversation Behavioral Flexibility") + # Label y-axis ylab("Frequency") + # Update aesthetics theme_classic() ``` Merge "entropy_data" into "outcomes." ```{r} outcomes <- merge(outcomes, entropy_data, by = "id") # View the first 10 rows of the outcomes data head(outcomes, 10) ``` # Problem description attractor. In this next section, we calculate the relative proportion of time spent in the problem description attractor. To begin, we first have to define the attractor. Specifically, we define the problem description attractor as the cells that were created from the intersection of discloser elaboration and hedged disclosure and listener acknowledgement, question, and reflection. We create a new variable that labels each turn as either IN the attractor or OUT of the attractor. ```{r} # Create a variable that labels all turns as "OUT" of the attractor data$attractor <- "OUT" # Update the "attractor" variable so that turns that occur in one of the six problem description attractor cells is now labeled "IN" data$attractor[data$ssg_cell == "LAck_DElab"] <- "IN" data$attractor[data$ssg_cell == "LAck_DHedg"] <- "IN" data$attractor[data$ssg_cell == "LQues_DElab"] <- "IN" data$attractor[data$ssg_cell == "LQues_DHedg"] <- "IN" data$attractor[data$ssg_cell == "LRefl_DElab"] <- "IN" data$attractor[data$ssg_cell == "LRefl_DElab"] <- "IN" # View the first 10 rows of the data head(data, 10) ``` Calculate the proportion of time in the attractor for each dyad. ```{r} attractor_time <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Count the occurrence of each category ("IN", "OUT") in the "attractor" variable dplyr::count(attractor) %>% # Remove any missing data tidyr::drop_na() %>% # Calculate the proportion of time in each category dplyr::mutate(attractor_prop = prop.table(n)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the attractor_time data head(attractor_time, 10) ``` Proportion of time in attractor descriptives. ```{r} # Examine descriptive statistics of attractor_prop describe(attractor_time$attractor_prop) # Plot attractor_prop distribution # Select data and variable of interest (attractor_prop) ggplot(attractor_time, aes(x = attractor_prop)) + # Create histogram and set width of bars in histogram geom_histogram(binwidth = 0.10) + # Label x-axis xlab("Proportion of Turns in the Problem Description Attractor for Each Dyad") + # Label y-axis ylab("Frequency") + # Update aesthetics theme_classic() ``` Merge proportion of time in the attractor (attractor_prop == "IN") into "outcomes." ```{r} outcomes <- merge(outcomes, attractor_time[attractor_time$attractor == "IN", c("id", "attractor_prop")], by = "id", all.x = TRUE) # View the first 10 rows of the outcomes data head(outcomes, 10) ``` # Attractor exit time. Finally, we will calculate the time (i.e., turn number) when the dyad first leaves the problem description attractor. Find first row in which zone is not equal to 1. ```{r} attractor_exit <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Remove the first row of each group since the first turn will # never be in a cell (since we need two data points to locate a turn # transition in a cell) dplyr::slice(2:n()) %>% # Keep rows in which the dyad is out of the attractor # (and is not due to missing data) dplyr::filter(attractor == "OUT" & !is.na(ssg_cell)) %>% # Keep the first row in which the dyad is out of the attractor dplyr::filter(row_number()==1) %>% # Save the data as a data.frame as.data.frame() # Rename turn variable to "attractor_exit" colnames(attractor_exit)[2] <- "attractor_exit" ``` Time of attractor exit descriptives. ```{r} # Examine descriptive statistics of attractor_exit describe(attractor_exit$attractor_exit) # Plot attractor_exit distribution # Select data and variable of interest (attractor_exit) ggplot(attractor_exit, aes(x = attractor_exit)) + # Create histogram and set width of bars in histogram geom_histogram(binwidth = 5) + # Label x-axis xlab("Time (in turns) of Problem Description Attractor Exit") + # Label y-axis ylab("Frequency") + # Update aesthetics theme_classic() ``` Merge "attractor_exit" variable into "outcomes." ```{r} outcomes <- merge(outcomes, attractor_exit[, c("id", "attractor_exit")], by = "id") # View the first 10 rows of the outcomes data head(outcomes, 10) ``` # SSG metrics and between-dyad differences. In this final step, we will examine how our state space grid metrics are associated with a between-dyad difference (distress). We will examine these associations using multiple regression, and we will control for the total number of turns in the conversation. Entropy. ```{r} entropy_regression <- lm(# Post-conversation distress is predicted by entropy and total turns distress ~ iEntropy + total_turns, data = outcomes) summary(entropy_regression) # Plot result # Select data and set x-axis (iEntropy - i.e., the predictor) # and y-axis (distress - i.e., the outcome) ggplot(outcomes, aes(x = iEntropy, y = distress)) + # Plot each dyad's iEntropy and distress geom_point() + # Plot smoothed regression line stat_smooth(method='lm', formula = y ~ x, color = "blue", size = 1) + # Label x-axis xlab("Conversation Behavior Flexibility") + # Label y-axis ylab("Disclosers' Post-conversation Distress") + # Update aesthetics theme_classic() ``` Entropy of conversation behaviors during a support conversation is not associated with discloser's post-conversation distress, while controlling for the total number of turns within the conversation. Problem description attractor. ```{r} attractor_regression <- lm(# Post-conversation distress is predicted by time in # attractor and total turns distress ~ attractor_prop + total_turns, data = outcomes) summary(attractor_regression) # Plot result # Select data and set x-axis (attractor_prop - i.e., the predictor) # and y-axis (distress - i.e., the outcome) ggplot(outcomes, aes(x = attractor_prop, y = distress)) + # Plot each dyad's attractor_prop and distress geom_point() + # Plot smoothed regression line stat_smooth(method='lm', formula = y ~ x, color = "blue", size = 1) + # Label x-axis xlab("Proportion of Turns in the Problem Description Attractor") + # Label y-axis ylab("Disclosers' Post-conversation Distress") + # Update aesthetics theme_classic() ``` The proportion of time spent in the problem description attractor during a support conversation is not associated with discloser's post-conversation distress, while controlling for the total number of turns within the conversation. Attractor exit time. ```{r} exit_regression <- lm(# Post-conversation distress is predicted by time of first # attractor exit and total turns distress ~ attractor_exit + total_turns, data = outcomes) summary(exit_regression) # Plot result # Select data and set x-axis (attractor_exit - i.e., the predictor) # and y-axis (distress - i.e., the outcome) ggplot(outcomes, aes(x = attractor_exit, y = distress)) + # Plot each dyad's attractor_exit and distress geom_jitter() + # Plot smoothed regression line stat_smooth(method='lm', formula = y ~ x, color = "blue", size = 1) + # Label x-axis xlab("Time (in turns) of Problem Description Attractor Exit") + # Label y-axis ylab("Disclosers' Post-conversation Distress") + # Update aesthetics theme_classic() ``` The timing of first exit from the problem description attractor during a support conversation is not associated with discloser's post-conversation distress, while controlling for the total number of turns within the conversation. ## Additional Information We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output may be helpful. ```{r} session_info(pkgs = c("attached")) ```