Conversational Motifs Tutorial

Overview

This tutorial provides R code to conduct the conversational motif analyses presented in the paper “Using Sequence Analysis to Identify Conversational Motifs in Supportive Interactions (Solomon et al., in press).”

The primary analytic technique forwarded and used in the paper is sequence analysis (MacIndoe & Abbott, 2004), a data-driven analytic technique that is used to (a) identify patterns in categorical time-series data and (b) examine differences across pattern groups and/or if and how the timing and prevalence of those patterns is related to an outcome of interest. Additional details about the method and how it is used to analyze and understand conversation data are elaborated in the paper.

In the empirical example presented here, we describe and examine patterns embedded in dyads’ conversations - conversational motifs. Using turn-by-turn data from a subset of the data analyzed in the paper (specifically, 53 conversations between friends during which one friend, the discloser, talked about a current propblem with the other friend, the listener). Each speaking turn in these conversations was coded as being one of six types: acknowledgement, advice, elaboration, hedged disclosure, question, or reflection (see Bodie et al., 2021 in the Journal of Language and Social Psychology for more details about the creation of the turn typology). We are specifically interested in (a) identifying the specific types of turn-to-turn exchanges that these dyads use in their conversations - i.e., the five-turn sequences that we refer to as conversational motifs and (b) whether the prevalence and timing of those sequences is associated with post-conversation reports of the discloser’s emotional improvement.

Please note that that while the steps of sequence analysis demonstrated in this tutorial are the same as those described in the paper, the steps that connect the conversational motifs to the outcome of interest differ. Specifically, the paper uses multigroup-SEMs to examine whether the timing and prevalence of conversational motifs are associated with the outcomes of interest. Here, we use regression models to examine whether the timing and prevalence of conversational motifs are associated with an outcome of interest to keep the focus of the tutorial on sequence analysis.

Note that the accompanying “ConversationalMotifs_Tutorial_2022August20.rmd” file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R).

Outline

This tutorial covers…

  • Reading in the data and loading needed packages.
  • Data descriptives.
  • Creating five-turn windows.
  • Creating sequences.
  • Establishing a cost matrix and obtaining a dissimilarity matrix.
  • Determining the number of clusters - conversational motifs.
  • Examining associations of timing and prevalence of conversational motifs with outcomes.
  • Conclusion.

Read in the data and load analysis packages.

Let’s read the data into R.

The exemplar data are stored in two .csv files. One file, “friends_subset.csv”, contains the repeated measures (i.e., turn-by-turn) supportive conversation data for all 53 dyads. The second file, “friends_outcomes.csv”, contains the time-invariant outcome data for all 53 dyads (specifically, disclosers’ self-reported emotional improvement).

# Set working directory (i.e., where the data files are stored)
# This can also be done by going to the top bar of RStudio and selecting 
# "Session" --> "Set Working Directory" --> "Choose Directory" --> 
# finding the location of the folder that contains the data files
setwd("~/Desktop") # Note: You can skip this line if you have 
#the data files and this .rmd file stored in the same directory

# Read in the repeated measures data
data <- read.csv(file = "friends_subset.csv", head = TRUE, sep = ",")

# View the first 10 rows of the repeated measures data
head(data, 10)
##    id turn      role                 turn_type
## 1   1    1  Listener  Listener_Acknowledgement
## 2   1    2 Discloser Discloser_Acknowledgement
## 3   1    3  Listener  Listener_Acknowledgement
## 4   1    4 Discloser          Discloser_Advice
## 5   1    5  Listener      Listener_Elaboration
## 6   1    6 Discloser     Discloser_Elaboration
## 7   1    7  Listener      Listener_Elaboration
## 8   1    8 Discloser     Discloser_Elaboration
## 9   1    9  Listener  Listener_Acknowledgement
## 10  1   10 Discloser     Discloser_Elaboration
# Read in the outcomes data
outcomes <- read.csv(file = "friends_outcomes_subset.csv", head = TRUE, sep = ",")

# View the first 10 rows of the outcomes data
head(outcomes, 10)
##    id emo_improve
## 1  80    5.666667
## 2  79    7.000000
## 3  78    5.333333
## 4  77    5.333333
## 5  74    3.000000
## 6  73    1.666667
## 7  72    6.666667
## 8  70    6.000000
## 9  69    4.333333
## 10 68    5.000000

In the data, we can see each row contains information for one turn and there are multiple rows (i.e., turns) for each dyad. Specifically, there is a column for:

  • Dyad ID (id)
  • Time variable - in this case, turn in the conversation (turn)
  • Dyad member’s role in the conversation (role; discloser = 1, listener = 2)
  • Turn type - in this case, based upon a typology derived in Bodie et al. (2021; turn_type)

In the outcome data (“outcomes”), we can see there is one row for each dyad and there are columns for:

  • Dyad ID (id)
  • Outcome variable - in this case, the discloser’s post-conversation report of emotional improvement (emo_improve; an average of the three items: “I feel better after having talked with my friend,” “My friend made me feel better about myself,” and “I feel more optimistic after having talked with my friend”)

Load the R packages we need.

Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model.

# install.packages("cluster") # Install package if you have never used it before
library(cluster) # For hierarchical cluster analysis

# install.packages("devtools") # Install package if you have never used it before
require(devtools) # For version control

# install.packages("dplyr") # Install package if you have never used it before
library(dplyr) # For data management

# install.packages("ggplot2") # Install package if you have never used it before
library(ggplot2) # For plotting

# install.packages("psych") # Install package if you have never used it before
library(psych) # For descriptive statistics

# install.packages("reshape") # Install package if you have never used it before
library(reshape) # For data management

# install.packages("stringr") # Install package if you have never used it before
library(stringr) # For changing character strings within variables

# install.packages("tidyr") # Install package if you have never used it before
library(tidyr) # For data management

# install.packages("TraMineR") # Install package if you have never used it before
library(TraMineR) # For sequence analysis

# install.packages("TraMineRextras") # Install package if you have never used it before
library(TraMineRextras) # For sequence analysis

Data Descriptives.

The goal of this step is to describe our sample, specifically,

  1. how many dyads are in the data set,
  2. how many conversation turns there are for each dyad, and
  3. the frequency of each turn type across all dyads.
  1. Number of dyads.
# Number of dyads in the repeated measures data
# Length (i.e., number) of unique ID values
length(unique(data$id))
## [1] 53
# Number of dyads in the outcome data
# Length (i.e., number) of unique ID values
length(unique(outcomes$id))
## [1] 53

There are 53 dyads in both data sets.

  1. Number of conversation turns for each dyad.
num_occ <- # Select data
           data %>%
           # Select grouping variable, in this case, dyad ID (id)
           group_by(id) %>%
           # Count the number of turns in each conversation
           summarise(count = n()) %>%
           # Save the data as a data.frame
           as.data.frame()

# Calculate descriptives on the number of turns per conversation
describe(num_occ$count)
##    vars  n mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 53 74.7 20.63     68   72.88 19.27  49 132    83 0.67    -0.56 2.83

The dyads in this subset of the data had supportive conversations that had, on average, approximately 75 turns (M = 74.70, SD = 20.63), with the conversations ranging in length from 49 to 132 turns.

Plot a histogram of the number of turns per conversation.

# Select data (num_occ) and value on the x-axis (number of turns per conversation: "count")
ggplot(data = num_occ, aes(x = count)) +
  # Create a histogram with binwidth = 5 and white bars outlined in black
  geom_histogram(binwidth = 5, fill = "white", color = "black") + 
  # Label x-axis
  labs(x = "Number of Turns per Conversation") +
  # Change background aesthetics of plot
  theme_classic()

  1. The number of total turns for each turn type.
# Create table that calculates the number of turns for each turn type
turntype_table <- table(data$turn_type)

# Display the table
turntype_table
## 
##  Discloser_Acknowledgement           Discloser_Advice 
##                        136                         48 
##      Discloser_Elaboration Discloser_HedgedDisclosure 
##                       1449                        244 
##         Discloser_Question       Discloser_Reflection 
##                         88                         27 
##   Listener_Acknowledgement            Listener_Advice 
##                        721                         78 
##       Listener_Elaboration  Listener_HedgedDisclosure 
##                        381                         97 
##          Listener_Question        Listener_Reflection 
##                        310                        379

We can see that disclosers overall used Elaboration turns the most (1,449 turns), while listeners overall used Acknowledgement turns the most (721 turns).

Create Five-turn Windows.

The goal of this step is to prepare the data for sequence analysis by creating five-turn windows, which requires several sub-steps:

  1. manipulating the data so each conversation begins with a discloser’s turn,
  2. updating the turn variable so that all conversations begin at turn 1,
  3. adding empty rows where there is missing information for a turn, and
  4. creating a data set that contains all five-turn windows for each dyad.

Note: You are not required to begin these five-turn windows with a set role in the dyads. We decided to focus on five-turn windows that begin with a discloser in this analysis, but creating windows that begin with a listener is also possible. In the case of indistinguishable dyads (e.g., two arguers), steps (1) & (2) are not necessary.

Remove rows that precede the first discloser turn.

data1 <- # Select data
         data %>% 
         # Select grouping variable, in this case, dyad ID (id)
         group_by(id) %>%
         # Remove any rows that precede the first "Discloser" row; 
         slice(which.max(role == "Discloser") : n()) %>%
         # Save the data as a data.frame
         as.data.frame()

# View the first 10 rows of the data
head(data1, 10)
##    id turn      role                 turn_type
## 1   1    2 Discloser Discloser_Acknowledgement
## 2   1    3  Listener  Listener_Acknowledgement
## 3   1    4 Discloser          Discloser_Advice
## 4   1    5  Listener      Listener_Elaboration
## 5   1    6 Discloser     Discloser_Elaboration
## 6   1    7  Listener      Listener_Elaboration
## 7   1    8 Discloser     Discloser_Elaboration
## 8   1    9  Listener  Listener_Acknowledgement
## 9   1   10 Discloser     Discloser_Elaboration
## 10  1   11  Listener  Listener_Acknowledgement

Create a new turn number variable that is the same as the original turn variable if the conversation originally started with a discloser turn. If the conversation originally started with a listener turn, then the new turn variable is one less than the original turn variable.

data1 <- # Select data
         data1 %>%
         # Select grouping variable, in this case, dyad ID (id)
         group_by(id) %>%
         # Create a new variable called "turn_minus" that is the value of the "turn" variable minus 1
         mutate(turn_minus = turn - 1) %>%
         # Create a new variable called "newturn"
         # If the first value of "turn" within a dyad is 2, then label "newturn" as "Yes"
         # If the first value of "turn" within a dyad is not 2, then label "newturn" as "No"
         mutate(newturn = if_else(first(turn) == 2, "Yes", "No")) %>%
         # If "newturn" is equal to "Yes" than replace the "newturn" values with the 
         # values in "turn_minus"
         # If "newturn" is not equal to "Yes" than keep the values in "turn"
         mutate(newturn = ifelse(newturn == "Yes", turn_minus, turn)) %>%
         # Save the data as a data.frame
         as.data.frame()

# View the first 10 rows of the data
head(data1, 10)
##    id turn      role                 turn_type turn_minus newturn
## 1   1    2 Discloser Discloser_Acknowledgement          1       1
## 2   1    3  Listener  Listener_Acknowledgement          2       2
## 3   1    4 Discloser          Discloser_Advice          3       3
## 4   1    5  Listener      Listener_Elaboration          4       4
## 5   1    6 Discloser     Discloser_Elaboration          5       5
## 6   1    7  Listener      Listener_Elaboration          6       6
## 7   1    8 Discloser     Discloser_Elaboration          7       7
## 8   1    9  Listener  Listener_Acknowledgement          8       8
## 9   1   10 Discloser     Discloser_Elaboration          9       9
## 10  1   11  Listener  Listener_Acknowledgement         10      10

Some datasets may contain missing data (e.g., because of uncodable turns that were removed from analyses). To ensure that the original turn numbers and ordering are still maintained even with missing data, we add empty rows so there are consecutive time points for all IDs.

data2 <- # Select data
         data1 %>% 
         # Select grouping variable, in this case, dyad ID (id)
         group_by(id) %>%
         # Complete the sequence of values in "newturn" that range 
         # from the lowest value in "newturn" to 
         # the highest value in "newturn"
         complete(newturn = seq(min(newturn), max(newturn), 1L)) %>%
         # Save the data as a data.frame
         as.data.frame()

# View the first 10 rows of the data
head(data2, 10)
##    id newturn turn      role                 turn_type turn_minus
## 1   1       1    2 Discloser Discloser_Acknowledgement          1
## 2   1       2    3  Listener  Listener_Acknowledgement          2
## 3   1       3    4 Discloser          Discloser_Advice          3
## 4   1       4    5  Listener      Listener_Elaboration          4
## 5   1       5    6 Discloser     Discloser_Elaboration          5
## 6   1       6    7  Listener      Listener_Elaboration          6
## 7   1       7    8 Discloser     Discloser_Elaboration          7
## 8   1       8    9  Listener  Listener_Acknowledgement          8
## 9   1       9   10 Discloser     Discloser_Elaboration          9
## 10  1      10   11  Listener  Listener_Acknowledgement         10

Finally, we need to create a data set that contains all the five-turn windows for each dyad that begin with a discloser turn. We create this data set in the loop below.

# Change the structure of the "id", "newturn", and "turn_type" variables
data2$id <- as.character(data2$id)
data2$newturn <- as.numeric(data2$newturn)
data2$turn_type <- as.character(data2$turn_type)

# Create a vector of all IDs in the data set that the loop will work through
data2_idlist <- unique(data2$id)

# Set the value of the window length 
# (value is 1 less than the desired value in order to select the number of values that follow)
window <- 4

# Set the minimum length of a sequence 
# (i.e., do not include partial sequences that may result because of windows toward end of sequence)
min_length <- 5

# Create an empty data set called "window_data"
window_data <- NULL

# Start loop
# For each i (i.e., dyad) in the vector
for(i in 1:length(data2_idlist)){
  
  # Select i-th subject from the vector
  subject_id <- data2_idlist[i]
  
  # Subset i-th subject's data
  dat <- subset(data2, id == subject_id)
  
  # Create a vector that contains all of the turns in the i-th subject's data
  turn_list <- dat$newturn
  
  # Only keep odd numbered turns from vector (as Discloser turns are odd numbered)
  # Skip this step by inserting a # (pound sign) before turn_list, 
  # if you are working with indistinguishable dyads 
  turn_list <- turn_list[lapply(turn_list, "%%", 2) != 0]
  
  # Create loop that selects a sub-sequence of turns starting with each turn in "turn_list"
  for(x in turn_list){
    
    # Select the x to x+window values in the "turn_type" variable and save to list "turntype"
    turntype <- dat[x:(x + window), "turn_type"]
    
    # Save the subject's ID variable to object "subject"
    subject <- unique(dat$id)
    
    # Combine the subject value and the turntype values into list called "new_row"
    new_row <- c(subject, turntype)
    
    # Add "new_row" to data set "window_data"
    window_data <- rbind(window_data, new_row)
    
    # Change the column names of "window_data"
    colnames(window_data)[1:6] <- c("id", "turn1", "turn2", "turn3", "turn4", "turn5")
    
  }
}

# Save "window_data" as data frame
window_data <- as.data.frame(window_data)

# Remove row names from data set
rownames(window_data) <- NULL

# If there is missing data in a row, then delete
window_data <- na.omit(window_data)

# View the first 10 rows of the data
head(window_data, 10)
##    id                      turn1                    turn2
## 1   1  Discloser_Acknowledgement Listener_Acknowledgement
## 2   1           Discloser_Advice     Listener_Elaboration
## 3   1      Discloser_Elaboration     Listener_Elaboration
## 4   1      Discloser_Elaboration Listener_Acknowledgement
## 5   1      Discloser_Elaboration Listener_Acknowledgement
## 6   1      Discloser_Elaboration Listener_Acknowledgement
## 7   1 Discloser_HedgedDisclosure Listener_Acknowledgement
## 16  1      Discloser_Elaboration Listener_Acknowledgement
## 17  1      Discloser_Elaboration Listener_Acknowledgement
## 18  1      Discloser_Elaboration Listener_Acknowledgement
##                         turn3                    turn4
## 1            Discloser_Advice     Listener_Elaboration
## 2       Discloser_Elaboration     Listener_Elaboration
## 3       Discloser_Elaboration Listener_Acknowledgement
## 4       Discloser_Elaboration Listener_Acknowledgement
## 5       Discloser_Elaboration Listener_Acknowledgement
## 6  Discloser_HedgedDisclosure Listener_Acknowledgement
## 7       Discloser_Elaboration Listener_Acknowledgement
## 16      Discloser_Elaboration Listener_Acknowledgement
## 17      Discloser_Elaboration Listener_Acknowledgement
## 18 Discloser_HedgedDisclosure Listener_Acknowledgement
##                         turn5
## 1       Discloser_Elaboration
## 2       Discloser_Elaboration
## 3       Discloser_Elaboration
## 4       Discloser_Elaboration
## 5  Discloser_HedgedDisclosure
## 6       Discloser_Elaboration
## 7       Discloser_Elaboration
## 16      Discloser_Elaboration
## 17 Discloser_HedgedDisclosure
## 18 Discloser_HedgedDisclosure

In the five-turn window data (“window_data”), the first column is the dyad ID variable (id) and columns two through six contain the turn type information for the five turns within the window.

Define Sequences.

Now that we know a little bit more about our repeated measures data and re-formatted our data into five-turn sequences, we can move on to the data preparation required for sequence analysis.

The goal of this step is to:

  1. create an “alphabet” that represents each of our categories, and
  2. create and plot the categorical sequence data.
  1. Create alphabet.

We create an alphabet that represents each possible category within the categorical variable of interest (in this case, “turn_type”). The actual naming of these values is not important, but we are going to name them in a way that facilitates interpretation.

# This object contains the categories that appear in the data set.
turn_alphabet <- c("Listener_Question", "Listener_Acknowledgement", 
                   "Listener_Elaboration", "Listener_HedgedDisclosure", 
                   "Listener_Reflection", "Listener_Advice",
                   "Discloser_Elaboration", "Discloser_HedgedDisclosure", 
                   "Discloser_Question", "Discloser_Acknowledgement", 
                   "Discloser_Advice", "Discloser_Reflection")

# This object allows for more helpful labels 
turn_labels <- c("Listener_Question", "Listener_Acknowledgement", 
                 "Listener_Elaboration", "Listener_HedgedDisclosure", 
                 "Listener_Reflection", "Listener_Advice",
                 "Discloser_Elaboration", "Discloser_HedgedDisclosure", 
                 "Discloser_Question", "Discloser_Acknowledgement", 
                 "Discloser_Advice", "Discloser_Reflection")
  1. Create sequences.

Now that the data are in the correct (i.e., wide) format and we have created an alphabet for our sequence analysis, we next need to create a sequence object that can be understood by the R package for sequence analysis (TraMineR).

Before creating the sequences, we first assign colors to each of the categories, which will help us when viewing plots of the sequences. This step is not required since there is a default color palette, but this gives us control over what color is assigned to which category. We do this by assigning each category (i.e., turn type) a hex code (https://www.color-hex.com/). The categories should be written as they appear in the alphabet created above.

A note on accessibility: To make your plots accessible, you may consider adopting a colorblind-friendly palette. David Nichols’ website (https://davidmathlogic.com/colorblind/) provides a great explainer on this issue, as well as a color picking tool.

Listener_Acknowledgement <- "#619CFF"     # Blue
Listener_Advice <- "#FFE700"              # Yellow
Listener_Elaboration <- "#F8766D"         # Red
Listener_HedgedDisclosure <- "#FFA500"    # Orange
Listener_Question <- "#00BA38"            # Green
Listener_Reflection <- "#DB72FB"          # Purple

Discloser_Acknowledgement <- "#619CFF"    # Blue
Discloser_Advice <- "#FFE700"             # Yellow
Discloser_Elaboration <- "#F8766D"        # Red
Discloser_HedgedDisclosure <- "#FFA500"   # Orange
Discloser_Question <- "#00BA38"           # Green
Discloser_Reflection <- "#DB72FB"         # Purple

Next, we create an object (“turn_seq”) that contains all of the sequences in the format needed for the sequence analysis package.

turn_seq <- TraMineR::seqdef(window_data,             # Select data   
                      var = 2:6,                      # Columns containing repeated measures data
                      alphabet = turn_alphabet,       # Alphabet  
                      labels = turn_labels,           # Labels
                      xtstep = 5,                     # Steps between tick marks
                      cpal = c(Listener_Question, Listener_Acknowledgement, 
                               Listener_Elaboration, Listener_HedgedDisclosure, 
                               Listener_Reflection, Listener_Advice,
                               Discloser_Elaboration, Discloser_HedgedDisclosure, 
                               Discloser_Question, Discloser_Acknowledgement, 
                               Discloser_Advice, Discloser_Reflection))   # Color palette
##  [>] 12 distinct states appear in the data:
##      1 = Discloser_Acknowledgement
##      2 = Discloser_Advice
##      3 = Discloser_Elaboration
##      4 = Discloser_HedgedDisclosure
##      5 = Discloser_Question
##      6 = Discloser_Reflection
##      7 = Listener_Acknowledgement
##      8 = Listener_Advice
##      9 = Listener_Elaboration
##      10 = Listener_HedgedDisclosure
##      11 = Listener_Question
##      12 = Listener_Reflection
##  [>] state coding:
##        [alphabet]                 [label]                    [long label]
##      1  Listener_Question          Listener_Question          Listener_Question
##      2  Listener_Acknowledgement   Listener_Acknowledgement   Listener_Acknowledgement
##      3  Listener_Elaboration       Listener_Elaboration       Listener_Elaboration
##      4  Listener_HedgedDisclosure  Listener_HedgedDisclosure  Listener_HedgedDisclosure
##      5  Listener_Reflection        Listener_Reflection        Listener_Reflection
##      6  Listener_Advice            Listener_Advice            Listener_Advice
##      7  Discloser_Elaboration      Discloser_Elaboration      Discloser_Elaboration
##      8  Discloser_HedgedDisclosure Discloser_HedgedDisclosure Discloser_HedgedDisclosure
##      9  Discloser_Question         Discloser_Question         Discloser_Question
##      10  Discloser_Acknowledgement  Discloser_Acknowledgement  Discloser_Acknowledgement
##      11  Discloser_Advice           Discloser_Advice           Discloser_Advice
##      12  Discloser_Reflection       Discloser_Reflection       Discloser_Reflection
##  [>] 1721 sequences in the data set
##  [>] min/max sequence length: 5/5

A lot of text will appear after the sequence object is created. This text tells you about the number of sequences (which should be equal to the number of five-turn sequences in the sample), the states (i.e., categories) that appear in the sequence object, and the alphabet and labels of the categories.

Finally, we can plot the sequences.

seqIplot(turn_seq,                                             # Sequence object
         with.legend = "right",                                # Display legend on right side of plot
         cex.legend = 0.8,                                     # Change size of legend
         main = "Turn Type Use during a Conversational Motif", # Plot title
         legend.prop = .4)                                     # Proportion of space for legend

To read this plot, each row represents a single five-turn conversational motif, where the different turn types within that motif are represented as different colors. We can see that the conversations varied in content, although, elaboration turns (red turns) seem to appear frequently across all conversations.

Establish a Cost Matrix and Obtain Dissimilarity Matrix.

The goal of this step is to establish a cost matrix and to obtain a dissimilarity matrix.

Sequence analysis aims to identify groups of sequences that are similar by clustering together sequences based on their distances. The distance between any pair of sequences is calculated as the minimum “cost” of transforming one sequence into another and is calculated using an optimal matching algorithm. In the transformation, there are specific costs associated with inserting, deleting, and substituting elements of the sequence, as well as costs for substituting missing values. We set these costs when conducting the sequence analysis.

There are a number of ways to set substitution costs. Typically, substitution costs are established as the distance between cells. However, we do not have an ordinal scale for the categories (i.e., there is no logical order or distance between our turn types, e.g., what turn type is “closest” to acknowledgement?). In this case, we use a constant cost matrix (i.e., the distance between any turn type is the same). If we were to use a theoretical rationale to sort turn types that were more or less similar to each other, we could use Manhattan (city-block) distance or Euclidian distance.

We need to create a substitution cost matrix before conducting the sequence analysis. The substitution cost matrix will be a (k+1) by (k+1) matrix with k = number of categories and an additional right-most column and bottom row to represent missingness costs (half of the highest cost, which in this case is half of 2). In our case, the substitution cost matrix will be a 7 x 7 matrix since we have 6 turn type categories + 1 row and column for the cost of missing values.

Here, we establish our substitution cost matrix.

# Create substitution cost matrix and save to the object "costmatrix"
costmatrix <- seqsubm(turn_seq,             # Sequence object
                      method = "CONSTANT",  # Method to determine costs
                      cval = 2,             # Substitution cost
                      with.missing = TRUE,  # Allows for missingness state
                      miss.cost = 1,        # Cost for substituting a missing state
                      time.varying = FALSE, # Does not allow the cost to vary over time
                      weighted = TRUE)      # Allows weights to be used when applicable
##  [!!] seqcost: 'with.missing' set as FALSE as 'seqdata' has no non-void missing values
##  [>] creating 12x12 substitution-cost matrix using 2 as constant value
# Examine substitution cost matrix
costmatrix
##                            Listener_Question Listener_Acknowledgement
## Listener_Question                          0                        2
## Listener_Acknowledgement                   2                        0
## Listener_Elaboration                       2                        2
## Listener_HedgedDisclosure                  2                        2
## Listener_Reflection                        2                        2
## Listener_Advice                            2                        2
## Discloser_Elaboration                      2                        2
## Discloser_HedgedDisclosure                 2                        2
## Discloser_Question                         2                        2
## Discloser_Acknowledgement                  2                        2
## Discloser_Advice                           2                        2
## Discloser_Reflection                       2                        2
##                            Listener_Elaboration Listener_HedgedDisclosure
## Listener_Question                             2                         2
## Listener_Acknowledgement                      2                         2
## Listener_Elaboration                          0                         2
## Listener_HedgedDisclosure                     2                         0
## Listener_Reflection                           2                         2
## Listener_Advice                               2                         2
## Discloser_Elaboration                         2                         2
## Discloser_HedgedDisclosure                    2                         2
## Discloser_Question                            2                         2
## Discloser_Acknowledgement                     2                         2
## Discloser_Advice                              2                         2
## Discloser_Reflection                          2                         2
##                            Listener_Reflection Listener_Advice
## Listener_Question                            2               2
## Listener_Acknowledgement                     2               2
## Listener_Elaboration                         2               2
## Listener_HedgedDisclosure                    2               2
## Listener_Reflection                          0               2
## Listener_Advice                              2               0
## Discloser_Elaboration                        2               2
## Discloser_HedgedDisclosure                   2               2
## Discloser_Question                           2               2
## Discloser_Acknowledgement                    2               2
## Discloser_Advice                             2               2
## Discloser_Reflection                         2               2
##                            Discloser_Elaboration Discloser_HedgedDisclosure
## Listener_Question                              2                          2
## Listener_Acknowledgement                       2                          2
## Listener_Elaboration                           2                          2
## Listener_HedgedDisclosure                      2                          2
## Listener_Reflection                            2                          2
## Listener_Advice                                2                          2
## Discloser_Elaboration                          0                          2
## Discloser_HedgedDisclosure                     2                          0
## Discloser_Question                             2                          2
## Discloser_Acknowledgement                      2                          2
## Discloser_Advice                               2                          2
## Discloser_Reflection                           2                          2
##                            Discloser_Question Discloser_Acknowledgement
## Listener_Question                           2                         2
## Listener_Acknowledgement                    2                         2
## Listener_Elaboration                        2                         2
## Listener_HedgedDisclosure                   2                         2
## Listener_Reflection                         2                         2
## Listener_Advice                             2                         2
## Discloser_Elaboration                       2                         2
## Discloser_HedgedDisclosure                  2                         2
## Discloser_Question                          0                         2
## Discloser_Acknowledgement                   2                         0
## Discloser_Advice                            2                         2
## Discloser_Reflection                        2                         2
##                            Discloser_Advice Discloser_Reflection
## Listener_Question                         2                    2
## Listener_Acknowledgement                  2                    2
## Listener_Elaboration                      2                    2
## Listener_HedgedDisclosure                 2                    2
## Listener_Reflection                       2                    2
## Listener_Advice                           2                    2
## Discloser_Elaboration                     2                    2
## Discloser_HedgedDisclosure                2                    2
## Discloser_Question                        2                    2
## Discloser_Acknowledgement                 2                    2
## Discloser_Advice                          0                    2
## Discloser_Reflection                      2                    0

Now that we have created the substitution cost matrix, we can calculate the distances between each pair of five-turn sequences.

We use an optimal matching algorithm. The output of the sequence analysis is a n x n (n = number of five-turn sequences) dissimilarity matrix where the elements in the matrix indicate the minimal cost of transforming one sequence into every other sequence in the corresponding cell of the matrix. Insertion/deletion costs are typically set to 1.0, substitution costs are set to the matrix we established above, and missingness costs are typically set to half the highest cost within the matrix (and are included in the substitution cost matrix we established above).

Note: Other algorithms are available, and they can be specified in the method = “” argument below. To see other algorithms available in the TraMineR package, type ?seqdist in the console or type seqdist in the search bar at the top of the Help tab on the right.

# Obtain distance matrix 
dist_om <- seqdist(turn_seq,            # Sequence object
                   method = "OM",       # Optimal matching algorithm
                   indel = 1.0,         # Insert/deletion costs set to 1
                   sm = costmatrix,     # Substitution cost matrix
                   with.missing = TRUE)
##  [!!] seqdist: 'with.missing' set as FALSE as 'seqdata' has no non-void missing values
##  [>] 1721 sequences with 12 distinct states
##  [>] checking 'sm' (size and triangle inequality)
##  [>] 549 distinct  sequences
##  [>] min/max sequence lengths: 5/5
##  [>] computing distances using the OM metric
##  [>] elapsed time: 0.191 secs
# Examine the top left corner of the dissimilarity matrix
dist_om[1:10, 1:10]
##    1 2 3 4 5 6 7 16 17 18
## 1  0 4 6 6 6 6 6  6  6  8
## 2  4 0 4 6 6 6 6  6  6  8
## 3  6 4 0 2 4 4 4  2  4  6
## 4  6 6 2 0 2 2 2  0  2  4
## 5  6 6 4 2 0 4 4  2  0  2
## 6  6 6 4 2 4 0 4  2  4  2
## 7  6 6 4 2 4 4 0  2  4  6
## 16 6 6 2 0 2 2 2  0  2  4
## 17 6 6 4 2 0 4 4  2  0  2
## 18 8 8 6 4 2 2 6  4  2  0

Determine Number of Clusters - Conversational Motifs.

The goal of this step is to determine the number of clusters - i.e., a typology of conversational motifs - using a data-driven approach.

We next take the distance matrix obtained in the prior step to determine an appropriate number of clusters that represent the different conversational motifs within our supportive conversations. We used hierarchical cluster analysis using Ward’s single linkage method to determine the number of clusters that represent the data well. We then create an object that contains cluster membership for each five-turn sequence (which will be used in the final step) and plot the clusters.

Conduct hierarchical cluster analysis and save cluster analysis results to the object “clusterward”.

# Insert dissimilarity matrix ("dist_om"), 
# indicate that we are using a dissimilarity matrix, and
# indicate that we want to use Ward's single linkage clustering method
clusterward <- cluster::agnes(dist_om, diss = TRUE, method = "ward")

# Plot the results of the cluster analysis using a dendrogram
# Insert cluster analysis results object ("clusterward")
plot(clusterward, which.plot = 2)

In this example, the resulting dendrogram indicates three clusters. We reached this conclusion by examining the length of the vertical lines (longer vertical lines indicate greater differences between groups) and the number of dyads within each group (we didn’t want a group with too few dyads). After selecting a 3-cluster solution, we plotted the sequences of the three clusters for visual comparison.

# Cut dendrogram (or tree) by the number of determined groups (in this case, 3)
# Insert cluster analysis results object ("clusterward") 
# and the number of cut points
cl3 <- cutree(clusterward, k = 3) 

# Turn cut points into a factor variable and label them
# Insert cut point object ("cl3") and create labels 
# by combining the text "Type" with either 1, 2, or 3
cl3fac <- factor(cl3, labels = paste("Type", 1:3)) 

# Plot the sequences for each cluster
seqplot(turn_seq,              # Sequence object
        group = cl3fac,        # Grouping factor level variable
        type = "I",            # Create whole sequence plot
        sortv = "from.start",  # Sort sequences based upon the category in which they begin
        with.legend = "right", # Display legend on right side of plot
        cex.legend = 0.8,      # Change size of legend
        border = NA)           # No plot border

Notable interpretation of the clusters include (1) the listeners in the “Type 1” dyads use more elaboration turns, which fits with the listener-focused elaboration conversational motif, (2) the disclosers in the “Type 2” dyads spent most of their time elaborating on their problem, which fits with the discloser problem description conversational motif, and (3) the disclosers in the “Type 3” dyad use elaboration and hedged discloser turns, which fits with the discloser problem processing conversational motif. These plots can sometimes be difficult to distinguish, so further descriptives (e.g., percent of each turn type that comprise each cluster) can be helpful.

In the next (and final) step, we will examine the associations of the timing and prevalence of conversational motifs with the discloser’s emotional improvement following the conversation.

Examine Associations of Prevalence and Timing of Conversational Motifs with Outcomes.

The goal of the final step is to examine whether the prevalence and timing of conversational motifs are associated with the discloser’s post-conversation reports of emotional improvement.

This step involves (1) data management substeps that add the conversational motif cluster information into the sequence data and calculate the proportion of each conversational motif for each third of the conversation, and (2) fit regression models that examine the association between the timing and prevalence of each conversational motif and the discloser’s emotional improvement.

Data Management.

We first add cluster information back into data set.

# Add grouping variable to data set
window_data$cluster <- cl3

We next create a variable that counts window number, which will make sure the conversational motifs stay within the temporal order they occurred for later analyses.

window_data <- # Select data
               window_data %>%
               # Select grouping variable, in this case, dyad ID (id)
               group_by(id) %>%
               # Create new variable "windownumber" that counts from 1 to n (the last row)
               mutate(windownumber = 1:n()) %>%
               # Save the data as a data.frame
               as.data.frame()

# View the first 10 rows of the data
head(window_data, 10)
##    id                      turn1                    turn2
## 1   1  Discloser_Acknowledgement Listener_Acknowledgement
## 2   1           Discloser_Advice     Listener_Elaboration
## 3   1      Discloser_Elaboration     Listener_Elaboration
## 4   1      Discloser_Elaboration Listener_Acknowledgement
## 5   1      Discloser_Elaboration Listener_Acknowledgement
## 6   1      Discloser_Elaboration Listener_Acknowledgement
## 7   1 Discloser_HedgedDisclosure Listener_Acknowledgement
## 8   1      Discloser_Elaboration Listener_Acknowledgement
## 9   1      Discloser_Elaboration Listener_Acknowledgement
## 10  1      Discloser_Elaboration Listener_Acknowledgement
##                         turn3                    turn4
## 1            Discloser_Advice     Listener_Elaboration
## 2       Discloser_Elaboration     Listener_Elaboration
## 3       Discloser_Elaboration Listener_Acknowledgement
## 4       Discloser_Elaboration Listener_Acknowledgement
## 5       Discloser_Elaboration Listener_Acknowledgement
## 6  Discloser_HedgedDisclosure Listener_Acknowledgement
## 7       Discloser_Elaboration Listener_Acknowledgement
## 8       Discloser_Elaboration Listener_Acknowledgement
## 9       Discloser_Elaboration Listener_Acknowledgement
## 10 Discloser_HedgedDisclosure Listener_Acknowledgement
##                         turn5 cluster windownumber
## 1       Discloser_Elaboration       1            1
## 2       Discloser_Elaboration       1            2
## 3       Discloser_Elaboration       2            3
## 4       Discloser_Elaboration       3            4
## 5  Discloser_HedgedDisclosure       3            5
## 6       Discloser_Elaboration       3            6
## 7       Discloser_Elaboration       3            7
## 8       Discloser_Elaboration       3            8
## 9  Discloser_HedgedDisclosure       3            9
## 10 Discloser_HedgedDisclosure       3           10

We then need to calculate the time each conversational motif occurred within a conversation. We do so by calculating time as the proportion of the conversation at which that motif occurred, with 0 representing the beginning of the conversation and 1 representing the end of the conversation.

We first create a new data set that calculates the time in proportion for each conversational motif.

window_data2 <- # Select data
                window_data %>%
                # Select variables of interest: dyad ID, conversational motif cluster, window number
                select(id, cluster, windownumber) %>%
                # Select grouping variable, in this case, dyad ID (id)
                group_by(id) %>%
                # Create new variables
                # The "max_turn" variable determines how many motif windows are in the conversation
                # The "time_prop" variable is calculated by dividing the window number by the
                # total number of motif windows in the conversation
                # The "time_prop" value is then rounded to 3 decimal places
                mutate(max_turn = max(windownumber),
                       time_prop = windownumber/max_turn,
                       time_prop = round(time_prop, digits = 3)) %>%
                # Save the data as a data.frame
                as.data.frame()

# View the first 10 rows of the data
head(window_data2, 10)
##    id cluster windownumber max_turn time_prop
## 1   1       1            1       37     0.027
## 2   1       1            2       37     0.054
## 3   1       2            3       37     0.081
## 4   1       3            4       37     0.108
## 5   1       3            5       37     0.135
## 6   1       3            6       37     0.162
## 7   1       3            7       37     0.189
## 8   1       3            8       37     0.216
## 9   1       3            9       37     0.243
## 10  1       3           10       37     0.270

We then create a data set that contains rows that represent 1000 time points in the conversation, since we’ve rounded proportion of time to the third decimal place.

# Create vector of id values 
window_data_id <- unique(window_data2$id)

# Determine the number of unique IDs
length(window_data_id) # 53 dyads - which matches what we determined earlier in the tutorial
## [1] 53
# Create sequence that contains values between 0 and 1 to the .001 decimal place
turns_seq <- seq(from = 0, to = 1, length = 1000)

# Create data set with two variables:
# id variable that is repeated 1000 times for each id
# time_prop variable that repeats the "turns_seq" sequence for each id (i.e., repeat it 53 times)
turns_seq_data <- data.frame(id = rep(window_data_id, each = 1000), 
                             time_prop = rep(turns_seq, 53))

# Round value in "time_prop" to 3 decimal places
turns_seq_data$time_prop <- round(turns_seq_data$time_prop, digits = 3)

# View the first 10 rows of the data
head(turns_seq_data, 10)
##    id time_prop
## 1   1     0.000
## 2   1     0.001
## 3   1     0.002
## 4   1     0.003
## 5   1     0.004
## 6   1     0.005
## 7   1     0.006
## 8   1     0.007
## 9   1     0.008
## 10  1     0.009

Merge the conversational motif data (“window_data2”) with the 1000-step time data (“turns_seq_data”).

window_data3 <- merge(turns_seq_data, # Select 1000-step time data
                      window_data2,   # Select conversational motif data
                      by = c("id", "time_prop"), # Merge on the ID and time variables
                      all.x = TRUE)              # Keep all rows across data sets

# View the first 10 rows of the data
head(window_data3, 10)
##    id time_prop cluster windownumber max_turn
## 1   1     0.000      NA           NA       NA
## 2   1     0.001      NA           NA       NA
## 3   1     0.002      NA           NA       NA
## 4   1     0.003      NA           NA       NA
## 5   1     0.004      NA           NA       NA
## 6   1     0.005      NA           NA       NA
## 7   1     0.006      NA           NA       NA
## 8   1     0.007      NA           NA       NA
## 9   1     0.008      NA           NA       NA
## 10  1     0.009      NA           NA       NA

Finally, we need to fill in the conversational cluster information in the “cluster” column.

# Fill in NA values for cluster 
window_data3 <- # Select data
                window_data3 %>% 
                # Select grouping variable, in this case, dyad ID (id)
                group_by(id) %>% 
                # Fill NAs in cluster column, with the NAs prior to a value 
                # being filled in with that value
                fill(cluster, .direction = "up") %>%
                # Save the data as a data.frame
                as.data.frame()

# View the first 10 rows of the data
head(window_data3, 10)
##    id time_prop cluster windownumber max_turn
## 1   1     0.000       1           NA       NA
## 2   1     0.001       1           NA       NA
## 3   1     0.002       1           NA       NA
## 4   1     0.003       1           NA       NA
## 5   1     0.004       1           NA       NA
## 6   1     0.005       1           NA       NA
## 7   1     0.006       1           NA       NA
## 8   1     0.007       1           NA       NA
## 9   1     0.008       1           NA       NA
## 10  1     0.009       1           NA       NA

Now that we have a more precise measure of time in our data, we next need to divide the conversations into phases. Here, we divide the conversations into three phases (first, middle, and final) and add a variable indicating the phase of the conversation to the data.

# Create labels that will be repeated for each ID
# Since we are dividing the conversations (as measured in 1000 steps) into thirds,
# we repeat each value either 334 or 33 times
convo_part <- c(rep("first", 334), rep("middle", 333), rep("final", 333))

# Add variable to data set that contains phase information for each dyad
window_data3$convo_part <- rep(convo_part, 53)

# View the first 10 rows of the data
head(window_data3, 10)
##    id time_prop cluster windownumber max_turn convo_part
## 1   1     0.000       1           NA       NA      first
## 2   1     0.001       1           NA       NA      first
## 3   1     0.002       1           NA       NA      first
## 4   1     0.003       1           NA       NA      first
## 5   1     0.004       1           NA       NA      first
## 6   1     0.005       1           NA       NA      first
## 7   1     0.006       1           NA       NA      first
## 8   1     0.007       1           NA       NA      first
## 9   1     0.008       1           NA       NA      first
## 10  1     0.009       1           NA       NA      first

As our last data preparation step, we calculate the proportion of each conversational motif type for each phase of the conversation.

window_data_prop <- # Select data
                    window_data3 %>%
                    # Select grouping variable, in this case, dyad ID (id), phase of the conversation,
                    # and then motif type
                    group_by(id, convo_part, cluster) %>%
                    # Calculate the percentage of each motif type
                    # Divide by 333 because that is (approximately) the total number of instances
                    # in each phase
                    summarise(percentage = n()/333) %>%
                    # Create new variable that connects the cluster value 
                    # with the phase of the conversation
                    mutate(cluster_time = paste0(cluster, "_", convo_part)) %>% 
                    # Save the data as a data.frame
                    as.data.frame()
## `summarise()` has grouped output by 'id', 'convo_part'. You can override using
## the `.groups` argument.
# View the first 10 rows of the data
head(window_data_prop, 10)
##    id convo_part cluster percentage cluster_time
## 1   1      final       1 0.24324324      1_final
## 2   1      final       2 0.35135135      2_final
## 3   1      final       3 0.40540541      3_final
## 4   1      first       1 0.16516517      1_first
## 5   1      first       2 0.08108108      2_first
## 6   1      first       3 0.75675676      3_first
## 7   1     middle       2 0.13513514     2_middle
## 8   1     middle       3 0.86486486     3_middle
## 9  10      final       1 0.15915916      1_final
## 10 10      final       2 0.60060060      2_final

We then reshape the data from long to wide so each column of the data represents the proportion of each turn type at each phase of the conversation.

window_data_third_prop <- reshape(# Select variables of interest
                                  data = window_data_prop[, c("id", "percentage", "cluster_time")], 
                                  # Select variable that will now represent the columns
                                  timevar = c("cluster_time"),        
                                  # Select the ID variable
                                  idvar = c("id"),   
                                  # Select the variable that will be the values in the columns
                                  v.names = "percentage", 
                                  # Reshape from long to wide
                                  direction = "wide", 
                                  # Separate words in column names with _
                                  sep = "_")

# Replace NAs with 0s for proportion variables
window_data_third_prop[, 2:10][is.na(window_data_third_prop[, 2:10])] <- 0

# View the first 10 rows of the data
head(window_data_third_prop, 10)
##    id percentage_1_final percentage_2_final percentage_3_final
## 1   1         0.24324324          0.3513514          0.4054054
## 9  10         0.15915916          0.6006006          0.2402402
## 17 11         0.00000000          0.6396396          0.3603604
## 24 12         0.06906907          0.4444444          0.4864865
## 32 13         0.00000000          0.8888889          0.1111111
## 40 14         0.42942943          0.1231231          0.4474474
## 49 15         0.00000000          0.6006006          0.3993994
## 56 16         0.06906907          0.7927928          0.1381381
## 65 17         0.20720721          0.5165165          0.2762763
## 74 18         0.09909910          0.3003003          0.6006006
##    percentage_1_first percentage_2_first percentage_3_first percentage_2_middle
## 1          0.16516517         0.08108108          0.7567568           0.1351351
## 9          0.00000000         0.75975976          0.2432432           0.4384384
## 17         0.06306306         0.39939940          0.5405405           0.6426426
## 24         0.14114114         0.55855856          0.3033033           0.6066066
## 32         0.11111111         0.11111111          0.7807808           0.3333333
## 40         0.42642643         0.20720721          0.3693694           0.3423423
## 49         0.00000000         0.30330330          0.6996997           0.4954955
## 56         0.06906907         0.52252252          0.4114114           0.7297297
## 65         0.06906907         0.62462462          0.3093093           0.3093093
## 74         0.00000000         0.70270270          0.3003003           0.3513514
##    percentage_3_middle percentage_1_middle
## 1            0.8648649          0.00000000
## 9            0.2402402          0.32132132
## 17           0.3573574          0.00000000
## 24           0.3933934          0.00000000
## 32           0.3333333          0.33333333
## 40           0.3513514          0.30630631
## 49           0.3063063          0.19819820
## 56           0.2042042          0.06606607
## 65           0.3453453          0.34534535
## 74           0.4474474          0.20120120

Merge the outcome variable of interest to proportion data.

window_data_third_prop <- merge(window_data_third_prop, outcomes, by = "id")

# View the first 10 rows of the data
head(window_data_third_prop, 10)
##    id percentage_1_final percentage_2_final percentage_3_final
## 1   1         0.24324324          0.3513514          0.4054054
## 2  10         0.15915916          0.6006006          0.2402402
## 3  11         0.00000000          0.6396396          0.3603604
## 4  12         0.06906907          0.4444444          0.4864865
## 5  13         0.00000000          0.8888889          0.1111111
## 6  14         0.42942943          0.1231231          0.4474474
## 7  15         0.00000000          0.6006006          0.3993994
## 8  16         0.06906907          0.7927928          0.1381381
## 9  17         0.20720721          0.5165165          0.2762763
## 10 18         0.09909910          0.3003003          0.6006006
##    percentage_1_first percentage_2_first percentage_3_first percentage_2_middle
## 1          0.16516517         0.08108108          0.7567568           0.1351351
## 2          0.00000000         0.75975976          0.2432432           0.4384384
## 3          0.06306306         0.39939940          0.5405405           0.6426426
## 4          0.14114114         0.55855856          0.3033033           0.6066066
## 5          0.11111111         0.11111111          0.7807808           0.3333333
## 6          0.42642643         0.20720721          0.3693694           0.3423423
## 7          0.00000000         0.30330330          0.6996997           0.4954955
## 8          0.06906907         0.52252252          0.4114114           0.7297297
## 9          0.06906907         0.62462462          0.3093093           0.3093093
## 10         0.00000000         0.70270270          0.3003003           0.3513514
##    percentage_3_middle percentage_1_middle emo_improve
## 1            0.8648649          0.00000000    6.000000
## 2            0.2402402          0.32132132    7.000000
## 3            0.3573574          0.00000000    3.000000
## 4            0.3933934          0.00000000    4.000000
## 5            0.3333333          0.33333333    5.000000
## 6            0.3513514          0.30630631    5.333333
## 7            0.3063063          0.19819820    6.666667
## 8            0.2042042          0.06606607    7.000000
## 9            0.3453453          0.34534535    4.000000
## 10           0.4474474          0.20120120    5.000000

The data are finally ready for regression analyses!

Data Analysis.

We examine the association of the timing and prevalence of each conversational motif on the discloser’s emotional improvement. We examine each type of conversational motif separately.

Type 1: Listener-focused elaboration.

type1_reg <- lm(emo_improve ~ percentage_1_first + percentage_1_middle + 
                              percentage_1_final, 
                data = window_data_third_prop)
summary(type1_reg)
## 
## Call:
## lm(formula = emo_improve ~ percentage_1_first + percentage_1_middle + 
##     percentage_1_final, data = window_data_third_prop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3880 -0.7447  0.0100  0.9627  2.0165 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.99002    0.28341  17.607   <2e-16 ***
## percentage_1_first   0.23645    1.44401   0.164    0.871    
## percentage_1_middle -0.02926    1.17913  -0.025    0.980    
## percentage_1_final   2.20076    1.37425   1.601    0.116    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.353 on 49 degrees of freedom
## Multiple R-squared:  0.06725,    Adjusted R-squared:  0.01014 
## F-statistic: 1.178 on 3 and 49 DF,  p-value: 0.3279

The timing and prevalence of the listener-focused dialogue motif were not associated with the discloser’s emotional improvement following the supportive conversation.

Type 2: Discloser problem description.

type2_reg <- lm(emo_improve ~ percentage_2_first + percentage_2_middle + 
                              percentage_2_final, 
                data = window_data_third_prop)
summary(type2_reg)
## 
## Call:
## lm(formula = emo_improve ~ percentage_2_first + percentage_2_middle + 
##     percentage_2_final, data = window_data_third_prop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5974 -0.7426 -0.0661  1.2742  2.1831 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          5.74145    0.46740  12.284   <2e-16 ***
## percentage_2_first  -0.85943    1.00624  -0.854    0.397    
## percentage_2_middle -0.26380    0.99979  -0.264    0.793    
## percentage_2_final   0.07066    0.96776   0.073    0.942    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.38 on 49 degrees of freedom
## Multiple R-squared:  0.02945,    Adjusted R-squared:  -0.02997 
## F-statistic: 0.4957 on 3 and 49 DF,  p-value: 0.687

The timing and prevalence of the discloser problem description motif were not associated with the discloser’s emotional improvement following the supportive conversation.

Type 3: Discloser problem processing.

type3_reg <- lm(emo_improve ~ percentage_3_first + percentage_3_middle + 
                              percentage_3_final, 
                data = window_data_third_prop)
summary(type3_reg)
## 
## Call:
## lm(formula = emo_improve ~ percentage_3_first + percentage_3_middle + 
##     percentage_3_final, data = window_data_third_prop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4991 -0.8959  0.0432  1.2095  2.1759 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           5.2058     0.4872  10.684 2.14e-14 ***
## percentage_3_first    0.8396     0.9865   0.851    0.399    
## percentage_3_middle   0.7607     1.1117   0.684    0.497    
## percentage_3_final   -1.4248     1.0773  -1.323    0.192    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.37 on 49 degrees of freedom
## Multiple R-squared:  0.04418,    Adjusted R-squared:  -0.01434 
## F-statistic: 0.7549 on 3 and 49 DF,  p-value: 0.5249

The timing and prevalence of the discloser problem processing motif were not associated with the discloser’s emotional improvement following the supportive conversation.

In sum, in this example, we found that the prevalence and timing of the different conversational motifs were not associated with the discloser’s emotional improvement.

Conclusion.

In this tutorial, we demonstrated the steps to identify five-turn sequences - conversational motifs - in supportive conversations using sequence analysis. Furthermore, we examined whether the timing and prevalence of those conversational motifs were related to emotional improvement following the supportive conversation.

We are excited about the potential for sequence analysis to contribute to the study of interpersonal dynamics across a variety of relationship types and interaction episodes.


Additional Information

We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful.

session_info(pkgs = c("attached"))
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       macOS Big Sur/Monterey 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2022-08-20
##  pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package        * version date (UTC) lib source
##  cluster        * 2.1.3   2022-03-28 [1] CRAN (R 4.2.0)
##  devtools       * 2.4.3   2021-11-30 [1] CRAN (R 4.2.0)
##  dplyr          * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
##  ggplot2        * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
##  psych          * 2.2.5   2022-05-10 [1] CRAN (R 4.2.0)
##  reshape        * 0.8.9   2022-04-12 [1] CRAN (R 4.2.0)
##  stringr        * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
##  tidyr          * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)
##  TraMineR       * 2.2-4   2022-06-09 [1] CRAN (R 4.2.0)
##  TraMineRextras * 0.6.4   2022-06-13 [1] CRAN (R 4.2.0)
##  usethis        * 2.1.6   2022-05-25 [1] CRAN (R 4.2.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────