Grid-sequence Analysis Tutorial

Overview

Grid-sequence analysis is a descriptive technique to capture within-dyad dynamics and allow for between-dyad comparisons. This analytic technique draws from state space grids, typically used in developmental psychology (Hollenstein, 2013) and sequence analysis, previously used in sociology and biology (Macindoe & Abbott, 2004).

Grid-sequence analysis combines state space grids and sequence analysis by:

  1. Tracking a dyad’s movements across a grid.
  2. Converting dyadic movements into a univariate sequence.
  3. Clustering dyads with similar movements using sequence analysis.

In this example, we will examine data simulated from couples’ self-reported happiness collected six times a day for seven days using ecological momentary assessment. In this case, we have simulated 98 couples’ data with 42 time points for each member of the couple. We are interested in whether couples’ day-to-day emotion dynamics (specifically, happiness) are related to either couple members’ relationship satisfaction or health (which have also been simulated).

Although the current example highlights how grid-sequence analysis can identify differences in variability across dyads, one particular advantage of grid-sequence analysis is identifying how use of the state space grid can change over time (e.g., certain states or patterns appearing early in the sequence and other states or patterns appearing later in the sequence).

In addition, the accompanying “GridSequenceAnalysis_Tutorial_2022August20.rmd” file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R).

To read more about grid-sequence analysis see:

Brinberg, M., Fosco, G.M., & Ram, N. (2017). Examining inter-family differences in intra-family (parent-adolescent) dynamics using grid-sequence analysis. Journal of Family Psychology, 31(8), 994-1004. doi: 10.1037/fam0000371

Brinberg, M., Ram, N., Hülür, G., Brick, T.R., & Gerstorf, D. (2018). Analyzing dyadic data using grid-sequence analysis: Inter-dyad differences in intra-dyad dynamics. Journals of Gerontology: Psychological Sciences, 73(1), 5-18. doi: 10.1093/geronb/gbw160

Outline

In this tutorial, we’ll cover…

  • Reading in the data and loading needed packages.
  • Creating state space grids for each dyad.
  • Creating sequences.
  • Establishing a cost matrix and conducting sequence analysis.
  • Determining the number of clusters.
  • Examining group differences among clusters.
  • Cautions.
  • Conclusions.

Read in the data and load needed packages.

Let’s read the data into R.

We are working with two data sets in this tutorial. One data set contains repeated measures (“gridsequence_simulation_data”), specifically the simulated reports of happiness for each dyad member. The other data set contains person-level data (“gridsequence_simulation_persons”), specifically the simulated reports of relationship satisfaction and health for each dyad member.

Both data sets are stored as .csv files (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer’s desktop.

# Set working directory (i.e., where your data file is stored)
# This can be done by going to the top bar of RStudio and selecting 
# "Session" --> "Set Working Directory" --> "Choose Directory" --> 
# finding the location of your file
setwd("~/Desktop") # Note: You can skip this line if you have 
#the data files and this .rmd file stored in the same directory

# Read in the repeated measures data
data <- read.csv(file = "gridsequence_simulation_data.csv", head = TRUE, sep = ",")

# View the first 10 rows of the repeated measures data
head(data, 10)
##    id time  outcome1  outcome2
## 1   1    1  88.27175 100.00000
## 2   1    2  93.70530  88.49227
## 3   1    3  97.49972  91.96082
## 4   1    4 100.00000 100.00000
## 5   1    5  96.65027  84.73661
## 6   1    6  96.44185 100.00000
## 7   1    7  96.67286 100.00000
## 8   1    8  94.50198 100.00000
## 9   1    9  97.27314 100.00000
## 10  1   10  93.99552  87.85946
# Read in the person-level data
persons <- read.csv(file = "gridsequence_simulation_persons.csv", head = TRUE, sep = ",")

# View the first 10 rows of the persons data
head(persons, 10)
##    id member  rel_sat   health
## 1   1      1 1.388314 2.747397
## 2   1      2 1.895276 2.032986
## 3   2      1 1.074569 1.548543
## 4   2      2 1.022392 1.843375
## 5   3      1 1.052483 1.883714
## 6   3      2 2.006875 3.357317
## 7   4      1 1.000000 1.699980
## 8   4      2 1.000000 2.927734
## 9   5      1 1.575167 2.004248
## 10  5      2 1.000000 1.242259

In the repeated measures data (“data”), we can see each row contains information for one time point and there are multiple time points for each dyad member within each dyad. In this data set, there are columns for:

  • Dyad ID (id)
  • Time variable - in this case, a momentary assessment (time)
  • Repeated measures variable for one dyad member - in this case, simulated self-reports of happiness from dyad member 1 (outcome1)
  • Repeated measures variable for the other dyad member - in this case, simulated self-reports of happiness from dyad member 2 (outcome2)

In the person-level data (“persons”), we can see there is one row for each dyad member and there are columns for:

  • Dyad ID (id)
  • Dyad member (member)
  • Person-level variables - in this case, simulated dyad members’ relationship satisfaction (rel_sat) and health status (health)

Load the R packages we need.

Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model.

# install.packages("cluster")         # Install package if you have never used it before
library(cluster)                      # For cluster analysis

# install.packages("devtools")        # Install package if you have never used it before
require(devtools)                     # For version control

# install.packages("ggplot2")         # Install package if you have never used it before
library(ggplot2)                      # For plotting

# install.packages("reshape")         # Install package if you have never used it before
library(reshape)                      # For reshaping the data (long to wide)

# install.packages("TraMineR")        # Install package if you have never used it before
library(TraMineR)                     # For sequence analysis

# install.packages("TraMineRextras")  # Install package if you have never used it before
library(TraMineRextras)               # For sequence analysis

Create State Space Grids for Each Dyad.

The first step in grid-sequence analysis is to establish the state space grid that will collapse the dyads’ bivariate time series into a univariate categorical time series.

We create the state space grid by setting up cut points. In this example, we create a 4 x 4 (16 cell) grid, and use three cut points (at intervals of 25 because our measure of happiness is on a 0-100 scale). Our cut points are only for the sake of demonstration, but these divisions can vary depending on your research question or underlying theory.

# Create cut points by creating new variables within the repeated measures data set ("data")
data$cut1 <- 25
data$cut2 <- 50
data$cut3 <- 75

Next, we use these cut points to label the cells on the grid with letters of the English alphabet and create a new variable that contains these letters (labeled as “happy” in our data set “data”). The lettering of the cells is a simple way to create categories that represent information from both dyad members in a univariate space. This new variable will be used in the subsequent sequence analysis.

To label the state space grid, we set the x-axis to be dyad member 1 (who is associated with “outcome1”) and the y-axis to be dyad member 2 (who is associated with “outcome2”). In this case, we label the upper-left hand corner with “A” and the lower-right hand corner with “P,” filling in alphabetically along the way.

# Create new variable "happy" that will contain cell labels
data$happy <- NA

# Letter each cell of the grid

# Set the value of "happy" to "A" when the value of "outcome1" is less than cut-point 1 and when the value of "outcome2" is greater than cut-point 3
# Similar logic follows for the labeling of all the other cells below
data$happy[data$outcome1 <= data$cut1 & 
           data$outcome2 > data$cut3] <- "A"

data$happy[data$outcome1 > data$cut1 & 
           data$outcome1 <= data$cut2 & 
           data$outcome2 > data$cut3] <- "B"

data$happy[data$outcome1 > data$cut2 & 
           data$outcome1 <= data$cut3 & 
           data$outcome2 > data$cut3] <- "C"

data$happy[data$outcome1 > data$cut3 & 
           data$outcome2 > data$cut3] <- "D" 

data$happy[data$outcome1 <= data$cut1 & 
           data$outcome2 > data$cut2 & 
           data$outcome2 <= data$cut3] <- "E"

data$happy[data$outcome1 > data$cut1 & 
           data$outcome1 <= data$cut2 & 
           data$outcome2 > data$cut2 & 
           data$outcome2 <= data$cut3] <- "F"

data$happy[data$outcome1 > data$cut2 & 
           data$outcome1 <= data$cut3 & 
           data$outcome2 > data$cut2 & 
           data$outcome2 <= data$cut3] <- "G"

data$happy[data$outcome1 > data$cut3 & 
           data$outcome2 > data$cut2 & 
           data$outcome2 <= data$cut3] <- "H"

data$happy[data$outcome1 <= data$cut1 & 
           data$outcome2 > data$cut1 & 
           data$outcome2 <= data$cut2] <- "I"

data$happy[data$outcome1 > data$cut1 & 
           data$outcome1 <= data$cut2 & 
           data$outcome2 > data$cut1 & 
           data$outcome2 <= data$cut2] <- "J"

data$happy[data$outcome1 > data$cut2 & 
           data$outcome1 <= data$cut3 & 
           data$outcome2 > data$cut1 & 
           data$outcome2 <= data$cut2] <- "K"

data$happy[data$outcome1 > data$cut3 & 
           data$outcome2 > data$cut1 & 
           data$outcome2 <= data$cut2] <- "L"

data$happy[data$outcome1 <= data$cut1 & 
           data$outcome2 <= data$cut1] <- "M"

data$happy[data$outcome1 > data$cut1 & 
           data$outcome1 <= data$cut2 & 
           data$outcome2 <= data$cut1] <- "N"

data$happy[data$outcome1 > data$cut2 & 
           data$outcome1 <= data$cut3 & 
           data$outcome2 <= data$cut1] <- "O"

data$happy[data$outcome1 > data$cut3 & 
           data$outcome2 <= data$cut1] <- "P"

# View the first 10 rows of the data
head(data, 10)
##    id time  outcome1  outcome2 cut1 cut2 cut3 happy
## 1   1    1  88.27175 100.00000   25   50   75     D
## 2   1    2  93.70530  88.49227   25   50   75     D
## 3   1    3  97.49972  91.96082   25   50   75     D
## 4   1    4 100.00000 100.00000   25   50   75     D
## 5   1    5  96.65027  84.73661   25   50   75     D
## 6   1    6  96.44185 100.00000   25   50   75     D
## 7   1    7  96.67286 100.00000   25   50   75     D
## 8   1    8  94.50198 100.00000   25   50   75     D
## 9   1    9  97.27314 100.00000   25   50   75     D
## 10  1   10  93.99552  87.85946   25   50   75     D

It is useful to plot each dyad’s movements across the state space grid to obtain a greater understanding of the data. Here are plots of two different dyads (one with low variability and one with high variability).

Example Dyad 1: Low variability.

Example Dyad 2: High variability.

We will create a loop that plots each dyad in the data set and saves these plots to a PDF.

First, identify the location where you would like to save the PDF and save this location as the object “dir”.

# This can be done by going to the top bar of RStudio and selecting 
# "Session" --> "Set Working Directory" --> "Choose Directory" --> 
# finding the location of when you want your file
dir <- setwd("~/Desktop")

Second, create a vector of all the IDs in the data set.

# Create vector 
idlist <- unique(data$id) 

# View contents of vector
idlist
##  [1]   1   2   3   4   5   6   7   8   9  11  12  13  14  15  16  17  18  19  20
## [20]  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39
## [39]  40  41  42  43  44  45  46  47  49  50  51  52  53  54  55  56  57  58  59
## [58]  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78
## [77]  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97
## [96]  98  99 100

Finally, create and run the loop for the plots.

# Open the pdf file
pdf('State Space Grid Simulation.pdf', width = 10, height = 7)

for(x in 1:length(idlist)) #looping through plots 
  { 
    
    # Select participant ID from the vector of IDs
    subject_id <- idlist[x]
    
    # Subset data for selected participant ID
    data_sub <- subset(data, id == subject_id)
  
    # Create object with participant ID
    name <- as.character(data_sub$id[1])
    
    # Remove the missing data
    # So only points will be plotted when there is information
    # from both dyad members
    data_sub_noNA <- na.omit(data_sub)
  
    if(nrow(data_sub_noNA) == 0){}
  
    if(nrow(data_sub_noNA) > 0){

  
  grid_plot <-

  ggplot(# Select data and set variables of x- and y- axes
         data = data_sub_noNA, aes(x = outcome1, y = outcome2)) +
    
         # Set x-axis limits
         # Give a little extra room (the scale is 0-100) so points on the 
         # edge of the scale will show up easily
         xlim(-5, 105) + 

         # Set y-axis limits
         # Give a little extra room (the scale is 0-100) so points on the 
         # edge of the scale will show up easily
         ylim(-5, 105) + 
  
         # Label y-axis
         ylab('Happiness Partner 2') + 
  
         # Label x-axis
         xlab('Happiness Partner 1') +
  
         # Create each cell of the state space grid by setting its width (xmin and xmax), 
         # height (ymin and ymax), and color (fill using hex code: https://www.color-hex.com/)
         # Set transparency of cell with the "alpha" command
         geom_rect(xmin = 0, xmax = cut1, ymin = 0, ymax = cut1, fill = "#000000", alpha = .05) + 
         geom_rect(xmin = 0, xmax = cut1, ymin = cut1, ymax = cut2, fill ="#000054", alpha = .05) +
         geom_rect(xmin = 0, xmax=cut1, ymin = cut2, ymax = cut3, fill = "#0000A8", alpha = .05) +
         geom_rect(xmin = 0, xmax=cut1, ymin = cut3, ymax = 100, fill = "#0000FC", alpha = .05) +
         geom_rect(xmin = cut1, xmax = cut2, ymin = 0, ymax = cut1, fill = "#540000", alpha = .05) +
         geom_rect(xmin = cut1, xmax = cut2, ymin = cut1, ymax = cut2, fill = "#540054", alpha = .05) +
         geom_rect(xmin = cut1, xmax = cut2, ymin = cut2, ymax = cut3, fill = "#5400A8", alpha = .05) +             geom_rect(xmin = cut1, xmax = cut2, ymin = cut3, ymax = 100, fill = "#5400FC", alpha = .05) +
         geom_rect(xmin = cut2, xmax = cut3, ymin = 0, ymax = cut1, fill = "#A80000", alpha = .05) +
         geom_rect(xmin = cut2, xmax = cut3, ymin = cut1, ymax = cut2, fill = "#A80054", alpha = .05) +
         geom_rect(xmin = cut2, xmax = cut3, ymin = cut2, ymax = cut3, fill = "#A800A8", alpha = .05) +
         geom_rect(xmin = cut2, xmax = cut3, ymin = cut3, ymax = 100, fill = "#A800FC", alpha = .05) +
         geom_rect(xmin = cut3, xmax = 100, ymin = 0, ymax = cut1, fill = "#FC0000", alpha = .05) +                 geom_rect(xmin = cut3, xmax = 100, ymin = cut1, ymax = cut2, fill = "#FC0054", alpha = .05) +
         geom_rect(xmin = cut3, xmax = 100, ymin = cut2, ymax = cut3, fill = "#FC00A8", alpha = .05) +
         geom_rect(xmin = cut3, xmax = 100, ymin = cut3, ymax = 100, fill = "#FC00FC", alpha = .05) +
  
         # Additional plot aesthetics
         theme(
                # Adjust text size and color on x-axis and y-axis 
                axis.text = element_text(size = 14, color = 'black'),
                
                # Adjust size of plot title
                axis.title = element_text(size = 18),
                
                # Adjust color and lines surrounding plot
                panel.grid.major = element_line(colour = "white"),
                panel.grid.minor = element_blank(),
                panel.background = element_blank(),
                axis.ticks = element_blank()) + 
        
          # Vertical lines to divide grid
          geom_vline(xintercept = c(0,25,50,75,100)) + 
        
          # Horizontal lines to divide grid
          geom_hline(yintercept = c(0,25,50,75,100)) + 
        
          # Create point that represents the x-y intersection point of the dyads' variables
          geom_point(colour = "white") + 
        
          # Create line that connects the consecutive points in time
          geom_path(colour = "white")  + 
        
          # Create title for plot by combining "Dyad = " with the name object created above
          ggtitle(paste("Dyad =", name)) 
  
  print(grid_plot)
}

dev.off()

Examination of dyads’ state space grids can provide valuable insights into within-dyad dynamics and how these dynamics differ from dyad-to-dyad.

Create Sequences.

In this step, we:

  1. reformat the repeated measures data (“data”) from long to wide,
  2. create an “alphabet” that represents each of our categories, and
  3. create and plot the categorical sequence data.
  1. Reformat the data from long to wide.

Our repeated measures data (“data”) are currently in a “long” format, which means that each dyad has multiple rows with each assessment having its own row. For sequence analysis, we need our repeated measures data to be in a “wide” format, which means that each dyad should only have one row and each assessment of the study should have its own column.

# Reformat repeated measures data ("data") from long to wide
data_wide <- reshape(
                     # Select data and specific columns that should be kept 
                     # (dyad ID: id, time variable: time, 
                     # categorical variable: happy)
                     data = data[, c("id", "time", "happy")],
                     # Identify time variable ("time")
                     timevar = "time",           
                     # Identify dyad ID variable ("id")
                     idvar = "id",             
                     # Identify categorical variable ("happy")
                     v.names = "happy",             
                     # Note direction of data reformat
                     direction = "wide",      
                     # No spaces in new column names
                     sep = "")                    

# View the first 10 rows and first 10 columns of the wide data
head(data_wide[1:10, 1:10])
##     id happy1 happy2 happy3 happy4 happy5 happy6 happy7 happy8 happy9
## 1    1      D      D      D      D      D      D      D      D      D
## 43   2      H      D      D      D      D      H      D      D      D
## 85   3      O      H      H      G      D      L      G      L      L
## 127  4      I      E      J      N      F      M      K      N      N
## 169  5      D      C      D      D      C      D      C      C      C
## 211  6      D      H      H      H      D      D      H      H      H
  1. Create alphabet.

We create an alphabet that represents each possible category within the categorical variable of interest (in this case, “happy”). The actual naming of these values is not important, but the categories can be named in a way that facilitates interpretation.

# Create a vector that contains the categories that appear in the wide data set
gs_alphabet <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P")

# Create a vector that allows for more helpful labels if applicable (e.g., negative, neutral, and positive affect).
gs_labels <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P")
  1. Create sequences.

Now that the data are in the correct (i.e., wide) format and we have created an alphabet for our sequence analysis, we next need to create a sequence object that can be understood by the R package for sequence analysis (TraMineR).

Before creating the sequences, we first we assign colors to each of the categories, which will help us when viewing plots of the sequences. This step is not required since there is a default color palette, but this gives us control over what color is assigned to which category. We do this by assigning each category (i.e., turn type) a hex code (https://www.color-hex.com/). The categories should be written as they appear in the alphabet created above.

A note on accessibility: To make your plots accessible, you may consider adopting a colorblind-friendly palette. David Nichols’ website (https://davidmathlogic.com/colorblind/) provides a great explainer on this issue, as well as a color picking tool.

A <- "#0000FC"
B <- "#5400FC"
C <- "#A800FC"
D <- "#FC00FC"
E <- "#0000A8"
F <- "#5400A8"
G <- "#A800A8"
H <- "#FC00A8"
I <- "#000054"
J <- "#540054"
K <- "#A80054"
L <- "#FC0054"
M <- "#000000"
N <- "#540000"
O <- "#A80000"
P <- "#FC0000" 

Next, we create an object (“happy_seq”) that contains all of the sequences in the format needed for the sequence analysis package.

happy_seq <- seqdef(data_wide,                     # Select data   
                    var = 2:41,                    # Columns containing repeated measures data
                    alphabet = gs_alphabet,        # Alphabet  
                    labels = gs_labels,            # Labels
                    xtstep = 6,                    # Steps between tick marks
                    cpal=c(A, B, C, D, E, F, G, H, 
                           I, J, K, L, M, N, O, P))# Color palette
##  [>] 16 distinct states appear in the data:
##      1 = A
##      2 = B
##      3 = C
##      4 = D
##      5 = E
##      6 = F
##      7 = G
##      8 = H
##      9 = I
##      10 = J
##      11 = K
##      12 = L
##       ...
##  [>] state coding:
##        [alphabet]  [label]  [long label]
##      1  A           A        A
##      2  B           B        B
##      3  C           C        C
##      4  D           D        D
##      5  E           E        E
##      6  F           F        F
##      7  G           G        G
##      8  H           H        H
##      9  I           I        I
##      10  J           J        J
##      11  K           K        K
##      12  L           L        L
##       ... (16 states)
##  [>] 98 sequences in the data set
##  [>] min/max sequence length: 40/40

A lot of text will appear after the sequence object is created. This text tells you about the number of sequences (which should be equal to the number of dyads in the sample), the states (i.e., categories) that appear in the sequence object, and the alphabet and labels of the categories.

Finally, we can plot the sequences.

seqIplot(happy_seq,                   # Sequence object
         with.legend = "right",       # Display legend on right side of plot
         cex.legend = 0.8,            # Change size of legend
         main = "Dyad Happiness")     # Plot title

To read this plot, each row represents one dyad’s state space grid location throughout the study and each color represents a different state space grid cell. We can see that the location and movement throughout the state space grid varied considerably from dyad to dyad.

Establish a Cost Matrix and Sequence Analysis.

There is one last step before conducting sequence analysis: establishing a cost matrix.

Sequence analysis aims to minimize the “cost” of transforming one sequence into another and relies on an optimal matching algorithm. There are costs for inserting, deleting, and substituting letters, as well as costs for missingness. The output of sequence analysis is a n x n (n = number of dyads) dissimilarity matrix with the cost of transforming one sequence into the corresponding sequence in each cell of the matrix.

The researcher establishes a cost matrix and often use standards, such as insertion/deletion costs of 1.0 and missingness costs of half the highest cost, within the matrix. There are a number of ways to determine substitution cost, typically, cost is established as the distance between cells. In this case, we use Manhattan (city-block) distance to determine substitution costs, but other methods (e.g., Euclidian distance) are also possible.

There may be instances when it does not make sense to order categories. For the sake of demonstration, we show how to obtain a constant cost matrix below.

Example constant cost matrix - i.e., the cost of substituting one category for another is the same regardless of the category (this may be a useful approach if working with categories that are truly categorical and not ordinal).

# Create substitution cost matrix and save to the object "costmatrix"
costmatrix <- seqsubm(happy_seq,            # Sequence object
                      method = "CONSTANT",  # Method to determine costs
                      cval = 2,             # Substitution cost
                      with.missing = TRUE,  # Allows for missingness state
                      miss.cost = 1,        # Cost for substituting a missing state
                      time.varying = FALSE, # Does not allow the cost to vary over time
                      weighted = TRUE)      # Allows weights to be used when applicable
##  [!!] seqcost: 'with.missing' set as FALSE as 'seqdata' has no non-void missing values
##  [>] creating 16x16 substitution-cost matrix using 2 as constant value
# Examine substitution cost matrix
costmatrix
##   A B C D E F G H I J K L M N O P
## A 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## B 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## C 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2
## D 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2
## E 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2
## F 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2
## G 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2
## H 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2
## I 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2
## J 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2
## K 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2
## L 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2
## M 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2
## N 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2
## O 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2
## P 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0

Here, we will create our own cost matrix since we believe there are differential costs for substituting categories within the sequences (e.g., a lower cost to substitute categories that are near each other on the state space grid versus a higher cost to substitute categories that are far from each other on the state space grid).

The cost matrix will be a (k+1) by (k+1) with k = number of cells in the grid. However, because we have no missing data, the cost matrix will be n by n. When there is missing data, an additional column and row would be needed such that the right-most column and the bottom row would represent missingness costs (half of the highest cost, which in this case is half of 6).

We used Manhattan (city-block) distance to calculate costs. We calculated Manhattan distance manually and then inserted the results into the matrix below. For example, the cost of staying in the same cell is zero, the cost of moving from “A” to “B” is one, and the cost of moving from “A” to “N” is four.

Create new cost matrix.

costmatrix2 <- matrix(c(0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 2.0, 3.0, 4.0, 5.0, 3.0, 4.0, 5.0, 6.0, 
                   1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0, 3.0, 2.0, 3.0, 4.0, 4.0, 3.0, 4.0, 5.0, 
                   2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 4.0, 3.0, 2.0, 3.0, 5.0, 4.0, 3.0, 4.0, 
                   3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 5.0, 4.0, 3.0, 2.0, 6.0, 5.0, 4.0, 3.0, 
                   1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 2.0, 3.0, 4.0, 5.0, 
                   2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0, 3.0, 2.0, 3.0, 4.0, 
                   3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 4.0, 3.0, 2.0, 3.0, 
                   4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 5.0, 4.0, 3.0, 2.0, 
                   2.0, 3.0, 4.0, 5.0, 1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 
                   3.0, 2.0, 3.0, 4.0, 2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0,  
                   4.0, 3.0, 2.0, 3.0, 3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 
                   5.0, 4.0, 3.0, 2.0, 4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 
                   3.0, 4.0, 5.0, 6.0, 2.0, 3.0, 4.0, 5.0, 1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 
                   4.0, 3.0, 4.0, 5.0, 3.0, 2.0, 3.0, 4.0, 2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 
                   5.0, 4.0, 3.0, 4.0, 4.0, 3.0, 2.0, 3.0, 3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 
                   6.0, 5.0, 4.0, 3.0, 5.0, 4.0, 3.0, 2.0, 4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0), 
                   nrow = 16, ncol = 16, byrow=TRUE)

# Fix the row and column names to nice labels from first cost matrix
rownames(costmatrix2) <- rownames(costmatrix)
colnames(costmatrix2) <- colnames(costmatrix)
costmatrix2
##   A B C D E F G H I J K L M N O P
## A 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6
## B 1 0 1 2 2 1 2 3 3 2 3 4 4 3 4 5
## C 2 1 0 1 3 2 1 2 4 3 2 3 5 4 3 4
## D 3 2 1 0 4 3 2 1 5 4 3 2 6 5 4 3
## E 1 2 3 4 0 1 2 3 1 2 3 4 2 3 4 5
## F 2 1 2 3 1 0 1 2 2 1 2 3 3 2 3 4
## G 3 2 1 2 2 1 0 1 3 2 1 2 4 3 2 3
## H 4 3 2 1 3 2 1 0 4 3 2 1 5 4 3 2
## I 2 3 4 5 1 2 3 4 0 1 2 3 1 2 3 4
## J 3 2 3 4 2 1 2 3 1 0 1 2 2 1 2 3
## K 4 3 2 3 3 2 1 2 2 1 0 1 3 2 1 2
## L 5 4 3 2 4 3 2 1 3 2 1 0 4 3 2 1
## M 3 4 5 6 2 3 4 5 1 2 3 4 0 1 2 3
## N 4 3 4 5 3 2 3 4 2 1 2 3 1 0 1 2
## O 5 4 3 4 4 3 2 3 3 2 1 2 2 1 0 1
## P 6 5 4 3 5 4 3 2 4 3 2 1 3 2 1 0

Now that we have created the substitution cost matrix, we can conduct our sequence analysis.

We use an optimal matching algorithm for sequence analysis. The output of the sequence analysis is a n x n (n = number of dyads) dissimilarity matrix with the cost of transforming one sequence into every other sequence in the corresponding cell of the matrix.

Note: Other algorithms are available, and they can be specified in the method = “” argument below. To see other algorithms available in the TraMineR package, type ?seqdist in the console or type seqdist in the search bar at the top of the Help tab on the right.

# Conduct sequence analysis 
dist_om <- seqdist(happy_seq,            # Sequence object
                   method = "OM",        # Optimal matching algorithm
                   indel = 1,            # Insert/deletion costs set to 1
                   sm = costmatrix2,     # Substitution cost matrix
                   with.missing = TRUE)
##  [!!] seqdist: 'with.missing' set as FALSE as 'seqdata' has no non-void missing values
##  [>] 98 sequences with 16 distinct states
##  [>] checking 'sm' (size and triangle inequality)
## Warning: At least one substitution cost greater than twice the indel cost. Such
## substitution costs will never be used.
##  [>] 98 distinct  sequences
##  [>] min/max sequence lengths: 40/40
##  [>] computing distances using the OM metric
##  [>] elapsed time: 0.064 secs
# Examine the top left corner of the dissimilarity matrix
dist_om[1:10, 1:10]
##      1 43 85 127 169 211 253 295 337 379
## 1    0 10 50  80  30  18  54  55  49  40
## 43  10  0 46  80  34  15  50  51  45  36
## 85  50 46  0  62  44  42  40  41  38  37
## 127 80 80 62   0  76  80  65  66  65  75
## 169 30 34 44  76   0  44  38  45  44  46
## 211 18 15 42  80  44   0  49  45  39  26
## 253 54 50 40  65  38  49   0  37  36  42
## 295 55 51 41  66  45  45  37   0  31  37
## 337 49 45 38  65  44  39  36  31   0  34
## 379 40 36 37  75  46  26  42  37  34   0

Cluster Determination.

We next take the distance matrix obtained in the prior step to determine an appropriate number of clusters that represent the different emotion dynamic patterns within our dyads. Several clustering techniques are available, but we use hierarchical cluster analysis using Ward’s single linkage method. Other possible methods include k-medoids clustering or latent mixture models. After determining the number of clusters that work for the data, we create an object that contains cluster membership for each dyad (which will be used in the final step) and plot the clusters.

Conduct hierarchical cluster analysis and save cluster analysis results to the object “clusterward”.

# Insert dissimilarity matrix ("dist_om"), 
# indicate that we are using a dissimilarity matrix, and
# indicate that we want to use Ward's single linkage clustering method
clusterward <- agnes(dist_om, diss = TRUE, method = "ward")

# Plot the results of the cluster analysis using a dendrogram
# Insert cluster analysis results object ("clusterward")
plot(clusterward, which.plot = 2)

In this example, the resulting dendrogram indicates three clusters. We reached this conclusion by examining the length of the vertical lines (longer vertical lines indicate greater differences between groups) and the number of dyads within each group (we didn’t want a group with too few dyads). After selecting a three cluster solution, we plotted the sequences of the three clusters for visual comparison.

# Cut dendrogram (or tree) by the number of 
# determined groups (in this case, 3)
# Insert cluster analysis results object ("clusterward") 
# and the number of cut points
cl3 <- cutree(clusterward, k = 3) 

# Turn cut points into a factor variable and label them
# Insert cut point object ("cl3") and 
# create labels by combining the text "Type" with either 1, 2, or 3
cl3fac <- factor(cl3, labels = paste("Type", 1:3)) 

# Plot the sequences for each cluster
seqplot(happy_seq,             # Sequence object
        group = cl3fac,        # Grouping factor level variable
        type = "I",            # Create whole sequence plot
        sortv = "from.start",  # Sort sequences based upon the category in which they begin
        with.legend = "right", # Display legend on right side of plot, change to "auto" if overlapping with plots
        cex.legend = 0.8,      # Change size of legend
        border = NA)           # No plot border

It appears that “Type 1” dyads are composed of dyads in which both members are generally happy (as indicated by the bright pink colors that are in the top-right of the state space grid). “Type 2” dyads seem to be composed of dyads in which dyad members experience low levels of happiness (as indicated by the dark colors that are in the bottom-left of the state space grid). Finally, “Type 3” dyads appear to exhibit high variability in their levels of happiness.

In the next (and final) step, we will formally test whether the dyads within these clusters differ on support self-reported relationship satisfaction and health.

Examine Group Differences among Clusters.

The final step of of our analysis is to examine group differences among the clusters. One can use a variety of methods to examine group differences, and the choice of method will depend on the number of clusters chosen and the research question. For example, if two clusters are chosen and one wants to examine differences on an outcome variable between the clusters, then one would use a t-test. In this case, we use analysis of variance (ANOVA) to examine group differences.

We examined whether dyad members’ self-reported relationship satisfaction and health (both simulated variables) differed by cluster membership, which represented daily dynamics of happiness.

# Add grouping variable to persons data set
persons$cl3 <- cl3

# Run ANOVA

rel_sat <- aov(persons$rel_sat ~ factor(persons$cl3))
summary(rel_sat)
##                      Df Sum Sq Mean Sq F value Pr(>F)
## factor(persons$cl3)   2   0.04 0.02124   0.103  0.902
## Residuals           193  39.63 0.20534
# If post hoc test is needed
TukeyHSD(rel_sat) 
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = persons$rel_sat ~ factor(persons$cl3))
## 
## $`factor(persons$cl3)`
##            diff        lwr       upr     p adj
## 2-1 -0.01761480 -0.1985305 0.1633009 0.9712709
## 3-1 -0.03692131 -0.2288114 0.1549687 0.8924814
## 3-2 -0.01930650 -0.2111966 0.1725836 0.9693531
health <- aov(persons$health ~ factor(persons$cl3))
summary(health)
##                      Df Sum Sq Mean Sq F value Pr(>F)
## factor(persons$cl3)   2   0.69  0.3428   0.636   0.53
## Residuals           193 103.96  0.5387
# If post hoc test is needed
TukeyHSD(health) 
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = persons$health ~ factor(persons$cl3))
## 
## $`factor(persons$cl3)`
##            diff        lwr       upr     p adj
## 2-1 -0.09358810 -0.3866024 0.1994262 0.7313341
## 3-1 -0.14413030 -0.4549189 0.1666583 0.5180964
## 3-2 -0.05054219 -0.3613308 0.2602464 0.9219316

As you can see, there were no significant group differences with this simulation data. Alternatively, we could examined dyad members separately (e.g., if dyad member 1 = male partners, dyad member 2 = female partners) to assess whether relationship satisfaction and subjective health differed by cluster membership for each dyad member type.

Cautions.

Although there are several distinct advantages of grid-sequence analysis, there are several limitations and considerations to the process, which include:

  1. The need for simultaneous data from each member of the dyad.
  2. The length of time series needed (dependent on the process under examination, but could be lengthy).
  3. The change of a continuous variable into an interval variable when creating axes.
  4. The extent of missingness.
  5. The (current) need to use distinguishable dyads, although this can be flexible if the state space grid is symmetric across the bottom-left to top-right diagonal.

Conclusion.

Theories of interpersonal dynamics and social interaction emphasize the need to study within-dyad dynamics. Grid-sequence analysis is a flexible new approach that allows researchers to capture within-dyad dynamics and to make between-dyad comparisons using repeated-measures dyadic data. We hope that the combination of state space grids and sequence analysis help researchers better study and understand the processes of interpersonal interactions.


Additional Information

We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful.

session_info(pkgs = c("attached"))
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       macOS Big Sur/Monterey 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2022-08-20
##  pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package        * version date (UTC) lib source
##  cluster        * 2.1.3   2022-03-28 [1] CRAN (R 4.2.0)
##  devtools       * 2.4.3   2021-11-30 [1] CRAN (R 4.2.0)
##  ggplot2        * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
##  reshape        * 0.8.9   2022-04-12 [1] CRAN (R 4.2.0)
##  TraMineR       * 2.2-4   2022-06-09 [1] CRAN (R 4.2.0)
##  TraMineRextras * 0.6.4   2022-06-13 [1] CRAN (R 4.2.0)
##  usethis        * 2.1.6   2022-05-25 [1] CRAN (R 4.2.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────