---
title: "Dyadic Categorical Time Series Plot Tutorial"
author: Miriam Brinberg
output: 
  rmdformats::robobook:
  html_document: default
  word_document: default
editor_options:
  chunk_output_type: console
---

# Overview
This tutorial provides R code on creating dyad-level plots for categorical time series. Specifically, this visualization is well-suited for data in which there are two people (or variables of interest) measured asynchronously over time, as in conversation data. 

In this example, we'll be using data from conversations between two strangers in which each utterance in the conversation was coded for its verbal response mode category (Stiles, 1992).

Note that the accompanying "CategoricalDyadicTimeSeriesPlot_Tutorial_2022July26.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). 

# Outline
In this tutorial, we'll cover...

* Reading in the data and loading needed packages.
* Plotting an exemplar dyad.
* Running a loop that will plot all dyads and save the plots in a PDF.

# Read in the data and load needed packages.

In this tutorial, we will be working with a subset (*N* = 10) of natural stranger dyads presented in Bodie et al. (2021) in the *Journal of Language and Social Psychology*. Specifically, stranger pairs had a brief support conversation in the lab in which one dyad member disclosed about a current problem. The form and intent of each utterance within the conversation was coded with one of eight verbal response modes (e.g., question, acknowledgment, reflection) or deemed uncodable. Furthermore, the person-centeredness of each listeners' utterances were also coded on a scale of one to nine.

**Let's read the data into R.**  

The data set we are working with is called "StrangerConversations_Subset" and is stored as a .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop.
```{r}
# Set working directory (i.e., where your data file is stored)
# This can be done by going to the top bar of RStudio and selecting 
# "Session" --> "Set Working Directory" --> "Choose Directory" --> 
# finding the location of your file

setwd("~/Desktop") # Note: You can skip this line if you have 
#the data file and this .rmd file stored in the same directory

# Read in the data
data <- read.csv(file = "StrangerConversations_Subset.csv", head = TRUE, sep = ",")

# View the first 10 rows of the data
head(data, 10)
```

We can see the first few rows of the data (from Dyad 2). We can see each row contains information for one utterance and there are multiple rows for each dyad. We can also see there is a column for: 

* Dyad ID (`id`)  
* Time variable - in this case, utterance (or segment) in the conversation (`seg`)  
* Dyad member ID - in this case, role in the conversation (`role`; discloser = 1, listener = 2)  
* Verbal response mode category for form (`form`)  
* Verbal response mode category for intent (`intent`)  
* Person-centeredness of listeners' utterances (`pc`)  
     + Note that only the listeners (role = 2) have scores for person-centeredness and even listeners may have missing values for person-centeredness since it was only coded once the supportive conversation officially "began" (i.e., comments or small talk prior to the official start of the supportive conversation were not coded). 

**Load the R packages we need.**  

Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model.
```{r, warning = FALSE, message = FALSE}
# install.packages("devtools") # Install package if you have never used it before
library(devtools) # For version control

# install.packages("ggplot2") # Install package if you have never used it before
library(ggplot2) # For plotting
```

# Setting the color palette

Before creating the plots, it is helpful to set the colors for each utterance type so the color of the utterance categories are consistent across plots (i.e., the number of utterance types present in a given conversation does not affect the color of the utterance types). We do this by creating a vector "cols" that contains color assignments (via hex code: https://www.color-hex.com/) for each utterance type.

```{r}
cols <- c("Acknowledgement"="#e57d72",
          "Advisement"="#c89432",
          "Confirmation"="#98a934",
          "Disclosure"="#59b64c", 
          "Edification"="#5cbea1",
          "Interpretation"="#58b7de",  
          "Question"="#709cf8", 
          "Reflection"="#cb79f4", 
          "Uncodable"="#ea6dbf")
```

Note: To make your plots accessible, you may consider adopting a colorblind-friendly palette. David Nichols' website (https://davidmathlogic.com/colorblind/) provides a great explainer on this issue, as well as a color picking tool.  

# Plot an Exemplar Dyad.

To get a feel for how this plotting works, let's begin by only plotting one dyad.

First, let's partition the data to focus on a single dyad: Dyad 2.
```{r}
# Partition the data
dyad2 <- data[data$id == 2, ]

# View the first 10 rows of the data
head(dyad2, 10)
```

Before we plot the categorical time series of the conversation, we need to create a factor variable for the "form" and "intent" variables and label each of the Verbal Response Mode categories. A factor variable makes sure R interprets the variables as categories instead of integers (like how the "form" and "intent" variables are currently coded).

The first code chunk below creates a factor variable for the "form" variable and the second code chunk creates a factor variable for the "intent" variable.

Note: our labels are not listed in a random order, but instead correspond to the numeric labeling in the "form" and "intent" variables. For instance, in our data, questions are indicated by a 5 in the "form" and "intent" variables, so "Question" is the fifth element of the lists below.
```{r}
# Create factor variable for form
dyad2$form_categories <- factor(dyad2$form, levels=c(1:9), 
                                labels = c("Disclosure", "Edification",
                                           "Advisement", "Confirmation", 
                                           "Question", "Acknowledgement",
                                           "Interpretation", "Reflection", 
                                           "Uncodable"))

# Create factor variable for intent
dyad2$intent_categories <- factor(dyad2$intent, levels=c(1:9), 
                                  labels = c("Disclosure", "Edification",
                                             "Advisement", "Confirmation", 
                                             "Question", "Acknowledgement",
                                             "Interpretation", "Reflection", 
                                             "Uncodable"))

# View the first 10 rows of the data
head(dyad2, 10)
```

In the "dyad2" data, we can now see two new variables: "form_categories" and "intent_categories". 

Now that the data are prepared, we'll create the dyadic categorical time series plot and save it to the object "dyad2_plot".
```{r, warning = FALSE}
dyad2_plot <-
  # Choose the data (dyad2), set the time variable (seg), and the dyad member variable (role)
  ggplot(dyad2, aes(x = seg, group = factor(role))) +
  
  # Create title for plot by combining "Dyad = " with the dyad id variable (id)
          ggtitle(paste("Dyad =", unique(dyad2$id))) +
  
  # Create bars for the form of the listeners' utterances
          # Partition data for listeners (role = 2)
          geom_rect(data = dyad2[dyad2$role == 2, ], 
                    # Set the width of each bar as -0.5 and +0.5 the value of the time variable (seg)
                    mapping = aes(xmin = seg-.5, xmax = seg+.5, 
                    # Set the height of each bar to range from 0 to 5              
                                  ymin = 0, ymax = 5, 
                    # Set the color of each bar to correspond to each form category
                                  fill = form_categories)) +
  
  # Add a horizontal line to separate bars
          geom_hline(yintercept = 5, color = "black") +
  
  # Create bars for the intent of the listeners' utterances
          # Partition data for listeners (role = 2)
          geom_rect(data = dyad2[dyad2$role == 2, ],
                    # Set the width of each bar as -0.5 and +0.5 the value of the time variable (seg)
                    mapping = aes(xmin = seg-.5, xmax = seg+.5, 
                    # Set the height of each bar to range from 5 to 10              
                                  ymin = 5, ymax = 10, 
                    # Set the color of each bar to correspond to each intent category
                                  fill = intent_categories)) +
  
  # Add a horizontal line to separate bars
          geom_hline(yintercept = 10, color = "black") +
  
  # Create bars for the form of the disclosers' utterances
          # Partition data for disclosers (role = 1)
          geom_rect(data = dyad2[dyad2$role == 1, ],
                    # Set the width of each bar as -0.5 and +0.5 the value of the time variable (seg)
                    mapping = aes(xmin = seg-.5, xmax = seg+.5, 
                    # Set the height of each bar to range from 10 to 15
                                  ymin = 10, ymax = 15,
                    # Set the color of each bar to correspond to each form category
                                  fill = form_categories)) +
  
  # Add a horizontal line to separate bars
          geom_hline(yintercept = 15, color = "black") +
  
  # Create bars for the form of the disclosers' utterances
          # Partition data for disclosers (role = 1)
          geom_rect(data = dyad2[dyad2$role == 1, ],
                    # Set the width of each bar as -0.5 and +0.5 the value of the time variable (seg)
                    mapping = aes(xmin = seg-.5, xmax = seg+.5, 
                    # Set the height of each bar to range from 15 to 20
                                  ymin = 15, ymax = 20,
                    # Set the color of each bar to correspond to each intent category
                                  fill= intent_categories)) +
  
  # Create point (triangle) for listeners' person-centeredness
          # Time (seg) is on the x-axis, person-centeredness (pc) is on the y-axis
          # Control the shape and the size of the point
          geom_point(aes(x = seg,y = pc), shape = 17, size = 3) +

  # Set color of utterances to vector we created earlier ("cols")
          scale_fill_manual(values = cols) +

  # Label for x-axis
          xlab("Utterance") + 
  
  # Label for y-axis
          ylab("Role") +
  
  # X-axis ticks 
          scale_x_continuous(breaks = seq(0, 150, by = 50)) +
  
  # Y-axis ticks and labels
          scale_y_continuous(breaks = c(2.5, 7.5, 12.5, 17.5), 
                             labels=c("Listener Form", "Listener Intent", 
                                      "Discloser Form", "Discloser Intent")) +
  
  # Legend label
          labs(fill = "VRM Code") +
  
  # Additional plot aesthetics
          theme(panel.grid.major = element_blank(), 
                panel.grid.minor = element_blank(),
                axis.text=element_text(color = "black"))
```

Print the plot we just created.
```{r}
print(dyad2_plot)
```

On the x-axis, we have utterance or thought unit across the conversation. On the y-axis, we have the form and intent category for the utterances for the disclosers on the top half and the listeners on the bottom half. Each category is represented by a different color and the gray areas indicate when a particular dyad member is not speaking. Finally, the triangles in the bottom half represent the listeners’ person-centeredness. The higher the triangle, the higher the person-centeredness of the message.

Note: we receive the warning message "121 rows containing missing values (geom_point)". This is not an issue we need to worry about. This warning message indicates that we have missing values for the variable that is creating the points/triangles on the plot, which is the "person-centeredness" variable. We expect there to be missing values because person-centeredness was not coded for disclosers (and for listeners before the official conversation began). Thus, we are able to ignore this warning message.

# Run a Loop that will Plot all Dyads and Save the Plots in a PDF.

Now that we know how to create a plot for a single dyad, we will create a loop that plots each dyad in the data set and saves these plots to a PDF.

First, identify the location where you would like to save the PDF and save this location as the object "dir".
```{r}
# This can be done by going to the top bar of RStudio and selecting 
# "Session" --> "Set Working Directory" --> "Choose Directory" --> 
# finding the location of where you want your file
dir <- setwd("~/Desktop")  # Note: You can skip this line if you have 
#the data file and this .rmd file stored in the same directory
```

Second, create a vector of all the IDs in the data set.
```{r}
# Create vector
idlist <- unique(data$id) 

# View contents of vector
idlist
```

Note: the first number in brackets ([1]) is not part of the vector, it is just a counter and indicates the first element of the vector. The numbers following the brackets are the IDs contained in the data set.

Following what we did earlier for one dyad, we need to perform some data management for the whole data set.

Third, create a factor variable for the form and intent variables and label each of the Verbal Response Mode categories for these variables.
```{r}
# Create factor variable for form
data$form_categories <- factor(data$form, levels=c(1:9), 
                               labels = c("Disclosure", "Edification",
                                          "Advisement", "Confirmation", 
                                          "Question", "Acknowledgement",
                                          "Interpretation", "Reflection", 
                                          "Uncodable"))

# Create factor variable for intent
data$intent_categories <- factor(data$intent, levels=c(1:9), 
                                 labels = c("Disclosure", "Edification",
                                            "Advisement", "Confirmation", 
                                            "Question", "Acknowledgement",
                                            "Interpretation", "Reflection", 
                                            "Uncodable"))

# View the first 10 rows of the data
head(data, 10)
```

Finally, create and run the loop for the plots.
```{r, warning = FALSE, eval = FALSE}
# Open the pdf file
pdf('ConversationTurn_TimeSeries.pdf', width = 10, height = 7)

  for(x in 1:length(idlist)) #looping through plots 
  { 
    
    # Select participant ID from the list of IDs
    subject_id <- idlist[x]
    
    # Partition data for selected participant ID
    data_sub <- subset(data, id == subject_id)
  
    # Create object with participant ID
    name <- as.character(data_sub$id[1])
        
    # Plot the dyad's conversation
    plot <- 
      ggplot(data_sub, aes(x = seg, group = factor(role))) +
      ggtitle(paste("Dyad =", name)) +
      geom_rect(data = data_sub[data_sub$role == 2, ], 
                mapping = aes(xmin = seg-.5, xmax = seg+.5, ymin = 0, ymax = 5, 
                fill = form_categories)) +
      geom_hline(yintercept = 5, color = "black") +
      geom_rect(data = data_sub[data_sub$role == 2, ],
                mapping = aes(xmin = seg-.5, xmax = seg+.5, ymin = 5, ymax = 10, 
                fill = intent_categories)) +
      geom_hline(yintercept = 10, color = "black") +
      geom_rect(data = data_sub[data_sub$role == 1, ],
                mapping = aes(xmin = seg-.5, xmax = seg+.5, ymin = 10, ymax = 15,
                fill = form_categories)) +
      geom_hline(yintercept = 15, color = "black") +
      geom_rect(data = data_sub[data_sub$role == 1, ],
                mapping = aes(xmin = seg-.5, xmax = seg+.5, ymin = 15, ymax = 20,
                fill= intent_categories)) +
      geom_point(aes(x = seg,y = pc), shape = 17, size = 3) +
      scale_fill_manual(values = cols) +
      xlab("Utterance") + 
      ylab("Role") +
      scale_x_continuous(breaks = seq(0, 200,by = 50)) +
      scale_y_continuous(breaks = c(2.5, 7.5, 12.5, 17.5), 
                         labels=c("Listener Form", "Listener Intent", 
                                  "Discloser Form", "Discloser Intent")) +
      labs(fill = "VRM Code") +
      theme(panel.grid.major = element_blank(), 
            panel.grid.minor = element_blank(),
            axis.text=element_text(color = "black"))

  # Print the plot  
  print(plot)

}

dev.off()
```

Hooray for plotting!

-----
### Additional Information

We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful.
  
```{r}
session_info(pkgs = c("attached"))
```