--- title: "Configural Frequency Analysis Tutorial" author: Miriam Brinberg, Yuwei Li output: rmdformats::robobook: html_document: default word_document: default editor_options: chunk_output_type: console --- # Overview This tutorial provides R code on conducting configural frequency analysis (Lienert & Krauth, 1975; Stemmler, 2020; von Eye, 1990). In brief, configural frequency analysis is similar to a chi-square test in that it uses a contingency table to determine whether instances are evenly distributed across categories. Beyond a chi-square test, however, configural frequency analysis identifies whether particular cells within a contingency table are over- or under-represented. In this example, we use configural frequency analysis to examine the use of specific turn transitions during a subset (*N* = 59) of conversations between strangers in which one dyad member disclosed about a current problem. Each turn in the conversations between stranger dyads was coded (e.g., as an acknowledgement, question, hedged disclosure, etc.; see Bodie et al., 2021 in the *Journal of Language and Social Psychology* for more details about the creation of the turn typology) and the transitions between turn types was tallied across the entire sample. We use configural frequency analysis to determine which turn transitions occur more or less frequently than expected during these conversations. We also add a third hypothetical variable to illustrate the value of configural frequency analysis for examining more than two variables at a time. Specifically, we add a hypothetical "condition" variable that represents an experimental manipulation in which dyads were assigned to communicate face-to-face or using computer-mediated communication. We include this hypothetical scenario following our initial demonstration of how configural frequency analysis was used in the Bodie et al. (2021) paper. Note that the accompanying "ConfiguralFrequency_Tutorial_2023June16.rmd" file contains all of the code presented in this tutorial and can be opened in RStudio (a somewhat more friendly user interface to R). # Outline In this tutorial, we'll cover... * Reading in the data and loading needed packages. * Plotting two dyads' conversation. * Preparing the data for configural frequency analysis. * Conducting configural frequency analysis. * Adding a hypothetical experimental condition to the data. * Conducting configural frequency analysis with three variables. # Read in the data and load needed libraries. **Let's read the data into R.** The data set we are working with is called "StrangerConversations_N59" and is stored as a .csv file (comma-separated values file, which can be created by saving an Excel file as a csv document) on my computer's desktop. ```{r} # Set working directory (i.e., where your data file is stored) # This can be done by going to the top bar of RStudio and selecting # "Session" --> "Set Working Directory" --> "Choose Directory" --> # finding the location of your file setwd("~/Desktop") # Note: You can skip this line if you have #the data file and this .rmd file stored in the same directory # Read in the data data <- read.csv(file = "StrangerConversations_N59.csv", head = TRUE, sep = ",") # View the first 10 rows of the data head(data, 10) ``` In the data, we can see each row contains information for one turn and there are multiple rows (i.e., turns) for each dyad. Specifically, there is a column for: * Dyad ID (`id`) * Time variable - in this case, turn in the conversation (`turn`) * Dyad member ID - in this case, role in the conversation (`role`; discloser = 1, listener = 2) * Turn type - in this case, based upon a typology derived in Bodie et al. (2021; `turn_type`) **Load the R packages we need.** Packages in R are a collection of functions (and their documentation/explanations) that enable us to conduct particular tasks, such as plotting or fitting a statistical model. ```{r, message = FALSE, warning = FALSE} # install.packages(data.table) # Install package if you have never used it before library(data.table) # For data management: counting turn transitions # install.packages("devtools") # Install package if you have never used it before require(devtools) # For version control #install.packages("confreq") # Install package if you have never used it before library(confreq) # For conducting configural frequency analysis # install.packages("dplyr") # Install package if you have never used it before library(dplyr) # For data management # install.packages("ggplot2") # Install package if you have never used it before library(ggplot2) # For plotting # install.packages("tidyverse") library(tidyverse) # For data management ``` # Plot Two Dyads' Conversation. To get a better feel for the conversation data, let's plot two dyads' conversations. Before creating the plots, it is helpful to set the colors for each turn type so the color of the turn categories are consistent across plots (i.e., the number of turn types present in a given conversation does not affect the color of the turn types). We do this by creating a vector "cols" that contains color assignments (via hex code: https://www.color-hex.com/) for each turn type. A note on accessibility: To make your plots accessible, you may consider adopting a colorblind-friendly palette. David Nichols' website (https://davidmathlogic.com/colorblind/) provides a great explainer on this issue, as well as a color picking tool. ```{r} cols <- c("Elaboration" = "#00BA38", "Question" = "#619CFF", "Acknowledgement" = "#F8766D", "Reflection" = "#DB72FB", "Advice" = "#93AA00", "HedgedDisclosure" = "#00C19F") ``` We'll create the dyadic categorical time series plot for each exemplar dyad and save these plots to the objects "dyad105_plot" and "dyad123_plot". Dyad 105 plot. ```{r, warning = FALSE} # First partition data of interest dyad105 <- data[data$id == 105, ] dyad105_plot <- # Choose the data, set time variable (turn) for the x-axis ggplot(dyad105, aes(x = turn)) + # Create title for plot by combining "Dyad = " with the dyad id variable (id) ggtitle(paste("Dyad =", unique(dyad105$id))) + # Create bars for the form of the listeners' turns # Partition data for listeners (role = 2) geom_rect(data = dyad105[dyad105$role == 2, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn-.5, xmax = turn+.5, # Set the height of each bar to range from 0 to 5 ymin = 0, ymax = 5, # Set the color of each bar to correspond to each turn type fill = turn_type)) + # Add a horizontal line to separate bars geom_hline(yintercept = 5, color = "black") + # Create bars for the form of the disclosers' turns # Partition data for disclosers (role = 1) geom_rect(data = dyad105[dyad105$role == 1, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn-.5, xmax = turn+.5, # Set the height of each bar to range from 5 to 10 ymin = 5, ymax = 10, # Set the color of each bar to correspond to each turn type fill = turn_type)) + # Set color of turn types to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Role") + # X-axis ticks and labels scale_x_continuous(breaks = seq(0, 110, by = 10)) + # Y-axis ticks and label scale_y_continuous(breaks = c(2.5, 7.5), labels=c("Listener Turn", "Discloser Turn")) + # Legend label labs(fill = "Turn Type") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black")) ``` Dyad 123 plot. ```{r, warning = FALSE} # First partition data of interest dyad123 <- data[data$id == 123, ] dyad123_plot <- # Choose the data, set time variable (turn) for the x-axis ggplot(dyad123, aes(x = turn)) + # Create title for plot by combining "Dyad = " with the dyad id variable (id) ggtitle(paste("Dyad =", unique(dyad123$id))) + # Create bars for the form of the listeners' turns # Partition data for listeners (role = 2) geom_rect(data = dyad123[dyad123$role == 2, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn-.5, xmax = turn+.5, # Set the height of each bar to range from 0 to 5 ymin = 0, ymax = 5, # Set the color of each bar to correspond to each turn type fill = turn_type)) + # Add a horizontal line to separate bars geom_hline(yintercept = 5, color = "black") + # Create bars for the form of the disclosers' turns # Partition data for disclosers (role = 1) geom_rect(data = dyad123[dyad123$role == 1, ], # Set the width of each bar as -0.5 and +0.5 the value of the time variable (turn) mapping = aes(xmin = turn-.5, xmax = turn+.5, # Set the height of each bar to range from 5 to 10 ymin = 5, ymax = 10, # Set the color of each bar to correspond to each turn type fill = turn_type)) + # Set color of turn types to vector we created earlier ("cols") scale_fill_manual(values = cols) + # Label for x-axis xlab("Turn") + # Label for y-axis ylab("Role") + # X-axis ticks and labels scale_x_continuous(breaks = seq(0, 110, by = 10)) + # Y-axis ticks and label scale_y_continuous(breaks = c(2.5, 7.5), labels=c("Listener Turn", "Discloser Turn")) + # Legend label labs(fill = "Turn Type") + # Additional plot aesthetics theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(color = "black")) ``` Print the plots we just created. ```{r} print(dyad105_plot) print(dyad123_plot) ``` On the x-axis, we have turn in the conversation. On the y-axis, we have the turn type for the disclosers on the top half and the listeners on the bottom half. Each turn category is represented by a different color and the gray bars indicate when a particular dyad member is not speaking. We can see that Dyad 105 had greater back-and-forth exchange during their conversation, as indicated by the greater number of turns. In both dyads, we can see that the disclosers spent many of their turns elaborating on their problem (green) and the listener used a variety of different turn types. # Prepare the Data for Configural Frequency Analysis. To prepare the conversation data for configural frequency analysis, we must first count the number of transitions between listener --> discloser turns and the number of transitions between discloser --> listener turns. First, let's make sure all the data are ordered by turn within each dyad. ```{r} # Order data by ID and turn number data <- data[order(data$id, data$turn), ] # View the first 10 rows of the data head(data, 10) ``` Second, before calculating the number of turn transitions, we first need to distinguish between listener and discloser turns of the same label (e.g., listener question vs. discloser question) since these are not distinguished in the "turn_type" variable. ```{r} data <- # Select data data %>% # Update "turn_type" variable so that # if the role = 1 (i.e., if it is a discloser turn), # then add a "D" in front of turn type, # otherwise add a "L" in front of turn type # separate the D (or L) and the turn type with a "_" dplyr::mutate(turn_type = paste0(ifelse(role == 1, "D", "L"), "_", turn_type)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` Third, let's create a lagged "turn_type" variable. This lagged variable will then be combined with the original "turn_type" variable to create a new variable that represents the turn transition. After running the code, you will see below that the discloser’s first turn is shown as D_Elaboration. On the same line and in the next column, the first lagged turn is represented as NA to represent the fact that the listener did not speak prior to the discloser’s first turn. ```{r} # Create a lagged variable data <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Create new variable that is a lag of "turn_type" dplyr::mutate(lagged_turn_type = lag(turn_type)) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` We next generate a data frame containing transition frequencies through cross-tabulation, and save it to the data frame "data_dt". Because there are 7 turn types (6 identified and 1 NA) and 2 roles (listener and discloser), there are 14 turn categories in our dataset. Thus, the crosstab table contains 14 * 14 = 196 cells, representing all pairings or combinations of two turns. ```{r} # Reformat data frame to data table data_dt <- data.table::as.data.table(data) # Count the occurrence of lagged turn and turn pairs data_counts <- as.data.frame(with(data_dt, table(turn_type, lagged_turn_type))) # Count the rows in data_counts nrow(data_counts) ``` Of course, not all of the 196 are legitimate turn transitions. Transitions that were at the beginning or end of the conversation and those that involved uncodable turns will need to be removed. In addition, while they exist in the table, D->D and L->L transitions do not occur in our data. We will remove them in the two code chunks below. ```{r} # Remove rows in which the lagged turn in the transition contains a _NA data_counts <- data_counts[ grep("_NA", data_counts$lagged_turn_type, invert = TRUE) , ] # Remove rows in which the following turn in the transition contains a _NA data_counts <- data_counts[ grep("_NA", data_counts$turn_type, invert = TRUE) , ] # Count the rows in data_counts nrow(data_counts) ``` Since we will be examining listener --> discloser and discloser --> listener turn transitions separately, we will create two different data sets that contain the counts of these turn transitions. We will also remove the L->L and D->D transitions at this step. ```{r} # Create listener --> discloser transition data set # Remove all transitions that begin with a discloser turn from data_counts # by selecting all rows in lagged_turn_type with the discloser tag "D_" list2disc <- data_counts[ grep("D_", data_counts$lagged_turn_type, invert = TRUE) , ] # Remove all transitions that end with a listener turn to create the L->D subset # by selecting all rows in turn_type with the listener tag "L_" list2disc <- list2disc[ grep("L_", list2disc$turn_type, invert = TRUE) , ] nrow(list2disc) # View the first 10 rows of the data head(list2disc, 10) # Create discloser --> listener transition data set # Remove all transitions that begin with a listener turn from data_counts # by selecting all rows in lagged_turn_type with the listener tag "L_" disc2list <- data_counts[ grep("L_", data_counts$lagged_turn_type, invert = TRUE) , ] # Remove all transitions that end with a discloser turn to create the D->L subset # by selecting all rows in turn_type with the discloser tag "D_" disc2list <- disc2list[ grep("D_", disc2list$turn_type, invert = TRUE) , ] nrow(disc2list) # View the first 10 rows of the data head(disc2list, 10) ``` Given that we are examining the transitions between six listener turn types and six discloser turn types, we should expect each of our data sets to contain 36 rows. These rows represent all possible transitions, including a few transitions that were not observed in our dataset. Finally, the `confreq` package requires the data to be organized in a specific way to conduct configural frequency analysis. Specifically, each data set should have three columns that represent (1) the type of lagged turn, (2) the current type of turn, and (3) the frequency of that turn transition. In the first two columns, each turn type needs to be represented by a number instead of a category label. So, we number the categories in alphabetical order (1 = acknowledgement, 2 = advice, 3 = elaboration, 4 = hedged disclosure, 5 = question, 6 = reflection). The code chunk below recodes the turn labels into numbers. ```{r} # Recoding variables in listener --> discloser transition data set list2disc <- # Select data list2disc %>% # Recode lagged turn and following turn variables with numbers dplyr::mutate(lagged_turn_type = recode(list2disc$lagged_turn_type, "L_Acknowledgement" = 1, "L_Advice" = 2, "L_Elaboration" = 3, "L_HedgedDisclosure" = 4, "L_Question" = 5, "L_Reflection" = 6), turn_type = recode(list2disc$turn_type, "D_Acknowledgement" = 1, "D_Advice" = 2, "D_Elaboration" = 3, "D_HedgedDisclosure" = 4, "D_Question" = 5, "D_Reflection" = 6)) %>% # Save the data as a data.frame as.data.frame() # Recoding variables in discloser --> listener transition data set disc2list <- # Select data disc2list %>% # Recode lagged turn and following turn variables with numbers dplyr::mutate(lagged_turn_type = recode(disc2list$lagged_turn_type, "D_Acknowledgement" = 1, "D_Advice" = 2, "D_Elaboration" = 3, "D_HedgedDisclosure" = 4, "D_Question" = 5, "D_Reflection" = 6), turn_type = recode(disc2list$turn_type, "L_Acknowledgement" = 1, "L_Advice" = 2, "L_Elaboration" = 3, "L_HedgedDisclosure" = 4, "L_Question" = 5, "L_Reflection" = 6)) %>% # Save the data as a data.frame as.data.frame() ``` Let's reorder the values of the turn variables since they represent the turn categories in alphabetical order, and reorganize the columns so the lagged turns come first. ```{r} # Order data by lagged_turn_type and turn_type for listener --> discloser turn transitions list2disc <- list2disc[order(list2disc$lagged_turn_type, list2disc$turn_type), ] # Reorder the columns for listener --> discloser turn transitions list2disc <- list2disc[, c(2, 1, 3)] # View the first 10 rows of the data head(list2disc, 10) # Order data by lagged_turn_type and turn_type for discloser --> listener turn transitions disc2list <- disc2list[order(disc2list$lagged_turn_type, disc2list$turn_type), ] # Reorder the columns for discloser --> listener turn transitions disc2list <- disc2list[, c(2, 1, 3)] # View the first 10 rows of the data head(disc2list, 10) ``` We can also rename the columns to represent the speaker of that turn. ```{r} # Rename columns in listener --> discloser turn transition data colnames(list2disc) <- c("listener", "discloser", "freq") # Rename columns in discloser --> listener turn transition data colnames(disc2list) <- c("discloser", "listener", "freq") ``` # Conduct the Configural Frequency Analysis. Let's examine the structure of the data sets. We need to check whether the variables are in the correct format. Specifically, we need the variables that label the turn types of the listeners and disclosers to be factor variables (instead of integer variables). A factor variable makes sure R interprets the variables as categories instead of integers. ```{r} # Examine the structure of the listener --> discloser # and discloser --> listener data str(list2disc) str(disc2list) # Need the discloser and listener variables to be factors in both datasets list2disc$listener <- as.factor(list2disc$listener) list2disc$discloser <- as.factor(list2disc$discloser) disc2list$listener <- as.factor(disc2list$listener) disc2list$discloser <- as.factor(disc2list$discloser) ``` Now that the data are in the correct format, we can run the configural frequency analyses. Let's first examine the listener to discloser transitions. The configural frequency analysis function requires that the data are formatted as a response pattern frequency table. ```{r} # Change format of data for configural frequency analysis # Insert data ("list2disc") in the dat2fre(fre2dat()) function cfa_list2disc <- dat2fre(fre2dat(list2disc)) # Examine the response pattern frequency table cfa_list2disc ``` Next, we conduct the configural frequency analysis and save the results. We then examine a summary of the results. ```{r} # Conduct configural frequency analysis and save the results # Insert the response pattern frequency table # and the name of the variables of interest (with a ~ and +) results_list2disc <- CFA(cfa_list2disc, form = "~ listener + discloser") # Examine the saved results summary(results_list2disc , showall = TRUE) ``` In the results, we can examine the column "Type" to determine which transitions occurred more (+) or less (-) frequently than expected. We can see that the transitions between (1) listener acknowledgements and discloser hedged disclosures, (2) listener elaborations and discloser acknowledgements, (3) listener elaborations and discloser questions, (4) listener elaborations and discloser reflections, and (5) listener hedged disclosures and discloser acknowledgements occurred more frequently than expected. Furthermore, transitions between (1) listener acknowledgements and discloser acknowledgements and (2) listener elaborations and discloser elaborations occurred less frequently than expected. We next examine the discloser to listener transitions. The configural frequency analysis function requires that the data are formatted as a response pattern frequency table. ```{r} # Change format of data for configural frequency analysis # Insert data (disc2list) in the dat2fre(fre2dat()) function cfa_disc2list <- dat2fre(fre2dat(disc2list)) # Examine the response pattern frequency table cfa_disc2list ``` Next, we conduct the configural frequency analysis and save the results. We then examine a summary of the results. ```{r} # Conduct configural frequency analysis and save the results # Insert the response pattern frequency table # and the name of the variables of interest (with a ~ and +) results_disc2list <- CFA(cfa_disc2list, form = "~ listener + discloser") # Examine the saved results summary(results_disc2list , showall = TRUE) ``` In the results, we can examine the column "Type" to determine which transitions occurred more (+) or less (-) frequently than expected. We can see that the transitions between (1) discloser acknowledgements and listener elaborations, (2) discloser acknowledgements and listener hedged disclosures, (3) discloser hedged disclosures and listener acknowledgements, and (4) discloser questions and listener elaborations occurred more frequently than expected. Furthermore, transitions between (1) discloser acknowledgements and listener acknowledgements, (2) discloser elaborations and listener elaborations, (3) discloser hedged disclosures and listener elaborations, (4) discloser questions and listener acknowledgements, and (5) discloser questions and listener reflections occurred less frequently than expected. # Add Hypothetical Experimental Condition to the Data. One particular advantage of using configural frequency analysis is that it allows researchers to examine the association between 2+ categorical variables at a time. For instance, researchers may examine the association between gender, health status, and education (or any other combination of variables that you can think of!). Stemmler and colleagues' work provide many examples of how to examine 2+ categorical variables using configural frequency analysis. Here, we build on our current example examining turn-to-turn transitions by adding a hypothetical variable to the data set. Specifically, we add a variable called "condition" that indicates whether the conversation occurred in a face-to-face or computer-mediated setting and randomly assign a condition to each dyad. ```{r} # Create new variable called "condition" in the data set "data" data$condition <- NA # Set seed for random number generator that will be used below (to make sure we get consistency across runs) set.seed(1234) # Randomly assign each dyad to a condition data <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Create function that assigns a condition to each dyad # Within mutate, assign the condition variable a random value between 0 and 1 # and round that value to an integer - i.e., 0 decimal places dplyr::group_modify(function(.x, .y) .x %>% mutate(condition = round(runif(1, 0, 1), 0))) %>% # Relabel the conditions # If condition is equal to 0, relabel it as the face-to-face condition (FtF) # If condition is equal to 1, relabel it as the computer-mediated communication condition (CMC) dplyr::mutate(condition = ifelse(condition == 0, "FtF", ifelse(condition == 1, "CMC"))) %>% # Save the data as a data.frame as.data.frame() # View the first 10 rows of the data head(data, 10) ``` Let's double check to see how are random assignment worked. Specifically, we will count how many dyads ended up in each condition. ```{r} # Create dyad level data set dyad_data <- # Select data data %>% # Select grouping variable, in this case, dyad ID (id) dplyr::group_by(id) %>% # Reduce each dyad to one row with its assigned condition dplyr::summarise(assigned_condition = unique(condition)) %>% # Count the number of dyads for each condition dplyr::count(assigned_condition) %>% # Save the data as a data.frame as.data.frame() # Print data set dyad_data ``` Woohoo - random assignment was successful! Twenty-nine dyads were assigned to the CMC condition, and 30 dyads were assigned to the FtF condition. Now, we get back to preparing our data for configural frequency analysis. Like we did above, we generate a data frame containing transition frequencies by condition through cross-tabulation and save it to the data frame "data_dt3" (the 3 indicating our 3 variable analysis). Because there are 7 turn types (6 identified and 1 NA), 2 roles (listener and discloser), and 2 conditions (CMC and FtF) there are 28 turn transition/condition combinations in our dataset. Thus, the crosstab table contains 28 * 28 = 784 cells, representing all combinations. ```{r} # Reformat data frame to data table data_dt3 <- data.table::as.data.table(data) # Count the occurrence of lagged turn and turn pairs by condition data_counts3 <- as.data.frame(with(data_dt3, table(turn_type, lagged_turn_type, condition))) # Count the rows in data_counts nrow(data_counts3) ``` As described above, not all of the 784 cells are legitimate turn transition/condition combinations. Transitions that were at the beginning or end of the conversation and those that involved uncodable turns will need to be removed. In addition, while they exist in the table, D->D and L->L transitions do not occur in our data. We will remove them in the two code chunks below. ```{r} # Remove rows in which the lagged turn in the transition contains a _NA data_counts3 <- data_counts3[ grep("_NA", data_counts3$lagged_turn_type, invert = TRUE) , ] # Remove rows in which the following turn in the transition contains a _NA data_counts3 <- data_counts3[ grep("_NA", data_counts3$turn_type, invert = TRUE) , ] # Count the rows in data_counts nrow(data_counts3) ``` Since we will be examining listener --> discloser and discloser --> listener turn transitions separately, we will create two different data sets that contain the counts of these turn transitions. We will also remove the L->L and D->D transitions at this step. ```{r} # Create listener --> discloser transition data set # Remove all transitions that begin with a discloser turn from data_counts # by selecting all rows in lagged_turn_type with the discloser tag "D_" list2disc3 <- data_counts3[ grep("D_", data_counts3$lagged_turn_type, invert = TRUE) , ] # Remove all transitions that end with a listener turn to create the L->D subset # by selecting all rows in turn_type with the listener tag "L_" list2disc3 <- list2disc3[ grep("L_", list2disc3$turn_type, invert = TRUE) , ] nrow(list2disc3) # View the first 10 rows of the data head(list2disc3, 10) # Create discloser --> listener transition data set # Remove all transitions that begin with a listener turn from data_counts # by selecting all rows in lagged_turn_type with the listener tag "L_" disc2list3 <- data_counts3[ grep("L_", data_counts3$lagged_turn_type, invert = TRUE) , ] # Remove all transitions that end with a discloser turn to create the D->L subset # by selecting all rows in turn_type with the discloser tag "D_" disc2list3 <- disc2list3[ grep("D_", disc2list3$turn_type, invert = TRUE) , ] nrow(disc2list3) # View the first 10 rows of the data head(disc2list3, 10) ``` Given that we are examining the transitions between six listener turn types and six discloser turn types by condition, we should expect each of our data sets to contain 72 rows. These rows represent all possible transitions, including a few transitions that were not observed in our dataset. Finally, the `confreq` package requires the data to be organized in a specific way to conduct configural frequency analysis. Specifically, each data set should have four columns that represent (1) the type of lagged turn, (2) the current type of turn, (3) the condition, and (4) the frequency of that combination. In the first three columns, each turn type and condition needs to be represented by a number instead of a category label. So, we number the turn categories in alphabetical order (1 = acknowledgement, 2 = advice, 3 = elaboration, 4 = hedged disclosure, 5 = question, 6 = reflection) and the condition as initially created above (0 = FtF, 1 = CMC). The code chunk below recodes the turn labels into numbers. ```{r} # Recoding variables in listener --> discloser transition data set list2disc3 <- # Select data list2disc3 %>% # Recode lagged turn and following turn variables with numbers dplyr::mutate(lagged_turn_type = recode(list2disc3$lagged_turn_type, "L_Acknowledgement" = 1, "L_Advice" = 2, "L_Elaboration" = 3, "L_HedgedDisclosure" = 4, "L_Question" = 5, "L_Reflection" = 6), turn_type = recode(list2disc3$turn_type, "D_Acknowledgement" = 1, "D_Advice" = 2, "D_Elaboration" = 3, "D_HedgedDisclosure" = 4, "D_Question" = 5, "D_Reflection" = 6), condition = recode(list2disc3$condition, "FtF" = 0, "CMC" = 1)) %>% # Save the data as a data.frame as.data.frame() # Recoding variables in discloser --> listener transition data set disc2list3 <- # Select data disc2list3 %>% # Recode lagged turn and following turn variables with numbers dplyr::mutate(lagged_turn_type = recode(disc2list3$lagged_turn_type, "D_Acknowledgement" = 1, "D_Advice" = 2, "D_Elaboration" = 3, "D_HedgedDisclosure" = 4, "D_Question" = 5, "D_Reflection" = 6), turn_type = recode(disc2list3$turn_type, "L_Acknowledgement" = 1, "L_Advice" = 2, "L_Elaboration" = 3, "L_HedgedDisclosure" = 4, "L_Question" = 5, "L_Reflection" = 6), condition = recode(disc2list3$condition, "FtF" = 0, "CMC" = 1)) %>% # Save the data as a data.frame as.data.frame() ``` Let's reorder the values of the turn variables since they represent the turn categories in alphabetical order, and reorganize the columns so the lagged turns come first. ```{r} # Order data by lagged_turn_type and turn_type for listener --> discloser turn transitions list2disc3 <- list2disc3[order(list2disc3$lagged_turn_type, list2disc3$turn_type), ] # Reorder the columns for listener --> discloser turn transitions list2disc3 <- list2disc3[, c(2, 1, 3, 4)] # View the first 10 rows of the data head(list2disc3, 10) # Order data by lagged_turn_type and turn_type for discloser --> listener turn transitions disc2list3 <- disc2list3[order(disc2list3$lagged_turn_type, disc2list3$turn_type), ] # Reorder the columns for discloser --> listener turn transitions disc2list3 <- disc2list3[, c(2, 1, 3, 4)] # View the first 10 rows of the data head(disc2list3, 10) ``` We can also rename the columns to represent the speaker of that turn. ```{r} # Rename columns in listener --> discloser turn transition data colnames(list2disc3) <- c("listener", "discloser", "condition", "freq") # Rename columns in discloser --> listener turn transition data colnames(disc2list3) <- c("discloser", "listener", "condition", "freq") ``` # Conduct the Configural Frequency Analysis with Three Variables. As before, let's examine the structure of the data sets. We need to check whether the variables are in the correct format. Specifically, we need the variables that label the turn types of the listeners and disclosers as well as the condition variable to be factor variables (instead of integer variables). A factor variable makes sure R interprets the variables as categories instead of integers. ```{r} # Examine the structure of the listener --> discloser # and discloser --> listener data str(list2disc3) str(disc2list3) # Need the discloser, listener, and condition variables to be factors in both datasets list2disc3$listener <- as.factor(list2disc3$listener) list2disc3$discloser <- as.factor(list2disc3$discloser) list2disc3$condition <- as.factor(list2disc3$condition) disc2list3$listener <- as.factor(disc2list3$listener) disc2list3$discloser <- as.factor(disc2list3$discloser) disc2list3$condition <- as.factor(disc2list3$condition) ``` Now that the data are in the correct format, we can run the configural frequency analyses. Let's first examine the listener to discloser transitions across conditions. The configural frequency analysis function requires that the data are formatted as a response pattern frequency table. ```{r} # Change format of data for configural frequency analysis # Insert data ("list2disc3") in the dat2fre(fre2dat()) function cfa_list2disc3 <- dat2fre(fre2dat(list2disc3)) # Examine the response pattern frequency table cfa_list2disc3 ``` Next, we conduct the configural frequency analysis and save the results. We then examine a summary of the results. ```{r} # Conduct configural frequency analysis and save the results # Insert the response pattern frequency table # and the name of the variables of interest (with a ~ and +) results_list2disc3 <- CFA(cfa_list2disc3, form = "~ listener + discloser + condition") # Examine the saved results summary(results_list2disc3 , showall = TRUE) ``` In the results, we can examine the column "Type" to determine which transitions occurred more (+) or less (-) frequently than expected. For the face-to-face condition, we see that the transitions between (1) listener acknowledgements and discloser hedged disclosures, (2) listener elaborations and discloser questions, (3) listener elaborations and discloser reflections, (4) listener hedged disclosures and discloser acknowledgements, and (5) listener reflections and discloser acknowledgements occurred more frequently than expected. Furthermore, transitions between (1) listener acknowledgements and discloser acknowledgements and (2) listener elaborations and discloser elaborations occurred less frequently than expected. For the computer-mediated communication condition, we see that the transitions between (1) listener advice and discloser acknowledgements, (2) listener elaborations and discloser acknowledgements, (3) listener elaborations and discloser questions, and (4) listener hedged disclosures and discloser acknowledgements occurred more frequently than expected. Furthermore, transitions between (1) listener acknowledgements and discloser acknowledgements and (2) listener hedged disclosures and discloser elaborations occurred less frequently than expected. We next examine the discloser to listener transitions across conditions. The configural frequency analysis function requires that the data are formatted as a response pattern frequency table. ```{r} # Change format of data for configural frequency analysis # Insert data (disc2list3) in the dat2fre(fre2dat()) function cfa_disc2list3 <- dat2fre(fre2dat(disc2list3)) # Examine the response pattern frequency table cfa_disc2list3 ``` Next, we conduct the configural frequency analysis and save the results. We then examine a summary of the results. ```{r} # Conduct configural frequency analysis and save the results # Insert the response pattern frequency table # and the name of the variables of interest (with a ~ and +) results_disc2list3 <- CFA(cfa_disc2list3, form = "~ listener + discloser + condition") # Examine the saved results summary(results_disc2list3 , showall = TRUE) ``` In the results, we can examine the column "Type" to determine which transitions occurred more (+) or less (-) frequently than expected. For the face-to-face condition, we see that the transitions between (1) discloser acknowledgements and listener elaborations and (2) discloser hedged disclosures and listener acknowledgements occurred more frequently than expected. Furthermore, transitions between (1) discloser acknowledgements and listener acknowledgements, (2) discloser elaborations and listener elaborations, and (3) discloser hedged disclosures and listener elaborations occurred less frequently than expected. For the computer-mediated communication condition, we see that the transitions between (1) discloser acknowledgements and listener advice, (2) discloser acknowledgements and listener elaborations, (3) discloser acknowledgements and listener hedged disclosures, and (4) discloser questions and listener elaborations occurred more frequently than expected. Furthermore, transitions between discloser acknowledgements and listener acknowledgements occurred less frequently than expected. ----- ### Additional Information We created this tutorial with a system environment and versions of R and packages that might be different from yours. If R reports errors when you attempt to run this tutorial, running the code chunk below and comparing your output and the tutorial posted on the LHAMA website may be helpful. ```{r} session_info(pkgs = c("attached")) ```