Lab 07: Qualitative Data Analysis

Learning Goals

Students will learn how to collaboratively perform a thematic analysis on open-ended interview/survey responses.
Students will gain experience implementing the steps for open/axial coding, codebook development, and inter-coder reliability testing.
Students will gain practical experience using coding as a tool for identifying patterns in human-robot interaction (HRI) research data.

Working in Groups

For this lab, you will work in groups of ~3 students. Each group will turn in ONE set of deliverables.

Lab 7 Deliverables & Submission

Lab 7 introduces the basics of thematic analysis, adapted from Richards & Hemphill (2018), one kind of qualitative data analysis. You will work in groups to analyze anonymized open-ended interview responses (lab_07_interview_data.xlsx, available on Canvas), identify recurring themes, develop and refine a shared codebook, and apply that codebook to additional data. You will calculate inter-coder reliability and also write up your results as if you were writing a research paper.

You will be asked to submit:

The Lab 7 Qualitative Data Analysis Worksheet — the main worksheet for this lab where you will report:
- Your codebook
- Your data analysis approach + inter-rater reliability score
- A writeup of the results of your thematic analysis, similar to what we might see in the "Results" section of an HRI research paper
A Spreadsheet of Thematic Analysis/Coding — a spreadsheet you create that records the themes you coded for each participant. Your group decides on the format, but it should make your coding decisions clear (e.g., one row per participant per coder, with columns for the themes you chose).

To receive credit for this lab, one of the members of your group will need to submit your completed qualitative data analysis worksheet to Canvas by Friday, May 8 at 6:00pm.

Lab 7 HRI Study, Data, and Your Goal for this Lab

During this lab, you'll be analyzing open-ended response data from the same HRI study we examined during Lab 6. In case it's helpful, here's the study overview again, so you can remind yourself about the study hypotheses, methods, and measures.

The dataset (lab_07_interview_data) contains responses from 79 participants across four experimental conditions in a 2×2 design that crosses robot form factor with personality customization:

NAO : humanoid SoftBank NAO robot, no customization (n = 20)
TURTLE : non-humanoid TurtleBot3 with OpenMANIPULATOR-X arm, no customization (n = 20)
NAO + Customizable : humanoid robot, participant customized Big Five traits (n = 19)
TURTLE + Customizable : non-humanoid robot, participant customized Big Five traits (n = 20)

Each row of the spreadsheet corresponds to one participant. The relevant columns are:

Q53 : participant ID
Condition : which of the four conditions the participant was assigned to
Mid-interaction: "Describe your interaction with the robot so far." (asked between Phases 2 and 3)
Mid-interaction: "Describe the robot's personality so far."
Post-interaction: "Describe your interaction with the robot." (asked after Phase 3)
Post-interaction: "Describe the robot's personality."
Post-interaction: "Describe your experience customizing the robot's personality." (customizable conditions only : blank for the half of participants who could not customize)

The five free-response columns map onto a 2×2 grid of timepoint × topic (mid- vs. post-interaction × describe interaction vs. describe personality), plus one customization-specific question:

                       Describe the INTERACTION    |   Describe the PERSONALITY
  Mid-interaction:     "...so far."                |   "...so far."
  Post-interaction:    "...the robot."             |   "...the robot's personality."

  Customizable only:                                   "...your experience customizing..."

Customizable-condition participants answered all 5 questions; non-customizable participants answered 4 (the customization question is blank for them).

Because each participant produces several short open-ended responses rather than a long interview transcript, your coding unit for this lab is the full set of responses for a single participant (i.e., one row of the spreadsheet). You will assign theme codes to each participant based on the content of all of their responses considered together.

To access the data file for Lab 7 lab_07_interview_data.xlsx, download it from the Lab 7 announcement on Canvas.

Your goal for this lab is to conduct a thematic analysis on the open-ended responses and report your data analysis methods and findings. The outcome of this lab will be a written report that resembles a qualitative "Results" section of the papers we've read in class.

Steps for Thematic Analysis

Phase One: Preparing for the Analysis

This phase involves understanding the context and goals of your analysis. Your goals for this analysis include:

Collecting evidence for or against the experiment hypotheses. In particular, hypotheses about how robot form factor and personality customization affect user perceptions of the robot as a social agent (for review, you can look over the study overview again)
Understanding participants' perceptions of the overall experience
Understanding participants' perceptions of the robot (its personality, helpfulness, naturalness, etc.)

You are not required to pursue all three of these goals. These are meant to serve as starting points and guidance for your analysis.

Phase Two: Open and Axial Coding

Step 1: Each team member independently reads the responses from 8 participants (representing 2 participants from each of the 4 conditions: NAO, TURTLE, NAO + Customizable, TURTLE + Customizable) and identifies initial themes and subthemes. Some example themes and subthemes could be:
- Example theme: overall opinion of the robot
  - Example subtheme: positive
  - Example subtheme: negative
  - Example subtheme: neutral / mixed
- Example theme: perceived effect of customizing the robot's personality
  - Example subtheme: customization made the interaction feel more personal/aligned
  - Example subtheme: customization felt awkward or had little effect
  - Example subtheme: customization was not mentioned or not relevant (no-customization conditions)
- Example theme: humanlikeness of the robot
  - Example subtheme: described as humanlike or socially present
  - Example subtheme: described as machine-like, scripted, or robotic
  - Example subtheme: compared to other AI/voice assistants (e.g., ChatGPT, Alexa)
Step 2: As a team, discuss and refine your themes iteratively. For the purposes of this lab, please select 2 themes to code, each of which can have 2–4 subthemes. Your group should review at least 24 participant responses (6 from each of the 4 conditions) to refine your themes. Richards & Hemphill (2018) recommend reviewing around 30% of the data during this phase for a full-fledged thematic analysis (and 24 of 79 ≈ 30%).

Phase Three: Preliminary Codebook

Step 1: Create a preliminary codebook based on the discussion above (put your codebook in your Lab 7 Qualitative Data Analysis Worksheet).
Step 2: The team reviews the draft together. [Optional] You may invite an external researcher familiar with the study (but not part of the coding process) to review it as well. You can scroll down on the worksheet to find an example codebook of mine (Pooja)

Phase Four: Pilot Testing the Codebook

Step 1: All team members independently code the same 4 new participant responses (one from each condition) using the codebook (e.g., in separate tabs of a Google spreadsheet).
Step 2: Discuss discrepancies and revise the codebook accordingly until your team is confident in it.

Phase Five: Final Coding and Inter-Coder Reliability

Step 1: Perform full coding on the dataset using either consensus coding or split coding. For HRI research, we typically use split coding. This means that all team members code an overlap set (typically consisting of ~10% of the data. For this dataset, that is ~8 participants, ideally with at least 2 from each condition). After a sufficient inter-rater reliability is achieved on the overlap set (see Step 2), the team members split up the rest of the data, where only one team member codes each of the remaining ~71 participant rows.
Step 2: Calculate inter-coder reliability on the participant rows coded by all team members, using the appropriate metric below:

Choosing an Inter-Coder Reliability Metric:

2 coders, mutually exclusive themes: Use Cohen's Kappa
2+ coders, mutually exclusive themes: Use Krippendorff's Alpha or Fleiss' Kappa
2 coders, non-mutually exclusive themes: Use Cohen's Kappa per theme (binary coding)
2+ coders, non-mutually exclusive themes: Use Krippendorff's Alpha (nominal)

You can download cohen_kappa.py, krippendorff_alpha.py, and fleiss_kappa.py from the Lab 7 GitHub repository. You may need to install dependencies via pip install scikit-learn statsmodels krippendorff.

Phase Six: Interpreting and Writing Results

This phase involves drawing conclusions based on your thematic analysis and writing up your results. In particular, consider whether the patterns you observe in your codes differ across the four conditions. For example, do participants in customization conditions describe the robot's personality differently than those without customization? Does the robot's form factor (NAO vs. TURTLE) shape how participants describe the interaction? Rather than prescribing how to write these sections, we recommend that you review good examples of thematic analyses from published HRI papers and emulate the best of what you see from them:

Carsenti et al. (2025) : Section VF "Qualitative: Semi-structured interviews"
Nanavati et al. (2023) : Section 6 "Interview Results"
Hu et al. (2025) : Section VB "Place-Attachment Experience"
Shen et al. (2025) : Section IV "Results"

Tips & Resources

Use color coding or margin notes while reviewing responses to help identify themes.
When building the codebook, be specific about the criteria for each theme, especially because the responses are short, ambiguity in your definitions can quickly hurt reliability.
Consider using a spreadsheet to compare coded responses side-by-side. Adding a column per subtheme (with a 1/0 indicator) makes inter-rater reliability calculations straightforward.
Some participants give very brief responses (e.g., a few words) while others write paragraphs. Decide as a team how to handle thin responses — for example, whether to mark them as "insufficient information" rather than forcing a code.
Python scripts to calculate inter-rater reliability will be provided.
Refer to the original article for examples and guidance: Richards & Hemphill (2018) .

Extra Challenge

Compare how themes shift across the four experimental conditions by visualization — for example, by plotting the proportion of each subtheme in each condition as a stacked bar chart. Do NAO and TURTLE participants describe the robot's personality differently? Does customization change which themes appear most frequently?