This R data package loads a concatenated dataframe composed of two-party US Presidential debate transcripts from 1960 to 2020. Each debate transcript is delineated in a one-row per utterance format marked by a distinct document_id (debate and year), candidate (e.g., REAGAN) and turn#.
Each utterance is also marked with associated metadata regarding the candidate (e.g. name, party) election outcome (e.g., party_winner), and various other statistics about the US in that debate year including:
1) Democrat candidate age
2) Republican candidate age
3) Age difference between candidates
3) Inflatiion rate in the year of the debate
4) GDP % change from prior year of debate
We edited each raw transcript to omit commentator questions and remarks (e.g., “What is the most important issue facing the economy?”). We also ommitted parenthetical transcription notes (e.g., […coughing]).
We initially compiled debate transcripts from the Commission on Presidential Debates (debates.org). Since each of the debates we included were widely televised and publicly available, the data are considered fair use (https://www.copyright.gov/fair-use/). That is, their intended use is for academic/scholarly research.
Researchers are free to make use of this data for any purpose. We optimized the format to work well with our companion R-package (ConversationAlign) designed to analyze alignment between two interlocutors.
Install the development version of USAPresidentialDebates from GitHub by typing the following in your console or script (make sure you have devtools installed):
Call the transcripts
Could not be more simple… call the library, then type the word debates.
Get in touch!
Contact jamie_reilly@temple.edu for feedback and assistance.