Skip to contents

Cleans and formats language transcripts from the read stage. Removes non-alphabetic characters and stopwords. Language transcripts can be lemmatized by calling lemmatize = TRUE. Vectorizes each utterance and reports the total word count and mean word length by interlocutor in each dyad. Also reports the number of words in each turn.

Usage

clean_dyads(read_ts_df, lemmatize = TRUE, stop_words_df = "default")

Arguments

read_ts_df

data frame produced from the read_dyads() function

lemmatize

logical, should words be lemmatized (switched to base morphological form)

stop_words_df

defaults to built in list of stopwords, otherwise supply a file path to a cvs file with a column of stopwords titles 'Word'

Value

dataframe with cleaned text data, formatted with one word per row