Cleans, vectorizes and appends lexical norms to all content words in a language corpus. User guides options for stopword removal and lemmatization. User selects up to three psycholinguistic dimensions to yoke norms on each content word in the original conversation transcript.
Usage
prep_dyads(
dat_read,
lemmatize = TRUE,
omit_stops = TRUE,
which_stoplist = "Temple_stops25",
verbose = TRUE
)
Arguments
- dat_read
dataframe produced from read_dyads() function
- lemmatize
logical, should words be lemmatized (switched to base morphological form), default is TRUE
- omit_stops
option to remove stopwords, default TRUE
- which_stoplist
user-specified stopword removal method with options including "none", "SMART", "MIT_stops", "CA_OriginalStops", or "Temple_Stopwords25". "Temple_Stopwords25 is the default list
- verbose
display detailed output such as error messages and progress (default is TRUE)
Value
dataframe with text cleaned and vectorized to a one word per-row format. Lexical norms and metadata are appended to each content word. Cleaned text appears under a new column called 'Text_Clean'. Any selected dimensions (e.g., word length) and metadata are also appended to each word along with speaker identity, turn, and Event_ID (conversation identifier).