Survey researchers commonly use behavior coding to identify whether interviewers read questions exactly as worded in the questionnaire (a potential source of interviewer variance). Recurrent Neural Networks (RNNs; a machine learning technique) can partially automate this coding process with reliability levels comparable to humans (Timbrook and Eck 2019) to save time and money, though it is unknown whether the reliability of this RNN-based coding differs across question characteristics (e.g., number of words in a question, question reading difficulty). Using human-coded transcripts of interviewer question-asking as learning examples, I trained RNNs to identify when interviewers asked questions exactly as worded or with changes. I then compare the reliability of human and RNN coding of interviewer question-asking across four question characteristics: question stem length, Flesch-Kincaid reading level, type (demographic, attitude/opinion, behavior), and comprehensibility using the Question Understanding Aid (QUAID). Preliminary results indicate that RNNs were more reliable when coding questions with shorter stems (? ?=.604) versus longer stems (? ?=.401).