This study compared the utterances produced by test-takers in two modes of a speaking test: video-conferencing mode and human-to-machine semi-direct mode. The results indicated that in the video-conferencing mode, test-takers’ utterances were more fluent and grammatically accurate but less complex. Additionally, test-takers in this mode exhibited more signs of agreement, such as “yes” and “OK,” and generated more interrogative utterances. Conversely, in the semi-direct mode, test-takers used more transitional words such as “if,” “because,” and “also,” resulting in longer and more complex sentences. This implies that the two modes elicit qualitatively different types of speech.