Share Paper: Searching Discourse Segments for Formulaic Sequences in a Closed Caption TV Corpus for Language Learning

  1. Hajime Mochizuki, Tokyo University of Foreign Studies, Japan
  2. Kohji Shibano, Tokyo University of Foreign Studies, Japan
Friday, October 20 10:15-10:45 AM Junior Ballroom B

Abstract: This paper describes a retrieval method to extract discourse segments from a closed caption TV corpus using formulaic sequences that serve an important function in discourse. We have been building the corpus from closed caption TV since December 2012 and as of February 2016 the total number of words has reached over 655 million. Because TV is a major medium in daily life with a lot of natural or near-natural dialogues, we expect to be able to apply the corpus to language education in e-learning systems. We expect that a discourse segment that includes a practical formulaic expression plays an ...