Abstracts of presentations/posters accepted for the CALICO Pre-Conference Workshop on
Automatic Analysis of Learner Language:
Bridging Foreign Language Teaching Needs and NLP Possibilities
(Alphabetical list, by first author)
A New Arabic Interlanguage Database: Collection, Annotation, Analysis
We describe an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. The current collection of texts, which is constantly growing, contains intermediate- and advanced-level student writings. We describe the need for such corpora, the learner data we have collected, the tagset we have developed and present the results of our preliminary CEA. We also discuss our ongoing work, including adding extra levels of annotation, inter-annotator agreement, and experiments with analyzing the data for Second Language Acquisition (SLA) research.
Little Things With Big Effects: On the Identfication and Interpretation of Tokens for Error Diagnosis in ICALL
Error diagnosis in ICALL typically analyzes the learner input in an attempt to abstract and identify indicators of the learner’s (mis)conceptions of the linguistic properties of the input. For written input, this process usually starts with the identification of tokens that will serve as the atomic building blocks of the analysis. In this talk, we discuss the consequences of mismatches between the learner’s perception of the linguistic properties of a given token and the system’s interpretation of it. Based on our analysis of the interaction of beginning learners of Portuguese at OSU with the ICALL system TAGARELA, we discuss why tokenization and the interpretation of accented characters deserve particular attention in a system used by language learners.
Diagnosing meaning errors in short answers to reading comprehension questions
The ability of an ICALL system to diagnose and provide feedback on the meaning conveyed by a learner response depends on how well it can deal with the response variation allowed by an activity. We focus on short-answer reading comprehension questions which have a clearly defined target response but the learner may convey the meaning of the target in multiple ways. We analyze the target and learner answers in an EFL learner corpus we have collected and develop a categorization of semantic errors. We then tie the observed errors to our work in building an ICALL content assessment module to diagnose meaning errors.
Developing an online system for automatically identifying errors in Brazilian learner English
This paper will demonstrate an application for automated error detection in written English as a foreign language. This application was initially developed by manually tagging errors from a training corpus of essays by Brazilian learners, using a simple tag set following Nicholls (1999). Subsequently, a pre-processor extracted the probability of each word, with surrounding three-word bundles, collocational frameworks, and parts of speech, being used erroneously within the corpus. Thus, possible errors are identified and correct alternatives provided. The demonstration will attempt to assess the use, precision and recall of this tool, using essays which were not part of the training corpus.
On Diagnosing Word Order Errors
We present a word order error typology developed for learners of German and discuss how it can be used in the annotation of a corpus of written responses by beginning college-level learners. Combining aspects of previously developed typologies, we develop a detailed typology with a focus on the position of items with respect to topological fields. We examine properties of these word order errors relevant for automatic error diagnosis and show how an analysis built on constraints over topological field units can overcome the shortcomings of phrase-structure-based encodings.
Automatic detection of preposition errors in learner writing
Automatic detection of preposition errors in learner writing: In this talk, we present a machine-learning based approach to the automatic identification of preposition errors in L2 English writing. A model for the correct use of prepositions in English is developed using the British National Corpus; this model, which takes into account 345 contextual syntactic and semantic features, is then applied to L2 texts to test its suitability in identifying misused prepositions. In preliminary investigations, the model achieves up to 85.4% accuracy in matching prepositions to their context in the L1 data.
Multi-level annotation with adjustable granularity in a corpus of learner English
This paper reports on the construction of an error tagger called EARS, based on a multi-level, fine-grained taxonomy of errors from a corpus of Spanish students of English. No interlanguage exploitation of the material has been attempted at as yet. So far, the main objective has been to provide a narrow L1 specific tagging system for coverage of areas which generic tagging systems may fall short to describe. It builds up from a PhD dissertation (Díaz Negrillo, 2007), and is currently in progress funded by two Spanish research projects (HUM2007-60107/FILO and P07HUM- 03028).
Korean Particle Error Detection via Probabilistic Parsing
A crucial question for automatically detecting errors in learner input is which grammatical information is relevant and useful for determining a particular construction is erroneous. Based on knowledge about how learner input will vary in its grammatical properties, we propose a method for using probabilistic parsers to assist in detecting ill-formed input, allowing us to re-use analyses from corpus annotation. Using multiple grammatical models that capture different aspects of learner knowledge, we outline how the parsing models can assist in detecting errors in Korean particle usage, constructions which are problematic even for advanced learners.
A Variety of `Errors’? Automated Analysis of Teacher and Pupil Talk in Singapore Classrooms
The application of corpus methodology to the study of classroom discourse is a challenging new research area. The Singapore Corpus of Research in Education (SCoRE) comprises spoken pedagogic interactions in primary and secondary classrooms. SCORE is characterized by Singaporean pupils’ and teachers’ spoken language that, problematically, deviates from ‘standard’ English in existing annotation schemes. We present methods for increasing the accuracy of annotation for the Singapore variety of English. As a case study, we also present techniques for exploring how far novice and professional users’ repertoires of ready-made patterns, or ‘phrasicons’, can be said to significantly differ in this context.
Using Statistical Techniques and Web Search to Correct ESL Errors
We present an ESL (English as a Second Language) proofing prototype that leverages information from multiple sources to detect potential errors made by non-native writers and suggest suitable corrections. A series of classifiers detects errors such as missing articles or faulty preposition choices on the basis of part of speech and lexical information. Proposed corrections are then scored by a large language model to filter spurious candidates. Finally, the World Wide Web is used as a corpus to provide usage examples to assist the writer. We present a technical overview and evaluation results from both human and automatic evaluation.
Toward accurate syntactic feedback in a writing lab for German-speaking elementary schoolers: A generation-based approach
We describe a first prototype of a virtual writing conference based NLP techniques. State-of-the-art computer support for writing tasks is restricted to multiple-choice questions or quizzes. To our knowledge, no software tool exists that deploys generation technology to evaluate the grammatical quality of student output. We base exact feedback on the output of a natural language generation system that provides all paraphrases. Parsing technology is applied only in the teacher mode, where new stories are encoded in a simple manner. Exercises where the pupils improve the story given by simple clauses are generated fully automatically. Well-targeted syntactic feedback is provided for every student action.
Goals and Challenges for the Standardization of Error Typologies in Parser-Based CALL
Despite different architectural as well as pedagogical goals of a CALL system, classification systems ideally prove workable, robust and effective while following commonly accepted general principles of error tagging that suggest that any tagset be consistent, informative, flexible and reusable. During this presentation, we will discuss some of the goals and challenges associated with the standardization of error classifications by also providing, by way of example, an error typology that we created for a parser-based CALL system for German with HPSG as the underlying grammar formalism.
Annotation and analyses of temporal aspects of spoken fluency
In this talk, we will present the methodology adopted for transcribing and quantifying temporal fluency phenomena in a spoken L2 corpus (L2 English, French, and Italian, by learners of different proficiency levels); the CHILDES suite is being used for transcription and analysis, and we have adapted the CHAT format in order to code disfluencies as precisely as possible. We will also briefly present findings for two extreme sub-groups in the corpus – our most disfluent and most fluent learners, and trace the observable links between the hesitation phenomena observed, and gaps in L2 declarative knowledge and procedures.
SCALE: Spelling Correction Adapted for Learners of English
Conventional spelling correction programs, which are designed for native English speakers, are not adequate for correcting spelling errors made by English Language Learners (ELLs). I will present a pilot study showing that one third of the spelling errors made by ELLs who are native speakers of Japanese (in the Miura (1999) Corpus) were either not detected by standard spell checkers, or were detected, but the list of suggestions did not contain the target word. I address four classes of errors made in the corpus: morphological overregularization, phonological confusion, real-word phonological confusion, and syntactically appropriate real-word phonological confusion.
Annotation of Korean Learner Corpora for Particle Error Detection
In this study, we focus on particle errors and discuss an efficient annotation scheme for Korean learner corpora, designed to facilitate the automatic error detection process as well as information extraction. Our further goal is to build up the Intelligent CALL system for Korean language learning. As a stepping stone, we investigate properties of Korean learner corpora for diagnosing learner errors and provide resourceful annotation guidelines. We will also present relevant issues of annotated Korean learner corpora including how to represent error mark-ups, desirable forms, overlapping error types, etc, and how to affect other processes of evaluation and feedback.
Mastering noise and silence in learner answers processing: simple techniques for analysis and diagnosis
This paper presents a “low-cost” strategy to cope with the problem of reliability of NLP applications for CALL systems. Based on this approach, ExoGen is a prototype for generating activities such as gapfill exercises. It integrates a module for error detection and description, which checks learners’ answers against expected ones. Through the analysis of differences, it is able to diagnose problems like spelling errors, lexical mix-up, error prone agreement, bad conjugations, etc. The first evaluation of ExoGen outputs, based on the FRIDA learner corpus, has yielded very promising results, paving the way for the development of an efficient and general model tailored to a wide variety of activities
Croatian Corpus of Learner English
The purpose of this poster presentation is to inform about research that is being carried out as part of the Croatian national project Early English language learning: analysis of learner language. The project is expected to make possible the compilation of the first Croatian Corpus of Learner English (CCLE) which will consist of written language and utterances produced by young Croatian English language learners in both one-to-one and classroom interactions. The presentation will give insights into properties of learner language that may be relevant for research into Second Language Acquisition (SLA) and Foreign Language Teaching (FLT).
Scoring an Oral Language Test Using Automatic Speech Recognition
Elicited imitation (EI) is an oral language testing method enjoying renewed attention. In it learners repeat—word for word—sentences of increasing length and complexity until they are no longer able to reproduce them accurately, thus providing a reasonable approximation of global oral language proficiency. We first report on research in developing an EI test with encouraging performance when compared to human judges. Then we focus on how we developed a tool with the Sphinx speech recognition engine that automatically scores the imitated sentences. We discuss evaluation results and indicate how the integrated tool will be further developed and used in the future.
Automatic Measurement of Syntactic Complexity in Second Language Acquisition
A major challenge facing second language acquisition (SLA) researchers who apply syntactic complexity measures to large language samples is the labor-intensiveness of manual analysis. This has not only limited the size of the samples analyzed in previous studies, but has made it difficult to empirically evaluate the reliability of different complexity measures using large-scale corpus data. The current research addresses this challenge by implementing a computational system that can automatically measure the syntactic complexity of language samples of any size using a wide range of complexity metrics current in SLA research.
Robo-Sensei’s NLP-based Error Detection and Feedback Generation
The Robo-Sensei NLP system includes a lexicon, a morphological generator, a word segmentor, a morphological parser, a syntactic parser, an error detector, and a feedback generator. The author describes how the error detector and the feedback generator are integrated into the Robo-Sensei NLP system, what types of errors are detected, and what kinds of feedback messages are generated. Errors are classified into “missing word”, “unexpected word”, “particle error”, “predicate error”, “word order error”, “modifier error”, “unknown word”. Error feedback explains grammatical principles violated in the learner’s sentence. The effectiveness of such principle-based feedback has been demonstrated by a series of empirical studies.
TechWriter: An individualized approach to writing assistance and improvement
Writing improvement software has often had a “one-size-fits-all” philosophy, which treats individual deficiencies as identical problems. Consequently, the system may overlook some of the user’s errors or automatically correct them, causing the user to become dependent on the system. Thus, we propose TechWriter, a comprehensive English-language system that aims to improve the quality of the user’s writing by learning from their mistakes and suggesting personalized corrections to them. The program is to be used in a classroom setting with an instructor helping the student correct their errors. TechWriter learns from the corrections so that it may automatically fix identical errors in the future. This way, the user’s writing will improve through the use of the program and they will not become dependent on it, but they will also not grow frustrated with it. TechWriter incorporates techniques from natural language processing, machine learning, and corpus linguistics to accomplish its goals.
Construction of an error information tagged corpus of Japanese language learners and automatic error detection
Learner corpora have been increasing greatly because of the recent development of computer technology. However, they still need some work, such as being tagged with error information with refined error taxonomy and creating an automatic error detection system. We currently work on these following tasks : 1) Constructing a raw corpus with students’ writing samples; 2) Designing a valid tag set with a generic error taxonomy; 3) Detecting erroneous sentences from correct usage of Japanese sentences; 4) Categorizing errors into certain error types; 5) Analyzing the characteristics on learners’ errors statistically for the SLA research. At the workshop, we will explain the whole project flow and pay attention to points 2, 3 and 4 above in detail.
The Challenges of Annotating Learner Language
This paper reports on the design and implementation of a learner language annotation scheme. The long-term goal is to develop a markup scheme that allows for reliable and objective learner language markup, which leads to the identification of patterns of language use and error in ESL students. The design of this markup scheme was guided by two main desiderata: (i) focus on learner language patterns as opposed to only errors, and (ii) objectivity of the markup. The development of a common framework for learner language annotation is crucial for progress in ICALL and CAT as well as better understanding of SLA process.
Exploiting unsupervised techniques to predict EFL learner errors
We present ongoing research related to the development of unsupervised techniques for the detection task in large coverage error correction systems. We studied how a set of predefined parameters (corpus size, statistical measure, context, and linguistic model) can be combined to optimize performance. Evaluation is performed on error annotated learner corpora produced in real instruction settings. Overall results show these simple techniques provide performance rates comparable to related work. We are analysing in detail the pros and cons of n-gram based linguistic patterns, and plan to be able to open future directions building on the results of this error analysis.
Classification Systems for Misspellings by Non-native Writers
This presentation provides a survey of classification systems for misspellings by foreign language learners with an eye to both foreign language teaching needs and the evaluation and development of spell checkers in CALL. We will discuss four error taxonomies (edit distance, linguistic competence, linguistic subsystem, language influence) in the context of a study that found that spell checker improvement for L2 German should address intralingual morphological competence misspellings with an edit distance of more than one. Our presentation will invite discussion on the computational challenges for automatizing error tagging for L2 misspelling corpora.
Analyzing Learner Texts
We argue for the combination of a set of discourse analytic measures which provide some indication of different aspects of the quality of learner texts and show how these measures can be complemented with results from an NLP-based analysis. We are introducing a browser-based tool which analyses student writing according to a number of accuracy and complexity measures (Skehan & Foster, 1997, 2005; Tavakoli & Skehan, 2005) and on recent research in automated essay scoring (Shermis & Burstein, 2003). The measures in their combination provide more information than a simple error count or an impressionistic measure of stylistic complexity.
Reliability of Human Annotation of Usage Errors in Learner Text
To date, most efforts in developing NLP tools for error detection have used corpora annotated by one rater. This presents a problem for evaluating an NLP system on a gold-standard corpus since if a human rater is systematically biased in judgment, it could impact evaluation of the system. To investigate, we tested the reliability of human raters in judging usage errors (prepositions and collocations) in ELL essays. Our results suggest that for these types of errors, high rater agreement can be difficult to achieve and thus annotating a corpus based solely on one rater’s judgments may not be prudent.
Using Decision Trees to Detect and Classify Grammatical Errors
We present various methods to flag a sentence as ungrammatical. The methods exploit machine-learning and use ungrammatical sentences for training which are created automatically by transforming sentences in a regular corpus. The accuracy levels of our error detection methods range from 50% to 80% (depending on the error type). We apply our methods to a variety of real learner data, and we attempt to modify them so that the type of error is also detected. The performance of the error diagnosis methods is analysed and we provide suggestions for improvement.
Formalizing the second language learner corpus by means of automatic analysis
The on-going work we’re presenting involves automatic analysis of an L2 learner corpus. As the corpus was manually analyzed, learner language properties (cf. Klein and Perdue and the Basic Variety) have already been described. The aim of the workshop is to show the feasibility of an automatic analysis of this type of corpus. This automatic analysis carries a double implication : a) it is possible to see how NLP can contribute to SLA research and b) it allows for reflection of how one might use a formalized NLP description of learner language to develop pedagogical tools for teaching foreign languages.
Construction of a rated speech corpus of L2 learners’ speech
This study aims to construct a speech database of L2 learners of English as an initial step in developing an automatic pronunciation assessment system for L2 learners. L2 learners’ spontaneous speech data were collected and each phone in the speech data was rated by phonetically trained ESL teachers. Despite the challenges of rating spontaneous speech, raters developed rating criteria based on phonetics training and ESL experience, and achieved a high agreement ratio. We examined those instances where raters disagreed and analyzed confounding influences. In addition, the relationship between distribution of the phoneme, word, dysfluency and the speaking proficiency were examined.