CMSC 35100: Natural Language Processing

    Introduction to Discourse and Dialogue

 
[Description ] [ Requirements ] [ Syllabus ] [ Assignments ] [ Bibliography ] [ Resources ]
[ 1 ]
Assignment 1
Due: October 12th
Theme:Anaphora resolution applied to spoken discourse

Procedure:
Select a short section of one of the transcribed naturally occuring spoken discourses handed out in class. Within one of those discourses you should identify a 5-10 sentence region of interest. Since the materials were transcribed by non-experts, there may be typos or mistranscriptions. You may make some assumptions about the ability of your reference resolution algorithm to perform its necessary syntactic or semantic analysis on the text given errorful transcription or linguistic complexity. You should state clearly whatever assumptions you have made, and you should discuss their impact in the second half of the assignment below. Select one of the anaphora resolution algorithms discussed in class: RAP (Lappin & Leass 1994), Centering (Brennan et al 1987), or CogNIAC (Baldwin 2000). For each utterance in your section, describe the actions taken by your chosen resolution procedure on that utterance. You should record the updates made to the discourse structures as each utterance is processed: e.g. list the Cf, Cp, and Cb for Centering, and what each anaphor in the utterance would resolve to.

Discussion:
The algorithms discussed in class had about an 80% accuracy rate on monologue texts. How do they perform on your more naturalistic discourses? How do they do when modeling multi-party discourse rather than monologues? What types of anaphora in your discourse were most problematic? easiest? Most of the examples worked through in class and in the text were a two or three sentences long. Were there any difficulties that you would associate with the longer discourses, e.g. changes in topics?

We ask you to turn in the printed hand-simulation of the anaphora resolution algorithm as applied to the short section of your transcribed discourse and a discussion of the points listed above. This discussion should be no longer than one page.