The aim of this two day workshop is to identify and to synthesize current needs for language-technology evaluation.
The space of possible dialogues is enormous, even for limited domains like travel information servers. The generalization of evaluation methodologies across different application domains and languages is an open problem. Review of published evaluations of dialogue models and systems suggests that usability techniques are the standard method. Dialogue-based system are often evaluated in terms of standard, objective usability metrics, such as task-completion time and number of user actions. In the past, researchers have proposed and debated theory-based methods for modifying and testing the underlying dialogue model, but the most widely used method of evaluation is usability testing, although more precise and empirical methods for evaluating the effectiveness of dialogue models have been proposed. For task-based interaction, typical measures of effectiveness are time-to-completion and task outcome, but the evaluation should focus on user satisfaction rather than on arbitrary effectiveness measurements.Indeed, the problems faced in current approaches to measurement of effectiveness dialogue models and systems include:
For its first day, the workshop organizers solicit papers on these issues, with particular emphasis on methods that go beyond usability testing to address the underlying dialogue model. Representative questions to be addressed include:
Of course, the problems faced in evaluating dialogue and system models are found in other domains of language engineering, even for non-interactive processes such as part-of-speech tagging, parsing, semantic disambiguation, information extration, speech transcription, and audio document indexing. So the issue of evaluation can be viewed at a more generic level, raising fundamental, theoretical questions such as:
For its second day of work, the workshop organizers solicit papers on these issues, with the intent to address the problem of evaluation both from a broader perspective (including novel applications domains for evaluation, new metrics for known tasks and resource evaluation) and a more theoretical point of view (including formal theory of evaluation and infrastructural needs of language engineering).
NOTE: People who would like to submit a paper on lexical semantic disambiguation evaluation should consider the parallel workshop, on July 5-6, for the closure of the SENSEVAL-2 evaluation campaign.
The organization of each of the two days of the workshop will reflect the workshop's two main themes. Each day will begin with a session of presentations of selected papers and follow with panel discussions to synthesize and develop possible methodologies from additional selected workshop papers.
The workshop seeks participation from people involved or interested in the problem of evaluation in language processing and the research and industrial communities that study and implement dialogue models for natural-language interaction systems.
The first part of the workshop will specifically draw on the natural-language interaction community, for instance like the one developing at the confluence of SIGdial and SIGCHI, which will find in this workshop an atmosphere more flavored by computational-linguistics related issues (see, for example, the First SIGdialWorkshop on Discourse and Dialogue).
The second part of the workshop is intended to provide a forum for a broader audience more in the spirit of the one that attended the LREC'2000 Satellite Workshop on Evaluation (see http://www.limsi.fr/TLP/CLASS), in particular offering an opportunity to people involved in language engineering evaluation (e.g ., the CLASS audience) in the context of national or transnational projects or programs, both in Europe and abroad.
Paper submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference. They are available from the ACL-2001 program committee Web site at http://acl2001.dfki.de/style/.
Papers should be submitted electronically, as either a LaTeX, Word or PDF file to either:
Deadline for workshop paper submissions: | April 6, 2001 |
Deadline for notification of workshop paper acceptance: | April 27, 2001 |
Deadline for camera-ready workshop papers: | May 16, 2001 |
Workshop date: | July 6-7, 2001 |
David G. Novick
Department of Computer Science
University of Texas at El Paso
El Paso, TX 79968, USA
Phone: +1 915-747-6952
novick@cs.utep.edu
http://www.cs.utep.edu/novick
Joseph Mariani
Limsi - CNRS
Bâtiment 508 Université Paris XI
BP 133
- 91403 ORSAY Cedex - France
Fax: +33 (0)1 69 85 80 88
mariani@limsi.fr
http://www.limsi.fr/Individu/mariani
Candy Kamm
AT&T Labs
180 Park, Bldg 103
Florham Park, NJ 07932, USA
+1 973-360-8540
cak@research.att.com
http://www.research.att.com/info/cak
Patrick Paroubek
Spoken Language Processing Group / Human-Machine Communication Department
Limsi - CNRS
Bâtiment 508 Université Paris XI
BP 133 - 91403 ORSAY Cedex - France
Fax: +33 (0)1 69 85 80 88
Phone: +33 (0)1 69 85 81 91
pap@limsi.fr
http://www.limsi.fr/Individu/pap
Nils Dahlbäck
Computer & Information Science Department
Linköping University
S-581 83 Linköping Sweden
Phone: +46 13 28 16 64
nilda@ida.liu.se
http://www.ida.liu.se/~nilda/
Frankie James
RIACS Mail Stop 19-39
NASA Ames Research Center
Moffett Field, CA 94035, USA
Phone: +1 650-604-0197
fjames@riacs.edu
http://www-pcd.stanford.edu/frankie/
Karen Ward
Department of Computer Science
University of Texas at El Paso
El Paso, TX 79968 USA
Phone: +1 915-747-6957
kward@cs.utep.edu
http://www.cs.utep.edu/kward
ACL 2001 | |
CLASS | |
ELRA | |
ELSNET |
We also anticipate co-sponsorship from SIGdial.
Additional information on the workshop, including accepted papers and the workshop schedule, will be made available as needed at http://www.limsi.fr/TLP/CLASS/eacl01.html