7 July 2001
Toulouse, France

With the increased availability of online corpora, data-driven approaches have become central to the NL community. A variety of data-driven approaches have been used to help build Machine Translation systems -- example-based, statistical MT, and other machine learning approaches -- and there are all sorts of possibilities for hybrid systems. We wish to bring together proponents of as many techniques as possible to engage in a discussion of which combinations will yield maximal success in translation.

We propose to center the workshop on Data Driven MT, by which we mean all approaches which develop algorithms and programs to exploit data in the development of MT, primarily the use of large bilingual corpora created by human translators, and serving as a source of training data for MT systems. The workshop will focus on the following topics:

An especially important question that we wish to address is which techniques are best for each of the subparts of a complete MT system - e.g. learning grammars, building lexicons, parsing input data, determining transfer principles, generating target text, etc.

Invited speaker

Hermann Ney, RWTH Aachen

Workshop chairs

Jessie Pinkham Microsoft Research
Kevin Knight USC/ISI
Franz Josef Och RWTH Aachen

Programme Committee

Srinivas Bangalore AT&T Research
Ralf BrownCMU
Francisco Casacuberta Polytechnic Univ. of Valencia
Eugene CharniakBrown University
Robert FrederkingCMU
Ulf HermjakobUSC/ISI
Pierre IsabelleXerox Research Centre Europe
Bob MooreMSR
Masaaki Nagata NTT
Joseph Pentheroudakis MSR
Norbert Reithinger DFKI
Philip ResnikUniv. of Maryland
Steve RichardsonMSR
Eiichiro SumitaATR
Koichi TakedaIBM TRL
Enrique Vidal Polytechnic Univ. of Valencia
Stephan Vogel Univ. of Kaiserslautern
Hideo WatanabeIBM TRL


Papers describing original work in the area of Data Driven Machine Translation should be submitted electronically in Postscript or PDF format to:

Deborah Coughlin

Submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference. They are available from the ACL-2001 program committee Web-site at The paper should not include the authors' names and affiliations. As reviewing will be blind, the submission must be associated with an email containing the following information (ASCII text):

TITLE:    <title of the paper>
AUTHORS:  <list of authors>
EMAIL:    <email of author for correspondence>
KEYWORDS: <keywords, topic sub-areas, ...>
ABSTRACT: <abstract of the paper>

Important dates

Paper submissions *** 6 April 2001 ***
Notification of acceptance 27 April 2001
Camera-ready copies due 16 May 2001
Workshop dates 7 July 2001


The registration fee for the workshop will be posted at a later stage. The registration fee includes attendance of the workshop and a copy of workshop proceedings. Follow the registration instructions at the ACL site and indicate that you would like to attend the Data-Driven MT workshop.

