Workshop on Operational Text Classification Systems
2001
New Orleans, Louisiana, USA
September 13, 2001
http://www.DavidDLewis.com/events/otc2001
otc2001info@DavidDLewis.com
in conjunction with
ACM SIGIR 2001
September 9-13, 2001
http://www.sigir2001.org
Text classification research and practice has exploded in the past decade. This work has been pursued under a variety of headings (text categorization, automated indexing, text mining, topic detection and tracking, etc.). Both the automated assignment of textual data to classes, and the automated discovery of such classes (by techniques such as clustering) have been of intense interest. A variety of practical applications have been fielded, in areas such as indexing of documents for retrieval, hierarchical organization of Web sites, alerting and routing of news, creation of specialized information products, enforcement of information security, content filtering (spam, porn, etc.), help desk automation, knowledge discovery in textual and partially textual databases, and many others.
Experiments on text classification data sets have been widely presented in a variety of forums. The technical details of operational text classification, however, have rarely been discussed.
The goal of this workshop is to expose researchers and practitioners to the challenges encountered in building and fielding operational text classification systems. We hope to begin the systematizing of engineering principles in this area, and spark new directions for research as well.
TOPICS
Workshop topics will include (but are not limited to):
* Cost effectiveness of automating text classification tasks
* Understanding what users want from classification systems
* Technical and personnel issues in using training data and prior knowledge
* Trading off space, time, and other resources in the training, adaptation, and execution phases of classification
* Integrating automated classification systems with pre-existing software, organizational procedures, relevant laws, and cultural expectations
* Maintaining and monitoring effectiveness as text sources and classes change over time
* Discovering, defining, updating, and explaining classes and classifiers
* The roles of classification and related technologies
(information extraction, terminology discovery, etc.)
PARTICIPATION
To facilitate discussion, workshop attendance will be limited to a maximum of 70 participants. Anyone interested in attending should apply in one of these two ways:
1. Researchers, practitioners, and users with an interest
in text classification:
**Please submit a paragraph
describing your background, organizational affiliation (if any), and interest
in text classification.
2. Prospective speakers with substantial knowledge
of one or more operational text classification systems and an interest
in presenting a talk based on their experience:
**Please submit both a paragraph
of interest (as described above) and an abstract (maximum 750 words) outlining
the major points you would speak on. Talks whose focus is experimental
results on standard test collections are discouraged. Conversely,
operational text classification at any scale from the tiny (e.g. an evaluation
of content filtering software for a small organization) to the huge (e.g.
categorizing hundreds of newswires each day) is of interest. Selection
of talks will be largely based on the speaker's ability and willingness
to discuss technical details of operational systems, as reflected in their
abstract.
Submissions should be sent in ASCII or PDF form to:
otc2001submit@DavidDLewis.com
All submissions will be reviewed by the organizers
and program committee. The interest paragraphs and talk proposals of invited
participants will be reproduced and distributed as an informal notebook
at the workshop.
IMPORTANT DATES
Interest paragraphs must be received: June 15, 2001
Talk abstracts must be received: June 15, 2001
Notification of acceptance: July 15, 2001
Workshop: September 13, 2001
Please visit http://www.sigir2001.org for hotel and registration deadlines.
ORGANIZERS
David
D. Lewis, independent consultant (Chair)
Susan
Dumais, Microsoft
Ronen
Feldman, Clearforest
Fabrizio
Sebastiani, Italian National Council of Research
PROGRAM COMMITTEE
James
Allan, University of Massachusetts
David
Evans, Clairvoyance
Sue Feldman,
IDC
Norbert
Fuhr, University of Dortmund
Thorsten
Joachims, GMD
Andras
Kornai, Northern Light
Wai Lam,
Chinese University of Hong Kong
Dunja
Mladenic, J. Stefan Institute and Carnegie Mellon Univ.
Isabelle
Moulinier, Thomson
Christopher
Porter, Factiva
Prabhakar
Raghavan, Verity
Mehran
Sahami, E.piphany
Robert
Schapire, AT&T
Frank
Smadja, Elron Software
Richard
Tong, Tarragon Consulting
Mark Wasson,
LexisNexis
Scott
Waterman, Kanisa Inc.
Yiming
Yang, Carnegie Mellon University