Invited
Talks
Robustness through Prior Knowledge: Using Explanation-Based Learning to Distinguish Handwritten Chinese Characters
(Page 1)
Gerald DeJong (University of Illinois, Urbana-Champaign)
Paper
Session: Classification of Noisy Text
Finding Structure in Noisy Text: Topic Classification and
Unsupervised Clustering
(Page 3)
R. Prasad, P. Natarajan, K. Subramanian, S. Saleem and R. Schwartz
BBN Technologies, Cambridge, MA, USA
Genre as Noise - Noise in Genre
(Page 9)
Andrea Stubbe, Christoph Ringlstetter* and Klaus U. Schulz
University of Munich, Germany
*University of Alberta, Canada
A Practical Implementation of Automatic Text Categorisation and
Correction of Noisy OCR Documents into Braille and Large Print
(Page 17)
Ryan Brooks, William John Teahan, and David Hunnisett*
University of Wales, Bangor, UK
*ETL Solutions Ltd., UK
Discovering Identies in Web Contexts Using Unsupervised Clustering
(Page 23)
Ted Pedersen and Anagha Kulkarni*
University of Minnesota, Duluth, MN, USA
*Carnegie Mellon University, USA
Ontology Based Algorithms for Indexing and Search of Semantically
Close Natural Language Phrases
(Page 31)
Srikanth Kamath
National Institute of Technology Karnataka, India
A Treebank Conversion Algorithm for Non-Configurational
Languages
(Page 35)
Ahmad Pouramini and Naser Mozayani
Iran University of Science and Technology, Iran
Treebanks Gone Bad: Generating a Treebank of Ungrammatical English
(Page 39)
Jennifer Foster
Dublin City University, Dublin, Ireland
Paper Session: Detecting and Correcting Noisy Text
Text Correction Using Domain Dependent Bigram Models from Web
Crawls
(Page 47)
Christoph Ringlstetter, Max Hadersbeck*, Klaus U. Schulz* and Stoyan
Mihov**
University of Alberta, Canada
*University of Munich, Germany
**Bulgarian Academy of Sciences, Sofia, Bulgaria
Enhanced Integrated Scoring for Cleaning Dirty Texts
(Page 55)
Wilson Wong, Wei Liu and Mohammed Bennamoun
University of Western Australia, Australia
Investigation and Modeling of the Structure of Texting Language
(Page 63)
Monojit Choudhury, Rahul Saraf*, Vijit Jain, Sudeshna Sarkar and
Anupam Basu
Indian Institute of Technology, Kharagpur, India
*National Institute of Technology, Jaipur, India
Adding Sentence Boundaries to Conversational Speech Transcriptions
using Noisily Labelled Examples
(Page 71)
Tetsuya Nasukawa, Diwakar Punjani*, Shourya Roy*, L Venkata
Subramaniam* and Hironori Takeuchi
IBM Research, Tokyo, Japan
*IBM Research, New Delhi, India
Multi-Level Feature Extraction for Spelling Correction
(Page 79)
Johannes Schaback and Fang Li*
Technische Universitaet, Berlin, Germany
*Shanghai Jiao Tong University, China
Hidden Markov Model Based Identification of Transliterated Regional
Language Words in Text Documents
(Page 87)
Achuth Sankar S. Nair, Vrinda V. Nair* and Vinod Chandra S. S.**
University of Kerala, Thiruvananthapuram, India
*College of Engineering, Trissur, India
**College of Engineering, Thiruvananthapuram, India
Multiclass Hierarchical SVM for Recognition of Printed Tamil
Characters
(Page 93)
Shivsubramani K., Loganathan Ramasamy, Srinivasan C. J., Ajay V. and
Soman K. P.
Amrita Vishwa Vidyapeetham, India
A Causal Classification of Orthography Errors in Web Texts
(Page 99)
Mirko Tavosanis
Univerity of Pisa, Italy
Paper
Session: Information Extraction from Noisy Text
A Supervised Machine Learning Approach to Conjunction
Disambiguation in Named Entities
(Page 107)
Pawel Mazur and Robert Dale
Macquarie University, Australia
BlogVox: Separating Blog Wheat from Blog Chaff
(Page 115)
Akshay Java, Pranam Kolari, Tim Finin, Justin Martineau, Anupam Joshi
and James Mayfield*
University of Maryland, Baltimore County, MD, USA
*Johns Hopkins University, USA
An Automatic Approach to Semantic Annotation of Unstructured,
Ungrammatical Sources: A First Look
(Page 123)
Matthew Michelson and Craig Knoblock
Information Sciences Institute, University of Southern California, USA
Information Extraction for Multi-Participant, Task-Oriented,
Synchronous, Computer-Mediated Communication: A Corpus Study
of Chat Data
(Page 131)
Cassandre Creswell, Nicholas Schwartzmyer and Rohini Srihari
Janya Inc., USA
Alignment of Noisy Unstructured Text Data
(Page 139)
Julien Bourdaillet and Jean-Gabriel Ganascia
Universite Pierre et Marie Curie, France
Information Access to Historical Documents from the Early New High
German Period
(Page 147)
Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz and
Christiane Wanzeck
University of Munich, Germany
On Extracting Structured Knowledge from Unstructured Business
Documents
(Page 155)
Gaurav Pandey and Rakshit Daga*
University of Minnesota, Twin Cities, USA
*SAP Labs, USA
Mining Conversational Text for Procedures
(Page 163)
Deepak P. and Krishna Kummammuru
IBM Research, Bangalore, India
Panel Discussion
Noisy Text Analytics: An Exercise in Futility?
(Page 171)
Daniel Lopresti (moderator)
Lehigh University
(Return
to Top)