iJCai-07 Wor kShop on Analitics for noisy Unstrctrd Txt Data

Proceedings
Hyderabad, India - January 8, 2007



Main Page

Table of Contents

Author Index

AND 07 Website

Invited Talks

Robustness through Prior Knowledge: Using Explanation-Based Learning to Distinguish Handwritten Chinese Characters (Page 1)
Gerald DeJong (University of Illinois, Urbana-Champaign)

Paper Session: Classification of Noisy Text

Finding Structure in Noisy Text: Topic Classification and Unsupervised Clustering (Page 3)
R. Prasad, P. Natarajan, K. Subramanian, S. Saleem and R. Schwartz
BBN Technologies, Cambridge, MA, USA

Genre as Noise - Noise in Genre (Page 9)
Andrea Stubbe, Christoph Ringlstetter* and Klaus U. Schulz
University of Munich, Germany
*University of Alberta, Canada

A Practical Implementation of Automatic Text Categorisation and Correction of Noisy OCR Documents into Braille and Large Print (Page 17)
Ryan Brooks, William John Teahan, and David Hunnisett*
University of Wales, Bangor, UK
*ETL Solutions Ltd., UK

Discovering Identies in Web Contexts Using Unsupervised Clustering (Page 23)
Ted Pedersen and Anagha Kulkarni*
University of Minnesota, Duluth, MN, USA
*Carnegie Mellon University, USA

Ontology Based Algorithms for Indexing and Search of Semantically Close Natural Language Phrases (Page 31)
Srikanth Kamath
National Institute of Technology Karnataka, India

A Treebank Conversion Algorithm for Non-Configurational Languages (Page 35)
Ahmad Pouramini and Naser Mozayani
Iran University of Science and Technology, Iran

Treebanks Gone Bad: Generating a Treebank of Ungrammatical English (Page 39)
Jennifer Foster
Dublin City University, Dublin, Ireland

Paper Session: Detecting and Correcting Noisy Text

Text Correction Using Domain Dependent Bigram Models from Web Crawls (Page 47)
Christoph Ringlstetter, Max Hadersbeck*, Klaus U. Schulz* and Stoyan Mihov**
University of Alberta, Canada
*University of Munich, Germany
**Bulgarian Academy of Sciences, Sofia, Bulgaria

Enhanced Integrated Scoring for Cleaning Dirty Texts (Page 55)
Wilson Wong, Wei Liu and Mohammed Bennamoun
University of Western Australia, Australia

Investigation and Modeling of the Structure of Texting Language (Page 63)
Monojit Choudhury, Rahul Saraf*, Vijit Jain, Sudeshna Sarkar and Anupam Basu
Indian Institute of Technology, Kharagpur, India
*National Institute of Technology, Jaipur, India

Adding Sentence Boundaries to Conversational Speech Transcriptions using Noisily Labelled Examples (Page 71)
Tetsuya Nasukawa, Diwakar Punjani*, Shourya Roy*, L Venkata Subramaniam* and Hironori Takeuchi
IBM Research, Tokyo, Japan
*IBM Research, New Delhi, India

Multi-Level Feature Extraction for Spelling Correction (Page 79)
Johannes Schaback and Fang Li*
Technische Universitaet, Berlin, Germany
*Shanghai Jiao Tong University, China

Hidden Markov Model Based Identification of Transliterated Regional Language Words in Text Documents (Page 87)
Achuth Sankar S. Nair, Vrinda V. Nair* and Vinod Chandra S. S.**
University of Kerala, Thiruvananthapuram, India
*College of Engineering, Trissur, India
**College of Engineering, Thiruvananthapuram, India

Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters (Page 93)
Shivsubramani K., Loganathan Ramasamy, Srinivasan C. J., Ajay V. and Soman K. P.
Amrita Vishwa Vidyapeetham, India

A Causal Classification of Orthography Errors in Web Texts (Page 99)
Mirko Tavosanis
Univerity of Pisa, Italy

Paper Session: Information Extraction from Noisy Text

A Supervised Machine Learning Approach to Conjunction Disambiguation in Named Entities (Page 107)
Pawel Mazur and Robert Dale
Macquarie University, Australia

BlogVox: Separating Blog Wheat from Blog Chaff (Page 115)
Akshay Java, Pranam Kolari, Tim Finin, Justin Martineau, Anupam Joshi and James Mayfield*
University of Maryland, Baltimore County, MD, USA
*Johns Hopkins University, USA

An Automatic Approach to Semantic Annotation of Unstructured, Ungrammatical Sources: A First Look (Page 123)
Matthew Michelson and Craig Knoblock
Information Sciences Institute, University of Southern California, USA

Information Extraction for Multi-Participant, Task-Oriented, Synchronous, Computer-Mediated Communication: A Corpus Study of Chat Data (Page 131)
Cassandre Creswell, Nicholas Schwartzmyer and Rohini Srihari
Janya Inc., USA

Alignment of Noisy Unstructured Text Data (Page 139)
Julien Bourdaillet and Jean-Gabriel Ganascia
Universite Pierre et Marie Curie, France

Information Access to Historical Documents from the Early New High German Period (Page 147)
Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz and Christiane Wanzeck
University of Munich, Germany

On Extracting Structured Knowledge from Unstructured Business Documents (Page 155)
Gaurav Pandey and Rakshit Daga*
University of Minnesota, Twin Cities, USA
*SAP Labs, USA

Mining Conversational Text for Procedures (Page 163)
Deepak P. and Krishna Kummammuru
IBM Research, Bangalore, India

Panel Discussion

Noisy Text Analytics: An Exercise in Futility? (Page 171)
Daniel Lopresti (moderator)
Lehigh University

(Return to Top)

Endorsed by the International Association for Pattern Recognition

Supported by IBM Research

 

 

 

 

 

(Return to Top)

 

 

 

 

 

 

 

 

 

 

(Return to Top)

 

 

 

 

 

 

 

 

 

 

 

(Return to Top)

 

 

 

 

 

 

 

 

 

 

 

(Return to Top)