IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data

Hyderabad, India - January 8, 2007

IBM Research
Supported by IBM Research
Endorsed by the International Association for Pattern Recogntion


09.00 - 11.05 Session I
09.00 - 9.05 Introduction [ppt]
Craig Knoblock
9.05 - 9.55 Keynote address by Gerald DeJong
University of Illinois at Urbana-Champaign
Robustness through Prior Knowledge: Using Explanation-Based Learning to Distinguish Handwritten Chinese Characters [ppt]
09.55 - 11.05 Sub-Session: Classification of Noisy Text (Session Chair: Craig Knoblock)
9.55 - 10.15 Paper 1 Finding Structure in Noisy Text: Topic Classification and Unsupervised Clustering [ppt]
Rohit Prasad, Prem Natarajan, Krishna Subramanian, Shirin Saleem and Rich Schwartz
BBN Technologies, Cambridge, MA, USA.
10.15 - 10.35 Paper 2 Genre as Noise - Noise in Genre [ppt]
Andrea Stubbe, Christoph Ringlstetter* and Klaus U. Schulz
University of Munich, Germany.
*University of Alberta, Canada.
10.35 - 10.55 Paper 3 A Practical Implementation of Automatic Text Categorisation and Correction of Noisy OCR Documents into Braille and Large Print
Ryan Brooks, William John Teahan and David Hunnisett*
University of Wales, Bangor, UK.
*ETL Solutions Ltd., UK.
10.55 - 11.05 Boasters for Posters 1, 2, 3 and 4
11.05 - 11.30 Tea/Coffee Break
11.30 - 13.00 Session II: Detecting and Correcting Noisy Text (Session Chair: Venu Govindaraju)
11.30 - 11.50 Paper 1 Text Correction Using Domain Dependent Bigram Models from Web Crawls [ppt]
Christoph Ringlstetter, Max Hadersbeck*, Klaus U. Schulz* and Stoyan Mihov**
University of Alberta, Canada.
*University of Munich, Germany.
**Bulgarian Academy of Sciences, Sofia, Bulgaria.
11.50 - 12.10 Paper 2 Enhanced Integrated Scoring for Text Preprocessing in Ontology Engineering from Dirty Text [ppt]
Wilson Wong, Wei Liu and Mohammed Bennamoun
University of Western Australia, Australia.
12.10 - 12.30 Paper 3 Investigation and Modeling of the Structure of Texting Language [ppt]
Monojit Choudhury, Rahul Saraf*, Vijit Jain, Sudeshna Sarkar and Anupam Basu
Indian Institute of Technology, Kharagpur, India.
*National Institute of Technology, Jaipur, India.
12.30 - 12.50 Paper 4 Adding Sentence Boundaries to Conversational Speech Transcriptions using Noisily Labelled Examples [ppt]
Tetsuya Nasukawa, Diwakar Punjani*, Shourya Roy*, L Venkata Subramaniam* and Hironori Takeuchi
IBM Research, Tokyo, Japan.
*IBM Research, New Delhi, India.
12.50 - 13.00 Boasters for Posters 5, 6, 7 and 8
13.00 - 14.00 Lunch
14.00 - 15.30 Session III: Information Extraction from Noisy Text (Session Chair: Klaus Schulz)
14.00 - 14.20 Paper 1 A Supervised Machine Learning Approach to Conjunction Disambiguation in Named Entities [ppt]
Pawel Mazur and Robert Dale
Macquarie University, Australia.
14.20 - 14.40 Paper 2 BlogVox: Separating Blog Wheat from Blog Chaff [ppt]
Akshay Java, Pranam Kolari, Tim Finin, Justin Martineau, Anupam Joshi and James Mayfield*
University of Maryland, Baltimore County, MD, USA.
*Johns Hopkins University, USA.
14.40 - 15.00 Paper 3 An Automatic Approach to Semantic Annotation of Unstructured, Ungrammatical Sources: A First Look [ppt]
Matthew Michelson and Craig Knoblock
Information Sciences Institute, University of Southern California, USA.
15.00 - 15.20 Paper 4 Information Extraction for Multi-Participant, Task-Oriented, Synchronous, Computer-Mediated Communication: A Corpus Study of Chat Data [ppt]
Cassandre Creswell, Nicholas Schwartzmyer and Rohini Srihari
Janya Inc., USA.
15.20 - 15.30 Boasters for Posters 9, 10, 11 and 12
15.30 - 16.00 Tea/Coffee Break (along with Posters)
16.00 - 18.30 Session IV
16.00 - 17.00 Poster Presentations
Paper 1 Discovering Identies in Web Contexts Using Unsupervised Clustering [ppt]
Ted Pedersen and Anagha Kulkarni
University of Minnesota, Duluth, MN, USA.
Paper 2 Ontology Based Algorithms for Indexing and Search of Semantically Close Natural Language Phrases [ppt]
Srikanth Kamath
National Institute of Technology Karnataka, India.
Paper 3 A Treebank Conversion Algorithm for Non-Configurational Languages
Ahmad Pouramini and Naser Mozayani
Iran University of Science and Technology, Iran.
Paper 4 Generating a Treebank of Ungrammatical English [ppt]
Jennifer Foster
Dublin City University, Dublin, Ireland.
Paper 5 Multi-Level Feature Extraction for Spelling Correction [pdf]
Johannes Schaback and Fang Li*
Technische Universitaet, Berlin, Germany.
*Shanghai Jiao Tong University, China.
Paper 6 Hidden Markov Model Based Identification of Transliterated Regional Language Words in Text Documents [ppt]
Achuth Sankar S. Nair, Vrinda V. Nair* and Vinod Chandra S. S.**
University of Kerala, Thiruvananthapuram, India.
*College of Engineering, Trissur, India.
**College of Engineering, Thiruvananthapuram, India.
Paper 7 Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters [ppt]
Shivsubramani K, Loganathan Ramasamy, Srinivasan CJ, Ajay V and Soman KP
Amrita Vishwa Vidyapeetham, India.
Paper 8 A Causal Characterisation of Orthography Errors in Web Texts [ppt]
Mirko Tavosanis
Univerity of Pisa, Italy.
Paper 9 Alignment of Noisy Unstructured Text Data [ppt]
Julien Bourdaillet and Jean-Gabriel Ganascia
Universite Pierre et Marie Curie, France.
Paper 10 Information Access to Historical Documents from the Early New High German Period [ppt]
Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz and Christiane Wanzeck
University of Munich, Germany.
Paper 11 On Extracting Structured Knowledge from Unstructured Business Documents [ppt]
Gaurav Pandey and Rakshit Daga*
University of Minnesota, Twin Cities, USA.
*SAP Labs, USA.
Paper 12 Mining Conversational Text for Procedures [ppt]
Deepak S. Padmanabhan and Krishna Kummammuru
IBM Research, Bangalore, India.
17.00 - 18.00 Panel Discussion: Noisy Text Analytics: An Exercise in Futility?
Daniel Lopresti (moderator), Lehigh University, Bethelehem, PA, USA. [ppt]
Sreeram Balakrishnan, IBM Research, New Delhi, India. [ppt]
Hwee Tou Ng, National University of Singapore, Singapore. [ppt]
Rohini Srihari, Janya Inc., USA. [ppt]
18.00 - 18.03 IAPR Best Student Paper Award Announcement [ppt]
Raghuram Krishnapuram
IBM Research, New Delhi, India.
18.03 - 18.10 Closing [ppt]
Craig Knoblock, Daniel Lopresti, Shourya Roy, L. Venkata Subramaniam
18.00 - 22.00 IJCAI Inauguration and Welcome Dinner

