IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data

Hyderabad, India - January 8, 2007


Home
Programme
Proceedings
Call for Papers
Important Dates
People
Submission
Attendance
Contact
IBM Research
Supported by IBM Research
IAPR
Endorsed by the International Association for Pattern Recogntion

Programme

   
09.00 - 11.05 Session I
   
09.00 - 9.05 Introduction [ppt]
Craig Knoblock
   
9.05 - 9.55 Keynote address by Gerald DeJong
University of Illinois at Urbana-Champaign
Robustness through Prior Knowledge: Using Explanation-Based Learning to Distinguish Handwritten Chinese Characters [ppt]
   
09.55 - 11.05 Sub-Session: Classification of Noisy Text (Session Chair: Craig Knoblock)
   
9.55 - 10.15 Paper 1 Finding Structure in Noisy Text: Topic Classification and Unsupervised Clustering [ppt]
Rohit Prasad, Prem Natarajan, Krishna Subramanian, Shirin Saleem and Rich Schwartz
BBN Technologies, Cambridge, MA, USA.
   
10.15 - 10.35 Paper 2 Genre as Noise - Noise in Genre [ppt]
Andrea Stubbe, Christoph Ringlstetter* and Klaus U. Schulz
University of Munich, Germany.
*University of Alberta, Canada.
   
10.35 - 10.55 Paper 3 A Practical Implementation of Automatic Text Categorisation and Correction of Noisy OCR Documents into Braille and Large Print
Ryan Brooks, William John Teahan and David Hunnisett*
University of Wales, Bangor, UK.
*ETL Solutions Ltd., UK.
   
10.55 - 11.05 Boasters for Posters 1, 2, 3 and 4
   
11.05 - 11.30 Tea/Coffee Break
   
11.30 - 13.00 Session II: Detecting and Correcting Noisy Text (Session Chair: Venu Govindaraju)
   
11.30 - 11.50 Paper 1 Text Correction Using Domain Dependent Bigram Models from Web Crawls [ppt]
Christoph Ringlstetter, Max Hadersbeck*, Klaus U. Schulz* and Stoyan Mihov**
University of Alberta, Canada.
*University of Munich, Germany.
**Bulgarian Academy of Sciences, Sofia, Bulgaria.
   
11.50 - 12.10 Paper 2 Enhanced Integrated Scoring for Text Preprocessing in Ontology Engineering from Dirty Text [ppt]
Wilson Wong, Wei Liu and Mohammed Bennamoun
University of Western Australia, Australia.
   
12.10 - 12.30 Paper 3 Investigation and Modeling of the Structure of Texting Language [ppt]
Monojit Choudhury, Rahul Saraf*, Vijit Jain, Sudeshna Sarkar and Anupam Basu
Indian Institute of Technology, Kharagpur, India.
*National Institute of Technology, Jaipur, India.
   
12.30 - 12.50 Paper 4 Adding Sentence Boundaries to Conversational Speech Transcriptions using Noisily Labelled Examples [ppt]
Tetsuya Nasukawa, Diwakar Punjani*, Shourya Roy*, L Venkata Subramaniam* and Hironori Takeuchi
IBM Research, Tokyo, Japan.
*IBM Research, New Delhi, India.
   
12.50 - 13.00 Boasters for Posters 5, 6, 7 and 8
   
13.00 - 14.00 Lunch
   
14.00 - 15.30 Session III: Information Extraction from Noisy Text (Session Chair: Klaus Schulz)
   
14.00 - 14.20 Paper 1 A Supervised Machine Learning Approach to Conjunction Disambiguation in Named Entities [ppt]
Pawel Mazur and Robert Dale
Macquarie University, Australia.
   
14.20 - 14.40 Paper 2 BlogVox: Separating Blog Wheat from Blog Chaff [ppt]
Akshay Java, Pranam Kolari, Tim Finin, Justin Martineau, Anupam Joshi and James Mayfield*
University of Maryland, Baltimore County, MD, USA.
*Johns Hopkins University, USA.
   
14.40 - 15.00 Paper 3 An Automatic Approach to Semantic Annotation of Unstructured, Ungrammatical Sources: A First Look [ppt]
Matthew Michelson and Craig Knoblock
Information Sciences Institute, University of Southern California, USA.
   
15.00 - 15.20 Paper 4 Information Extraction for Multi-Participant, Task-Oriented, Synchronous, Computer-Mediated Communication: A Corpus Study of Chat Data [ppt]
Cassandre Creswell, Nicholas Schwartzmyer and Rohini Srihari
Janya Inc., USA.
   
15.20 - 15.30 Boasters for Posters 9, 10, 11 and 12
   
15.30 - 16.00 Tea/Coffee Break (along with Posters)
   
16.00 - 18.30 Session IV
   
16.00 - 17.00 Poster Presentations
   
Paper 1 Discovering Identies in Web Contexts Using Unsupervised Clustering [ppt]
Ted Pedersen and Anagha Kulkarni
University of Minnesota, Duluth, MN, USA.
   
Paper 2 Ontology Based Algorithms for Indexing and Search of Semantically Close Natural Language Phrases [ppt]
Srikanth Kamath
National Institute of Technology Karnataka, India.
   
Paper 3 A Treebank Conversion Algorithm for Non-Configurational Languages
Ahmad Pouramini and Naser Mozayani
Iran University of Science and Technology, Iran.
   
Paper 4 Generating a Treebank of Ungrammatical English [ppt]
Jennifer Foster
Dublin City University, Dublin, Ireland.
   
Paper 5 Multi-Level Feature Extraction for Spelling Correction [pdf]
Johannes Schaback and Fang Li*
Technische Universitaet, Berlin, Germany.
*Shanghai Jiao Tong University, China.
   
Paper 6 Hidden Markov Model Based Identification of Transliterated Regional Language Words in Text Documents [ppt]
Achuth Sankar S. Nair, Vrinda V. Nair* and Vinod Chandra S. S.**
University of Kerala, Thiruvananthapuram, India.
*College of Engineering, Trissur, India.
**College of Engineering, Thiruvananthapuram, India.
   
Paper 7 Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters [ppt]
Shivsubramani K, Loganathan Ramasamy, Srinivasan CJ, Ajay V and Soman KP
Amrita Vishwa Vidyapeetham, India.
   
Paper 8 A Causal Characterisation of Orthography Errors in Web Texts [ppt]
Mirko Tavosanis
Univerity of Pisa, Italy.
   
Paper 9 Alignment of Noisy Unstructured Text Data [ppt]
Julien Bourdaillet and Jean-Gabriel Ganascia
Universite Pierre et Marie Curie, France.
   
Paper 10 Information Access to Historical Documents from the Early New High German Period [ppt]
Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz and Christiane Wanzeck
University of Munich, Germany.
   
Paper 11 On Extracting Structured Knowledge from Unstructured Business Documents [ppt]
Gaurav Pandey and Rakshit Daga*
University of Minnesota, Twin Cities, USA.
*SAP Labs, USA.
   
Paper 12 Mining Conversational Text for Procedures [ppt]
Deepak S. Padmanabhan and Krishna Kummammuru
IBM Research, Bangalore, India.
   
17.00 - 18.00 Panel Discussion: Noisy Text Analytics: An Exercise in Futility?
Daniel Lopresti (moderator), Lehigh University, Bethelehem, PA, USA. [ppt]
Sreeram Balakrishnan, IBM Research, New Delhi, India. [ppt]
Hwee Tou Ng, National University of Singapore, Singapore. [ppt]
Rohini Srihari, Janya Inc., USA. [ppt]
   
18.00 - 18.03 IAPR Best Student Paper Award Announcement [ppt]
Raghuram Krishnapuram
IBM Research, New Delhi, India.
   
18.03 - 18.10 Closing [ppt]
Craig Knoblock, Daniel Lopresti, Shourya Roy, L. Venkata Subramaniam
   
18.00 - 22.00 IJCAI Inauguration and Welcome Dinner




On 7th January, Sunday, evening there will be an IBM Welcome and Dinner
AND - 07 attendees and other special invitees are welcome. If you plan
on attending it please let Raj Sharma (rajks_AT_in.ibm.com) know.

List of Workshop Registrants as on 1 jan. 2007.