International Workshop on

 

Massive Data Analytics over the Cloud (MDAC2010)

 

in conjunction with WWW2010

 

April 26, 2010, Raleigh, North Carolina, USA


Important Dates  |  Submission Guidelines  |  Workshop Committee   |  Program Committee


 

 

Invited Talks by Amr Awadallah, CTO, Cloudera Inc and Vijay Narayanan, Yahoo! Labs Silicon Valley

Papers available for download from ACM DL

Workshop presentations are available here.

 

Internet and traditional organizations continue to be faced with the challenge of making sense of the mountains of data that are everywhere, be it the blogs on the Web, transactions performed at Web-commerce sites, data generated by telecom switches, health care systems, bioinformatics and even in homeland security. In this new data-flooded world, the key question is: How do we analyze these enormous amounts of data in a timely and cost-effective manner? Clearly, the traditional model of building bigger machines and more storage does not apply here as the data is growing at a rate which is impossible to keep up with using scale-up methods. The promising solutions, such as those based on MapReduce, use racks of commodity servers with locally attached storage and are able to scale out quickly at low cost. In general, there is a need to assemble resources on demand – the motivation for Cloud Computing. Broadly speaking, Cloud Computing represents the desire to migrate from the traditional server-centric computing architecture to a totally network-centric architecture where logical computing resources can be assembled flexibly, on demand. MapReduce is a software framework introduced by Google, in an attempt to bring the benefits of cloud computing to tackling the problems posed by massive datasets. While the early signs in terms of support from the developer community, academia and industry are quite encouraging, there are many open problems that still need to be addressed. Some of them are:

  • Is there a class of problems/workloads for which this distributed computing over commodity machines is the best solution?
  • How amenable are these algorithms to expression using MapReduce or other simple paradigms?
  • How do we define and use effective and practical metrics to compare the capability of different solutions being offered?
  • Do we need different solutions for different data types or algorithms? How do we build bridges between them?
  • Are there viable alternatives, such as Pregel or others for important sets of problems?
  • How do we integrate these new computing platforms with existing traditional data warehouse-based analytics systems?
  • Challenges in managing the massive datasets - Security, versioning, archiving?
  • Tackling the legal/moral challenges associated with mining those datasets?

We invite researchers working in any of the following areas to participate:

  • Data intensive applications of cloud computing
  • Algorithms for massive data analysis such as data mining, statistics, network, and predictive algorithms
  • Distributed data management, retrieval and mining
  • Novel architectures for cloud computing
  • Map-reduce and its generalizations
  • Large scale social media analysis
  • Privacy Preservation in Cloud
  • Parallel databases
  • Storage as a Service
  • Cloud resource management
  • Fault Tolerance Assessment and Management
  • Reliability of applications running over the cloud

 

Important Dates

Manuscripts due: March 1, 2010

Notification of acceptance:  March 19, 2010

Final revised manuscript:  March 31, 2010

Workshop:   April 26, 2010

 

Submission Guidelines

We welcome original, unpublished manuscripts of upto 6 pages (2 column format) inclusive of all references and figures. Vision papers and descriptions of work-in-progress are welcomed as short paper submissions (4 pages). Papers must be written in English, and formatted according to WWW 2010 proceeding format.

Submission Site: Papers are to be submitted online at https://cmt.research.microsoft.com/MDAC2010/.

Proceedings: The proceedings will be published electronically as an ACM ICPS volume (ISBN: 978-1-60558-991-6) and will be available on ACM Digital Library. We will be following ACM Copyright and plagiarism policies.

 

Workshop Committee

Workshop Chairs

Ullas Nambiar
IBM India Research Lab, New Delhi, India

 

John McPherson
IBM Almaden Research Center
, USA


David Konopnicki
IBM Haifa Research Lab, Israel

 

Steering Committee

Rakesh Agrawal
Microsoft Search Labs, Mountain View, CA, USA


Alon Halevy
Google Inc., Mountain View, CA, USA

 

Program Committee

Amr Awadallah, Cloudera, USA

Andrew McCallum, University of Massachusetts Amherst, USA

Assaf Schuster, Technion - Israel Institute of Technology

Gautam Das, University of Texas, Arlington, USA

Jimeng Sun, IBM Watson Research Center, USA

John Shafer, Microsoft Search Labs, USA

Kevin Chang, University of Illinois at Urbana-Champaign, USA

Kun Liu, Yahoo! Labs, USA

Louiqa Raschid, University of Maryland, College Park, USA

Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel

Michael Sheng, University of Adelaide, Australia

Mong Li Lee, National University of Singapore, Singapore

Rajeev Gupta, IBM India Research Lab, India

Vanja Josifovski, Yahoo Research, USA

Yannis Sismanis, IBM Almaden Research Center, USA

Yi Chen, Arizona State University, USA

Wen-syan Li, SAP, China