LADIS 2012 keynote and invited speakers include:
- Ron Bekkerman LinkedIn
- Flavio Junqueira Yahoo!
- Rodrigo Rodrigues Universidade Nova de Lisboa and CITI research lab
- Scott Shenker UC Berkeley and ICSI
Bios and Abstracts
Ron Bekkerman, LinkedIn
Title: Scaling Up Machine Learning
abstract: In this talk, I'll provide an extensive introduction to parallel and
distributed machine learning. I'll answer the questions "How actually
big is the big data?", "How much training data is enough?", "What do
we do if we don't have enough training data?", "What are platform
choices for parallel learning?" etc. Over an example of k-means
clustering, I'll discuss pros and cons of machine learning in Apache
Pig, MPI, DryadLINQ, and CUDA. Time permitting, I'll take a dive into
a super large scale text categorization task.
Bio: Ron Bekkerman is a computer engineer and scientist whose experience
spans across disciplines from video processing to business
intelligence. Currently a senior research scientist at LinkedIn, he
previously worked for a number of major companies including
Hewlett-Packard and Motorola. Ron completed his PhD in Computer
Science at the University of Massachusetts Amherst in 2007. He holds
BSc and MSc degrees from the Technion---Israel Institute of
Technology. Ron's research interests lie primarily in the area of
large-scale unsupervised learning. He is the corresponding author of
several publications in top-tier venues, such as ICML, KDD, SIGIR,
WWW, IJCAI, CVPR, EMNLP and JMLR.
Flavio Junqueira, Yahoo!
Title: Durability with BookKeeper
abstract: Durability is a property of data stores and refers to the persistence of committed data. To provide durability guarantees, data stores typically write data to a persistent store, such as a magnetic disk. Such writes can significantly hurt the performance of applications if not carefully designed, and a common technique to preserve durability while obtaining high performance is to use a write-ahead log (databases) or a journal (file systems). In fact, a number of production applications currently rely upon this technique: databases such as Sherpa and Hbase; distributed file systems like HDFS; messaging systems such as ActiveMQ and Hedwig. The implementation of the write-ahead log for these applications is critical both for correctness and performance.
We have developed BookKeeper to be a building block for these applications. BookKeeper is a service that provides efficient write-ahead logging even in the presence of a large number of concurrent logs. Such a feature enables BookKeeper to support applications that require many concurrent logs, like large scale pub/sub, and to have applications sharing a common pool of BookKeeper storage units. The pool of storage units can dynamically grow and shrink as storage units are added or decommissioned.
BookKeeper stripes and replicates log records for performance and fault tolerance, respectively. Over time, as applications roll logs to reclaim storage space, they may use different values for the degree of replication and of striping, thus leveraging the existence of a pool of available storage units to elastically change the amount of resources used for logging. We also leverage the pool of storage units by replacing faulty storage units dynamically.
BookKeeper is currently an Apache project and it has been adopted by applications inside and outside Yahoo!.
Bio: Flavio Junqueira is a Senior Research Scientist with Yahoo Research and leads the Scalable Computing group in Barcelona, Spain. He holds a PhD degree from University of California San Diego (UCSD) in computer science. His main research interest is distributed systems and algorithms, and he has focused on topics such as dependability, concurrency, and replication. He has additionally worked on projects related to the modeling of failures and vulnerabilities, systems for Web search, and storage systems. He is the recipient of awards and nominations, such as the CSE Department best PhD dissertation award, a nomination to the ACM PhD Dissertation award, and best paper awards at ACM CIKM 2009 and USENIX ATC 2010. He actively contributes to open source projects, such as Hadoop, ZooKeeper, BookKeeper, and S4 hosted by the Apache Software Foundation.
Rodrigo Rodrigues , Universidade Nova de Lisboa and CITI research lab
Title: Building fast and consistent geo-replicated systems
abstract: To provide a good user experience, Internet services deploy data centers in geographically diverse locations, and direct users to the closest data center. This raises a dilemma to the designers of the replication protocols employed by these services, who have to opt between either providing fast operations that execute in a single data center, or offering stronger consistency levels that prevent undesirable semantics due to operations in data centers not being aware of each other. This talk will discuss how to build geo-replicated systems that are fast if possible, and consistent when needed. The talk will summarize our recent research efforts in this area and outline a series of challenges that lie ahead. This is ongoing work with Allen Clement, Cheng Li and Daniel Porto (MPI-SWS), Nuno Preguiça (Universidade Nova de
Lisboa) and Johannes Gehrke (Cornell).
Bio: Rodrigo Rodrigues is an associate professor at the Universidade Nova de Lisboa and a member of the CITI research lab. Previously, he was a tenure-track faculty at the Max Planck Institute for Software Systems
(MPI-SWS) where he led the Dependable Systems Group, and an assistant professor at the Technical University of Lisbon / INESC-ID. He graduated from the Massachusetts Institute of Technology with a doctoral degree in 2005. During his PhD, he was a researcher at MIT's Computer Science and Artificial Intelligence Laboratory, under the supervision of Prof.
Barbara Liskov. He received his Master's degree from MIT in 2001, and an undergraduate degree from the Technical University of Lisbon in 1998. He has won several fellowships and awards, including a best paper award at the 18th ACM Symposium on Operating Systems Principles (SOSP), and a special recognition award from MIT's Department of Electrical Engineering and Computer Science.
Scott Shenker, US Berkeley and ICSI
Title: Software-Defined Networking: History, Hype, and Hope
abstract:Software-Defined Networking (SDN) has all the signs of a fad: massive hype in the trade rags, an increasing number of academic papers, and widespread confusion about what SDN really means (Isn't it just OpenFlow? Isn't it all about centralization? Isn't it all just hot air?). This talk will try to dispel some of this confusion by discussing how SDN arises from a few natural abstractions for the network control plane.
Bio: Scott Shenker spent his academic youth studying theoretical physics but soon gave up chaos
theory for computer science . Continuing to display a remarkably short attention span, his
research over the years has wandered from performance modeling and networking to game
theory and economics. Unable to focus on any single topic, his current research projects
include cluster programming models, genomic sequence aligners, software-defined
networking, and Internet architecture. Unable to hold a steady job, he currently splits
his time between the U. C. Berkeley Computer Science Department and the International
Computer Science Institute.