Computer Science Colloquia
SPEAKER: Dan Feldman*, MIT, The Distributed Robotics Lab
TOPIC: Learning patterns in big data from small data using core-sets
DATE: Friday, March 1st
TIME: 3:30 p.m.
PLACE: MEC 205 followed by reception in Rice Hall 4th Floor Atrium
HOST: Jack Stankovic
Abstract: When we need to solve an optimization problem we usually use
the best available algorithm/software or try to improve it. In recent
years we have started exploring a different approach: instead of
improving the algorithm, reduce the input data and run the existing
algorithm on the reduced data to obtain the desired output much faster.
A core-set for a given problem is a semantic compression of its input,
in the sense that a solution for the problem with the (small) coreset as
input yields a provable approximate solution to the problem with the
original (Big) Data. Core-set can usually be computed via one pass over
a streaming input, manageable amount of memory, and in parallel. For
real time performance we use Hadoop, Clouds and GPUs.
In this talk I will describe how we applied this magical paradigm to
obtain algorithmic achievements with performance guarantees in iDiary: a
system that combines sensor networks, robotics, differential privacy,
and text mining. It turns large signals collected from smart-phones or
robots into maps and textual descriptions of their trajectories. The
system features a user interface similar to Google Search that allows
users to type text queries on their activities (e.g., "Where did I have
dinner last time I visited Paris?") and receive textual answers based on
Bio: Dan Feldman is a post-doc at MIT in the Distributed Robotics Lab,
where he develops systems for handling streaming Big data from sensors,
smartphones, images, and robots. He got his Ph.D. from Tel-Aviv
University in 2010, under the supervision of Prof. Micha Sharir and
Prof. Amos Fiat. He then was a postdoc at the Center for the Mathematics
of Information at Caltech for a year and a half, where he started to
reduce the gap between theoretical computational geometry and practical
machine learning. He is specialized in developing software for scalable
data compression, based on core-set constructions with provable
guarantees. His coresets were implemented in several start-ups, banks,
super-markets, and internet search companies over the recent years, to
name just a few. When he is not working, Dan is building robots with his
very own coresets, Ariel and Eleanor.
*Mr. Feldman is a faculty candidate for the Department of Computer Science.