Assistant Professor, Statistics and Electrical & Computer Engineering
Machine learning and data mining applications are constantly dealing with datasets at Terabyte (TB) scale, and the anticipation is that soon it will reach Petabyte (PB) level. At this scale, classical approaches to inference and learning fail to simultaneously address new concerns of computational resources storage limitations, network communication constraints, energy efficiently, real-time latency, etc. These challenges prompt a fundamental
question: how can we design machine learning algorithms which are significantly resource frugal?
To circumvent with these big-data challenges, we design randomized algorithms which in addition to being practical come with theoretical provable guarantees. The proposed algorithms trade a very small amount of certainty, which typically is insignificant for most purposes, with a huge, often exponential gains, in the computations and the memory.
Furthermore, the proposed algorithms are massively parallelizable and naturally amenable to the map-reduce framework making them ideal for modern big data systems.
We will show that there is still a possibility of obtaining significant computational and memory gains in some of the frequent task including search, alignment and deep learning.