Professor, Computer Science & Artificial Intelligence
Many algorithmic developments for massive data have been enabled by the method of random projection. In particular, the well known Johnson-Lindenstrauss (JL) lemma states that, given a data set of n points, one can represent each point using only O(log n ) numbers while still preserving all distances up to a constant factor. This lemma has been recently shown, by Green Larsen and Nelson, to be tight. In this talk I will show that, despite the aforementioned lower bound, one can still improve over the JL lemma, by designing an algorithm that represents each point using only O(log n) *bits* on the average (under mild precision assumptions). I will then describe how this approach leads to data analysis algorithms that are more efficient than previously known.
The randomized projection method and related techniques yield algorithms that are very efficient as a function of the input size but whose running times are inversely polynomial (as opposed to, say, logarithmic) in the accuracy parameter. In the second part of this talk I will present evidence that a dependence of this type is unavoidable. In particular, I will show that, for a large class of kernel problems including Support Vector Machines, any algorithm that solves the problem up to a super-polynomial precision must suffer from a running time that is at least quadratic in the input size, unless a certain complexity-theoretic conjecture is false.