TTIC
Toyota Technological Institute at Chicago  

Amit Chakrabarti

Dartmouth College

Estimating Entropy and Entropy Norm on Data Streams

May 26, 2006 2:30pm

Abstract:

We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the ``entropy norm'' of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space one-pass algorithm for this problem. For a stream of $m$ items and a given real parameter $\alpha$, our algorithm uses spac $\tilde{O}(m^{2\alpha})$ and provides an approximation of $1/\alpha in the worst case and $(1+\eps)$ in ``most'' cases. We then present a two-pass polylogarithmic space $(1+\eps)$-approximation algorithm. All our algorithms are quite simple.

If you have questions, or would like to meet the speaker, please contact Ponda at 4-1994 or pondabarnes@tti-c.org. For information on future TTI-C talks or events, please go to the TTI-C Events page.



return to events page