Book on a Rod

I was wondering how to quantitatively represent the capacity of an VAE’s latent variable (or a reconstruction step of a diffusion model) in bits, making it comparable to other distributions, such as the data distribution?

Then, a paradox I saw like 8 years ago came to my mind:

“Theoretically, you can store all information of a book on a rod. To do that, encode the book’s content into a decimal string, put it behind 0. , and mark at the position of that number fraction of the rod’s length.”

What prohibits one to do that in reality is the noise occurs when inscribing and mesuring the mark. So, in reality, it would be more like “You can store at most 50*x bits on a rod of x meters, with one mark on it.”

That paradox is a down-to-earth version of the following:

“The Shannon entropy of a non-singular continuous distribution (such as Gaussian) is infinity. Therefore, the latent vairable have infinity capacity.

That means, with Shannon entropy as the measure, the capacity of a VAE’s latent variable is larger than all categorical data distribution, even the latent space is only one dimensional.

Yes, we have differential entropy, which provides a convergent measure of entropy of continuous distributions. But it is not robust to scaling, and we can’t really say “the distribution has 10 bits of differential entropy”.

After days of thinking, I believe that what’s missing is some noise when we access the latent variable. The noise prevents an encoded code (the mark) to occupy infinitely small distinct subset of the latent space (the rod). With an additive Gaussian noise, we can yield a nice answer to the original problem:


Given a dd-dimensional latent variable ZN(0,s2I) Z\sim\mathcal{N}(0,,s^{2}I), and a additive Gaussion noise when accessing the latent variable NN(0,t2I)N\sim\mathcal{N}(0,t^{2}I), the capacity of the latent variable becomes I(Z,Z+N)=12dlog(1+s2t2)I(Z,Z+N)=\frac{1}{2}d\log({1+\frac{s^2}{t^2}}).


This mesure is robust to scaling as ss and tt scale together. It can be mesured in bits.

The modified VAE now looks like a noisy channel, and the equation looks like the equation in Shannon–Hartley theorem.