How to define sparseness of a vector?

You could of course generalize your current measure

\begin{align} S(X) = \frac{\frac{k^{(1/m)}}{k^{(1/n)}} -\frac{\|X\|_m}{\|X\|_n} } {\frac{k^{(1/m)}}{k^{(1/n)}}-1} \end{align}

while preserving your properties you specified.

An interesting special case could be $m = 1, n \to \infty$, in which case the expression simplifies to

\begin{equation} S(X) = \frac{k-\frac{\|X\|_1}{\|X\|_c}}{k-1} \end{equation}

where $c = \infty$, (for some reason, mathjax refused to render when I inserted $\infty$ directly in the fraction)


There is a definition of sparsity, which is used (amongst others) in the compressed sensing literature, see e.g. here.

A vector $x\in \mathbb{C}^k$ is called $s$-sparse, if $|| x ||_0 = |\text{supp}(x)| \leq s$, that is, it has at most $s$ non-zero entries. Denote by $\Sigma_s$ the set of all such vectors. Then, the $s$-term approximation error of a vector $x\in \mathbb{C}^k$ is defined as $$ \sigma_s(x)_p = \min_{y\in\Sigma_s} ||x-y||_p. $$

Now this quantity equals $0$, if your vector $x$ is $s$-sparse, and will be greater than $0$ otherwise. Note that you now have two parameters $s$ and $p$ to tune this "measure". Clearly, you get your definition of sparsity if you set $s=1$.


Disclaimer: This post considers the case in which you do not mind some computational effort to get nice sparseness value. For something new, please skip to part 2.

Part 1

I agree with Mikael, that kind of generalization is nice, what's more, with Mikael's formula it is intuitive from where it came from: the most basic notion for vector sparseness would be $$\frac{\text{number of indices $k$ such that }X_k = 0}{\text{total number of indices}}.$$

However, by this definition $\langle 0, 0, \ldots, 0\rangle$ is sparse, but vector $\langle c, c, \ldots, c \rangle$ is not. Still, it is easy to fix it: $$\frac{\text{number of indices on which }X_k - c = 0}{\text{total number of indices}}\,,$$ where $c$ is some average of $X$, e.g. $c = \|X\|_\infty$. The problem with this measure is that it is not easy to count the number of indices. To alleviate for that, we could approximate the number of indices by $\frac{\|X\|_1}{\|X\|_\infty}$. Naturally we need some normalization, and by that we arrive at Mikael's special case: $$\frac{k-\frac{\|X\|_1}{\|X\|_\infty}}{k-1}.$$

But the average we took as an example $x = \|X\|_\infty$ isn't the only one. Similarly we could approximate the number of indices in different fashions: $\frac{\|X\|_m}{\|X\|_n}$ would do for any $m < n$, and the normalization is just $$\frac{\frac{\|C\|_m}{\|C\|_n}-\frac{\|X\|_1}{\|X\|_\infty}}{R_\max-R_\min}, R_\max = \frac{\|C\|_m}{\|C\|_n}, R_\min = \frac{\|D\|_m}{\|D\|_n},$$ where $C = \langle c, c, \ldots, c \rangle$ and $D = \langle c, 0, 0, \ldots, 0 \rangle$ for any $c \neq 0$.

Part 2

Still, this measure is rather weird, because intuitively it more depends on the sizes of the values, than how many different numbers there are. I am not sure that this is a property we would want to have. There is a different measure that can take that into account, i.e. measure based on entropy. Interpreting $X$ as the samples, one can calculate $$ -\sum_i P(X = i) \log_k P(X = i) .$$

To soften this a bit just pick any distribution you want (best specific to your application), e.g. $F_\mu = N(\mu, \sigma^{2})$, set $$F = \frac{1}{k}\sum_i F_{X_i},$$ and then calculate differential entropy ($f$ is the density function of $F$): $$-\int_\mathbb{R} f(x) \ln f(x) \ dx$$ or even better relative entropy if you have some reference measure (e.g. the very same $F_\mu$ adjusted a bit might do the trick). Of course, all this has to be scaled to $[0,1]$ what makes the formulas even nastier, however, in my opinion, it catches the notion of sparseness pretty good. Finally, you can combine the two approaches in infinitely many ways, to get even more sparseness models!