Expected edit distance

The only rigorous bound I am aware of is due to Gonzalo Navarro*

$$c\geq 1-{\rm e}/\sqrt{\sigma},$$

for an alphabet of $\sigma$ characters. Obviously, for the binary string ($\sigma=2$) this bound is ineffective. Navarro also mentions a large-$\sigma$ conjecture $c=1-1/\sqrt{\sigma}$, which for the binary string would give $c=0.2929$, quite close to your numerical finding.

*G. Navarro, A guided tour to approximate string matching (2001)


I am not sure there is any proven lower bound so far however.

This should really be a comment but it is too long.

A lot depends on how good lower bound you want. The trivial one can be obtained by just saying that if we have $k$ deletions, $k$ insertions, and $m$ substitutions, we can get just ${n\choose k}^2{n-k\choose m}$ new sequences from a given one. Now, if that is much less than $2^n$ for every $k,m$ with $2k+m < cn$, then the limit is at least $c$. Using the simple approximation $m!\approx m^me^{-m}$, we see that all we need is to ensure that $$ 2\mu\log\mu+(1-\mu)\log(1-\mu)+\nu\log\nu+(1-\mu-\nu)\log(1-\mu-\nu)>-\log 2 $$ whenever $2\mu+\nu\le c$, which (after some computations) yields $c>0.18$. Of course, this is very crude because it doesn't take into account the fact that the number of ways to convert one sequence into another within the edit distance is typically exponential in $n$ too.


I did some simulations myself, (5000 of each string length 1-1000) and thought I would share them here. They are plotted against Carlo's conjectured bound of $1-1/\sqrt{2}$ (in blue).

To me, the conjecture seems quite likely, since the smallest average distance I get is 0.296 at $n=1000$. enter image description here