What is the mistake in the proof of the Homotopy hypothesis by Kapranov and Voevodsky?
The MathSciNet review (by Julie Bergner) of Simpson's book: Homotopy theory of higher categories, New Mathematical Monographs, 19. Cambridge University Press, Cambridge, 2012, has a note about the counterexample.
REVIEWER'S ADDENDUM (October, 2015): While not explicitly stated as such, this book contains a counterexample of a result of M. M. Kapranov and V. Voevodsky (stated in Uspekhi Mat. Nauk 45 (1990), no. 5(275), 183–184; MR1084995 and presented in Cahiers Topologie Géom. Différentielle Catég. 32 (1991), no. 1, 29–46; MR1130401) that any n-type can be obtained as the realization of a strict n-groupoid. R. Brown and P. J. Higgins proved in Cahiers Topologie Géom. Différentielle 22 (1981), no. 4, 371–386; MR0639048 that the Whitehead products vanish for the realization of a strict n-groupoid, and C. Berger proved in Higher homotopy structures in topology and mathematical physics (Poughkeepsie, NY, 1996), 49–66, Contemp. Math., 227, Amer. Math. Soc., Providence, RI, 1999; MR1665460 that this result holds even if inverses are taken to be weak; the result was also mentioned by Grothendieck in his letter Pursuing Stacks. Simpson's argument in this book shows that, even under a slightly more general realization functor, these results imply that one cannot obtain the 3-type of $S^2$ as the realization of any strict 3-groupoid, contradicting the claim of Kapranov and Voevodsky.
Here is my guess. To compare spaces with their notion of strict $\infty$-groupoids (in which everything is strict except inverses) Kapranov and Voevodsky use an intermediate category of Kan diagrammatic sets, which they show to be equivalent to both spaces and strict $\infty$-groupoids (after inverting a suitable collection of weak equivalences). Whatever Kan diagrammatic sets are, they seem to be a non-strict model, and so let's assume that they do form a model for spaces. In this case the mistake must be in the comparison of Kan diagrammatic sets and strict $\infty$-groupoids (Theorem 3.7). This theorem relies on Proposition 3.5 which compares the homotopy groups of a Kan diagrammatic set $X$ and the homotopy groups of the strict $\infty$-groupoid $\Pi(X)$ generated from $X$. This comparison, in turn, is based on Lemma 3.4 which says that any morphism in $\Pi(X)$ can be realized via a single pasting diagram in $X$, which are in some sense the cells of $X$ (since $X$ is a presheaf on pasting diagrams). But this claim doesn't seem to be true, and the reason is that when one generates the $\infty$-groupoid $\Pi(X)$ one doesn't only freely add morphisms, but also identifies pairs of morphisms which are supposed to be the same in a strict $\infty$-category structure. This means, for example, that if two different pasting diagrams coincide after this identification, then the identity morphism between them might not be a pasting diagram in $X$ (or at least, one would have to explicitly argue why this would be the case). The proof of Lemma 3.4 seems to be vague enough to allow for this subtlety to slip. All of this could be wrong of course, but if I had to pick one possibly problematic lemma it would be this Lemma 3.4.
It's been more than a year and a half since I asked this question and I had a lot of thought about it so I decided I will post my own answer.
First I entirely agree with Yonatan that the main problem is with lemma 3.4. The specific problem that he mentions appears exactly because of the "degeneracy maps" that are added by Voevodsky and Kapranov to their category of diagrams. More precisely, one can, using the degeneracies, construct a diagrammatic set whose realization will "collapse" because of the Eckman-Hilton argument, and quite interestingly if one modifies their definition so that the category has no degeneracies then this no longer happens (the free $\infty$-category is just obtained by "freely adding arrows" gradually as they assume it behaves in the paper). So if one thinks that degeneracies correspond exactly to units, this is very encouraging for the Simpson conjecture. I haven't been able to make this into a clear counterexample of the lemma, but only because the lemma actually has other problems that appear before that.
In the end, I believe the main obstruction to their proof is the following: the initial idea to use "generalized Moore homotopy" parametrized by some class of diagrams seems (at least intuitively) to need the following two properties of the class of diagrams:
1) One should be able to formally "compose" the diagrams (and that it corresponds to pushout on the level of geometric realization) so that when you look at all the continuous functions $|D| \rightarrow X$ for all diagrams $D$ you indeed get an $\infty$-category.
2) That given two "parallel" $n$-diagrams, you can construct an $(n+1)$-diagram whose source and target are the two given $n$-diagrams, so that if two diagram shapes are used to represent homotopically equivalent $n$-arrows then one can actually have an $(n+1)$-arrow that represents this homotopy.
It appears that both these properties fail for the kind of diagrams (Johnson diagrams) they are using! Unfortunately, due to the fact that they actually use a slightly different construction than the one they explain in the introduction, these do not immediately translate into mistakes in their paper.
This being said, they actually seem to use that Johnson diagrams can be composed within the proof of Lemma 3.4 mentioned above, so that it is probably a second reason for which this lemma fails.
It is not clear to me if (and where) they use the second property somewhere, but I expect some property of this kind should be important in order to prove that the geometric realization of diagrammatic sets indeed induces an equivalence with the category of spaces (and they are extremely imprecise about how this equivalence is obtained, they just say that "one does exactly as for simplicial and cubical sets"!).
For more details (and a third reason why Lemma 3.4 fails) I have a very recent preprint (https://arxiv.org/abs/1711.00744) which constructs a category of diagrams that has the two properties mentioned above as soon as you work in a 'non-unital' framework, unfortunately this category of diagrams is a lot more complicated than the category of Johnson diagrams (and it is unique so this complication is unavoidable) and this prevents from using the exact same strategy as they do. I discus in details in the appendix of the paper the proof of Kapranov and Voevodsky (this will expand a lot on this answer) and explain some ideas on how to make it into a proof of the Simpson conjecture using the category of diagrams that I constructed. This new version also has the advantage to make the two ways of explaining the construction (in terms of generalized Moore homotopies and using two adjunctions with a presheaf category of diagrams in the middle) actually equivalent.
Update : In fact, in a subsequent preprint, I did proved a version of the Simpson conjecture using essentially the strategy of Kapranov and Voevodsky with a modified category of diagrams.
Note that there are still some difficulties appearing (due to the increased complexity of the category of diagram) and at the moment I'm still not capable of proving the most general version of the Simpson conjecture. To be precise, at this point I'm only able to strictify a certain set of composition operations, which I call the "regular composition operations", (informally they are those whose pasting diagram is "topologically regular") which are such that any kind of composition operation that you have in an $\infty$-category can be obtained as a regular composition of identities and non identities arrow. So it does gives a notion where you have a bunch of operations that are strictly compatible, and only weak identities on top of that, but one can still hope to find stronger statement with more strict operations.