Estimating direction from a distribution on a circle
This seems too simple to be true, but, combining some of the ideas posted earlier, I think you could just interpret the vectors as complex numbers and take the RMS.
Squaring will turn the bimodal distribution into a unimodal one. Then the square roots of the mean should give a good estimate of the modes of the original distribution.
The standard way to solve this is to just consider each of your data points as unit vectors, then take the average of those unit vectors. The direction of this averaged vector is the estimated direction.
There is a large literature on this topic which generally goes by the name of directional statistics. The seminal text on is Mardia and Jupp's book Directional Statistics. This field has a huge number of applications in astronomy, biology, meteorology, engineering etc.
Ok, so now I will describe why Niels's estimator works so well. Take a bimodal and symmetric circular density function $f$ with modes $p$ and $-p$ (we will assume that $p$ is positive) such as the one plotted in my previous answer. Let $\Theta_1, \Theta_2, \dots, \Theta_N$ be $N$ observations drawn from $f$.
Niels's estimator first computes the complex numbers $e^{i 2 \Theta_n}$ and takes their average $$ \bar{C} = \sum_{n=1}^{N} e^{i 2 \Theta_n} .$$ The estimate, denoted $\hat{p}$, is given by taking the complex argument of $\bar{C}$ and dividing by 2, that is $$ \hat{p} = \frac{\angle{\bar{C}}}{2}$$ where $\angle{\bar{C}} \in [0,2\pi)$ denotes the complex argument. The next theorem describes the asymptotic properties of this estimator. I use the notation $\langle x \rangle_{\pi}$ do denote $x$ taken to its representative inside $[-\pi, \pi)$. So, for example, $\langle 2\pi \rangle_{\pi} = 0$ and $\langle \pi + 0.1 \rangle_{\pi} = -\pi + 0.1$.
Theorem: Let $\lambda$ denote the difference $\lambda = \tfrac{1}{2}\langle 2\hat{p} - 2p \rangle_{\pi}.$ Then $\lambda$ converges almost surely to zero as $N \rightarrow \infty$ and the distribution of the normalised difference $\sqrt{N}\lambda$ converges to the zero mean normal with variance $$ \frac{\sigma_s^2}{c} $$ where $$ \sigma_s^2 = \int_{-\pi/2}^{\pi/2}\sin^2(\theta) f(\langle \theta + p \rangle_\pi) d\theta \qquad \text{and} \qquad c = \int_{-\pi/2}^{\pi/2}\cos(\theta) f(\langle \theta + p \rangle_\pi) d\theta. $$
The definition of the difference $\lambda$ might seem a little strange at first, but it is actually very natural. To see why note that $p$ and the estimate $\hat{p}$ are both in $[0,\pi)$ but, for example, if $p = 0$ and $\hat{p} = \pi - 0.01$ then the difference between these is not $\pi - 0.01$, because the two modes are actually very close to aligned in this case. The correct difference is $\lambda = \tfrac{1}{2}\langle 2(\pi-0.01) - 2 \times 0 \rangle_{\pi} = 0.01$.
The proof of this theorem follows from a very similar argument to Theorem 6.1 (page 87) from my thesis. The original argument is due to Barry Quinn. Rather than restate the proof I'll just give you some convincing numerical evidence.
I've run some simulations for the case when the noise is a sum of two weighted von Mises circular distributions with concentration parameter $\kappa$. So, when $\kappa$ is large the distribution is concentrated and looks something like the picture on the left below ($\kappa = 20$ in this case) and when $\kappa$ is small the distribution is quite spread out and looks something like the picture on the right below ($\kappa = 0.5$). We obviously expect the estimator to perform better when the distribution is quite concentrated ($\kappa$ is large).
(source left) (source right)
Here are the results. The plot below show the simulated variance of $\lambda$ after 5000 trials (the dots) versus the variance predicted in the theorem above for a range of values of $\kappa$ and number of observations $N$. You can see that the theorem does a very good job of accurately predicting the performance if $\kappa$ isn't too small.
(source)
There is still an open question as to whether this is the best estimator (in the sense of maximally reducing the variance of $\lambda$). It would be possible to derive a Cramer-Rao bound for this estimation problem to give an idea of the best possible performance of an unbiased estimator. I suspect that this estimator performs very near the Cramer-Rao bound. So, in that sense it is close to best possible.