Why doesn't $ds^2 = 0$ imply two distinct points $p$ and $p'$ on a manifold are the same point?
Let's separate out some definitions:
metric(1): Given a set $X$, a function $d : X \times X \to \mathbb{R}$ such that the following axioms hold for all $x,y,z \in X$:
- $d(x,y) \geq 0$,
- $d(x,y) = 0 \Leftrightarrow x = y$,
- $d(x,y) = d(y,x)$, and
- $d(x,z) \leq d(x,y) + d(y,z)$.
pseudo-metric(1): Given a set $X$, a function $d : X \times X \to \mathbb{R}$ such that the following axioms hold for all $x,y,z \in X$:
- $d(x,x) = 0$,
- $d(x,y) = d(y,x)$, and
- $d(x,z) \leq d(x,y) + d(y,z)$.
metric(2): (aka "inner product") Given a vector space $V$ over a field $F$, which is either $\mathbb{R}$ or $\mathbb{C}$, a function $g : V \times V \to F$ such that the following axioms hold for all $x,y,z \in V$ and $a \in F$:
- $g(x,y) = \overline{g(y,x)}$;
- $g(ax,y) = a g(x,y)$,
- $g(x+y,z) = g(x,z) + g(y,z)$,
- $g(x,x) \geq 0$, and
- $g(x,x) = 0 \rightarrow x = 0$.
pseudo-metric(2): (aka "pseudo inner product") Given a vector space $V$ over a field $F$, which is either $\mathbb{R}$ or $\mathbb{C}$, a function $g : V \times V \to F$ such that the following axioms hold for all $x,y,z \in V$ and $a \in F$:
- $g(x,y) = \overline{g(y,x)}$;
- $g(ax,y) = a g(x,y)$,
- $g(x+y,z) = g(x,z) + g(y,z)$, and
- $\exists\ v \in V : g(x,v) \neq 0$.
Now you want to define a distance between points on a manifold. You are intuitively looking for a (pseudo-)metric(1) here, a distance function on a set without any extra structure. The problem is all you are given is a (pseudo-)metric(2) on the tangent space at each point. Your (pseudo-)metric(2) can only give you magnitudes of tangent vectors at points. Intuitively, these are "infinitesimal distances." You need to integrate such magnitudes along a path in order to get distances between points.
But this is the crux of the issue: What path do you choose? Even for a nice manifold like the surface of a 2-sphere (that is, something with a real metric(2), not just a pseudo-metric(2), on its tangent bundle), the distance between points is path dependent. You could fly directly from New York to London along a great circle (geodesic), or you could stop by in Beijing.
If you have positive-definiteness working for you, you could take the infimum over all paths from one point to another. Consider curves of the form \begin{align} \gamma : [0,1] & \to M \\ \lambda & \mapsto p \\ 0,1 & \mapsto p_1,p_2. \end{align} Then $$ d(p_1,p_2) = \inf_\gamma \int_0^1 \left(g_p \left(\frac{\mathrm{d}p}{\mathrm{d}\lambda}, \frac{\mathrm{d}p}{\mathrm{d}\lambda}\right)\right)^{1/2} \, \mathrm{d}\lambda $$ defines a distance function in the metric(1) sense as long as $g_p$ is an honest metric(2) inner product at each $p$.
Unfortunately, when you try this with a Lorentzian manifold equipped with a pseudo-metric(2), the construction fails to produce anything useful. Even taking an absolute value before the square root, there will always be a piecewise differentiable null path between any two points. Thus there will be differentiable curves of length arbitrarily close to $0$, and so the pseudo-metric(1) you induce is trivial: all distances are $0$.
I think it might help to think about the spacetime interval $\text{d}s^2$ as a measure of movement in spacetime relative to the speed of light. Let's say that you want to move from a point $p=(0,0,0,0)$ to another point $p'=(t,x,0,0)$. The quantity $\text{d}s^2 = c^2\text{d}t^2-\text{d}x^2$ is then:
- Positive if $x<ct$, which means that you traversed the distance slower than the speed of light;
- Zero if $x = ct$, which means that you traversed it exactly at the speed of light;
- Negative if $x>ct$, which means that you traversed the distance faster than the speed of light.
So with the metric convention that you use, special relativity dictates that any massive particle can only traverse positive spacetime intervals, and any massless particle can only traverse zero spacetime intervals, since all the distances are measured relative to how a photon would move between the points.
Yeah, you've not yet adapted. That's OK. Let me take you through it.
In this conventional world of classical physics we have separate notions of distance and time, with the idea that either two events happen at the same time and therefore have an objective distance between them, or two events happen at different times and therefore have an objective time between them. It is always one or the other, not both, not neither: if there is a time gap between two events, then there is some reference frame which sees them both at distance L for any L you'd like; otherwise if the time gap is zero then everyone agrees on the distance between the two events. This allows you to freeze a moment in time and speak of distances.
In relativity, we make things a little more complicated, but also more realistic. It's almost the same story but not quite. Think about, say, a supernova explosion -- what we see as a bright flash in the starry sky would, if you were to "look down upon it", look like an event in spacetime with a horizon of light announcing the event expanding outward at the speed of light. That "expanding bubble" is important. In relativity we call that bubble a "light cone".
Imagine two expanding-at-speed-$c$ bubbles. Topologically, either one bubble is inside the other, or they both intersect on a ring (when they're large enough to intersect at all!), or they both intersect at just one point where they "kiss" each other. Those are your three possibilities: time-separated, space-separated, and null-separated. They correspond to a positive metric, negative metric, and 0 metric. The Lorentz transforms preserve this topology because they preserve the metric in general; the light cones get mapped to other light cones with the same structure relative to each other.
Given time-separated, you still have that two things are not objectively space-separated, as there is a reference frame for a spaceship which passes through both events in spacetime. Given space-separated, you also have that they are not objectively time-separated; that's a little more subtle, but imagine someone on the expanding "ring" where the bubbles "intersect": they see both events simultaneous with their local apparatus. The claim is, given the right velocity, they would draw the distances back to the original events both as L, so that they think that both events were simultaneous, proving that there is no objective time-ordering.
Null-separated is the new beast that you'll have to accept, lying between those. It's objectively "time separated" because up to a point, one light cone is "inside" the other one, so you can say that one comes "before" the other. However, in the limit as you go faster and faster, trying to be at both events at the same time, you literally see no time elapse between the two events. Similarly it's objectively "space separated" but there are reference frames which make the distance between the two events arbitrarily small.
Those are the metric values and what they mean. "Distance" is relative to this new notion of "basically at the same place, basically at the same time" and can go negative for proper distances or positive for proper times.
You can view this, too, if you like, as broadening the notion of "present" up from a plane in classical physics to the space between two light cones. One light cone represents all the events in the "past" which "have been seen" by this spacetime point (light from them has had an opportunity to reach the point); another light code represents all the events in the "future" which "have seen" this spacetime point; the spacelike-separated stuff is all in a "relativistic present" relative to this point; different Lorentz transforms choose different planes through the point as "present planes" but this choice is more arbitrary.