How do I develop an intuitive model of spacetime?
Here's a little story comparing visualizing distances with visualizing spacetime intervals.
In plane Euclidean geometry, the shape that's invariant with respect to rotations is a circle. If you rotate a circle any amount, no one can tell. This is in contrast to other curves, which have at most discrete symmetries with respect to rotations.
If we have a Cartesian coordinate system, the equation for a circle of radius $R$ located at the origin is
$$x^2 + y^2 = R^2$$
The quantity $x^2 + y^2$ uniquely picks out which circle you're on, and everywhere on a given circle has the same value for that quantity.
If you draw a circle on a piece of graph paper, then put your finger down somewhere on the circle, then rotate the paper around the center of the circle while leaving your finger in the same spot, by the end of the rotation your finger is still on the same circle.
Thus, although the x-coordinate and y-coordinate under your finger change, the value of $x^2 + y^2$ does not. $x^2 + y^2$ is given the name "distance", and it is invariant with respect to rotations.
In 1+1 dimensional Minkowski spacetime, the shape that's invariant with respect to Lorentz boosts is a right hyperbola. If you boost a right hyperbola by any amount, no one can tell. This is in contrast to other curves, which have at most discrete symmetries with respect to boosts.
This Wikipedia image illustrates the action of boosts, which are less intuitive than rotations. The diagonal lines shown are essentially the special hyperbola $x^2 - y^2 = 0$.
If you watch a single point, it drifts downward when the observer is not accelerating because time is moving forward. When the observer accelerates, the point will quickly move along a hyperbola, then start drifting downwards again.
If we introduce a coordinate system, the general equation for a right hyperbola is
$$x^2 - t^2 = s^2$$
where $s^2$ can be positive or negative. The quantity $x^2 - t^2$ uniquely picks out which hyperbola you're on, and everywhere on a given hyperbola has the same value for that quantity.
If you draw a hyperbola on a piece of graph paper, then put your finger down somewhere on the hyperbola, then somehow made the paper undergo a hyperbolic rotation (i.e. Lorentz boost), by the end of the hyperbolic rotation your finger would still be on the same hyperbola.
The hyperbolic rotation can't be done with a normal sheet of paper, but it looks like this:
Thus, although the x-coordinate and t-coordinate under your finger change, the value of $x^2 - t^2$ does not. $x^2 - t^2$ is given the name "spacetime interval", and it is invariant with respect to Lorentz boosts.
This story begins by positing a transformation, then looking at what sort of shape is invariant under that transformation. Physically, I think it's a little more insightful to begin with the invariant - distance or spacetime interval - and then ask what transformations leave it invariant.
This is historically how the Lorentz transformations were discovered. They are linear transformations that leave Maxwell's equations invariant. Lorentz found them by searching for such transformations before Einstein published his first paper on relativity.
You can't describe the world without describing it. An inertial frame is a language. You can describe a physical system or process by using any language (inertial frame) you like, but you cannot describe it by using no language (inertial frame) at all. To be able to visualize a 4-vector, which is invariant under Lorentz transformations (translations from one language or frame to another), you would have to visualize at once all its frame-dependent decompositions into a spatial and a temporal component. The animation provided by Mark does this to some extent.
...do you ever get a proper intuitive model for Minkowski spacetime...?
Allow me to deepen you mystification. For any two events A,B there exist two reference frames FA and FB and a third event C such that C is simultaneous with A in FA and simultaneous with B in FB. This "simultaneity by proxy" of A with B compels us to conceive of all parts of the spatiotemporal whole as coexistent and as equally real.
Or does it? The coexistence of the spatiotemporal whole cannot be a simultaneous existence — that would be self-contradictory. It can only be a a tenseless or atemporal coexistence. Can you imagine an expanse whose character is neither spatial nor temporal? If you can, then you got a proper intuitive model for Minkowski spacetime.