Confusion with Virtual Displacement
Here on SE, you may already find many answers to your question. Even if most of them are correct, I feel that a plain and correct answer is still missing. Where plain does not mean non-rigorous. But mathematical rigor is not the same as introducing differential manifolds. One could keep making confusing arguments even after introducing fiber bundles.
If there would be no constraint on a mechanical system described in the 3D space by a set of $N$ position vectors ${\bf r}_i~~(i=1,\dots,N)$, one could consider any set of $N$ vectors as the set of possible displacements $d {\bf r}_i$. Said in another way, for a set of starting positions ${\bf r}_i$, we have complete freedom in choosing the initial velocities ${\bf v}_i$ which would provide a set of displacements $d{\bf r}_i={\bf v}_i\,dt$ after a time $dt$.
If there are constraints, we are not allowed to choose all velocities (and then displacements) at our will. We must be consistent with the existence of constraints. For example, if a marble has to move on the floor, we are not allowed to assign a velocity pointing downwards: it would be in contradiction with the constraint.
If constraints are static, there is no difference between the set of all possible displacements and virtual displacements. All of them are compatible with constraints. The reason for considering different displacements, and then introducing the definition of virtual displacement as different from possible displaceent or briefly displacement, is when constraints vary with time. In that case, we can consider two different set of displacements: those compatible with the evolving constraint, and those compatible with the constraint frozen at a given time $t$. A simple example helps to clarify the difference.
Let's consider a point-like particle constrained to move on a ramp. If the ramp is fixed, all possible displacements (virtual or not) are displacements on the plane of the ramp. they can be obtained by assigning a velocity parallel to the ramp ad after a time $dt$ the particle will be again on the ramp.
If the ramp is moving, an initial velocity parallel to the instantaneous position of the ramp would result, in general, in a violation of the evolving constraint. The real possible velocities, which ensure the particle remains on the ramp after a time $dt$, are the sum of velocities parallel to the ramp plus the instantaneous velocity of the ramp at the initial time. The resulting possible displacements are the real displacements since they correspond to the set of all the possible displacements which could be physically obtained. Still, in some cases, one may be interested in the situation of what would be the displacement under the hypothesis of freezing the evolving constraint. In the case of the ramp, virtual displacements are those parallel to the ramp plane. More formally, if ${\bf u}$ is the velocity of the ramp at some time $t$ and ${\bf v}$ is a velocity parallel to the ramp plane $$ d{\bf r} = ({\bf v + u }) dt $$ is a possible displacement after a time $dt$ compatible with the evolving constraint, while $$ \delta{\bf r} = {\bf v } dt $$ is a virtual displacement (${\bf v } dt$ is a displacement compatible with the frozen constraints).
The real reason for introducing virtual displacements is connected to the possibility of deriving useful information on the system by analyzing the work done by constraint forces. However calculating work in the case of constraints using the actual displacements would provide useless information. In the case of a mass $m$ rigidly fixed on a moving ramp, if we would use the actual displacement of the mass we would find a non-zero value. Only virtual displacement, which are parallel to the plane of the ramp, could give us zero value for work, which is what we need, for a useful exploiting of the concept of work in such a situation.
Edited after change of the title:
I have to say that I do not agree with the suggestion of removing reference to virtual work from the title. Without the need of the virtual work as intrinsic work performed by constraints, it is hard to understand the reason for introducing virtual displacements at all. Unfortunately there is a long tradition of neglecting such a key point in the conceptual development of the subject. The results is a generalized sense of lack of meaning, sometimes hidden under tons of math.
End of editing
At this point, one can generalize all these ideas, looking at the introduction of constraints as equivalent to the introduction of a non trivial configuration space, well described as a differential manifold. The set of all tangent planes at each point of the manifold has a nice mathematical structure and so on. But this is just a convenient and general dress for the physical ideas above. It becomes a trivial step only if one has got the right concept of what a virtual displacement is and why it has been introduced.
Let there be given a manifold $3N$-dimensional position manifold $M$ with coordinates $({\bf r}_1, \ldots, {\bf r}_N)$. Let the time axis $\mathbb{R}$ have coordinate $t$.
Let there be given $m\leq 3N$ holonomic constraint functions $$f^a: M\times \mathbb{R} ~\to~\mathbb{R}, \qquad a~\in~\{1,\ldots, m\}. $$ The constraint functions $(f^1,\ldots, f^m)$ are usually assumed to satisfy various regularity conditions, cf. my Phys.SE answer here. Moreover, they are assumed to be functionally independent, and the intersection of their zero-level-sets $$C~:=~\bigcap_{a=1}^m (f^a)^{-1}(\{0\})~\subseteq~ M\times \mathbb{R}$$ is assumed to form a submanifold of dimension $3N+1-m=n+1$, which we will call the constrained/physical submanifold. Let $(q^1, \ldots, q^n,t)$ be coordinates on $C$. The $q$'s are known as generalized coordinates.
Given a point $p\in C$ with coordinates $(q^1_0, \ldots, q^n_0,t_0)$, then the $n$-dimensional submanifold of finite virtual displacements at $p$ is $$ V~:=~C \cap (M\times \{t_0\}). $$ So in a nutshell, a finite virtual displacement is a displacement of position that doesn't violate the constraints and is frozen in time. See also this related Phys.SE post.
The heuristic notion of infinitesimal variations can in this context be replaced with tangent vectors. So the tangent space of infinitesimal virtual displacements at $p$ is $$ T_pV. $$ Concerning the use of infinitesimals in physics, see also this and this Phys.SE posts.