On "familiarity" (or How to avoid "going down the Math Rabbit Hole"?)
Your example makes me think of graphs.
Imagine some nice, helpful fellow came along, and made a big graph of every math concept ever, where each concept is one node and related concepts are connected by edges. Now you can take a copy of this graph, and color every node green based on whether you "know" that concept (unknowns can be grey).
How to define "know"? In this case, when somebody mentions that concept while talking about something, do you immediately feel confused and get the urge to look the concept up? If no, then you know it (funnily enough, you may be deluding yourself into thinking you know something that you completely misunderstand, and it would be classed as "knowing" based on this rule - but that's fine and I'll explain why in a bit). For purposes of determining whether you "know" it, try to assume that the particular thing the person is talking about isn't some intricate argument that hinges on obscure details of the concept or bizarre interpretations - it's just mentioned matter-of-factly, as a tangential remark.
When you are studying a topic, you are basically picking one grey node and trying to color it green. But you may discover that to do this, you must color some adjacent grey nodes first. So the moment you discover a prerequisite node, you go to color it right away, and put your original topic on hold. But this node also has prerequisites, so you put it on hold, and... What you are doing is known as a depth first search. It's natural for it to feel like a rabbit hole - you are trying to go as deep as possible. The hope is that sooner or later you will run into a wall of greens, which is when your long, arduous search will have born fruit, and you will get to feel that unique rush of climbing back up the stack with your little jewel of recursion terminating return value.
Then you get back to coloring your original node and find out about the other prerequisite, so now you can do it all over again.
DFS is suited for some applications, but it is bad for others. If your goal is to color the whole graph (ie. learn all of math), any strategy will have you visit the same number of nodes, so it doesn't matter as much. But if you are not seriously attempting to learn everything right now, DFS is not the best choice.
So, the solution to your problem is straightforward - use a more appropriate search algorithm!
Immediately obvious is breadth-first search. This means, when reading an article (or page, or book chapter), don't rush off to look up every new term as soon as you see it. Circle it or make a note of it on a separate paper, but force yourself to finish your text even if its completely incomprehensible to you without knowing the new term. You will now have a list of prerequisite nodes, and can deal with them in a more organized manner.
Compared to your DFS, this already makes it much easier to avoid straying too far from your original area of interest. It also has another benefit which is not common in actual graph problems: Often in math, and in general, understanding is cooperative. If you have a concept A which has prerequisite concept B and C, you may find that B is very difficult to understand (it leads down a deep rabbit hole), but only if you don't yet know the very easy topic C, which if you do, make B very easy to "get" because you quickly figure out the salient and relevant points (or it may be turn out that knowing either B or C is sufficient to learn A). In this case, you really don't want to have a learning strategy which will not make sure you do C before B!
BFS not only allows you to exploit cooperativities, but it also allows you to manage your time better. After your first pass, let's say you ended up with a list of 30 topics you need to learn first. They won't all be equally hard. Maybe 10 will take you 5 minutes of skimming wikipedia to figure out. Maybe another 10 are so simple, that the first Google Image diagram explains everything. Then there will be 1 or 2 which will take days or even months of work. You don't want to get tripped up on the big ones while you have the small ones to take care of. After all, it may turn out that the big topic is not essential, but the small topic is. If that's the case, you would feel very silly if you tried to tackle the big topic first! But if the small one proves useless, you haven't really lost much energy or time.
Once you're doing BFS, you might as well benefit from the other, very nice and clever twists on it, such as Dijkstra or A*. When you have the list of topics, can you order them by how promising they seem? Chances are you can, and chances are, your intuition will be right. Another thing to do - since ultimately, your aim is to link up with some green nodes, why not try to prioritize topics which seem like they would be getting closer to things you do know? The beauty of A* is that these heuristics don't even have to be very correct - even "wrong" or "unrealistic" heuristics may end up making your search faster.
You don't learn what a vector space is by swallowing a definition that says
A vector space $\langle V, S\rangle$ is a set $V$ and a field $S$ that satisfy the following 8 axioms: …
Or at least I don't, and from the sound of things that isn't working for you either. That definition is for someone who not only already knows what a field is, but who also already knows what a vector space is, and for whom the formal statement may illuminate what they already know.
Instead, if you want to learn what a vector space is, you pick up an elementary textbook on linear algebra and you start reading it. I picked up Linear Algebra and its Applications (G. Strang, 1988) from next to the bed just now, and I find that "vector space" isn't even defined. The first page of chapter 2 (“Vector Spaces and Linear Equations”) introduces the idea informally, leaning heavily on the example of $\Bbb R^n$, which was already introduced in Chapter 1, and then emphasizes the crucial property: “We can add any two vectors, and we can multiply vectors by scalars.” The next page reiterates this idea: “a real vector space is a set of ‘vectors’ together with rules for vector addition and multiplication by real numbers.” Then there follow three examples that are different from the $\Bbb R^n$ examples.
A good textbook will do this: it will reduce those 8 axioms to a brief statement of what the axioms are actually about, and provide a set of illuminating examples. In the case of the vector space, the brief statement I quoted, boldface in the original, was it: we can add any two vectors, and we can multiply vectors by scalars.
You don't need to know what a field is to understand any of this, because it's restricted to real vector spaces, rather than to vector spaces over arbitrary fields. But it sets you up to understand the idea in its full generality once you do find out what a field is: “Just like the vector spaces you're used to, except instead of the scalars being real numbers, they can be elements of any field.”
If you find yourself chasing an endless series of definitions, that's because you're trying to learn mathematics from a mathematical encyclopedia. Well, it's worth a try; it worked for Ramanujan. But if you find that you're not Ramanujan, you might try what the rest of us non-Ramanujans do, and try reading a textbook instead. And if the textbook starts off by saying something like:
A vector space $\langle V, S\rangle$ is a set $V$ and a field $S$ that satisfy the following 8 axioms: …
then that means you have mistakenly gotten hold of a textbook that was written for people who already know what a vector space is, and you need to put it aside and get another one. (This is not a joke; there are many such books.)
The Strang book is really good, by the way. I recommend it.
One last note: It's not usually enough to read the book; you have to do a bunch of the exercises also.
A very well known mathematician showed me how he avoids the rabbit hole. I copied his method, and now I can stay out of it most of the time.
I had private weekly seminars with him. Every week, he would research a topic he knew nothing about (that was our deal and that's what was in it for him). I would name the topic (examples: Bloom Filters, Knuth-Bendix Theorem, Linear Logic), and the following week he would give a zero-frills Power-Point presentation of what he found out. The presentations had a uniform pattern:
Motivating Example
Definitions
Lemmas and Theorems
Applications
By beginning with the motivating example, we never got lost in the thicket of technicalities, and the Applications section would circle back and explain the Motivating Example (and maybe some others if time allowed) in terms of the technicalities.
This is how he taught himself a topic without going down the MRH.
Limit your rabbit-hole time (one week)
your presentation must be one hour long
Focus on a Motivating Example
do just enough technicalities to explain the example and optional variations
I have since copied this style. When I teach myself a new topic, I make a slide presentation like that, and then I present it to others in a weekly reading group.