Could *I* have come up with the definition of Compactness (and Connectedness)?
I'm going to make a stab at "compactness" here. Suppose you want to prove something about sets in, say, a metric space. You'd like to, say, define the "distance" between a pair of sets $A$ and $B$. You've thought about this question for, say, finite sets of real numbers, and things worked out OK, and you're hoping to generalize. So you say something like "I'll just take all points in $A$ and all points in $B$ and look at $d(a, b)$ for each of those, and then take the min."
But then you realize that "min" might be a problem, because the set of $(a,b)$-pairs might be infinite -- even uncountably infinite, but "min" is only defined for finite sets.
But you've encountered this before, and you say "Oh...I'll just replace this with "inf" the way I'm used to!" That's a good choice. But now something awkward happens: you find yourself with a pair of sets $A$ and $B$ whose distance is zero, but which share no points. You'd figured that in analogy with the finite-subsets-of-$\Bbb R$, distance-zero would be "some point is in both sets", but that's just not true.
Then you think a bit, and realize that if $A$ is the set of all negative reals, and $B$ is the set of positive reals, the "distance" between them is zero (according to your definition), but ...there's no overlap. This isn't some weird metric-space thing ... it's happening even in $\Bbb R$. And you can SEE what the problem is --- it's the "almost getting to zero" problems, because $A$ and $B$ are open.
So you back up and say "Look, I'm gonna define this notion only for closed sets; that'll fix this stupid problem once and for all!"
And then someone says "Let $A$ be the $x$-axis in $\Bbb R^2$ and let $B$ be the graph of $y = e^{-x}$." And you realize that these are both closed sets, and they don't intersect, but the distance you've defined is still zero. Damnit!
You look more closely, and you realize the problem is with $\{ d(a, b) \mid a \in A, b \in B\}$. That set is an infinite set of positive numbers, but the inf still manages to be zero. If it were a finite set, the inf (or the min -- same thing in that case!) would be positive, and everything would work out the way it was supposed to.
Still looking at $A$ and $B$, instead of looking at all point in $A$ and $B$, you could say "Look, if $B$ is at distance $q$ from $A$, then around any point of $B$, I should be able to place an (open) ball of radius $q$ without hitting $A$. How 'bout I rethink things, and say this instead: consider, for all points $b \in B$, the largest $r$ such that $B_r(b) \cap A = \emptyset$...and then I'll just take the smallest of these "radii" as the distance.
Of course, that still doesn't work: the set of radii, being infinite, might still have zero as its inf. But what if you could somehow pick just finitely many of them? Then you could take a min and get a positive number.
Now, that exact approach doesn't really work, but something pretty close does work, and situations just like that keep coming up: you've got an infinite collection of open balls, and want to take the minimum radius, but "min" has to be "inf" and it might be zero. At some point, you say "Oh, hell. This proof isn't working, and something like that graph-and-$x$-axis problem keeps messing me up. How 'bout I just restate the claim and say that I'm only doing this for sets where my infinite collection of open sets can always be reduced to a finite collection?"
Your skeptical colleague from across the hall comes by and you explain your idea, and colleague says "You're restricting your theorem to these 'special' sets, ones where every covering by open sets has a finite subcover .. .that seems like a pretty extreme restriction. Are there actually any sets with that property?"
And you go off and work for a while and convince yourself that the unit interval has that property. And then you realize that in fact if $X$ is special and $f$ is continuous, then $f(X)$ is also special, so suddenly you've got tons of examples, and you can tell your colleague that you're not just messing around with the empty set. But the colleague then asks, "Well, OK. So there are lots of these. But this finite-subcover stuff seems pretty...weird. Is there some equivalent characterization of these special sets?"
It turns out that there's not -- the "change infinite into finite" is really the secret sauce. But in some cases -- like for "subsets of $\Bbb R^n$ -- there is an equivalent characterization, namely "closed and bounded". Well, that's something everyone can understand, and it's a pretty reasonable kind of set, so you need a word. Is "compact" the word I'd have chosen? Probably not. But it certainly matches up with the "bounded"-ness, and it's not such a bad word, so it sticks.
The key thing here is that the idea of compactness arises because of multiple instances of people trying to do stuff and finding it'd all work out better if they could just replace a cover by a finite cover, often so that they can take a "min" of some kind. And once something gets used enough, it gets a name.
[Of course, my "history" here is all fiction, but there are plenty of cases of this sort of thing getting named. Phrases like "in general position", for instance, arise to keep us out of the weeds of endless special cases that are arbitrarily near to perfectly nice cases.]
Sorry for the long and rambling discourse, but I wanted to make the case that stumbling on the notion of compactness (or "linear transformation", or 'group') isn't that implausible.
One of the big problems I had when first learning math was that I thought all this stuff was handed down to Moses on stone tablets, and didn't realize that it arose far more organically. Perhaps one of the tip-offs was when I learned about topological spaces, and one of the classes of spaces was "T-2 1/2". It seemed pretty clear that someone skipped over something and then went back and filled in a spot that wasn't there by giving a "half-number" as a name. (This could well be wrong, but it's sure how it looked to a beginner!)
I like John Hughes' answer, but I'll try to make my own stab at it. I will also go on a rather long rant, so make sure you have some time if you're reading this. If I'm not mistaken, the minor problem concerning connectedness has been solved in the comments.
Before trying to see whether you could have come up with the concept of compactness yourself, you should try to see what specific aspect of compactness you're interested in, what "intuitive" property is represented by it - if that property is "from any covering you can extract a finite subcovering" then of course you could have discovered it yourself, but it's not a very intuitive property, so that's not very interesting.
So first let's figure out what we mean and what we want with the concept of compactness (for instance, this might help us in figuring out why it's called "compact").
Is it the Bolzano-Weierstrass property that we're trying to generalize to other spaces where we noticed it didn't go as well ? Is it the "closed and bounded" property ? Is it a generalization of finiteness ? Or is it just "something that shares the 'compact' properties of $[0,1]$" ? Or perhaps "property that says that you don't go off to infinity" ?
I could give a different story according to what we're actually interested in, but to me the most intuitive route would be Bolzano-Weierstrass : my other favourite is "don't go off to infinity" (which is closely related to "finiteness" of course), and I can add a few words on that if you want, but I'll start with Bolzano-Weierstrass because I think that's what would convince most students that compactness is an interesting notion : the BW-theorem is such a powerful theorem in analysis and you can prove many great things with it, it only makes sense that we would want to see what it looks like more generally.
See the end for a "tldr"
Moreover I will take a somewhat deviant route from BW to compactness which is not the one that's usually presented to students (at least that was presented to me and my friends). This will also be a completely fictional story.
You're a young mathematician and you've learned a bunch of stuff in analysis and you've noticed this marvelous BW property that $[0,1]$ has, that any sequence has a convergent subsequence. You've also noticed, along your many encounters with analysis, that sequences tend to be a very important tool in studying real functions, or even subsets of the euclidean space.
In fact, you notice that anything seems to be determined by sequences, which makes this BW property so much more interesting : continuity can be determined by looking at sequences, so can "being the complement of an open set" (the notion of open set being quite natural : it's a set that contains all the points close enough to all of its points): you're the complement of an open set if and only if any convergent sequence that lies in you converges in you. You've used the BW property a couple of times here and there to prove that such function is continuous, or that such function has that value, or is extendable etc.
One day your colleague comes to you with a thing they claim is a 'space'. With modern definitions, this space is $\beta \mathbb N$, the space of ultrafilters on $\mathbb N$ [it's not important if you don't know what they are - you can do a similar example with many spaces, such as $\omega_1+1$ if you know what that is]. They claim that understanding this space is important for such and such reason. So you start looking at it for a while and you prove two theorems that are of interest to your colleagues. You prove the first one on monday, and by friday you've forgotten its precise statement and proved the second one, without thinking of the first one, which you thought was "just a technicality". By saturday, you see both theorems side to side on your way to your colleague's and you're stumped, because they seem to condtradict eachother !
The first theorem is : any sequence of principal ultrafilters [a special kind of ultrafilter, it's not important to know what they are] that converges in $\beta \mathbb N$ is eventually constant - in particular it converges to a principal ultrafilter.
The second theorem is : any open neighbourhood of any ultrafilter contains a principal ultrafilter [there are nonprincipal ultrafilters]
Huh. You're very much used to analysis and so you think there's a problem but neither you nor your colleague can see the mistake in your proofs. Then you start to doubt yourself and go over your analysis knowledge : you try to see why, in $\mathbb R$, theorem 2 would imply that any ultrafilter has a convergent sequence of principal ultrafilters to it. You realize that you're using the following property : there is a countable sequence of neighbourhoods of any point such that any neighbourhood contains one of them. Hah ! this is not true in $\beta \mathbb N$ !
You're thinking of saying that $\beta \mathbb N$ is therefore just a pathology that you should ignore but your colleague keeps telling you that it's very important to their work. In particular they need the following result : any continuous function $\beta\mathbb N\to \mathbb R$ is bounded. In your world, you could use sequences to approach that, but you now see that sequences can't solve everything in "wilder" spaces, you have to think of something new.
Now the argument that the second theorem should contradict the first one doesn't work with sequences, but what if you change the meaning of sequence ? After all, if you index points by the neighbourhoods of the point you're trying to approximate, then suddenly you get something that really approximates it.
Mhm but note that this last thing is independent of $\beta\mathbb N$ : what if you replaced sequences in your work by a more general notion of sequence ? Something that can be indexed by a more general object than $\mathbb N$ ?
You work following this path and discover the notion of "nets" and work out many of their properties. You see that they seem to generalize sequences, and they have analogous properties, even in pathological spaces like $\beta\mathbb N$ ! For instance continuity of a function can be determined by looking at nets, complements of open sets can be characterized by nets as well etc.
You're happy because you destroyed the pathologies by going from the notion of "sequence" which was biased towards $\mathbb N$ to the notion of "nets" which was more general, and almost as easily workable. Now has come the time to test out your theory : what does the BW property look like with nets ?
Well working some more on the example of $\beta \mathbb N$ (which your colleague has told you should have the analogous property, for the results they need and believe to be true) convinces you that you can't just take a subset of the indexing order to get your extraction property, so you need something more subtle. At this point you discover the notion of subnet and define the analogous BW property for subnets.
With some work, you prove that $\beta \mathbb N$ does indeed have that anlogous property and so your colleague can safely go on with their research.
But you're not fully satisfied : sure the BW property with nets is nice and all, but it doesn't seem to be an intrinsic characterization (in euclidean space we have the "closed bounded" characterization which is purely intrinsic). At this point you've noticed that a lot of properties about nets can be proved by taking as indexing sets some sets of neighbourhoods and so you play around with that, and soon enough you find the intrinsic characterization : indeed you assume that you have some net $(x_i)_{i\in I}$, and you want to force convergence of some subnet to, say, $x$. Then take the set of pairs $(i,V)$ where $i\in I$ and $V$ is a neighbourhood of $x$ with $x_i\in V$ (a standard trick you will have learnt by working on nets !). You have an obvious associated subnet and it should converge to $x$, unless you have some neighbourhood $V_x$ with no $x_i$ beyond some $i_x$ in it.
Then, if your net in fact has no converging subnet, this happens for every $x\in X$ so you have a whole gallery of opens $V_x$. Now you play around with this for a while : take $x\in X$, then beyond $i_x$ no one is in $V_x$. Where is "$i_x+1$" (which doesn't make sense, but you're just playing so you allow it) ? it's in some $V_y$, but then not after $i_y$, and so after that they're in $V_z$, but not after $i_z$, etc. etc.
This last "etc. etc." is interesting because you start wondering : "hey ! the problem is that this 'etc. etc.' is infinite - if the process stopped at some point, I would get a contradiction, so my net would have a converging subnet !". OK but this is a given net. The cover $(V_x)$ can be pretty much as wild as you'd like if the net varies, the only thing that doesn't change is : it covers the whole space.
So to make sure that every net has a converging subnet, you need to ensure that for any (wild or not) cover, the process stops. What this means is precisely that there's a finite subcover. Now you say "well from what I've done, it's pretty clear [i.e. I will find a proof soon] that if I have this weird property on covers, I have my property on nets !"; and after thinking a bit you use again one of the usual tricks to go from a cover to a net to see that there's a converse to that statement : you found your intrinsic characterization of the generalize BW property.
Now you prove (way more easily) that $\beta\mathbb N$ has this cover-property, so does $\omega_1+1$, etc. and you reprove the equivalence in the special case of sequences for euclidean space (or in fact, metric spaces).
Unfortunately along the way we haven't learnt why it's called compact. I think the "go off at infinity" point of view is the best to explain this name.
$\mathbf{tldr}$ : Now I've done something quite long, but I think the main point to remember is the following : the notion of compactness for general spaces can be seen as just a reformulation of the Bolzano-Weierstrass property in a context where we understand that sequences don't characterize everything in more general spaces. Seeing the equivalence between "net-BW" and "finite cover" is pretty much straightforward, the problem is going from BW to net-BW, that is, understanding why we go from sequences to nets (or filters, but I preferred going through nets here because they're more intuitive to students).
Note that, as opposed to sequences, nets do characterize everything in sight even in pathological spaces (continuity, closedness, compactness, etc.)
It is important to remember that definitions are not a universal thing given to us by higher powers, that we need to discover. They are also not something we come up with. They are choices made by humans out of all equivalent reformulations of the same property. Any equivalent reformulation can be considered as the definition of a property. Usually, we choose the one reformulation that is either the shortest, or the most convenient to work with. So your question is how would you figure out that certain reformulation of compactness is convenient? Before that you should also decide why you care about compactness and why it is an interesting property, so that we need to find a convenient definition for it.
Now about your notions:
Continuity: Let me note that your definition of continuity is for functions $\mathbb{R}\to \mathbb{R}$, but not in the full generality of topological spaces. It is worth mentioning that calculus existed many years before people came up with nice $\varepsilon-\delta$ definition of limits and defined derivatives in the way modern books do. The reason this definition sticked, is because it is convenient to work with. Well, for many arguments with continuity, it is even more convenient to use the topological definition (preimage of any open set is open). This definition certainly does not look natural to high schooler and I am not sure if it gives you more intuition than $\varepsilon-\delta$ one, or if it captures the "essence" of continuity. You might have discovered it if you were studying abstract topological spaces (but why would you do that?). The only way I could justify the topological definition is that after trying to work with it, you notice that this definition results in more elegant proofs than $\varepsilon-\delta$ one.
Compactness: similar story. To motivate that compactness is an important notion, you could define compact sets to be sets on which any continous function attains its minimum (certainly for subsets of $\mathbb{R^n}$ it is equivalent to compactness in the usual sense). Certainly, an important notion for someone who cares about minimizing functions from real life (which are continuous). Now, you may ask: is this the most important feature of compactness that captures all about it, that your students should always think about? Probably not, but I do not know how do you compare those things. The point is that being a compact set is equivalent to a billion different things, and for people with different interests one is more important than the other.
When you let a student prove that closed and bounded sets are such, they would most likely try to do it by extracting converging subsequences, so you see that the definition of "every sequence has converging subsequence" is useful. Then when you try to show other features of compact sets, you notice that you keep using the same argument of extracting subsequences. That's how one decides that this should be a definition of a compact set. Similarly to topological definition of continuity, I find it hard to justify why one should use this definition, before you try to go into the proof, and see that it is convenient. Then you do other proofs of things that you find important, and you see that many times you need to extract finite subcover. What would be the first such example that students see? It does not matter. So if you tried to answer questions related to compactness, I am sure you would have discovered the argument with extracting finite subcovers, and after you have done it many times you could have thought to just use it as definition of compactness.
Since we do not know all equivalent reformulations of being a compact set, it is possible that we have not yet discovered the best definition which will spill the most light into what compactness is about. So could you discover it? I don't know...
Just to give one example, think of the following very basic fact: image of any compact set under continuous function is compact. Try to prove it using various definitions (of compactness and of continuity), and see which one is more elegant.
Another nice example is amenable groups. They have many equivalent definitions (surely in double digits, but possibly more than 100). Each new definition is a theorem. Many definitions are very natural when you study one property of them, but useless when you study other properties. Many mathematicians have intuition about one definition but not others, depending on their areas of expertise. Could you or anyone else find discover all/some of these definitions/theorems? If you were interested in a problem, such that the relevant definition arises in the context of that problem and you have decent skills, then the answer is yes.