Difference between superscalar and multi-core?
Super-scalar processors means that you dispatch multiple instructions during a single clock cycle. The reason this is differentiated from multi-core is that you only get one instruction counter. So you keep track of multiple instructions in-flight, but all the instructions are from a single program. This is still just one process. Now I said "you get one instruction counter," and technically that's true in that there is no point in which your code will experience a disparity except using some branch prediction schemes (Speculative Execution: you simultaneously execute both branches and throw away the "wrong" prediction's result).
When you get into multi-core you have multiple instruction streams executing simultaneously. The important part is that each core (executing with its own instruction counter) can also be super-scalar in order to execute each single process more quickly!
It is possible to have super-scalar without pipelining or out-of-order execution by having what's called very long instruction word or "VLIW". This is also called "static" super-scalar (i.e. it's in the code itself). This is where you basically have enough components to execute multiple instructions at the same time, and you fetch multiple instructions at once and then run them. In its most simple form, imagine that you said "this processor will always fetch and execute two instructions at the same time." Then as long as the coder could find work to be done simultaneously in the same process, you would double your throughput! If you couldn't find two instructions to put together, you would simply pair one instruction and a NOP. This idea is not very good mostly because if you make a better version of the processor which can execute 3, 4, or more instructions at the same time, all your old code breaks! But they solved this in a quite ingenious way, you should check out explicitly parallel instruction computing or "EPIC" stuff if you want to know more.
Dynamic super-scalar with pipelining can take advantage of both data-independent instructions as well as instruction-level parallelism, which is what makes it such a powerful combination. Essentially it allows you, with enough hardware, to execute as many independent instructions simultaneously as possible.
Dynamic super-scalar with pipelining and out-of-order execution was essentially the limit of instruction-level parallelism: you would try and execute multiple instructions in the same stage simultaneously, trying to find operations which operated without data dependencies. You could finish out of order, you would start out of order, there are all sorts of things you need to do to keep your head on straight while doing super-scalar stuff. Multi-core says "hey programmer! Give me multiple problems I can solve at the same time!" and then since the programmer is capable of seeing independently solvable problems beyond just a few lines apart in the compiled assembly, they can more efficiently program those solutions for multi-core.
Super-scalar isn't even capable of solving problems like "how do I execute these two programs more quickly." It could only execute each independent program faster.
Hope that helps, sorry if it's a bit disjointed.
--Edit--
Modified to take into account ajs410's point that I had confused multiple ideas.