Methods of making mathematical discovery
Often mathematicians make discoveries by working on an applied problem, and being confronted with a mathematical problem that is either new, or at least not well-known. Solving the required mathematics for an immediate applied problem can yield new mathematical results, and once this is done, there are often natural extensions/generalisations to the applied problem that can lead to broader mathematical problems, which have correspondingly broader results. If the mathematical problem turns out to arise in a lot of diverse applied problems (which is often the case) then the mathematical result becomes of direct interest.
The example you give in your question is a well-known mathematical problem, and it arises in a large number of applied problems in all sorts of applied mathematical fields. Although it is now well-known, once upon a time this was new, and had no known general formula. The formula for the sum of squares of the first $n$ natural numbers (called the square pyramidal numbers) was derived by Fibonacci in 1202 in his book Liber Abbaci (The Book of Calculation). A natural generalisation of the problem is to look at other positive integer powers, and this latter problem leads to a general form called the Bernoulli-Faulhaber formula. This general form of this summation for any positive integer power was published by Jacob Bernoulli in 1713 in his book Ars Conjectandi (The Art of Conjecturing), and was later independently discovered by Johann Faulhaber. In this particular case the formulae were given but a formal proof of correctness did not occur until the nineteenth century. There were many later extensions extending this sum to allow complex numbers, looking at the relationship with the Reimann-zeta function, etc.
As you can see from this example, mathematical discoveries may involve an initial discovery of a solution to some relatively narrow mathematical problem, and then later discovery of solutions to natural generalisations of that problem. This can occur in a gradual expansion as more and more extensions are made, and relationships between other results and mathematical objects are derived. Often an initial discovery is not made via strict proof, but may instead done with a heuristic demonstration, and formal proof might only come later (often from a different author).
Ben's answer above is good (upvote it!), but you might also be interested in the role of computer experiments in finding patterns that can then lead to mathematical discoveries. Jonathan Borwein was a rather well-known exponent of this (see here: https://carma.newcastle.edu.au/jon/index-books.shtml ) for example.
Before computers mathematicians would often look at the first few terms of a problem through hand-computation: now we often have the chance to see hundreds of terms, and test hypotheses really quickly. These can lead to moments of insight that then enable the reduction to simpler formulae.
Generating functions (I think the standard reference is probably this: https://www.math.upenn.edu/~wilf/gfologyLinked2.pdf ) are also a good way of deducing a simpler formula from known terms of something.