Probability vs Confidence

Your question is a natural one and the answer is controversial, lying at heart of a decades-long debate between frequentist and Bayesian statisticians. Statistical inference is not mathematical deduction. Philosophical issues arise when one takes a bit of information in a sample and tries to make a helpful statement about the population from which the sample was chosen. Here is my attempt at an elementary explanation of these issues as they arise in your question. Others may have different views and post different explanations.

Suppose you have a random sample $X_1, X_2, \dots X_n$ from $Norm(\mu, \sigma)$ with $\sigma$ known and $\mu$ to be estimated. Then $\bar X \sim Norm(\mu, \sigma/\sqrt{n})$ and we have $$P\left(-1.96 \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right) = 0.95.$$ After some elementary manipulation, this becomes $$P(\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}) = 0.95.$$ According to the frequentist interpretation of probability, the two displayed equations mean the same thing: Over the long run, the event inside parentheses will be true 95% of the time. This interpretation holds as long as $\bar X$ is viewed as a random variable based on a random sample of size $n$ from the normal population specified at the start. Notice that the second equation needs to be interpreted as meaning that the random interval $\bar X \pm 1.96\sigma/\sqrt{n}$ happens to include the unknown mean $\mu.$

However, when we have a particular sample and the numerical value of an observed mean $\bar X,$ the frequentist "long run" approach to probability is in potential conflict with a naive interpretation of the interval. In this particular case $\bar X$ is a fixed observed number and $\mu$ is a fixed unknown number. Either $\mu$ lies in the interval or it doesn't. There is no "probability" about it. The process by which the interval is derived leads to coverage in 95% of cases over the long run. As shorthand for the previous part of this paragraph, it is customary to use the word confidence instead of probability.

There is really no difference between the two words. It is just that the proper frequentist use of the word probability becomes awkward, and people have decided to use confidence instead.

In a Bayesian approach to estimation, one establishes a probability framework for the experiment at hand from the start by choosing a "prior distribution." Then a Bayesian probability interval (sometimes called a credible interval) is based on a melding of the prior distribution and the data. A difficulty Bayesian statisticians may have in helping nonstatisticians understand their interval estimates is to explain the origin and influence of the prior distribution.