Why do knowers of Bayes's Theorem still commit the Base Rate Fallacy?
As a sometimes instructor of probability, I will take a shot at this.
Answer to (1): I think it is nonintuitive. Students may be in a hurry to learn a formula to get a homework or test problem right. Their goal may not be really understanding. There are also a number of concepts in the world that get learned for class and then forgotten. (This has been studies a lot in physics, where even something as basic as the force of gravity acting on a ball is not well understood). Then later in the heat of the moment it is so much easier to think the test (for cancer) is 90% accurate, and so a positive test implies a 90% chance of cancer.
Answer to (2): One suggestion, and this is also mentioned in the article you reference, is to use actual numbers. In the above situation, let's say there is a population of 100,000 women. Of these, 1$\%$, or 1000 have breast cancer. Of these, 90$\%$ or 900 will test positive. Of the remaining 99,000 (cancer-free) women, 9$\%$, or 8910 will test (falsely) positive. We know we got a positive test result, so it comes from a pool of 9810 positive tests. But only 900 of these are the result of people who have cancer. So the chance of cancer is still only $\frac{900}{9810}$.
It is hoped that seeing the actual numbers might be more convincing, and give a better understanding of what's going on with Bayes' Theorem.
I think the problem is with the way we usually solve the problem. The students follow a procedure to get an answer (like the tree solution shown in the article) but nothing is learned about the structure of the problem. Whatever the answer is, we move on to the next problem. Let $D$ be the event 'has disease' and D's complement be $D^c.$ And "+" represents "the test gives a positive response." Then we can write Bayes' Rule as $$\frac{P(D|+)}{P(D^c|+)}=\frac{P(+|D)}{P(+|D^c)} \frac{P(D)}{P(D^c)} $$
$$\frac{P(D|+)}{1-P(D|+)}= \frac{\text{Sensitivity}}{\text{False Positive}}\frac{P(D)}{1-P(D)} $$
From left to right, this is: the posterior odds equals the Likelihood ratio times the prior odds ratio. The term $P(D)/[1-P(D)]$ is the prior odds of having the disease or 1/99 in this example. Note that these are "odds for an event," in this case having the disease. Gambling odds in Las Vegas terminology are "odds against." I think odds for an event are a little easier to understand since then odds and probability are monotonically related. The Likelihood ratio is not an odds ratio like the other two terms. That is, the first and third ratios are of the form $p/(1-p)$ and are just a different way of expressing probabilities. The middle term can be thought of as an amplification factor which turns the prior odds into posterior odds. If it is equal to $1$, a + result from the test provides us no additional information. We would hope for a large number for this ratio. In this example we have a ratio of 10. So this converts our prior odds of 1/99 to posterior odds of 10/99. To convert the posterior odds into probability, use: $\dfrac{\text{posterior odds}}{1+ \text{posterior odds}}$. So we get $P(D|+)=10/109.$ We see a positive test result raises the probability of disease from $0.01$ to $0.092$.
We clearly see that if the test had higher Sensitivity or a lower False Positive rate, the Likelihood ratio would increase and we would wind up with higher posterior odds.
The other quantity of interest is $P(D|-),$ namely the chance that you have the disease although you got a negative test result. (Remember: negative results are good and positive results are bad in health screening.)
$$\frac{P(D|-)}{1-P(D|-)}= \frac{1-\text{Sensitivity}}{1-\text{False Positive}}\frac{P(D)}{1-P(D)} $$
This starts out at $1/99$ as in the previous example and has a Likelihood ratio of $1/9.$ The posterior odds of $1/891$ converts into $P(D|-)=1/892.$ So you can relax if you get a negative result. The test result has lowered your disease probability from $0.01$ to $1/892.$ Note that here we are starting with the same disease odds $P(D)/[1-P(D)]$ but since we are now looking at a "-" result, an informative test will reduce those odds which is what the $1/9$ ratio does. If you prefer Likelihood ratios that are greater than 1 for a good test, just reciprocate each factor.
I think this approach is desirable since it clearly separates the prior information from the information obtained from the test. And is shows Bayes Rule as a factor that modifies the prior odds depending on how good the test is.
While this may appear as a approach only a Bayesian statistician would use, it does not require a Bayesian statistical viewpoint.