P-values differ in tests for normality between Mathematica and SAS

I should have looked at your code but I didn't, because I saw that the test-statistics for the tests in question were the same.

What I did then is to look up how p-values are computed for Anderson-Darling, which seems to be not as standardized as one might think. I came across this site where a formula is given to compute the p-value from the test-statistic. In your case, this formula was

p[ad_] := Exp[1.2937 - 5.709 ad + 0.0186 ad^2];
p[1.275759]
(* 0.00258163 *)

which supported the result that is given by SAS. At this point, I DID look on your code.

So what you do is to use the test wrong. By giving the (estimated) normal distribution as second parameter, Mathematica does not make the usual single sample normality test but it tests against your given distribution.

This results in the same statistic-value, but obviously the p-values are computed differently. Therefore, the simple solution is

DistributionFitTest[data, Automatic, {"TestDataTable", All}]

Mathematica graphics


Yes, the P-values are being computed differently but there's a good reason for doing so and no bug report to Support needs to be made.

First generate some data:

SeedRandom[12345];
data = RandomVariate[NormalDistribution[], 2500];

Now test to see if the sample is from a normal distribution (and by that I mean any normal distribution: an unknown mean and unknown standard deviation):

h = DistributionFitTest[data, NormalDistribution[μ, σ], 
   "HypothesisTestData"];
h["TestDataTable", All]

$$ \begin{array}{l|ll} \text{} & \text{Statistic} & \text{P-Value} \\ \hline \text{Anderson-Darling} & 1.28718 & 0.00211436 \\ \text{Cramer-von Mises} & 0.184634 & 0.00785495 \\ \text{Pearson }\chi ^2 & 29.6667 & 0.000242055 \\ \text{Shapiro-Wilk} & 0.930615 & 0.0011702 \\ \end{array}$$

The same output is obtained with NormalDistribution[μ,σ] is replaced with Automatic.

But now suppose we wanted to test against a specific normal distribution. In fact suppose we test against a normal distribution with the same mean and standard deviation as the sample mean and sample standard deviation.

h = DistributionFitTest[data, 
   NormalDistribution[Mean[data], StandardDeviation[data]],
   "HypothesisTestData"];
h["TestDataTable", All]

$$\begin{array}{l|ll} \text{} & \text{Statistic} & \text{P-Value} \\ \hline \text{Anderson-Darling} & 1.27575 & 0.239873 \\ \text{Cramer-von Mises} & 0.183053 & 0.302856 \\ \text{Pearson }\chi ^2 & 29.6667 & 0.000971023 \\ \text{Shapiro-Wilk} & 0.930615 & 0.0011702 \\ \end{array}$$

So we see that all of the P-values (except the Shapiro-Wilk) have increased because we've stacked the deck by testing against a normal distribution with the same mean and standard deviation. In other words we're much less likely to reject the null hypothesis.

So the P-values differ but that's because we're testing two different hypotheses. The first is if the sample might have been drawn from an unspecified normal distribution and the other hypothesis deals with a specific normal distribution.

(I've only kept the tests that matched what SAS produced and why the Shapiro-Wilk P-value does not change is left as an exercise for the reader.)