perl6: why is array skipping calculated values inside declaration?

(It turned out that this answer completely missed what was ailing @con. But I'll not delete it because it gathers together some hopefully useful links regarding rational numerics.)

What's wrong?

why is array skipping calculated values inside declaration?

It isn't.

I am learning Perl6 ...

Numbers like 1.3 are decimals in math and in Perl 6. Some explanations:

  • Damian Conway spends 3 minutes demonstrating the difference between the Perl 6 and Perl 5 approaches to rationals.

  • My Medium comment on Baking rationals into a programming language.

  • The P6 doc's Numerics page. (Written by Zoffix. Thank you for the ton of excellent stuff you did for P6 Zoffix.)

  • My SO answer to the question "Does Perl 6 performance suffer by using rationals for decimal numbers".

... from Perl5

Something about this belongs in one of the Perl 5 to Perl 6 guide docs. Would you be willing to open a new doc issue?

[1.0/15.0, 10.0/62.0],#this isn't evaluated

[1.0/10, 2/50.0],#neither is this

They're all evaluated.

In math and in P6 a literal 1.0 is a decimal is a rational. foo / bar is also a rational.

(Well, it is if foo and bar are both integers or rationals and either the denominator of the result remains 64 bits or less or one of foo or bar is an arbitrary precision FatRat rational.)

However, Perl6 doesn't seem to like values like I have indicated here.

You haven't explained the situation in which you're seeing something that makes you think P6 doesn't like them.

A strong possibility is that, per Brad Gilbert++'s comment, you are seeing values like, say, <1/15> instead of 0.066667. The former is a P6 literal value that 100% accurately denotes 1/15. To 100% accurately denote that in decimal the display would have to be 0.06666666666666... with a ... or some such at the end to denote an infinitely repeating final 6. But <1/15> denotes the same number and is shorter and arguably simpler so dd and .perl use the <1/15> form instead.

How can I get Perl6 to evaluate expressions like this when I'm declaring the 2D array?

You don't have to do anything. It's evaluating them. :)


(This is a nanswer, i.e. not an answer per se. It was originally written after @con rewrote their question to include all their code and was a step towards my Third and final answer. It's now hopefully a useful resource for those learning Perl 6.)

Original intro to my first edit of this nanswer

That's a whole lot of code! :)

My thought thus far is: are you absolutely, positively, 100% certain you haven't just missed out some input data? It seems vastly more likely that you've skipped data than that P6 has, especially given that the value calculated is exactly the one you're expecting for your next correct result.

(Update It did indeed turn out to be that the problem was incorrect input data.)

Translation, with commentary, of the code in question

The rest of this nanswer is a line-by-line "clean up" (not a refactor) of the code in the question. I had two goals in mind, with the second one being the most important one as you, dear reader, read this:

  • My translation effectively proved to me and demonstrated to @con that I'd taken every bit of their code into account. This was intended to reduce uncertainty about where the bug might be. Note that most of my changes could not possibly be directly relevant to their bug but I didn't feel comfortable assuming anything until I'd done the rewrite.

  • @con's code, and my translation of it, can presumably be useful for anyone learning Perl 6. @con's code is a P6 translation of Perl 5 code. The P5 code is in turn a translation of C code. And there are other translations of that code into other languages. My code takes @con's translation and translates it to a more idiomatic version with commentary explaining why I changed their code.

My translation of code in question, runnable via tio

My translation of @con's code, minus my commentary (see below), in tio.

(My original plan was that we would further explore @con's bug based on modifying the code in tio and sharing links to the modified tio version(s). To do that you/they would just click within tio the link icon at the top () to get a link to the code as it is when the link icon is clicked. Tio is great for being able to write, edit and run code written in P6 and other languages and then share it.)

My translation of code in question, with commentary

I've left the opening lgamma routine as is. I've written extensive commentary about it in a footnote, mainly for other readers, because it packs in a ton of interesting features1:

sub lgamma ( Num(Real) \n --> Num ){
  use NativeCall;
  sub lgamma (num64 --> num64) is native {}
  lgamma( n )
}

I note that the lgamma code is all about floating point numbers (in P6 that's the Num / num... types).


sub pvalue (@a, @b) {

For those who don't know Perls, the sub keyword introduces a subroutine (aka function).

This one takes two "listy" (Positional) arguments.

(When you see an @ (either as a "sigil" eg as in @foo or as an operator, eg as in @($bar)), think "list".)


To speed up code reading and reduce repetitive code I replaced this:

  if @a.elems <= 1 {
    return 1.0;
  }
  if @b.elems <= 1 {
    return 1.0;
  }

with this:

  return 1 if @a | @b <= 1;

Read this in English as "return 1 if the number of elements in either list @a or list @b is less than or equal to 1".


I used the | operator to construct an any junction.

(Don't waste time trying to wrap your head around the theory of what junctions do, and how they do it. Just use them in practice in simple and/or succinct ways. If they're kept simple, and read in a simple fashion, they're great and just do the obvious thing. If using them makes code not obvious in a particular case, then consider using some other construct for that particular case.)


An array in numeric context evaluates to its number of elements. Numeric operators like <= impose numeric context. So I dropped the .elems.

Numeric context, and arrays evaluating to their length in numeric context, are basic aspects of P6 so this is idiomatic coding that's appropriate for all but the most basic introductory newbie examples.


I switched from 1.0 to 1.

My guess is that @con wrote 1.0 because that was how the value was literally written in code they were translating and/or with the intent it represented a floating point value. But in P6 plain decimal literals (without an e exponent) like 1.0 produce rational numbers instead.

Using 1.0 instead of 1 in this code replaces a simpler faster type (integers) with a more complicated slower type (rationals). A value of this slower type will then be forced to convert (more slowly than an integer) into a floating point number when it's used in formulae in which any component value is floating point. Which will happen for most or all the formulae in this code because lgamma returns a floating point number.

More generally, P6 works and reads best if you leave the types of variables and values unspecified, leaving them for the compiler to figure out, unless there's a compelling reason to specify them. Leaving types unspecified reduces cognitive load for the reader and increases flexibility for reuse of the code and for optimization of the code by the compiler.

This leads to a complementary pair of maxims:

  • By default, leave type information implicit. If you don't know if it matters what type a value or variable is given, or know that it doesn't matter, then don't specify it.

  • If you make the type of a value, variable or parameter explicit, then P6 will use that type forcing compliance with that type, regardless of whether that improves your code or unnecessarily slows code down or stops it altogether.

If you absolutely know that you need to use a different type or add a type constraint to make code correct, then by all means go ahead. Likewise, if you've already verified that your code is correct without the more specific type but you want to make it faster, safer, or clearer, go ahead. But if you don't know, then leave it generic. P6 has been designed to do what you mean and will succeed far more often than you can possibly imagine, as Ben once said.

To state this point even more forcefully, in analogy with premature optimization, in P6, premature typing is an anti-pattern in both one-off code and long term production code.


  my Rat $mean1 = @a.sum / @a.elems;
  my Rat $mean2 = @b.sum / @b.elems;
  if $mean1 == $mean2 {
    return 1.0;
  }

became:

  my (\mean_a, \mean_b) = @a.sum / @a, @b.sum / @b;
  return 1 if mean_a == mean_b;

I typically "slash sigils" of "variables" unless I know I'll need a sigil. If a "variable" doesn't actually vary, it likely doesn't need a sigil.

I've renamed $mean1 to mean_a because it clearly corresponds to @a.


  my Rat $variance1 = 0.0;
  my Rat $variance2 = 0.0;

  for @a -> $i {
    $variance1 += ($mean1 - $i)**2#";" unnecessary for last statement in block
  }
  for @b -> $i {
    $variance2 += ($mean2 - $i)**2
  }

became:

  my ($vari_a, $vari_b);

  $vari_a += (mean_a - $_)² for @a;
  $vari_b += (mean_b - $_)² for @b;

Read the variable $_ as "it".


  if ($variance1 == 0 && $variance2 == 0) {
    return 1.0;
  }
  $variance1 /= (@a.elems - 1);
  $variance2 /= (@b.elems - 1);

became:

  return 1 unless ($vari_a or $vari_b);

  $vari_a /= (@a - 1);
  $vari_b /= (@b - 1);

In Perls, 0 evaluates in a boolean test context as False. And an array or list like @a evaluates in a numeric context to its element count.


  my $WELCH_T_STATISTIC = ($mean1-$mean2)/sqrt($variance1/@a.elems+$variance2/@b.elems);
  my $DEGREES_OF_FREEDOM = (($variance1/@a.elems+$variance2/@b.elems)**2)
  /
  (
  ($variance1*$variance1)/(@a.elems*@a.elems*(@a.elems-1))+
  ($variance2*$variance2)/(@b.elems*@b.elems*(@b.elems-1))
  );
  my $A = $DEGREES_OF_FREEDOM/2;
  my $value = $DEGREES_OF_FREEDOM/($WELCH_T_STATISTIC*$WELCH_T_STATISTIC+$DEGREES_OF_FREEDOM);
  my Num $beta = lgamma($A)+0.57236494292470009-lgamma($A+0.5);
  my Rat $acu = 10**(-15);
  my ($ai,$cx,$indx,$ns,$pp,$psq,$qq,$rx,$temp,$term,$xx);
  # Check the input arguments.
  return $value if $A <= 0.0;# || $q <= 0.0;

became:

  my \WELCH_T_STATISTIC  = (mean_a - mean_b)
                / ( $vari_a / @a   +   $vari_b / @b ).sqrt;

  my \DEGREES_OF_FREEDOM = ($vari_a / @a   +   $vari_b / @b)²
          / ($vari_a² / (@a² * (@a - 1))   +   $vari_b² / (@b² * (@b - 1)));

  my \A                  = DEGREES_OF_FREEDOM / 2;
  my $value              = DEGREES_OF_FREEDOM
             / (WELCH_T_STATISTIC² + DEGREES_OF_FREEDOM);

  my \magic-num          = 0.57236494292470009;
  my \beta               = lgamma(A) + magic-num - lgamma(A + 0.5);
  my \acu                = 1e-15;

  # Check the input arguments.
  return $value if A <= 0;# || $q <= 0.0;

(What's that $q about in the above line?)

Note that I dropped the types on beta and acu. The value assigned to the beta variable will be a Num anyway because lgamma returns a Num. The value assigned to acu will also be a Num because use of the e exponent in a number literal means the value it constructs is a Num.


  return $value if $value < 0.0 || 1.0 < $value;
  # Special cases
  return $value if $value == 0.0 || $value == 1.0;

became:

  return $value unless $value ~~ 0^..^1;

Read the 0^ as "above zero" (not including zero) and the ^1 as "up to one" (not including one).

The ~~ is the "smart match" operator. It returns True if the value on its right accepts the value on its left.

So this return statement returns $value if $value is less than or equal to 0 or greater than or equal to 1.


As a minor tidy up I moved a my declaration of a load of variables that I dropped out in the above rewrite to this point, immediately before they became relevant, and added $ns too:

  my ($ai, $cx, $indx, $pp, $psq, $qq, $rx, $temp, $term, $xx, $ns);

  $psq = $A + 0.5;
  $cx = 1.0 - $value;
  if $A < $psq * $value {
      ($xx, $cx, $pp, $qq, $indx) = ($cx, $value, 0.5, $A, 1);
  } else {
      ($xx, $pp, $qq, $indx) = ($value, $A, 0.5, 0);
  }

became:

  $psq = A + 0.5;
  $cx = 1 - $value;
  ($xx, $cx, $pp, $qq, $indx) =
        A < $psq * $value
          ?? ($cx, $value, 0.5, A, 1)
          !! ($value, $cx, A, 0.5, 0);

I rewrote the conditional assignment of a bunch of variables as a ternary so it was easier to see what got assigned to what.


  $term = 1.0;
  $ai = 1.0;
  $value = 1.0;
  $ns = $qq + $cx * $psq;
  $ns = $ns.Int;

became:

  $term = 1;
  $ai = 1;
  $value = 1;
  $ns = ($qq + $cx * $psq) .Int;

Replacing 1.0s with 1s and combining the expression assigned to $ns with the .Int coercion.

(I stripped types from the code as I translated and it continued to calculate the correct results except that removing the above Int coercion made the code infiniloop. This is what finally led me to search the net to see if I could find the code @con was translating. That's when I found it on rosettacode.org. It was explicitly typed as an integer in the code I saw so presumably it's central to ensuring the algorithm works.)


  #Soper reduction formula.
  $rx = $xx / $cx;
  $temp = $qq - $ai;
  $rx = $xx if $ns == 0;

(Didn't change.)


  while (True) {

became:

  loop {

    $term = $term * $temp * $rx / ( $pp + $ai );
    $value = $value + $term;
    $temp = $term.abs;

(Didn't change.)


    if $temp <= $acu && $temp <= $acu * $value {
      $value = $value * ($pp * $xx.log + ($qq - 1.0) * $cx.log - $beta).exp / $pp;
      $value = 1.0 - $value if $indx;
      last;
    }

became:

    if $temp <= acu & acu * $value {
      $value = $value * ($pp * $xx.log + ($qq - 1) * $cx.log - beta).exp / $pp;
      $value = 1 - $value if $indx;
      last;
    }

This time the condition containing the junction (&) reads in English as "if temp is less than or equal to both acu and acu times value".


    $ai++;
    $ns--;
    if 0 <= $ns {
        $temp = $qq - $ai;
        $rx = $xx if $ns == 0;
    } else {
        $temp = $psq;
        $psq = $psq + 1;
    }
  }
  return $value;
}

I just replaced 1.0 with 1.


Now the problematic array. As I wrote at the start, I'm pretty sure you have (or the supplier of your data has) just forgotten a couple lines:

my @array2d =

  [27.5,21.0,19.0,23.6,17.0,17.9,16.9,20.1,21.9,22.6,23.1,19.6,19.0,21.7,21.4],
  [27.1,22.0,20.8,23.4,23.4,23.5,25.8,22.0,24.8,20.2,21.9,22.1,22.9,20.5,24.4],

  [17.2,20.9,22.6,18.1,21.7,21.4,23.5,24.2,14.7,21.8],
  [21.5,22.8,21.0,23.0,21.6,23.6,22.5,20.7,23.4,21.8,20.7,21.7,21.5,22.5,23.6,21.5,22.5,23.5,21.5,21.8],

  [19.8,20.4,19.6,17.8,18.5,18.9,18.3,18.9,19.5,22.0],
  [28.2,26.6,20.1,23.3,25.2,22.1,17.7,27.6,20.6,13.7,23.2,17.5,20.6,18.0,23.9,21.6,24.3,20.4,24.0,13.2],

  [30.02,29.99,30.11,29.97,30.01,29.99],
  [29.89,29.93,29.72,29.98,30.02,29.98],

  [3.0,4.0,1.0,2.1],
  [490.2,340.0,433.9],

  [<1.0/15.0>, <10.0/62.0>],
  [<1.0/10>, <2/50.0>],

  [0.010268,0.000167,0.000167],
  [0.159258,0.136278,0.122389],

  [9/23.0,21/45.0,0/38.0],
  [0/44.0,42/94.0,0/22.0];

say @array2d[11][0];

Who or what says these are the correct answers? Are you 100% sure the 0.0033... answer goes with the [<1.0/15.0>, <10.0/62.0>],[<1.0/10>, <2/50.0>] data?

my @CORRECT_ANSWERS =
      0.021378001462867,
      0.148841696605327,
      0.0359722710297968,
      0.090773324285671,
      0.0107515611497845,
      0.00339907162713746,
      0.52726574965384,
      0.545266866977794;

And in the last bit I just removed types and once again used a superscript for a prettier value-plus-exponent (10⁻⁹):

my $i = 0;
my $error = 0;
for @array2d -> @left, @right {
  my $pvalue = pvalue(@left, @right);
  $error += ($pvalue - @CORRECT_ANSWERS[$i]).abs;
  say "$i [" ~ @left.join(',') ~ '] [' ~ @right ~ "] = $pvalue";
  if $error > 10⁻⁹ {
    say "\$p = $pvalue, but should be @CORRECT_ANSWERS[$i]";
    die;
  }
  #  printf("Test sets %u p-value = %.14g\n",$i+1,$pvalue);
  $i++
}

printf("the cumulative error is %g\n", $error);

Footnotes

1 Love that nice lgamma routine! (Turns out Brad Gilbert wrote it.)

sub lgamma ( Num(Real) \n --> Num ){
  use NativeCall;
  sub lgamma (num64 --> num64) is native {}
  lgamma( n )
}

Showcasing:

  • Shadowing a low level (C) routine with a high level P6 (Perl 6) routine with the same name to add further processing before calling the C function.

  • Explicitly converting the input type from the broader P6 type Real numbers to the narrower Num. Perls are "strongly typed" per the original technical definition of the term, but P6 provides additional options per several other interpretations of "strong typing" and relative to languages less capable of these other interpretations of "strong typing" (like Perl 5, Python, and C). Explicit type conversion is part of this shift in capability that P6 introduces.

  • Slashing the sigil. This is further discussed where I've done the same elsewhere in this post.

  • Lexically scoping use of a library. The inner lgamma routine, and the symbols imported by use Nativecall;, are not visible anywhere but inside the outer lgamma routine containing it.

  • Using NativeCall (the P6 C FFI) to allow high level P6 code to sweetly map directly to C code including automatic conversion from P6's IEEE double float compatible boxed type Num to the unboxed machine data type equivalent num64.

All in 5 lines! Very nice. :)


Third and final answer

.oO (He says, hopefully. Has anyone ever written four answers to an SO question?!? )

If you swap these lines in your data:

  [<1.0/15.0>, <10.0/62.0>],
  [<1.0/10>, <2/50.0>],

  [0.010268,0.000167,0.000167],
  [0.159258,0.136278,0.122389],

to instead be the other way around:

  [0.010268,0.000167,0.000167],
  [0.159258,0.136278,0.122389],

  [<1.0/15.0>, <10.0/62.0>],
  [<1.0/10>, <2/50.0>],

then the output of your program (at least my version of it from my rewrite of your code nanswer) is:

0.159258
0 [27.5,21,19,23.6,17,17.9,16.9,20.1,21.9,22.6,23.1,19.6,19,21.7,21.4] [27.1 22 20.8 23.4 23.4 23.5 25.8 22 24.8 20.2 21.9 22.1 22.9 20.5 24.4] = 0.02137800146286709
1 [17.2,20.9,22.6,18.1,21.7,21.4,23.5,24.2,14.7,21.8] [21.5 22.8 21 23 21.6 23.6 22.5 20.7 23.4 21.8 20.7 21.7 21.5 22.5 23.6 21.5 22.5 23.5 21.5 21.8] = 0.14884169660532756
2 [19.8,20.4,19.6,17.8,18.5,18.9,18.3,18.9,19.5,22] [28.2 26.6 20.1 23.3 25.2 22.1 17.7 27.6 20.6 13.7 23.2 17.5 20.6 18 23.9 21.6 24.3 20.4 24 13.2] = 0.035972271029797116
3 [30.02,29.99,30.11,29.97,30.01,29.99] [29.89 29.93 29.72 29.98 30.02 29.98] = 0.09077332428566681
4 [3,4,1,2.1] [490.2 340 433.9] = 0.010751561149784494
5 [0.010268,0.000167,0.000167] [0.159258 0.136278 0.122389] = 0.003399071627137453
6 [1.0/15.0,10.0/62.0] [1.0/10 2/50.0] = 0.5272657496538401
7 [0.391304,0.466667,0] [0 0.446809 0] = 0.5452668669777938
the cumulative error is 5.50254e-15

In retrospect this was very obviously what was wrong. 2020 hindsight and all that. :)

Tags:

Raku