How can I break apart fixed-width columns in Perl?

I like using unpack for this sort of thing. It's fast, flexible, and reversible.

You just need to know the positions for each column, and unpack can automatically trim the extra whitespace from each column.

If you change something in one of the columns, it's easy to go back to the original format by repacking with the same format:

my $format = 'A23 A8 A7 A*';

while( <DATA> ) {
    chomp( my $line = $_ );

    my( $machine, $year, $letter, $sentence ) =
        unpack( $format, $_ );

    # save the original line too, which might be useful later
    push @grades, [ $machine, $year, $letter, $sentence, $_ ];
    }

my @sorted = sort { $a->[2] cmp $b->[2] } @grades;

foreach my $tuple ( @sorted ) {
    print $tuple->[-1];
    }

# go the other way, especially if you changed things
foreach my $tuple ( @sorted ) {
    print pack( $format, @$tuple[0..3] ), "\n";
    }

__END__
darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

Now, there's an additional consideration. It sounds like you might have this big chunk of multi-line text in a single variable. Handle this as you would a file by opening a filehandle on a reference to the scalar. The filehandle stuff takes care of the rest:

 my $lines = '...multiline string...';

 open my($fh), '<', \ $lines;

 while( <$fh> ) {
      ... same as before ...
      }

use strict;
use warnings;

# this puts each line in the array @lines
my @lines = <DATA>; # <DATA> is a special filehandle that treats
                    # everything after __END__ as if it was a file
                    # It's handy for testing things

# Iterate over the array of lines and for each iteration
# put that line into the variable $line
foreach my $line (@lines) {
   # Use split to 'split' each $line with the regular expression /s+/
   # /s+/ means match one or more white spaces.
   # the 4 means that all whitespaces after the 4:th will be ignored
   # as a separator and be included in $col4
   my ($col1, $col2, $col3, $col4) = split(/\s+/, $line, 4);

   # here you can do whatever you need to with the data
   # in the columns. I just print them out
   print "$col1, $col2, $col3, $col4 \n";
}


__END__
darren.local           1987    A      Sentece1
darren.local           1996    C      Sentece2
darren.local           1991    E      Sentece3
darren.local           1954    G      Sentece4
darren.local           1998    H      Sentece5

Assuming that the text is put into a single variable $info, then you can split it into separate lines using the intrinsic perl split function:

my @lines = split("\n", $info); 

where @lines is an array of your lines. The "\n" is the regex for a newline. You can loop through each line as follows:

foreach (@lines) {
   $line = $_;
   # do something with $line....  
}

You can then split each line on whitespace (regex \s+, where the \s is one whitespace character, and the + means 1 or more times):

@fields = split("\s+", $line);

and you can then access each field directly via its array index: $field[0], $field[1] etc.

or, you can do:

($var1, $var2, $var3, $var4) = split("\s+", $line);

which will put the fields in each line into seperate named variables.

Now - if you want to sort your lines by the character in the third column, you could do this:

my @lines = split("\n", $info); 
my @arr = ();    # declare new array

foreach (@lines) {
   my @fields = split("\s+", $_);
   push(@arr, \@fields)    # add @fields REFERENCE to @arr 
}

Now you have an "array of arrays". This can easily be sorted as follows:

@sorted = sort { $a->[2] <=> $b->[2] } @arr;

which will sort @arr by the 3rd element (index 2) of @fields.

Edit 2 To put lines with the same third column into their own variables, do this:

my %hash = ();             # declare new hash

foreach $line (@arr) {     # loop through lines
  my @fields = @$line;     # deference the field array

  my $el = $fields[2];     # get our key - the character in the third column

  my $val = "";
  if (exists $hash { $el }) {         # check if key already in hash
     my $val = $hash{ $el };        # get the current value for key
     $val = $val . "\n" . $line;    # append new line to hash value         
  } else {
     $val = $line;
  }
  $hash{ $el } = $val;         # put the new value (back) into the hash
}

Now you have a hash keyed with the third column characters, with the value for each key being the lines that contain that key. You can then loop through the hash and print out or otherwise use the hash values.

Tags:

Perl