How can I reverse a string that contains combining characters in Perl?
You can use the \X special escape (match a non-combining character and all of the following combining characters) with split
to make a list of graphemes (with empty strings between them), reverse the list of graphemes, then join
them back together:
#!/usr/bin/perl
use strict;
use warnings;
my $original = "re\x{0301}sume\x{0301}";
my $wrong = reverse $original;
my $right = join '', reverse split /(\X)/, $original;
print "original: $original\n",
"wrong: $wrong\n",
"right: $right\n";
The best answer is to use Unicode::GCString, as Sinan points out
I modified Chas's example a bit:
- Set the encoding on STDOUT to avoid "wide character in print" warnings;
- Use a positive lookahead assertion (and no separator retention mode) in
split
(doesn't work after 5.10, apparently, so I removed it)
It's basically the same thing with a couple of tweaks.
use strict;
use warnings;
binmode STDOUT, ":utf8";
my $original = "re\x{0301}sume\x{0301}";
my $wrong = reverse $original;
my $right = join '', reverse split /(\X)/, $original;
print <<HERE;
original: [$original]
wrong: [$wrong]
right: [$right]
HERE