Greedy/Non-Greedy pattern matching and optional suffixes in Lua

Hmm I don't have Lua4 installed but this pattern works under Lua5. I would expect it to work for Lua4 as well.

Update 1: Since additional requirements have been specified (localization) I've adapted the pattern and the tests to reflect these.

Update 2: Updated the pattern and tests to deal with an additional class of text containing a number as mentioned by @IanBoyd in the comments. Added an explanation of the string pattern.

Update 3: Added variation for the case where the localized number is dealt with separately as mentioned in the last update to the question.

Try:

"(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"

or (no attempt to validate number localization tokens) - just take anything which is not a letter with a digit sentinel at the end of the pattern:

"(([%+%-][^%a]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"

Neither of the patterns above are meant to deal with numbers in scientific notation (e.g: 1.23e+10)

Lua5 test (Edited to clean up - tests getting cluttered):

function test(tab, pattern)
   for i,v in ipairs(tab) do
     local f1, f2, f3, f4 = v:match(pattern)
     print(string.format("Test{%d} - Whole:{%s}\nFirst:{%s}\nSecond:{%s}\nThird:{%s}\n",i, f1, f2, f3, f4))
   end
 end

 local pattern = "(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
 local testing = {"+123 Parry",
   "+123 Critical Strike",
   "-123 Parry",
   "-123 Critical Strike",
   "+123 Parry (Reforged from Dodge)",
   "+123 Critical Strike (Reforged from Dodge)",
   "-123 Parry (Reforged from Hit Chance)",
   "-123 Critical Strike (Reforged from Hit Chance)",
   "+122384    Critical    Strike      (Reforged from parry chance)",
   "+384 Critical Strike ",
   "+384Critical Strike (Reforged from parry chance)",
   "+1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+12345 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+123456 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
   "-1 MoUnT aNd RuN sPeEd InCrEaSe (Reforged from CrItIcAl StRiKe ChAnCe)",
   "-1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+123,456 +1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+123.456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+123'456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+123 456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+1,23,456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
   "+9 mana every 5 sec",
   "-9 mana every 20 min (Does not occurr in data but gets captured if there)"}
 test(testing, pattern)

Here's a breakdown of the pattern:

local explainPattern =  
   "(" -- start whole string capture
   ..
   --[[
   capture localized number with sign - 
   take at first as few digits and separators as you can 
   ensuring the capture ends with at least 1 digit
   (the last digit is our sentinel enforcing the boundary)]]
   "([%+%-][',%.%d%s]-[%d]+)" 
   ..
   --[[
   gobble as much space as you can]]
   "%s*"
   ..
   --[[
   capture start with letters, followed by anything which is not a bracket 
   ending with at least 1 letter]]
   "([%a]+[^%(^%)]+[%a]+)"
   ..
   --[[
   gobble as much space as you can]]
   "%s*"
   ..
   --[[
   capture an optional bracket
   followed by 0 or more letters and spaces
   ending with an optional bracket]]
   "(%(?[%a%s]*%)?)"
   .. 
   ")" -- end whole string capture

It's much simpler, instead of just matching the pattern, you can look directly for the short output to the string you need, you can use it string.gsub

Example:

local testing = {"+123 Parry",
"+123 Critical Strike",
"-123 Parry",
"-123 Critical Strike",
"+123 Parry (Reforged from Dodge)",
"+123 Critical Strike (Reforged from Dodge)",
"-123 Parry (Reforged from Hit Chance)",
"-123 Critical Strike (Reforged from Hit Chance)",
"+122384    Critical    Strike      (Reforged from parry chance)",
"+384 Critical Strike ",
"+384Critical Strike (Reforged from parry chance)",
"+1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+12345 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123456 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
"-1 MoUnT aNd RuN sPeEd InCrEaSe (Reforged from CrItIcAl StRiKe ChAnCe)",
"-1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123,456 +1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123.456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123'456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123 456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+1,23,456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+9 mana every 5 sec",
"-9 mana every 20 min (Does not occurr in data but gets captured if there)"}

for k,v in ipairs(testing) do
  local result = string.gsub(v, "([%+%-][',%.%d%s]-[%+%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?)", '(%1) (%2) %3')
  print(result)
end

Output

(+123) (Parry) 
(+123) (Critical Strike) 
(-123) (Parry) 
(-123) (Critical Strike) 
(+123) (Parry) (Reforged from Dodge)
(+123) (Critical Strike) (Reforged from Dodge)
(-123) (Parry) (Reforged from Hit Chance)
(-123) (Critical Strike) (Reforged from Hit Chance)
(+122384) (Critical    Strike) (Reforged from parry chance)
(+384) (Critical Strike) 
(+384) (Critical Strike) (Reforged from parry chance)
(+1234) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+12345) (Mount and run speed increase) (Reforged from CrItIcAl StRiKe ChAnCe)
(+123456) (Mount and run speed increase) (Reforged from CrItIcAl StRiKe ChAnCe)
(-1) (MoUnT aNd RuN sPeEd InCrEaSe) (Reforged from CrItIcAl StRiKe ChAnCe)
(-1) (HiT) (Reforged from CrItIcAl StRiKe ChAnCe)
(+123,456 +1234) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+123.456) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+123'456) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+123 456) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+1,23,456) (Critical Strike Chance) (Reforged from CrItIcAl StRiKe ChAnCe)
(+9) (mana every 5 sec) 
(-9) (mana every 20 min) (Does not occurr in data but gets captured if there)

Why parse this in one pattern when you can use several?

First, get the number:

local num, rest = string.match(test_string, "([%+%-]?%d+)%S*(.+)")

Then make a table that enumerates the possibilities for the hit type.

local hitTypes =
{
  "Hit",
  "Critical Strike",
  -- Insert more.
}

Now, iterate over the list, testing against each one.

local hitIndex = nil
local reforge = nil

for i, htype in ipairs(hitTypes) do
  local final = string.match(rest, htype .. "%S*(.*)")
  if(final) then
    hitIndex = i
    reforge = string.match(final, "%(Reforged from (.+)%)")
  end
end

Lua patterns are limited, so it's best to use actual code to avoid their limitations.

Tags:

Lua