is there something akin to regEx in applescript, and if not, what's the alternative?

Don't despair, since OSX you can also access sed and grep through "do shell script". So:

set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"

My sed skills aren't too crash hot, so there might be a more elegant way than appending *good* to any name that matches [0-9]{10} and then looking for *good* at the start of the result. But basically, if filename is "1234567890dfoo.mov" this will run the command:

echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"

Note the escaped quotes \" and escaped backslash \\ in the applescript. If you're escaping things in the shell you have to escape the escapes. So to run a shell script that has a backslash in it you have to escape it for the shell like \\ and then escape each backslash in applescript like \\\\. This can get pretty hard to read.

So anything you can do on the command line you can do by calling it from applescript (woohoo!). Any results on stdout get returned to the script as the result.


There is an easier way to make use of the shell (works on bash 3.2+) for regex matching:

set isMatch to "0" = (do shell script ¬
  "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")

Note:

  • Makes use of a modern bash test expression [[ ... ]] with the regex-matching operator, =~; not quoting the right operand (or at least the special regex chars.) is a must on bash 3.2+, unless you prepend shopt -s compat31;
  • The do shell script statement executes the test and returns its exit command via an additional command (thanks, @LauriRanta); "0" indicates success.
  • Note that the =~ operator does not support shortcut character classes such as \d and assertions such as \b (true as of OS X 10.9.4 - this is unlikely to change anytime soon).
  • For case-INsensitive matching, prepend the command string with shopt -s nocasematch;
  • For locale-awareness, prepend the command string with export LANG='" & user locale of (system info) & ".UTF-8';.
  • If the regex contains capture groups, you can access the captured strings via the built-in ${BASH_REMATCH[@]} array variable.
  • As in the accepted answer, you'll have to \-escape double quotes and backslashes.

Here's an alternative using egrep:

set isMatch to "0" = (do shell script ¬
  "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")

Though this presumably performs worse, it has two advantages:

  • You can use shortcut character classes such as \d and assertions such as \b
  • You can more easily make matching case-INsensitive by calling egrep with -i:
  • You canNOT, however, gain access to sub-matches via capture-groups; use the [[ ... =~ ... ]] approach if that is needed.

Finally, here are utility functions that package both approaches (the syntax highlighting is off, but they do work):

# SYNOPIS
#   doesMatch(text, regexString) -> Boolean
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
#   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
#   there is a match or not.
#    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
#      a 'considering case' block.
#    - The current user's locale is respected.
# EXAMPLE
#    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
    local ignoreCase, extraGrepOption
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraGrepOption to "i"
    else
        set extraGrepOption to ""
    end if
    # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
    #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
    tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch

# SYNOPSIS
#   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language and
#   *returns the matching string and substrings matching capture groups, if any.*
#   
#   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
#     a 'considering case' block.
#   - The current user's locale is respected.
#   
#   IMPORTANT: 
#   
#   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
#   Instead, use one of the following POSIX classes (see `man re_format`):
#       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
#       [[:alnum:]] [[:digit:]] [[:xdigit:]]
#       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
#       [[:graph:]]  [[:print:]] 
#   
#   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#   
#   Always returns a *list*:
#    - an empty list, if no match is found
#    - otherwise, the first list element contains the matching string
#       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
#  EXAMPLE
#       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
    local ignoreCase, extraCommand
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraCommand to "shopt -s nocasematch; "
    else
        set extraCommand to ""
    end if
    # Note: 
    #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
    #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
    #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
    tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
    return paragraphs of result
end getMatch

I recently had need of regular expressions in a script, and wanted to find a scripting addition to handle it, so it would be easier to read what was going on. I found Satimage.osax, which lets you use syntax like below:

find text "n(.*)" in "to be or not to be" with regexp

The only downside is that (as of 11/08/2010) it's a 32-bit addition, so it throws errors when it's called from a 64-bit process. This bit me in a Mail rule for Snow Leopard, as I had to run Mail in 32-bit mode. Called from a standalone script, though, I have no reservations - it's really great, and lets you pick whatever regex syntax you want, and use back-references.

Update 5/28/2011

Thanks to Mitchell Model's comment below for pointing out they have updated it to be 64-bit, so no more reservations - it does everything I need.