What is the fastest way to turn every member of an array alphanumeric?
tl;dr - Regular expressions destroy VBA implementations. If this were a code challenge, @brettj or @Slai should win it.
There are a bunch of tricks to make your AlphaNumericOnly
faster.
First, you can get rid of the vast majority of the function calls by treating it as a byte array instead of a string. That removes all of the calls to Mid$
and Asc
. Although these are incredibly fast functions, they still add the overhead pushing onto and popping off of the call stack. That adds up over a couple hundred thousand iterations.
The second optimization is to not use Case x To y
syntax if you can avoid it. The reason has to do with how it compiles - it doesn't compile to a test like Case = Condition >= x And Condition <= y
, it actually creates a loop with an early exit condition like this:
Case = False
For i = x To y
If Condition = i Then
Case = True
End If
Next
Again, not a huge performance hit, but it adds up. The third optimization is to order your tests in a way that makes them sort circuit on the most likely hits in your data set. I tailored my examples below for primarily letters, with most of them upper case. You may do better with different ordering. Put it all together and you get something that looks like this:
Public Function ByteAlphaNumeric(source As Variant) As String
Dim chars() As Byte
Dim outVal() As Byte
chars = CStr(source) 'Load the array up.
Dim bound As Long
bound = UBound(chars) 'Size the outbound array.
ReDim outVal(bound)
Dim i As Long, pos As Long
For i = 0 To bound Step 2 'Wide characters, only care about the ASCII range.
Dim temp As Byte
temp = chars(i) 'Pointer math isn't free. Cache it.
Select Case True 'Order is important here.
Case temp > 64 And temp < 91
outVal(pos) = temp
pos = pos + 2 'Advance the output pointer.
Case temp < 48
Case temp > 122
Case temp > 96
outVal(pos) = temp
pos = pos + 2
Case temp < 58
outVal(pos) = temp
pos = pos + 2
End Select
Next
'This is likely the most expensive operation.
ReDim Preserve outVal(pos) 'Trim the output array.
ByteAlphaNumeric = outVal
End Function
How does it do? Pretty well:
Public Sub Benchmark()
Dim starting As Single, i As Long, dummy As String, sample As Variant
sample = GetRandomString
starting = Timer
For i = 1 To 1000000
dummy = AlphaNumericOnlyOP(sample)
Next i
Debug.Print "OP's AlphaNumericOnly: ", Timer - starting
starting = Timer
For i = 1 To 1000000
dummy = AlphaNumericOnlyThunderframe(sample)
Next i
Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting
starting = Timer
For i = 1 To 1000000
dummy = AlphaNumeric(sample)
Next i
Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting
starting = Timer
For i = 1 To 1000000
dummy = ByteAlphaNumeric(sample)
Next i
Debug.Print "ByteAlphaNumeric: ", Timer - starting
Dim cast As String
cast = CStr(sample)
starting = Timer
For i = 1 To 1000000
dummy = ByteAlphaNumericString(cast)
Next i
Debug.Print "ByteAlphaNumericString: ", Timer - starting
Set stripper = Nothing
starting = Timer
For i = 1 To 1000000
dummy = OptimizedRegex(sample)
Next i
Debug.Print "OptimizedRegex: ", Timer - starting
End Sub
Private Function GetRandomString() As Variant
Dim chars(30) As Byte, i As Long
Randomize
For i = 0 To 30 Step 2
chars(i) = Int(96 * Rnd + 32)
Next i
Dim temp As String
temp = chars
GetRandomString = CVar(temp)
End Function
Results with a 15 character random String
:
OP`s AlphaNumericOnly: 6.565918 ThunderFrame`s AlphaNumericOnly: 3.617188 CallumDA33`s AlphaNumeric: 23.518070 ByteAlphaNumeric: 2.354980
Note, I omitted submissions that weren't trivial to convert to functions. You may notice 2 additional test - the ByteAlphaNumericString
is exactly the same as the ByteAlphaNumeric
function, but it takes a String
as input instead of a Variant
and gets rid of the cast. That's not trivial:
ByteAlphaNumericString: 2.226074
And finally, the elusive OptimizedRegex
function (basically @brettj's code in function form for comparison timing):
Private stripper As RegExp 'Module level
Function OptimizedRegex(strSource As Variant) As String
If stripper Is Nothing Then
Set stripper = New RegExp
With stripper
.Global = True
.Pattern = "[^0-9A-Za-z]"
End With
End If
OptimizedRegex = stripper.Replace(strSource, vbNullString)
End Function
OptimizedRegex: 1.094727
EDIT: Bonus implementation!
It occurred to me that a hash table lookup might be faster than a Select Case
structure, so I built one with using a Scripting.Dictionary
:
Private hash As Scripting.Dictionary 'Module level
Function HashLookups(source As Variant) As String
Dim chars() As Byte
Dim outVal() As Byte
chars = CStr(source)
Dim bound As Long
bound = UBound(chars)
ReDim outVal(bound)
Dim i As Long, pos As Long
With hash
For i = 0 To bound Step 2
Dim temp As Byte
temp = chars(i)
If .Exists(temp) Then
outVal(pos) = temp
pos = pos + 2
End If
Next
End With
ReDim Preserve outVal(pos)
HashLookups = outVal
End Function
Private Sub LoadHashTable()
Set hash = New Scripting.Dictionary
Dim i As Long
For i = 48 To 57
hash.Add i, vbNull
Next
For i = 65 To 90
hash.Add i, vbNull
Next
For i = 97 To 122
hash.Add i, vbNull
Next
End Sub
'Test code:
starting = Timer
LoadHashTable
For i = 1 To 1000000
dummy = HashLookups(sample)
Next i
Debug.Print "HashLookups: ", Timer - starting
It turned out to be not too shabby:
HashLookups: 1.655273
Final Version
Woke up and thought I'd try a vector lookup instead of a hash lookup (just fill a byte array of values to keep and use that for tests). This seems reasonable in that it's only a 256 element array - basically a truth table:
Private lookup(255) As Boolean 'Module level
Function VectorLookup(source As Variant) As String
Dim chars() As Byte
Dim outVal() As Byte
chars = CStr(source)
Dim bound As Long
bound = UBound(chars)
ReDim outVal(bound)
Dim i As Long, pos As Long
For i = 0 To bound Step 2
Dim temp As Byte
temp = chars(i)
If lookup(temp) Then
outVal(pos) = temp
pos = pos + 2
End If
Next
ReDim Preserve outVal(pos)
VectorLookup = outVal
End Function
Private Sub GenerateTable()
Dim i As Long
For i = 48 To 57
lookup(i) = True
Next
For i = 65 To 90
lookup(i) = True
Next
For i = 97 To 122
lookup(i) = True
Next
End Sub
Assuming that the lookup table is only generated once, it's clocking in somewhere around 10-15% faster than any other pure VBA method above.
Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w
matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.
'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing
Dim r As Range, s As String
Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000
With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
r.Copy
.GetFromClipboard
Application.CutCopyMode = False
s = .GetText
.Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text"
With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp")
.Global = True
'.IgnoreCase = False ' .IgnoreCase is False by default
.Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters
s = .Replace(s, vbNullString)
End With
.SetText s
.PutInClipboard
End With
' about 70% of the time is spent here in pasting the data
r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1
'Debug.Print Timer - t
I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.
Disabling events didn't seem to make difference in my tests, but might be worth trying.
Note that there is a tiny chance of another application using the clipboard while the macro is using it.
If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.