Parse CSV, ignoring commas inside string literals in VBA?
@Gimp said...
The current answers do not contain enough detail.
I'm running into the same problem. Looking for more detail in this answer.
To elaborate on @MRAB's answer:
Function ParseCSV(FileName)
Dim Regex 'As VBScript_RegExp_55.RegExp
Dim MatchColl 'As VBScript_RegExp_55.MatchCollection
Dim Match 'As VBScript_RegExp_55.Match
Dim FS 'As Scripting.FileSystemObject
Dim Txt 'As Scripting.TextStream
Dim CSVLine
ReDim ToInsert(0)
Set FS = CreateObject("Scripting.FileSystemObject")
Set Txt = FS.OpenTextFile(FileName, 1, False, -2)
Set Regex = CreateObject("VBScript.RegExp")
Regex.Pattern = """[^""]*""|[^,]*" '<- MRAB's answer
Regex.Global = True
Do While Not Txt.AtEndOfStream
ReDim ToInsert(0)
CSVLine = Txt.ReadLine
For Each Match In Regex.Execute(CSVLine)
If Match.Length > 0 Then
ReDim Preserve ToInsert(UBound(ToInsert) + 1)
ToInsert(UBound(ToInsert) - 1) = Match.Value
End If
Next
InsertArrayIntoDatabase ToInsert
Loop
Txt.Close
End Function
You need to customize the InsertArrayIntoDatabase Sub for your own table. Mine has several text fields named f00, f01, etc...
Sub InsertArrayIntoDatabase(a())
Dim rs As DAO.Recordset
Dim i, n
Set rs = CurrentDb().TableDefs("tbl").OpenRecordset()
rs.AddNew
For i = LBound(a) To UBound(a)
n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc..
rs.Fields(n) = a(i)
Next
rs.Update
End Sub
Note that instead of using CurrentDb()
in InsertArrayIntoDatabase()
, you should really use a global variable that gets set to the value of CurrentDb()
before ParseCSV()
runs, because running CurrentDb()
in a loop is very slow, especially on a very large file.
The first way to solve this problem is to look at the structure of the line from the csv file (int,int,"String literal, will have at most one comma", etc). A naive solution would be (Assuming that the line don't have any semicolons)
Function splitLine1(line As String) As String()
Dim temp() As String
'Splits the line in three. The string delimited by " will be at temp(1)
temp = Split(line, Chr(34)) 'chr(34) = "
'Replaces the commas in the numeric fields by semicolons
temp(0) = Replace(temp(0), ",", ";")
temp(2) = Replace(temp(2), ",", ";")
'Joins the temp array with quotes and then splits the result using the semicolons
splitLine1 = Split(Join(temp, Chr(34)), ";")
End Function
This function only solves this particular problem. Another way to do the job is using the regular expression object from VBScript.
Function splitLine2(line As String) As String()
Dim regex As Object
Set regex = CreateObject("vbscript.regexp")
regex.IgnoreCase = True
regex.Global = True
'This pattern matches only commas outside quotes
'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"
regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
'regex.replaces will replace the commas outside quotes with semicolons and then the
'Split function will split the result based on the semicollons
splitLine2 = Split(regex.Replace(line, ";"), ";")
End Function
This method seems much more cryptic, but does not deppends on the structure of the line
You can read more about regular expressions patterns in VBScript Here
A simple regex for parsing a CSV line, assuming no quotes inside quoted fields, is:
"[^"]*"|[^,]*
Each match will return a field.