SSIS reading LF as terminator when its set as CRLF
Before answering, i don't think that the column contains only LF
because if the row delimiter is CRLF
it will not consider it as delimiter. So it is probably CRLF
, but i will give a solution for the two cases (CRLF or LF)
Solution
You can fix this situation with the following steps:
- First in the Flat File connection manager add only one column (of type
DT_STR
and length4000
) so you will consider each row as one column. - In the data flow task you have to add a Script component that fix the file structure. and split row into columns.
Simple Test
I will consider a flat file with the following content
ID;name;DOB;Notes;ClassID{CRLF}
1;John;2001-01-01;;1{CRLF}
2;Moh;2002-01-01;Very cool{LF}
Genius;2{CRLF}
3;Ali;2000-01-01;Calm;2{CRLF}
- First i will add a flat file connection manager with the following options:
In the DataFlow Task i will add a
Flat File Source
, 2 xScript Component
,OLEDB Destination
In the first Script Component i will mark
Column0
as input and i will add 5 output ColumnsID,Name,DOB,Notes,ClassID
and i will set the Output Synchronous Input asNone
In the first Script Component i will write a script that store each line in a memory variable and assign it to an output row when row is complete and another row is present.
Dim strLine As String = String.Empty Dim strDelimiter As String = ";" Public Sub EmptyMemoryVariables() strLine = String.Empty End Sub Public Sub AssignMemoryVariablesToOutput() With Output0Buffer .AddRow() .NewRow = strLine End With End Sub Public Function AreVariablesEmpty() As Boolean If strLine = "" Then Return True Else Return False End If End Function Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim strColumns As String() = Row.Column0.Split(CChar(strDelimiter)) If strColumns.Length = 5 Then If Not AreVariablesEmpty() Then AssignMemoryVariablesToOutput() EmptyMemoryVariables() End If strLine = Row.Column0 AssignMemoryVariablesToOutput() EmptyMemoryVariables() Else If strLine.Split(CChar(strDelimiter)).Length = 5 Then AssignMemoryVariablesToOutput() EmptyMemoryVariables() End If strLine &= Row.Column0 End If
In the second Script COmponent i will split each row into Columns
Dim strDelimiter As String = ";"
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim strColumns As String() = Row.NewRow.Split(CChar(strDelimiter))
Row.ID = strColumns(0)
Row.NAME = strColumns(1)
Row.DOB = strColumns(2)
Row.NOTES = strColumns(3)
Row.CLASSID = strColumns(4)
End Sub
Important Note: the provided code is not optimal it may need more validations or can be simpler and better but i am trying to give you the way you can think to solve this issue
I have no SSIS experience but as an ETL developer I have faced this many times. So my suggestions might not help you solve the problem but hopefully point you in the right direction
- If the problem field has text qualifier (single or double quote usually) and SSIS supports use it
- Also if there is an option to force SSIS to use different end of record delimiter other than LF (CRLF in this case) I'd use it (hopefully there is no CRLF in the problem field text)
- If the problem field is not the last field, you can count the number of de-limiters by reading the entire record as a single LF delimited field to identify and filter out the problem records (if they are only few) and try to stitch them back
- If possible read the file as single record (if SSIS has an option) and replace all LF, provided CR is consistent end of record delimiter from the source