SSIS Excel Data Source - Is it possible to override column data types?
I was looking for a solution for the similar issue, but didn't find anything on the internet. Although most of the found solutions work at design time, they don't work when you want to automate your SSIS package.
I resolved the issue and made it work by changing the properties of "Excel Source". By default the AccessMode
property is set to OpenRowSet
. If you change it to SQL Command
, you can write your own SQL to convert any column as you wish.
For me SSIS was treating the NDCCode
column as float, but I needed it as a string and so I used following SQL:
Select [Site], Cstr([NDCCode]) as NDCCode From [Sheet1$]
If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.
I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.
In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.
Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.
According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:
- If the top 8 records contain equal number of numeric and character types – then the priority is numeric
- If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
- If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as NULLs
The post outlines two things you can do to fix this:
- First, add
IMEX=1
to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric. - In the registry, change the value for
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.
To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.
A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of
- You have more control over what you convert to.
- You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
- Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)