Import most recent csv file to sql server in ssis
Assuming that you wanted to use C#, to get the newest file in a given directory, you can use a method like this...
private static FileInfo GetLatestFile(string directoryName, string fileExtension)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
return directoryInfo.GetFiles(fileExtension)
.OrderByDescending(q => q.LastWriteTimeUtc)
.FirstOrDefault();
}
This method is called like...
FileInfo file = GetLatestFile( "C:\myDirectory", "*.csv");
And it returns a FileInfo instance (or null) of the file with the most recent write time. You can then use the FileInfo instance to get the name of the file and so on for your processing...
The code from @garry Vass, or one like it, is going to be needed even if you're using SSIS as your import tool.
Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.
Finding the most recent file
Whether you do it by file attributes (Garry's code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv
had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn't get ported back into the database.
I would suggest you create 2 variables of type String and scoped to the package named RootFolder
and CurrentFile
. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv
. RootFolder
would be the base folder you expect to find files in C:\ssisdata\MyProject
. CurrentFile
will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.
Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile.
This script would go inside the public partial class ScriptMain: ...
braces
/// <summary>
/// This verbose script identifies the most recently modified file of type fileMask
/// living in RootFolder and assigns that to a DTS level variable.
/// </summary>
public void Main()
{
string fileMask = "*.csv";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
// Assign values from the DTS variables collection.
// This is case sensitive. User:: is not required
// but you must convert it from the Object type to a strong type
rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();
// Repeat the above pattern to assign a value to fileMask if you wish
// to make it a more flexible approach
// Determine the most recent file, this could be null
System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);
if (candidate != null)
{
mostRecentFile = candidate.FullName;
}
// Push the results back onto the variable
Dts.Variables["CurrentFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;
}
/// <summary>
/// Find the most recent file matching a pattern
/// </summary>
/// <param name="directoryName">Folder to begin searching in</param>
/// <param name="fileExtension">Extension to search, e.g. *.csv</param>
/// <returns></returns>
private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
{
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);
System.IO.FileInfo mostRecent = null;
// Change the SearchOption to AllDirectories if you need to search subfolders
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
foreach (System.IO.FileInfo current in legacyArray)
{
if (mostRecent == null)
{
mostRecent = current;
}
if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
{
mostRecent = current;
}
}
return mostRecent;
// To make the below code work, you'd need to edit the properties of the project
// change the TargetFramework to probably 3.5 or 4. Not sure
// Current error is the OrderByDescending doesn't exist for 2.0 framework
//return directoryInfo.GetFiles(fileExtension)
// .OrderByDescending(q => q.LastWriteTimeUtc)
// .FirstOrDefault();
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
Updating a Connection Manager
At this point, our script has assigned a value to the CurrentFile variable. The next step is to tell SSIS we need to use that file. In your Connection Manager for your CSV, you will need to set an Expression (F4 or right click and select Properties) for the ConnectionString. The value you want to assign is our CurrentFile variable and the way that's expressed is @[User::CurrentFile]
Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.