Effecient way to compare data tables
First I would ask if you have tried this in a simple for/foreach loop instead and compared the performance?
At the moment you are creating a new Enumerable and then copying to a datatable. If you use a for/foreach loop then you can compare and copy in the same iteration.
You should also look at the string comparison. At the moment you are trimming then converting to lowercase. This will allocate new memory for each operation for each string as strings are immutable. So in your where statement you are basically doing this (up to) 8 times per iteration.
I would also ask if you really need Trim()
? Is it likely that one DT will have a space at the front of the string and the other not? Or will a comparison still be true? Don't trim strings unless really needed.
Then you should use case insensitive string comparison rather than converting ToLower
. This will be quicker. According to MS StringComparison.OrdinalIgnoreCase
is better performing.
Do these and then compare performance and see how much difference you have
See also: https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings
Update:
This intrigued me, so I went back and done some tests. I generated 10,000 rows of random(ish) data in two datatables where every second row would match and executed your comparison vs a simplified for loop comparison with a String comparison like this:
for (int i = 0; i < dt1.Rows.Count; i++)
{
if (dt1.Rows[i]["N"].ToString().Equals(dt2.Rows[i]["N"].ToString(), StringComparison.OrdinalIgnoreCase)
&& dt1.Rows[i][columnName].ToString().Equals(dt2.Rows[i][columnName].ToString(), StringComparison.OrdinalIgnoreCase))
{
dtDifference.Rows.Add(dt1.Rows[i].ItemArray);
}
}
Your code = 66,000ms -> 75,000ms
For loop code = 12ms -> 20ms
A significant difference!
Then I did a comparison using the for loop method but with the two different string comparison types for the string. Using my string comparison, vs yours. But I had to test on 1 million rows for this, to get a significant difference.
This differend by between 200ms and 800ms
So it seems in this case that the string comparison is not a major factor.
So it seems that your Linq query creating the datarows is what is taking the majority of time and not the comparison of the rows themselves.
So switch to using the for loop, and all will be well in the world again!