How to execute large amount of sql queries asynchronous and in threads
RunspacePool is the way to go here, try this:
$AllQueries = @( ... )
$MaxThreads = 5
# Each thread keeps its own connection but shares the query queue
$ScriptBlock = {
Param($WorkQueue)
$objConnection = New-Object System.Data.SqlClient.SqlConnection
$objConnection.ConnectionString = 'Data Source=...'
$objCmd = New-Object System.Data.SqlClient.SqlCommand
$objCmd.Connection = $objConnection
$objCmd.CommandTimeout = 0
$query = ""
while ($WorkQueue.TryDequeue([ref]$query)) {
$objCmd.CommandText = $query
$objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter $objCmd
$objDataTable = New-Object System.Data.DataTable
$objAdapter.Fill($objDataTable) | Out-Null
}
$objConnection.Close()
}
# create a pool
$pool = [RunspaceFactory]::CreateRunspacePool(1, $MaxThreads)
$pool.ApartmentState = 'STA'
$pool.Open()
# convert the query array into a concurrent queue
$workQueue = New-Object System.Collections.Concurrent.ConcurrentQueue[object]
$AllQueries | % { $workQueue.Enqueue($_) }
$threads = @()
# Create each powershell thread and add them to the pool
1..$MaxThreads | % {
$ps = [powershell]::Create()
$ps.RunspacePool = $pool
$ps.AddScript($ScriptBlock) | Out-Null
$ps.AddParameter('WorkQueue', $workQueue) | Out-Null
$threads += [pscustomobject]@{
Ps = $ps
Handle = $null
}
}
# Start all the threads
$threads | % { $_.Handle = $_.Ps.BeginInvoke() }
# Wait for all the threads to complete - errors will still set the IsCompleted flag
while ($threads | ? { !$_.Handle.IsCompleted }) {
Start-Sleep -Seconds 1
}
# Get any results and display an errors
$threads | % {
$_.Ps.EndInvoke($_.Handle) | Write-Output
if ($_.Ps.HadErrors) {
$_.Ps.Streams.Error.ReadAll() | Write-Error
}
}
Unlike powershell jobs, a RunspacePools can share resources. So there is one concurrent queue of all the queries, and each thread keeps its own connection to the database.
As others have said though - unless you're stress testing your database, you're probably better off reorganising the queries into bulk inserts.
You need to reorganize your script so that you keep a database connection open in each worker thread, using it for all queries performed by that thread. Right now you are opening a new database connection for each query, which adds a large amount of overhead. Eliminating that overhead should speed things up to or beyond your target.
- Group your queries based on the table and operations on that table. Using this you can identity how much async sql queries you could run against your different tables.
- Make sure the size of the each table against which you are going to run.
Because if table contains millions of rows and your doing a join operation with some other table as well will increase the time or if it is a CUD operation then might lock your table as well.
- And also choose number of threads based on your CPU cores and not based on assumptions. Because CPU core will run one process at a time so better you could create number of cores * 2 threads are efficient one.
So first study your dataset and then do the above 2 items so that you could easily identity what are all the queries are run parallely and efficiently.
Hope this will give some ideas. Better you could use any python script for that So that you could easily trigger more than one process and also monitor their activites.
Try using SqlCmd.
You can use run multiple processes using Process.Start() and use sqlcmd to run queries in parallel processes.
Of course if you're obligated to do it in threads, this answer will no longer be the solution.