Transactional replication latency issue
I had to finally call Microsoft support and just a simple command called DBCC LOG INFO on publisher revealed a possible root cause. I saw more than 8600 VLFs! and this was the cause of that latency. Also, our log file is preallocated to 538GB.
Replication was out of sync by almost 19 hours by the time I got follow up call from Microsoft help 4:00pm next day after I opened case with Microsoft. Steps to be taken were really simple. Backup publisher db log couple of time and try to shrink log file. Set increment factor for log file by 8GB or 12GB instead of percentage or 500MB. So, that next time log file grows, it will create 16 VLFs per 8GB or 12GB depending on your increment factor.
After I backed up log, I was able to shrink log file to 350GB and total VLFs to around 5300. Still higher. Latency didn't come down though. It went upto 22 hours. I started wondering whether number of VLFs was just one of the causes. However, around 11:30pm, latency reduced to around 7h30m and I freed up more space around that time, reducing VLFs to 2001. By 2am, replication was in sync. I hurried up and backed up log twice and shrink the log file to 10GB and grew it back to around 248GB. Total number VLFs as of now are 184 and replication is in sync since then. Whew! Log file is almost empty.
Let me know if you have any questions about this. I am glad to help. Hopefully someone else doesn't have to call Microsoft for this issue.
Will changing isolation level to Read Committed Snapshot Isolation (RCSI) help here?
This is not a straight forward change and it comes with additional tempdb penalty. I would not suggest you to just change the isolation level to RCSI without properly testing and seeing benefits in your environment. Trust me, this is a sledge hammer approach.
We recently encountered about the same problem
massive updates being done on few tables involved in replication and Log Reader is just scanning transaction log
Below is how I resolved the problem :
Make Articles replicate as BATCHED (This change is dynamic and does not require re-initialization):
- This can be done by right click publication --> generate scripts --> check
@status
value. Any value less than 16 indicates that it is set to replicate using TSQL ==> NOT Batched ! Even if 1 article is not set to BATCH, non of the articles will be BATCHED when applying changes to subscriber.
Use below TSQL
EXEC sp_changearticle @publication = N'<pub name>' -- your publication Name , @article = N'<article name>' -- your article Name , @property = 'status' , @value = 'parameters'
- This can be done by right click publication --> generate scripts --> check
Created a nonclusted index on the distribution database :
USE [distribution] GO CREATE NONCLUSTERED INDEX [nc_MSrepl_commands_DBA] ON [dbo].[MSrepl_commands] ([publisher_database_id],[article_id],[xact_seqno]) INCLUDE ([type],[originator_id]) GO
For more advance tuning, you can refer to Enhance Transactional Replication Performance especially Distribution Agent and Log Reader Agent Parameters.
Below is the script that I am using to find T-Rep Replication status :
USE [distribution]
-- Ref: http://www.sqlservercentral.com/blogs/basits-sql-server-tips/2012/07/25/t-sql-script-to-monitor-transactional-replication-status/
IF OBJECT_ID('Tempdb.dbo.#ReplStats') IS NOT NULL
DROP TABLE #ReplStats
CREATE TABLE [dbo].[#ReplStats](
[DistributionAgentName] [nvarchar](100) NOT NULL,
[DistributionAgentStartTime] [datetime] NOT NULL,
[DistributionAgentRunningDurationInSeconds] [int] NOT NULL,
[IsAgentRunning] [bit] NULL,
[ReplicationStatus] [varchar](14) NULL,
[LastSynchronized] [datetime] NOT NULL,
[Comments] [nvarchar](max) NOT NULL,
[Publisher] [sysname] NOT NULL,
[PublicationName] [sysname] NOT NULL,
[PublisherDB] [sysname] NOT NULL,
[Subscriber] [nvarchar](128) NULL,
[SubscriberDB] [sysname] NULL,
[SubscriptionType] [varchar](64) NULL,
[DistributionDB] [sysname] NULL,
[Article] [sysname] NOT NULL,
[UndelivCmdsInDistDB] [int] NULL,
[DelivCmdsInDistDB] [int] NULL,
[CurrentSessionDeliveryRate] [float] NOT NULL,
[CurrentSessionDeliveryLatency] [int] NOT NULL,
[TotalTransactionsDeliveredInCurrentSession] [int] NOT NULL,
[TotalCommandsDeliveredInCurrentSession] [int] NOT NULL,
[AverageCommandsDeliveredInCurrentSession] [int] NOT NULL,
[DeliveryRate] [float] NOT NULL,
[DeliveryLatency] [int] NOT NULL,
[TotalCommandsDeliveredSinceSubscriptionSetup] [int] NOT NULL,
[SequenceNumber] [varbinary](16) NULL,
[LastDistributerSync] [datetime] NULL,
[Retention] [int] NULL,
[WorstLatency] [int] NULL,
[BestLatency] [int] NULL,
[AverageLatency] [int] NULL,
[CurrentLatency] [int] NULL
) ON [PRIMARY]
INSERT INTO #ReplStats
SELECT da.[name] AS [DistributionAgentName]
,dh.[start_time] AS [DistributionAgentStartTime]
,dh.[duration] AS [DistributionAgentRunningDurationInSeconds]
,md.[isagentrunningnow] AS [IsAgentRunning]
,CASE md.[status]
WHEN 1 THEN '1 - Started'
WHEN 2 THEN '2 - Succeeded'
WHEN 3 THEN '3 - InProgress'
WHEN 4 THEN '4 - Idle'
WHEN 5 THEN '5 - Retrying'
WHEN 6 THEN '6 - Failed'
END AS [ReplicationStatus]
,dh.[time] AS [LastSynchronized]
,dh.[comments] AS [Comments]
,md.[publisher] AS [Publisher]
,da.[publication] AS [PublicationName]
,da.[publisher_db] AS [PublisherDB]
,CASE
WHEN da.[anonymous_subid] IS NOT NULL
THEN UPPER(da.[subscriber_name])
ELSE UPPER (s.[name]) END AS [Subscriber]
,da.[subscriber_db] AS [SubscriberDB]
,CASE da.[subscription_type]
WHEN '0' THEN 'Push'
WHEN '1' THEN 'Pull'
WHEN '2' THEN 'Anonymous'
ELSE CAST(da.[subscription_type] AS [varchar](64)) END AS [SubscriptionType]
,md.[distdb] AS [DistributionDB]
,ma.[article] AS [Article]
,ds.[UndelivCmdsInDistDB]
,ds.[DelivCmdsInDistDB]
,dh.[current_delivery_rate] AS [CurrentSessionDeliveryRate]
,dh.[current_delivery_latency] AS [CurrentSessionDeliveryLatency]
,dh.[delivered_transactions] AS [TotalTransactionsDeliveredInCurrentSession]
,dh.[delivered_commands] AS [TotalCommandsDeliveredInCurrentSession]
,dh.[average_commands] AS [AverageCommandsDeliveredInCurrentSession]
,dh.[delivery_rate] AS [DeliveryRate]
,dh.[delivery_latency] AS [DeliveryLatency]
,dh.[total_delivered_commands] AS [TotalCommandsDeliveredSinceSubscriptionSetup]
,dh.[xact_seqno] AS [SequenceNumber]
,md.[last_distsync] AS [LastDistributerSync]
,md.[retention] AS [Retention]
,md.[worst_latency] AS [WorstLatency]
,md.[best_latency] AS [BestLatency]
,md.[avg_latency] AS [AverageLatency]
,md.[cur_latency] AS [CurrentLatency]
FROM [distribution]..[MSdistribution_status] ds
INNER JOIN [distribution]..[MSdistribution_agents] da
ON da.[id] = ds.[agent_id]
INNER JOIN [distribution]..[MSArticles] ma
ON ma.publisher_id = da.publisher_id
AND ma.[article_id] = ds.[article_id]
INNER JOIN [distribution]..[MSreplication_monitordata] md
ON [md].[job_id] = da.[job_id]
INNER JOIN [distribution]..[MSdistribution_history] dh
ON [dh].[agent_id] = md.[agent_id]
AND md.[agent_type] = 3
INNER JOIN [master].[sys].[servers] s
ON s.[server_id] = da.[subscriber_id]
--Created WHEN your publication has the immediate_sync property set to true. This property dictates
--whether snapshot is available all the time for new subscriptions to be initialized.
--This affects the cleanup behavior of transactional replication. If this property is set to true,
--the transactions will be retained for max retention period instead of it getting cleaned up
--as soon as all the subscriptions got the change.
WHERE da.[subscriber_db] <> 'virtual'
AND da.[anonymous_subid] IS NULL
AND dh.[start_time] = (SELECT TOP 1 start_time
FROM [distribution]..[MSdistribution_history] a
JOIN [distribution]..[MSdistribution_agents] b
ON a.[agent_id] = b.[id] AND b.[subscriber_db] <> 'virtual'
WHERE [runstatus] <> 1
ORDER BY [start_time] DESC)
AND dh.[runstatus] <> 1
SELECT 'Transactional Replication Summary' AS [Comments];
SELECT [DistributionAgentName]
,[DistributionAgentStartTime]
,[DistributionAgentRunningDurationInSeconds]
,[IsAgentRunning]
,[ReplicationStatus]
,[LastSynchronized]
,[Comments]
,[Publisher]
,[PublicationName]
,[PublisherDB]
,[Subscriber]
,[SubscriberDB]
,[SubscriptionType]
,[DistributionDB]
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB]
,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB]
,[CurrentSessionDeliveryRate]
,[CurrentSessionDeliveryLatency]
,[TotalTransactionsDeliveredInCurrentSession]
,[TotalCommandsDeliveredInCurrentSession]
,[AverageCommandsDeliveredInCurrentSession]
,[DeliveryRate]
,[DeliveryLatency]
,[TotalCommandsDeliveredSinceSubscriptionSetup]
,[SequenceNumber]
,[LastDistributerSync]
,[Retention]
,[WorstLatency]
,[BestLatency]
,[AverageLatency]
,[CurrentLatency]
FROM #ReplStats
GROUP BY [DistributionAgentName]
,[DistributionAgentStartTime]
,[DistributionAgentRunningDurationInSeconds]
,[IsAgentRunning]
,[ReplicationStatus]
,[LastSynchronized]
,[Comments]
,[Publisher]
,[PublicationName]
,[PublisherDB]
,[Subscriber]
,[SubscriberDB]
,[SubscriptionType]
,[DistributionDB]
,[CurrentSessionDeliveryRate]
,[CurrentSessionDeliveryLatency]
,[TotalTransactionsDeliveredInCurrentSession]
,[TotalCommandsDeliveredInCurrentSession]
,[AverageCommandsDeliveredInCurrentSession]
,[DeliveryRate]
,[DeliveryLatency]
,[TotalCommandsDeliveredSinceSubscriptionSetup]
,[SequenceNumber]
,[LastDistributerSync]
,[Retention]
,[WorstLatency]
,[BestLatency]
,[AverageLatency]
,[CurrentLatency]
SELECT 'Transactional Replication Summary Details' AS [Comments];
SELECT [Publisher]
,[PublicationName]
,[PublisherDB]
,[Article]
,[Subscriber]
,[SubscriberDB]
,[SubscriptionType]
,[DistributionDB]
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB]
,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB]
FROM #ReplStats
GROUP BY [Publisher]
,[PublicationName]
,[PublisherDB]
,[Article]
,[Subscriber]
,[SubscriberDB]
,[SubscriptionType]
,[DistributionDB]