How to sync between two Azure storage (blobs) hosted on two different data centers
Is there a service that provides blob synchronization between different data centers?
No. Currently no such service exists out of the box which would synchronize content between 2 data centers.
if not, how can I implement one?
Although all the necessary infrastructure is available for you to implement this, the actual implementation would be tricky.
First you would need to decide if you want real-time synchronization or will a batched synchronization would do?
For realtime synhroniztion you could rely on Async Copy Blob
. Using async copy blob you can actually instruct the storage service to copy blob from one storage account to another instead of manually download the blob from source and uploading to target. Assuming all uploads are happening from your application, as soon as a blob is uploaded you would know in which datacenter it is being uploaded. What you could do is create a SAS URL of this blob and initiate an async copy to the other datacenter.
For batched synchronization, you would need to query both storage accounts and list blobs in each blob container. In case the blob is available in just one storage account and not other, then you could simply create the blob in destination storage account by initiating async copy blob. Things would become trickier if the blob (by the same name) is present in both storage accounts. In this case you would need to define some rules (like comparing modified date etc.) to decide whether the blob should be copied from source to destination storage account.
For scheduling the batch synchronization, you could make use of Windows Azure Scheduler Service
. Even with this service, you would need to write code for synchronization logic. Scheduler service will only take care of scheduling part. It won't do the actual synchronization.
I would recommend making use of a worker role to implement synchronization logic. Another alternative is Web Jobs
which are announced recently though I don't know much about it.
If your goals are just about performance and the content is public use Azure CDN for this. Point it at your primary blob storage container and it will copy the files around the world for best performance.