Add Cache-Control and Expires headers to Azure Storage Blobs
The latest version of Cerebrata Cloud Storage Studio, v2011.04.23.00, supports setting cache-control on individual blob objects. Right click on the blob object, choose "View/Edit Blob Properties" then set the value for the Cache-Control
attribute. (e.g. public, max-age=2592000
).
If you check the HTTP headers of the blob object using curl, you'll see the cache-control header returned with the value you set.
Sometimes, the simplest answer is the best one. If you just want to manage a small amount of blobs, you can use Azure Management to change the headers/metadata for your blobs.
- Click on Storage, then click on the storage account name.
- Click the Containers tab, then click on a container.
- Click on a blob, then click on Edit at the bottom of the screen.
In that edit window, you can customize the Cache Control, Content Encoding, Content Language, and more.
Note: you cannot currently edit this data from the Azure Portal
I had to run a batch job on about 600k blobs and found 2 things that really helped:
- Running the operation from a worker role in the same data center. The speed between Azure services is great as long as they are in the same affinity group. Plus there are no data transfer costs.
Running the operation in parallel. The Task Parallel Library (TPL) in .net v4 makes this really easy. Here is the code to set the cache-control header for every blob in a container in parallel:
// get the info for every blob in the container var blobInfos = cloudBlobContainer.ListBlobs( new BlobRequestOptions() { UseFlatBlobListing = true }); Parallel.ForEach(blobInfos, (blobInfo) => { // get the blob properties CloudBlob blob = container.GetBlobReference(blobInfo.Uri.ToString()); blob.FetchAttributes(); // set cache-control header if necessary if (blob.Properties.CacheControl != YOUR_CACHE_CONTROL_HEADER) { blob.Properties.CacheControl = YOUR_CACHE_CONTROL_HEADER; blob.SetProperties(); } });
Here's an updated version of Joel Fillmore's answer using Net 5 and V12 of Azure.Storage.Blobs. (Aside: wouldn't it be nice if default header properties could be set on the parent container?)
Instead of creating a website and using a WorkerRole, Azure has the ability to run "WebJobs". You can run any executable on demand on a website at the same datacenter where your storage account is located to set cache headers or any other header field.
- Create a throw-away, temporary website in the same datacenter as your storage account. Don't worry about affinity groups; create an empty ASP.NET site or any other simple site. The content is unimportant. I needed to use at least a B1 service plan, otherwise the WebJob aborted after 5 minutes.
- Create a console program using the code below which works with the updated Azure Storage APIs. Compile it for release, and then zip the executable and all required DLLs into a .zip file, or just publish it from VisualStudio and skip #3 below.
- Create a WebJob and upload the .zip file from step #2.
- Run the WebJob. Everything written to the console is available to view in the log file created and accessible from the WebJob control page.
- Delete the temporary website, or change it to a Free tier (under "Scale Up").
The code below runs a separate task for each container, and I'm getting up to 100K headers updated per minute (depending on time of day?). No egress charges.
using Azure;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace AzureHeaders
{
class Program
{
private static string connectionString = "DefaultEndpointsProtocol=https;AccountName=REPLACE_WITH_YOUR_CONNECTION_STRING";
private static string newCacheControl = "public, max-age=7776001"; // 3 months
private static string[] containersToProcess = { "container1", "container2" };
static async Task Main(string[] args)
{
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
var tasks = new List<Task>();
foreach (var container in containersToProcess)
{
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(container);
tasks.Add(Task.Run(() => UpdateHeaders(containerClient, 1000))); // I have no idea what segmentSize should be!
}
Task.WaitAll(tasks.ToArray());
}
private static async Task UpdateHeaders(BlobContainerClient blobContainerClient, int? segmentSize)
{
int processed = 0;
int failed = 0;
try
{
// Call the listing operation and return pages of the specified size.
var resultSegment = blobContainerClient.GetBlobsAsync()
.AsPages(default, segmentSize);
// Enumerate the blobs returned for each page.
await foreach (Azure.Page<BlobItem> blobPage in resultSegment)
{
var tasks = new List<Task>();
foreach (BlobItem blobItem in blobPage.Values)
{
BlobClient blobClient = blobContainerClient.GetBlobClient(blobItem.Name);
tasks.Add(UpdateOneBlob(blobClient));
processed++;
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine($"Container {blobContainerClient.Name} processed: {processed}");
}
}
catch (RequestFailedException e)
{
Console.WriteLine(e.Message);
failed++;
}
Console.WriteLine($"Container {blobContainerClient.Name} processed: {processed}, failed: {failed}");
}
private static async Task UpdateOneBlob(BlobClient blobClient) {
Response<BlobProperties> propertiesResponse = await blobClient.GetPropertiesAsync();
BlobHttpHeaders httpHeaders = new BlobHttpHeaders
{
// copy any existing headers you wish to preserve
ContentType = propertiesResponse.Value.ContentType,
ContentHash = propertiesResponse.Value.ContentHash,
ContentEncoding = propertiesResponse.Value.ContentEncoding,
ContentDisposition = propertiesResponse.Value.ContentDisposition,
// update CacheControl
CacheControl = newCacheControl
};
await blobClient.SetHttpHeadersAsync(httpHeaders);
}
}
}