Invalid character exception when adding Metadata to a CloudBlob

Unless I get an answer that actually solves the issue, this workaround is a solution for the above issue!

Workaround

To get this to work, I am using a combination of the below methods to:

  1. Convert all possible characters to their ascii/english equivivalent
  2. Invalid Characters that escape this cleanup are literally stripped out of the string

But this isn't ideal as we are losing data!

Diacritics to ASCII

/// <summary>
/// Converts all Diacritic characters in a string to their ASCII equivalent
/// Courtesy: http://stackoverflow.com/a/13154805/476786
/// A quick explanation:
/// * Normalizing to form D splits charactes like è to an e and a nonspacing `
/// * From this, the nospacing characters are removed
/// * The result is normalized back to form C (I'm not sure if this is neccesary)
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string ConvertDiacriticToASCII(this string value)
{
    if (value == null) return null;
    var chars =
        value.Normalize(NormalizationForm.FormD)
             .ToCharArray()
             .Select(c => new {c, uc = CharUnicodeInfo.GetUnicodeCategory(c)})
             .Where(@t => @t.uc != UnicodeCategory.NonSpacingMark)
             .Select(@t => @t.c);
    var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
    return cleanStr;
}

Non-ASCII Burninator

/// <summary>
/// Removes all non-ASCII characters from the string
/// Courtesy: http://stackoverflow.com/a/135473/476786
/// Uses the .NET ASCII encoding to convert a string. 
/// UTF8 is used during the conversion because it can represent any of the original characters. 
/// It uses an EncoderReplacementFallback to to convert any non-ASCII character to an empty string.
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string RemoveNonASCII(this string value)
{
    string cleanStr = 
           Encoding.ASCII
                   .GetString(
                              Encoding.Convert(Encoding.UTF8,
                                               Encoding.GetEncoding(Encoding.ASCII.EncodingName,
                                                                    new EncoderReplacementFallback(string.Empty),
                                                                    new DecoderExceptionFallback()
                                                                    ),
                                               Encoding.UTF8.GetBytes(value)
                                               )
                              );
    return cleanStr;
}

I really hope to get an answer as the workaround is obviously not ideal, and it also doesn't make sense why this is not possible!


Just have had confirmation from the azure-sdk-for-net team on GitHub that only ASCII characters are valid as data within blob meta-data.

joeg commented:
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.

Source on GitHub

So as a work-around, either:

  • escape the string (percent encode), base64 encode, etc, as suggested by joeg
  • use the techniques that I have mentioned in my other answer.