Parse UTF8 string from ReadOnlySequence<byte>
The first thing we should do here is test whether the sequence actually is a single span; if it is, we can hugely simplify and optimize.
Once we know that we have a multi-segment (discontiguous) buffer, there are two ways we can go:
- linearize the segments into a contiguous buffer, probably leasing an oversized buffer from ArrayPool.Shared, and use UTF8.GetString on the correct portion of the leased buffer, or
- use the
GetDecoder()
API on the encoding, and use that to populate a new string, which on older frameworks means overwriting a newly allocated string, or in newer frameworks means using thestring.Create
API
The first option is massively simpler, but involves a few memory-copy operations (but no additional allocations other than the string):
public static string GetString(in this ReadOnlySequence<byte> payload,
Encoding encoding = null)
{
encoding ??= Encoding.UTF8;
return payload.IsSingleSegment ? encoding.GetString(payload.FirstSpan)
: GetStringSlow(payload, encoding);
static string GetStringSlow(in ReadOnlySequence<byte> payload, Encoding encoding)
{
// linearize
int length = checked((int)payload.Length);
var oversized = ArrayPool<byte>.Shared.Rent(length);
try
{
payload.CopyTo(oversized);
return encoding.GetString(oversized, 0, length);
}
finally
{
ArrayPool<byte>.Shared.Return(oversized);
}
}
}