Golang truncate strings with special characters without corrupting data
Slicing strings treats them as their underlying byte array; the slice operator operates on indexes of bytes, not of runes (which can be multiple bytes each). However, range
over a string iterates on runes - but the index returned is of bytes. This makes it fairly straightforward to do what you're looking for (full playground example here):
func SanitizeName(name string, limit int) string {
name = reNameBlacklist.ReplaceAllString(name, "")
result := name
chars := 0
for i := range name {
if chars >= limit {
result = name[:i]
break
}
chars++
}
return result
}
This is explained in further detail on the Go blog
The reason your data is getting corrupted is because some characters use more than one byte and you are splitting them. To avoid this Go has type rune
which represents a UTF-8 character. You can just cast the string to a []rune
like this:
func SanitizeName(name string, limit int) string{
reNameBlacklist.ReplaceAllString(name, "")
result := []rune(name)
// Remove the special chars here
return string(result[:limit])
}
This should only leave the first limit UTF-8 characters.