How do I escape a Unicode string with Ruby?

To use a unicode character in Ruby use the "\uXXXX" escape; where XXXX is the UTF-16 codepoint. see http://leejava.wordpress.com/2009/03/11/unicode-escape-in-ruby/


In Ruby 1.8.x, String#inspect may be what you are looking for, e.g.

>> multi_byte_str = "hello\330\271!"
=> "hello\330\271!"

>> multi_byte_str.inspect
=> "\"hello\\330\\271!\""

>> puts multi_byte_str.inspect
"hello\330\271!"
=> nil

In Ruby 1.9 if you want multi-byte characters to have their component bytes escaped, you might want to say something like:

>> multi_byte_str.bytes.to_a.map(&:chr).join.inspect
=> "\"hello\\xD8\\xB9!\""

In both Ruby 1.8 and 1.9 if you are instead interested in the (escaped) unicode code points, you could do this (though it escapes printable stuff too):

>> multi_byte_str.unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u0068\\u0065\\u006c\\u006c\\u006f\\u0639\\u0021"

Tags:

Ruby

Unicode