Show non printable characters in a string

You'll have to make the translation manually; go through the string with a regular expression for example, and replace each occurrence with the hex equivalent.

import re

replchars = re.compile(r'[\n\r]')
def replchars_to_hex(match):
    return r'\x{0:02x}'.format(ord(match.group()))

replchars.sub(replchars_to_hex, inputtext)

The above example only matches newlines and carriage returns, but you can expand what characters are matched, including using \x escape codes and ranges.

>>> inputtext = 'Some example containing a newline.\nRight there.\n'
>>> replchars.sub(replchars_to_hex, inputtext)
'Some example containing a newline.\\x0aRight there.\\x0a'
>>> print(replchars.sub(replchars_to_hex, inputtext))
Some example containing a newline.\x0aRight there.\x0a

I don't know of any built-in method, but it's fairly easy to do using a comprehension:

import string
printable = string.ascii_letters + string.digits + string.punctuation + ' '
def hex_escape(s):
    return ''.join(c if c in printable else r'\x{0:02x}'.format(ord(c)) for c in s)

Modifying ecatmur's solution to handle non-printable non-ASCII characters makes it less trivial and more obnoxious:

def escape(c):
    if c.printable():
        return c
    c = ord(c)
    if c <= 0xff:
        return r'\x{0:02x}'.format(c)
    elif c <= '\uffff':
        return r'\u{0:04x}'.format(c)
    else:
        return r'\U{0:08x}'.format(c)

def hex_escape(s):
    return ''.join(escape(c) for c in s)

Of course if str.isprintable isn't exactly the definition you want, you can write a different function. (Note that it's a very different set from what's in string.printable—besides handling non-ASCII printable and non-printable characters, it also considers \n, \r, \t, \x0b, and \x0c as non-printable.

You can make this more compact; this is explicit just to show all the steps involved in handling Unicode strings. For example:

def escape(c):
    if c.printable():
        return c
    elif c <= '\xff':
        return r'\x{0:02x}'.format(ord(c))
    else:
        return c.encode('unicode_escape').decode('ascii')

Really, no matter what you do, you're going to have to handle \r, \n, and \t explicitly, because all of the built-in and stdlib functions I know of will escape them via those special sequences instead of their hex versions.

I'm kind of late to the party, but if you need it for simple debugging, I found that this works:

string = "\n\t\nHELLO\n\t\n\a\17"

procd = [c for c in string]
        
print(procd)

# Prints ['\n,', '\t,', '\n,', 'H,', 'E,', 'L,', 'L,', 'O,', '\n,', '\t,', '\n,', '\x07,', '\x0f,']

While just list is simpler, a comprehension makes it easier to add in filtering/mapping if necessary.

Show non printable characters in a string

Tags:

Python

Escaping

Python 3.X

Related

Recent Posts