removing newlines from messy strings in pandas dataframe cells?
df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=True)
worked for me.
Source:
https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
To remove carriage return (\r
), new line (\n)
and tab (\t
)
df = df.replace(r'\r+|\n+|\t+','', regex=True)
EDIT: the correct answer to this is:
df = df.replace(r'\n',' ', regex=True)
I think you need replace
:
df = df.replace('\n','', regex=True)
Or:
df = df.replace('\n',' ', regex=True)
Or:
df = df.replace(r'\\n',' ', regex=True)
Sample:
text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
A
0 hands-on\ndev nologies\nrelevant scripting\nla...
df = df.replace('\n',' ', regex=True)
print (df)
A
0 hands-on dev nologies relevant scripting lang
in messy data it might to be a good idea to remove all whitespaces df.replace(r'\s', '', regex = True, inplace = True)
.