Does Hive have a String split function?
There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:
split(string str, string pat)
Split str around pat (pat is a regular expression)
In your case, the delimiter "|
" has a special meaning as a regular expression, so it should be referred to as "\\|
".
Just a clarification on the answer given by Bkkbrad.
I tried this suggestion and it did not work for me.
For example,
split('aa|bb','\\|')
produced:
["","a","a","|","b","b",""]
But,
split('aa|bb','[|]')
produced the desired result:
["aa","bb"]
Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.
For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html
Another interesting usecase for split in Hive is when, for example, a column ipname
in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:
SELECT split(ipname,'[\.]')[0] FROM tablename;