Split line into key-value pairs based on first string

Here goes

awk '{for (i=2; i<=NF; ++i) print $1, $i}' file
A B
A C
1 2
1 3
1 4

printf %s\\n 'A B C' '1 2 3 4'|
sed -e's/\([^ ]*\)  *[^ ]*/&\n\1/;//P;D'

A B
A C
1 2
1 3
1 4

That works. It selects the first two sequences of zero or more not-space characters which are separated by one or more spaces. The first such sequence is referenced in \1 and the whole selection in &. The selection is replaced with itself followed by a \newline then \1. Pattern space is then printed up to the first occurring newline, and then the same portion is Deleted before the pattern space is recycled to the top of the script with what remains.

You can see what it does with the look command. Replace the P w/ l and put another l before the s///ubstitution...

A B C$
A B\nA C$
A C$
A C\nA$
A$
1 2 3 4$
1 2\n1 3 4$
1 3 4$
1 3\n1 4$
1 4$
1 4\n1$
1$

printf %s\\n 'A B C' '1 2 3 4'|
sed -ne:t -e'/  *[^ ]*/{s//\n&/2;P;s///;} -ett

A B
A C
1 2
1 3
1 4

It matches a pattern space with at least one sequence of space characters and any trailing not-spaces. The first substitution inserts a newline before the second occurrence of such a sequence, then Prints up to the newline, and the second substitution removes the first occurrence of that pattern - which will also now include the newline the first one appended to the tail of that sequence when operating on the second. The test branches back to the :t label each time a substitution occurs, and so sed eats pattern space a space separated field at a time.

With look again:

A B C$
A B\n C$
A C$
A C$
1 2 3 4$
1 2\n 3 4$
1 3 4$
1 3\n 4$
1 4$
1 4$

Split line into key-value pairs based on first string

Tags:

Text Processing

Related

Recent Posts