How to run the ‘:sort u’ command in Vim on a CSV table, but only use the values in a particular column as sorting keys?

Since it is not possible to achieve the transformation in question in one run of the :sort command, let us approach it as a two-step process.

1. The first step is sorting lines by the values of the second column (separated from the first one by a comma). In order to do that, we can use the :sort command, passing a regular expression that matches the first column and the following comma:

:sort/^[^,]*,/

As :sort compares the text starting just after the match of the specified pattern on each line, it gives us the desired sorting behavior. To compare the values numerically rather than lexicographically, use the n flag:

:sort n/^[^,]*,/

2. The second step involves running through the sorted lines and removing all lines but one in every block of consecutive lines with the same value in the second column. It is convenient to build our implementation upon the :global command, which executes a given Ex command on every line matching a certain pattern. For our purposes, a line can be deleted if it contains the same value in the second column as the following line. This formalization—accompanied with the initial assumption that commas cannot occur within column values—gives us the following pattern:

^[^,]*,\([^,]*\),.*\n[^,]*,\1,.*

If we run the :delete command on every line that satisfies this pattern, going from top to bottom over them in sorted order, we will have only a single line for every distinct value in the second column:

:g/^[^,]*,\([^,]*\),.*\n[^,]*,\1,.*/d_

3. Finally, both of the steps can be combined in a single Ex command:

:sort/^[^,]*,/|g/^[^,]*,\([^,]*\),.*\n[^,]*,\1,.*/d_

Tags:

Vim