CSV separator auto-detection in Javascript
A possible algorithm for getting the likely separator(s) is pretty simple, and assumes the data is well-formed:
- For every delimiter,
- For every line,
- Split the line by the delimiter, check the
length
. - If its
length
is not equal to the last line's length, this is not a valid delimiter.
- Split the line by the delimiter, check the
- For every line,
Proof of concept (doesn't handle quoted fields):
function guessDelimiters (text, possibleDelimiters) {
return possibleDelimiters.filter(weedOut);
function weedOut (delimiter) {
var cache = -1;
return text.split('\n').every(checkLength);
function checkLength (line) {
if (!line) {
return true;
}
var length = line.split(delimiter).length;
if (cache < 0) {
cache = length;
}
return cache === length && length > 1;
}
}
}
The length > 1
check is to make sure the split
didn't just return the whole line. Note that this returns an array of possible delimiters - if there's more than one item, you have an ambiguity problem.
Another solution is using the detect
method from the csv-string package:
detect(input : String) : String Detects the best separator.
var CSV = require('csv-string');
console.log(CSV.detect('a,b,c')); // OUTPUT : ","
console.log(CSV.detect('a;b;c')); // OUTPUT : ";"
console.log(CSV.detect('a|b|c')); // OUTPUT : "|"
console.log(CSV.detect('a\tb\tc'));// OUTPUT : "\t"