How do I capture utf-8 decode errors in node.js?

I hope you solved the problem in those years, I had a similar one and eventually solved with this ugly trick:

  function isValidUTF8(buf){
   return Buffer.compare(new Buffer(buf.toString(),'utf8') , buf) === 0;
  }

which converts the buffer back and forth and check it stays the same.

The 'utf8' encoding can be omitted.

Then we have:

> isValidUTF8(new Buffer('this is valid, 指事字 eè we hope','utf8'))
true
> isValidUTF8(new Buffer([128]))
false
> isValidUTF8(new Buffer('\ufffd'))
true

where the '\ufffd' character is correctly considered as valid utf8.

UPDATE: now this works in JXcore, too


From node 8.3 on, you can use util.TextDecoder to solve this cleanly:

const util = require('util')
const td = new util.TextDecoder('utf8', {fatal:true})
td.decode(Buffer.from('foo')) // works!
td.decode(Buffer.from([ 128 ], 'binary')) // throws TypeError

This will also work in some browsers by using TextDecoder in the global namespace.