Execution time with process.hrtime() return vastly different result
As you have already noticed, the performance difference leads to the comparison: generated array
vs JSON.parse
d. What we have in both cases: same arrays with same numbers? So the look up performance must be the same? No.
Every Javascript engine has various data type structures for representing same values(numbers, objects, arrays, etc). In most cases optimizer tries to find out the best data type to use. And also often generates some additional meta information, like
hidden clases
ortags
for arrays.
There are several very nice articles about the data types:
- v8-perf/data-types
- v8 performance tips
So why arrays created by JSON.parse
are slow? The parser, when creating values, doesn't properly optimize the data structures, and as the result we get untagged
arrays with boxed
doubles. But we can optimize the arrays afterwards with Array.from
and in your case, same as the generated arrays, you get smi
arrays with smi
numbers. Here is an example based on your sample.
const fs = require('fs');
const path = require('path');
const outputFilePath = path.join(__dirname, process.argv[2] || 'test-data.json');
let tests = JSON.parse(fs.readFileSync(outputFilePath));
// for this demo we take only the first items array
var arrSlow = tests[0].input;
// `slice` copies array as-is
var arrSlow2 = tests[0].input.slice();
// array is copied and optimized
var arrFast = Array.from(tests[0].input);
console.log(%HasFastSmiElements(arrFast), %HasFastSmiElements(arrSlow), %HasFastSmiElements(arrSlow2));
//> true, false, false
console.log(%HasFastObjectElements(arrFast), %HasFastObjectElements(arrSlow), %HasFastObjectElements(arrSlow2));
//> false, true, true
console.log(%HasFastDoubleElements(arrFast), %HasFastDoubleElements(arrSlow), %HasFastDoubleElements(arrSlow2));
//> false, false, false
// small numbers and unboxed doubles in action
console.log(%HasFastDoubleElements([Math.pow(2, 31)]));
console.log(%HasFastSmiElements([Math.pow(2, 30)]));
Run it with node --allow-natives-syntax test.js
OK...first of all lets talk about testing strategy...
Running this tests several times gives incredible different results fluctuating a lot for each point... see results here
https://docs.google.com/spreadsheets/d/1Z95GtT85BljpNda4l-usPjNTA5lJtUmmcY7BVB8fFGQ/edit?usp=sharing
After test update (running 100 tests in a row and calculating average) I score that the main difference in execution times are:
- the indexOf and for loops are working better in GENERATOR scenario
- the binary search and interpolation search are working better in JSON-parse scenario
Please look at the google doc before...
OK.. Great... This thing is much more easier to explain... basicly we trapped into situation when RANDOM memory access(binary, interpolation search) and CONSECUTIVE memory access(indexOf, for) give different results
Hmmm. Lets go deeper inside the memory management model of NodeJS
First of all NodeJS has several array representations, I actually know only two - numberArray
, objectArray
(means array that can include value of any type)
Lets look at GENERATOR scenario:
During initial array creation NodeJS is ABLE to detect that your array contains only numbers, as array starting from only numbers, and nothing of other type is added to it. This results in using simple memory allocation strategy, just raw row of integers going one by one in memory...
Array is represented as array of raw numbers
in memory, most likely only memory paging table has an effect here
This fact clearly explains why CONSECUTIVE memory access works better in this case.
Lets look at JSON-parse scenario:
During the JSON parsing structure of JSON is unpredictable(NodeJS uses a stream JSON parser(99.99% confidence)), every value is tracted as most cozy for JSON parsing, so...
Array is represented as array of references to the numbers
in memory, just because while parsing JSON this solution is more productive in most cases (and nobody cares (devil) )
As far as we allocating memory in heap by small chunks memory gets filled in more fluid way
Also in this model RANDOM memory access gives better results, because NodeJS engine has no options - to optimize access time it creates good either prefix tree either hash map which gives constant access time in RANDOM memory access scenarios
And this is quite good explanation why JSON-parse scenario wins during binary, interpolation search