using papa parse for big csv files
You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!
This is probably a good use case scenario for Web Workers.
NOTE: Per @tomBryer's answer below, Papa Parse now has support for Web Workers out of the box. This may be a better approach than rolling your own worker.
If you've never used them before, this site gives a decent rundown, but the key part is:
Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.
Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.
Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.
I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.
EDIT:
(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript
function (which is globally defined within the worker thread). See the MDN page for more info on that.)
Here is my working example:
csv.html
<!doctype html>
<html>
<head>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>
<body>
<input id="myFile" type="file" name="files" value="Load File" />
<br>
<button class="load-file">Load and Parse Selected CSV File</button>
<div id="report"></div>
<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;
// document.ready
$(function() {
$('.load-file').on('click', function(e) {
start_time = performance.now();
$('#report').text('Processing...');
console.log('initialize worker');
var worker = new Worker('worker.js');
worker.addEventListener('message', function(ev) {
console.log('received raw CSV, now parsing...');
// Parse our CSV raw text
Papa.parse(ev.data, {
header: true,
dynamicTyping: true,
complete: function (results) {
// Save result in a globally accessible var
parsed_csv = results;
console.log('parsed CSV!');
console.log(parsed_csv);
$('#report').text(parsed_csv.data.length + ' rows processed');
end_time = performance.now();
console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
}
});
// Terminate our worker
worker.terminate();
}, false);
// Submit our file to load
var file_to_load = document.getElementById("myFile").files[0];
console.log('call our worker');
worker.postMessage({file: file_to_load});
});
});
</script>
</body>
</html>
worker.js
self.addEventListener('message', function(e) {
console.log('worker is running');
var file = e.data.file;
var reader = new FileReader();
reader.onload = function (fileLoadedEvent) {
console.log('file loaded, posting back from worker');
var textFromFileLoaded = fileLoadedEvent.target.result;
// Post our text file back from the worker
self.postMessage(textFromFileLoaded);
};
// Actually load the text file
reader.readAsText(file, "UTF-8");
}, false);
GIF of it processing, takes less than a second (all running locally)
As of v5, PapaParse has now baked in WebWorkers.
A simple example of invoking the worker within Papaparse is below
Papa.parse(bigFile, {
worker: true,
step: function(results) {
console.log("Row:", results.data);
}
});
No need to re-implement if you have your own worker with PP, but for future projects, some may find it easier to use PapaParse's solution.