How to speed up interpreting numbers from strings?
I would like to link this W Community thread here where I asked for functionality like this in 2015 September, and explained why it's critical to have it. I can't link to individual posts but you can find it by searching the page for "StringToDouble".
As Leonid mentioned, there is Internal`StringToDouble
. This function is very fast, but it does not report errors. This makes it unsuitable for applications where not all inputs are numbers, especially when the type of the input is unpredictable.
Internal`StringToDouble["1e2"]
(* 100. *)
Internal`StringToDouble["foo"] (* not a number *)
(* 0. *)
As a workaround we can make a small LibraryLink function that parses numbers in this format. Fortunately it's very easy to do in C++.
I am going to use LTemplate for reasons of laziness (as I always do recently). LTemplate is absolutely not needed here, it just makes it quicker for me to set everything up.
First, put this C++ code in Parser.h
:
#include <sstream>
class Parser {
bool good;
public:
double parseReal(const char *s) {
std::istringstream str(s);
mma::disownString(s);
double res;
str >> res;
if (str.fail() || ! str.eof()) {
good = false;
return 0;
}
good = true;
return res;
}
mint parseInteger(const char *s) {
std::istringstream str(s);
mma::disownString(s);
mint res;
str >> res;
if (str.fail() || ! str.eof()) {
good = false;
return 0;
}
good = true;
return res;
}
bool success() const { return good; }
};
Then from Mathematica, make sure that Directory[]
is where Parser.h
is and evaluate:
<<LTemplate`
template = LClass[
"Parser",
{
LFun["parseReal", {"UTF8String"}, Real],
LFun["parseInteger", {"UTF8String"}, Integer],
LFun["success", {}, True | False]
}
];
CompileTemplate[template]
LoadTemplate[template]
Then use like this:
The parseReal
method parses a real number in e
-notation. The success
method tells us if the parsing was successful. Then we can build on top of this.
I used this for personal projects. You can get the code here, but LTemplate must be installed first, and you must also have a working C++ compiler installed... Remember that I made this for personal use. It's a really basic package and you will probably be better off writing your own, tailed to your own needs.
The only thing this provides over Internal`StringToDouble
is a way to check for errors.
Import
and ImportString
handle the e number format okay. You might be able to Import
directly from file, or use ImportString
to process the data you've already read in:
res = fromReadList ~StringRiffle~ "\n" ~ImportString~ "TSV" ~Cases~ {__,"Cake"|"Cookies",_};