Is there a faster way to create a matrix of indices from ragged data?
A modest improvement when you replace Replace[...]
with Transpose@Thread
:
(udata = Sort[DeleteDuplicates[Flatten@testData], Less];
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
out1 = Replace[testData /. dsptch, a_Integer :> {a, a}, 1];) // AbsoluteTiming
(* {2.1282128, Null} *)
(udata = Sort[DeleteDuplicates[Flatten@testData], Less];
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
out2 = Transpose@Thread[testData /. dsptch];) // AbsoluteTiming
(* {1.9421942, Null} *)
out1==out2
(* True *)
I get about a 20% speed up by using kguler's Thread
trick to transform the data at the beginning, saving the Transpose
until the end. There seems to be a slight advantage to working on data with dimensions {2,10^6} over data with dimensions {10^6,2}. I'm not sure why.
twobiglists=Thread[testData];
udata=Sort[DeleteDuplicates[Flatten@twobiglists],Less];
dsptch=Dispatch[Thread[udata->Range[Length[udata]]]];
result=Transpose[twobiglists/.dsptch];
This is a little faster approach:
first,transform data
and udata
a little, represent Infinity and -Infinity by "a1" and "a0" :
data2 = Block [{DirectedInfinity = "a" <> ToString[# + 1] &}, data]
=>{{1, "a2"}, {"a0", 2}, 3, {2, 2}, {2, 3}}
udata2 = Block [{DirectedInfinity = "a" <> ToString[# + 1] &}, udata]
=>{"a0", 1, 2, 3, "a2"}
second, rebuild dispatch table:
dsptch2 = Dispatch[Thread[udata2 -> Range[Length[udata2]]]];
third, Replace
and Replace
:
Replace[Replace[data2, dsptch2, {-1}], a_Integer :> {a, a}, 1]
==>{{2, 5}, {1, 3}, {4, 4}, {3, 3}, {3, 4}}
the main difference is the inner Replace
, make some bigger test data:
l = Join[Range[ 100], {\[Infinity] , -\[Infinity] }];
l2 = Partition [RandomChoice[l, 10^6], 2];
data = Riffle[l2, Join[{\[Infinity] , -\[Infinity] }, Range[ 100]], 5];
now timing the inner Replace
part alone:
c1 = data2 /. dsptch2; // Timing (*original approach*)
c2 = Replace[data2, dsptch2, {-1}]; // Timing (*modified approach*)
c1 == c2
=>{0.749, Null}
=>{0.343, Null}
=>True
we see the speed is doubled, now timing the whole:
(udata = Sort[DeleteDuplicates[Flatten@data], Less];
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
a1 = Replace[data /. dsptch, a_Integer :> {a, a}, 1];) // Timing
(data2 = Block [{DirectedInfinity = "a" <> ToString[# + 1] &}, data];
udata = Sort[DeleteDuplicates[Flatten@data], Less];
udata2 =
Block [{DirectedInfinity = "a" <> ToString[# + 1] &}, udata];
dsptch2 = Dispatch[Thread[udata2 -> Range[Length[udata2]]]];
a4 = Replace[Replace[data2, dsptch2, {-1}], a_Integer :> {a, a},
1];) // Timing
a4 == a1
=>{1.092, Null}
=>{0.889, Null}
=>True
a little faster...