Convert frequency counts to long notation in Mathematica
I added timings - 3rd from the bottom is fastest. I am sure there are faster versions. If speed is important you can parallelize or come up with a Compile
-ed solution.
In[1]:= list = RandomInteger[{3, 12}, {10^7, 2}];
In[2]:= list // Developer`PackedArrayQ
Out[2]= True
In[3]:= Table[#1, {#2}] & @@@ list // Flatten; // AbsoluteTiming
Out[3]= {22.015290, Null}
In[4]:= Join @@ (Table[#1, {#2}] & @@@ list); // AbsoluteTiming
Out[4]= {18.528328, Null}
In[13]:= Join @@ ConstantArray @@@ list; // AbsoluteTiming
Out[13]= {18.261945, Null}
In[5]:= ConstantArray[#1, #2] & @@@ list // Flatten; // AbsoluteTiming
Out[5]= {43.177745, Null}
In[6]:= NestList[# &, #1, #2 - 1] & @@@ list // Flatten; // AbsoluteTiming
Out[6]= {30.278883, Null}
In[7]:= Join @@MapThread[ConstantArray, Thread[list]]; // AbsoluteTiming
Out[7]= {15.465663, Null}
In[8]:= Flatten@ MapThread[ConstantArray, Thread[list]]; // AbsoluteTiming
Out[8]= {40.184748, Null}
In[9]:= Join @@ MapThread[Table[#1, {#2}] &, Thread[list]]; // AbsoluteTiming
Out[9]= {18.716637, Null}
In[3]:= Inner[ConstantArray, Sequence @@ Transpose@list, Join]; // AbsoluteTiming
Out[3]= {16.525300, Null}
Internal`RepetitionFromMultiplicity
list = {{1, 5}, {2, 10}, {3, 5}};
Internal`RepetitionFromMultiplicity @ list
{1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3}
This is faster than the fastest method in @Vitaliy's post.
list = RandomInteger[{3, 12}, {10^7, 2}];
res1 = Join @@ MapThread[ConstantArray, Thread[list]]; // AbsoluteTiming // First
8.5771
res2 = Internal`RepetitionFromMultiplicity[list]; // AbsoluteTiming // First
6.6958
res1 == res2
True
kglr made a very interesting find with Internal`RepetitionFromMultiplicity
. However, Internal`RepetitionFromMultiplicity
produces unpacked arrays and that tells me that it is not as efficient as it could be.
Here is an attempt to produce a compiled version that also allows for parallelization:
getRepetitionFromMultiplicity =
Compile[{{list, _Integer, 2}, {start, _Integer}, {stop, _Integer}},
Block[{a, x, y, c = 0},
a = Table[0, {i, 1, Total[list[[start ;; stop, 2]]]}];
Do[
x = Compile`GetElement[list, i, 1];
y = Compile`GetElement[list, i, 2];
Do[c++; a[[c]] = x, {i, 1, y}],
{i, start, stop}
];
a
],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
repetitionFromMultiplicity[list_?MatrixQ, jobs_: 1000] :=
Module[{len, starts, stops},
If[jobs <= Length[list],
len = Floor[Length[list]/jobs];
starts = len Range[0, jobs - 1] + 1;
stops = len Range[1, jobs];
stops[[-1]] = Length[list];
Join @@ getRepetitionFromMultiplicity[list, starts, stops]
,
getRepetitionFromMultiplicity[list, 1, Length[list]]
]
]
These are the timings (on a quad core machine):
list = RandomInteger[{3, 12}, {10^7 + 1, 2}];
res2 = Internal`RepetitionFromMultiplicity[list]; // AbsoluteTiming // First
res3 = repetitionFromMultiplicity[list]; // AbsoluteTiming // First
Developer`ToPackedArray@res2 == res3
4.85631
0.586881
True