"Indexing" a vector
I think the built-in function ArrayComponents
is what you need:
vec = {1, 4, 4, 8, 7, 7, 4};
ArrayComponents[vec]
(* {1,2,2,3,4,4,2} *)
mat = {{1, 4}, {2, 7}, {7, 2}, {9, 4}};
ArrayComponents[mat]
(* {{1,2},{3,4},{4,3},{5,2}} *)
raggedarray = RandomSample /@ (CharacterRange["a", "z"][[#]] & /@
Range[RandomSample[Range[5]]])
(* {{"a","b"},{"a"},{"c","b","a"},{"d","c","b","a"},{"e","b","c","d","a"}} *)
ArrayComponents[raggedarray]
(* {{1,2},{1},{3,2,1},{4,3,2,1},{5,2,3,4,1}} *)
genericinput = {{"a", "b"}, 1, 2, {3, 4}, {"a"}, "c", "b", "a", {"d", "c", "d"}, {2, 3}} ;
ArrayComponents[genericinput]
(* {{1,2},3,4,{5,6},{1},7,2,1,{8,7,8},{4,5}} *)
The old-school way to do this:
index[a_] := Module[{i = 1, f}, f[x_] := f[x] = i++; f /@ a]
index @ vec
{1, 2, 2, 3, 4, 4, 2}
A method using Assocation
, introduced long after ArrayComponents
.
index2[a_List] := AssociationThread[#, Range@Length@#] ~Lookup~ a & @ DeleteDuplicates @ a
Edit #2: extended to matrices using eldo's own method:
index2[m_List?MatrixQ] := Partition[index2 @ Flatten @ m, Last @ Dimensions @ m]
halirutan's unflatten
could be used in similar fashion for application to arbitrary nested lists of any structure.
Benchmarks
Needs["GeneralUtilities`"]
BenchmarkPlot[{ArrayComponents, index, index2}, RandomInteger[#, 5 #] &, 2^Range[3, 20],
"IncludeFits" -> True, ImageSize -> 600]
Well then, at least on this first test the older ArrayComponents
is several times slower than the newer Assocation
and Lookup
functionality. Let's try benchmarks with first denser and then sparser duplication:
BenchmarkPlot[{ArrayComponents, index, index2}, RandomInteger[99, 5 #] &, 2^Range[3, 20],
"IncludeFits" -> True, ImageSize -> 600]
With dense duplication index2
still beats ArrayComponents
. index2
is about six times faster than ArrayCompoents
here.
BenchmarkPlot[{ArrayComponents, index, index2}, RandomInteger[15 #, 5 #] &,
2^Range[3, 20], "IncludeFits" -> True, ImageSize -> 600]
With sparse duplication index2
is still the winner, but there is indication that it has higher complexity. Let's try single point test with a larger set. (Each in a fresh kernel.)
SeedRandom[0]
big = RandomInteger[3*^7, 1*^7];
ArrayComponents[big] // Timing // First
MaxMemoryUsed[]
23.758952 2092193592
SeedRandom[0]
big = RandomInteger[3*^7, 1*^7];
index2[a_] := AssociationThread[#, Range@Length@#] ~Lookup~ a & @ DeleteDuplicates @ a
index2[big] // Timing // First
MaxMemoryUsed[]
13.400486 1199556824
Not only does index2
remain faster than ArrayComponents
it uses only a bit more than half as much memory.
Alright, a final test: perhaps unpacked data is the Achilles heel of index2
:
(* don't forget to reload definitions needed for this plot *)
BenchmarkPlot[{ArrayComponents, index, index2}, "foo" /@ RandomInteger[#, #] &,
2^Range[3, 20], "IncludeFits" -> True, ImageSize -> 600]
Nope! :-) It appears that index2
is superior across the board.
you can also use ClusteringComponents
function
inex[m_] := ClusteringComponents[m, Length@m + 1];
vec = {1, 4, 4, 8, 7, 7, 4};
inex[vec]
(*{1, 2, 2, 3, 4, 4, 2}*)
mat = {{1, 4}, {2, 7}, {7, 2}, {9, 4}};
inex[mat]
(*{{1, 2}, {3, 4}, {4, 3}, {5, 2}}*)