Removing rows whose elements appear less than a certain number
Make some fake data:
SeedRandom[10]; numOccurrences = 5;
data = Table[{RandomInteger[{0, 100}], RandomInteger[{1, 100}], 1}, 100];
Group the data by the value of b, then select only those values of b for which at least numOccurrences
instances are present, then take the values of the resulting association and flatten them back into the desired shape:
Select[Length[#] >= numOccurrences &]@ GroupBy[#[[2]] &]@ data;
Values[%]~Flatten~1
(* Out:
{{83, 1, 1}, {33, 1, 1}, {27, 1, 1}, {12, 1, 1}, {74, 1, 1}}
*)
To generate data
(with a[i]
instead of ai
but the principle is unchanged):
SeedRandom[2020];
n = 20;
indexA = RandomInteger[{1, 4}, n];
indexB = RandomInteger[{1, 4}, n];
data = Table[{a[indexA[[i]]], b[indexB[[i]]], 1}, {i, n}]
(* {{a[1], b[4], 1}, {a[1], b[2], 1}, {a[1], b[2], 1}, {a[1], b[4],
1}, {a[4], b[4], 1}, {a[1], b[4], 1}, {a[3], b[2], 1}, {a[4], b[3],
1}, {a[4], b[1], 1}, {a[3], b[3], 1}, {a[1], b[3], 1}, {a[4], b[3],
1}, {a[4], b[1], 1}, {a[4], b[4], 1}, {a[4], b[4], 1}, {a[2], b[2],
1}, {a[1], b[4], 1}, {a[2], b[1], 1}, {a[4], b[3], 1}, {a[2], b[4],
1}} *)
Then, store variablesn that appear less than min
times in the second colum, and select rows whose second value in not in the list of bad indices badB
:
min = 5;
badB = Select[Tally[data[[All, 2]]], #[[2]] < min &][[All, 1]]
Select[data, MemberQ[badB, #[[2]]] == False &]
(* {{a[1], b[4], 1}, {a[1], b[4], 1}, {a[4], b[4], 1}, {a[1], b[4],
1}, {a[4], b[4], 1}, {a[4], b[4], 1}, {a[1], b[4], 1}, {a[2], b[4],
1}} *)
ClearAll[pick]
pick = Pick[#,
Developer`ToPackedArray @ UnitStep[(Counts[#[[All, 2]]] /@ #[[All, 2]]) - #2], 1] &;
Using data
from MarcoB's answer:
SeedRandom[10];
data = Table[{RandomInteger[{0, 100}], RandomInteger[{1, 100}], 1}, 100];
pick[data, 5]
{{83, 1, 1}, {33, 1, 1}, {27, 1, 1}, {12, 1, 1}, {74, 1, 1}}
Note: This approach preserves the ordering of kept rows.