Removing rows whose elements appear less than a certain number

Make some fake data:

SeedRandom[10]; numOccurrences = 5;
data = Table[{RandomInteger[{0, 100}], RandomInteger[{1, 100}], 1}, 100];

Group the data by the value of b, then select only those values of b for which at least numOccurrences instances are present, then take the values of the resulting association and flatten them back into the desired shape:

Select[Length[#] >= numOccurrences &]@ GroupBy[#[[2]] &]@ data;
Values[%]~Flatten~1

(* Out: 
 {{83, 1, 1}, {33, 1, 1}, {27, 1, 1}, {12, 1, 1}, {74, 1, 1}}
*)

To generate data (with a[i] instead of ai but the principle is unchanged):

SeedRandom[2020];
n = 20;
indexA = RandomInteger[{1, 4}, n];
indexB = RandomInteger[{1, 4}, n];
data = Table[{a[indexA[[i]]], b[indexB[[i]]], 1}, {i, n}]

(* {{a[1], b[4], 1}, {a[1], b[2], 1}, {a[1], b[2], 1}, {a[1], b[4], 
  1}, {a[4], b[4], 1}, {a[1], b[4], 1}, {a[3], b[2], 1}, {a[4], b[3], 
  1}, {a[4], b[1], 1}, {a[3], b[3], 1}, {a[1], b[3], 1}, {a[4], b[3], 
  1}, {a[4], b[1], 1}, {a[4], b[4], 1}, {a[4], b[4], 1}, {a[2], b[2], 
  1}, {a[1], b[4], 1}, {a[2], b[1], 1}, {a[4], b[3], 1}, {a[2], b[4], 
  1}} *)

Then, store variablesn that appear less than min times in the second colum, and select rows whose second value in not in the list of bad indices badB:

min = 5;
badB = Select[Tally[data[[All, 2]]], #[[2]] < min &][[All, 1]]
Select[data, MemberQ[badB, #[[2]]] == False &]

(* {{a[1], b[4], 1}, {a[1], b[4], 1}, {a[4], b[4], 1}, {a[1], b[4], 
1}, {a[4], b[4], 1}, {a[4], b[4], 1}, {a[1], b[4], 1}, {a[2], b[4], 
1}} *)

ClearAll[pick]
pick = Pick[#, 
    Developer`ToPackedArray @ UnitStep[(Counts[#[[All, 2]]] /@ #[[All, 2]]) - #2], 1] &;

Using data from MarcoB's answer:

SeedRandom[10];
data = Table[{RandomInteger[{0, 100}], RandomInteger[{1, 100}], 1}, 100];
pick[data, 5]

 {{83, 1, 1}, {33, 1, 1}, {27, 1, 1}, {12, 1, 1}, {74, 1, 1}}

Note: This approach preserves the ordering of kept rows.

Removing rows whose elements appear less than a certain number

Tags:

Matrix

List Manipulation

Related

Recent Posts