Fast membership test for integer lists/sets

The classical approach here is to use a (tiny) font as an array, exploiting the fontdimens. For a single array we can do

\font\myintarray = cmr10 at 1sp %
\count255 = 0 %
\loop
  \advance\count255 by 1 %
  \fontdimen\count255 \myintarray = 0sp %
  \ifnum\count255 < 11 %
\repeat
\protected\def\setarray#1#2{%
  \fontdimen#1 \myintarray = #2sp %
}
\def\getarray#1{%
  \number\fontdimen#1 \myintarray
}
\setarray{5}{27}
\count255 = 255 %
\loop
  \advance\count255 by 1 %
  \getarray{\count255 } %
  \ifnum\count255 < 11 %
\repeat
\bye

with more arrays we need a little management (each one has to be a separate font). These structures are global but have constant access time (so a mapping will have linear time).


In expl3 this approach is abstracted as the intarray data type

\intarray_new:Nn \g_my_intarray { 100 }
\intarray_gset:Nnn \g_my_intarray { 5 } { 27 }
\intarray_item:Nn \g_my_intarray { 5 }

In terms of limitations, the key one is that the maximum value is one power lower than the usual TeX limit (2^{30} - 1 rather than 2^{31} - 1). There is, that I know of, no pre-determined limit on the number of fonts that can be loaded. However, the total number of fontdimens (that is, the number of items in the array) is limited: with standard settings, 4 million entries are allowed.


UPDATE: It is maybe just me, but I cannot follow the argument that there is an O(n^2) cost. Of course, this may only be a misunderstanding since in your question n is used for various objects. Let us call the number of elements of the big integer list M, and its largest element n_max. Then I claim that you need "only" M+n_max steps. There is no quadratic dependence on the number of entries or the length of the list or anything. The following code addresses your updated problem: you have a possibly large list and want to have a membership test. This is achieved by \ProcessList{<list>}{<largest entry>}. The detailed implementation can certainly be improved (I am sure you can add more \expandafters and \ignorespaces and so on), but the point is that there is no quadratic dependence whatsoever.

\documentclass{article}
\newcounter{iloop}
\makeatletter
\newcommand{\ProcessList}[2]{\setcounter{iloop}{0}%
\loop%
\stepcounter{iloop}%
\edef\temp{\noexpand\xdef\csname member\roman{iloop}\endcsname{0}}%
\temp%
\ifnum\number\value{iloop}<\the\numexpr#2+1\repeat%
\@for\next:=#1\do{\edef\mynum{\romannumeral\next}%
\expandafter\xdef\csname member\mynum\endcsname{1}}}
\newcommand{\IsInList}[2]{%
\edef\temp{\noexpand\xdef\noexpand#2{\csname member\romannumeral#1\endcsname}}%
\temp}
\makeatother
\begin{document}
% we assume that the list is known as well as its largest element
% they will become the arguments of \ProcessList
% (the largest element can also be found out automatically)
\ProcessList{1,2,3,4,6,9,10,14,19,21,22,25,30,33,%
35,38,39,40,42,44,49,50,59,60,62,63,64,%
66,67,70,71,80,82,83,85,88,89,94,95,96,%
97,99,103,106,107,109,112,116,117,119,121,%
123,126,128,132,133,134,138,139,140,141,%
143,147,148,150,153,155,157,163,165,168,%
170,176,177,178,180,184,186,190,197,202,%
207,208,209,219,220,224,234,235,238,239,%
242,244,247,249,251,259,262,265,267,268,%
270,275,280,283,285,287,288,289,292,300,%
301,303,307,311,313,314,315,318,319,323,%
324,325,326,327,331,337,346,352,354,356,%
361,362,363,366,367,368,369,372,375,377,%
378,382,383,384,388,391,393,394,395,398,%
399,400,402,404,405,407,408,409,412,417,%
421,423,426,434,439,440,443,445,446,448,%
456,461,466,467,468,470,472,477,478,479,%
481,482,483,485,489,493,494,496,500,502,%
505,509,512,514,518,522,527,528,530,531,%
533,535,536,541,545,548,551,553,554,556,%
557,560,562,564,565,566,570,571,572,575,%
577,587,593,600,601,604,605,607,610,611,%
613,614,619,621,622,623,625,632,633,634,%
635,636,637,639,645,648,651,656,661,665,%
666,669,674,677,678,679,680,682,683,684,%
685,687,689,690,693,698,700,703,704,708,%
710,713,714,718,719,729,730,733,737,738,%
741,744,745,746,753,760,761,762,765,770,%
772,775,780,782,783,784,789,790,792,801,%
803,804,806,809,810,814,815,818,822,823,%
824,827,829,833,836,837,838,840,841,843,%
844,847,849,853,854,855,859,864,870,871,%
873,874,876,881,882,885,887,889,890,891,%
892,893,895,900,901,903,908,910,911,913,%
915,917,919,920,922,925,927,928,931,932,%
933,934,935,936,938,942,943,945,951,956,%
959,963,964,966,971,972,974,978,989,993,%
995,997,998}{998}

test if 6 is in the list:\IsInList{6}{\mytest} \mytest

test if 7 is in the list:\IsInList{7}{\mytest} \mytest
\end{document}

enter image description here

OLD ANSWER: In order to select those are no larger than M you need only one loop over all M elements. This gives you a list of K elements, say. At this stage, the cost is M. If you want to find out if a given integer is in the big list, you also only need M steps.

In any case, these are some basic routines that do something along these lines. I strongly believe similar routines must exist somewhere but I couldn't find them.

\documentclass{article}
\newcounter{iloop}
\newif\ifmember
\newif\iflstart
\makeatletter% for \@for see e.g. https://tex.stackexchange.com/a/100684/121799
\newcommand{\MemberQ}[2]{\global\memberfalse%
\@for\next:=#1\do{\ifnum\next=#2\global\membertrue\fi}}
\newcommand{\Preselect}[3]{\edef\itest{\the\numexpr#2+1}%
\lstarttrue%
\@for\next:=#1\do{\ifnum\next<\itest%
\iflstart%
\xdef#3{\next}%
\global\lstartfalse%
\else%
\xdef#3{#3,\next}%
\fi%
\fi}}
\newcommand{\Hits}[3]{\edef#3{-1}%
\lstarttrue%
\setcounter{iloop}{-1}\loop%
\stepcounter{iloop}%
\MemberQ{{#1}}{\number\value{iloop}}%
\ifmember%
\iflstart%
\xdef#3{\number\value{iloop}}%
\global\lstartfalse%
\else%
\xdef#3{#3,\number\value{iloop}}%
\fi\fi%
\ifnum\number\value{iloop}<#2\repeat}
\makeatother
\begin{document}
\subsection*{Tests of MemberQ}
\MemberQ{1,2,3,4}{2}
\ifmember 2 is in list \fi

\MemberQ{1,2,3,4}{5}
\ifmember 2 is in list \fi

\subsection*{Select all members of list which are smaller than or equal to a certain number}
% random list generated by Mathematica
\edef\LstLong{638, 761, 899, 899, 315, 827, 954, 696, 102, 577, 
525, 279, 108, 983, 845, 530, 658, 896, 818, 342, 
515, 946, 62, 632, 495, 784, 218, 583, 624, 761, 
230, 176, 38, 801, 514, 643, 720, 991, 930, 219, 
115, 585, 527, 115, 837, 50, 955, 566, 579, 600, 
184, 987, 212, 941, 966, 63, 192, 973, 801, 322, 
571, 946, 786, 433, 586, 997, 903, 820, 672, 618, 
355, 338, 183, 384, 479, 341, 507, 849, 431, 292, 
470, 927, 93, 460, 518, 865, 257, 712, 351, 732, 
817, 839, 217, 951, 194, 222, 604, 292, 208, 220, 
197, 476, 973, 232, 250, 527, 972, 496, 751, 824, 
334, 342, 751, 484, 883, 526, 644, 424, 368, 410, 
530, 243, 600, 216, 661, 273, 412, 685, 724, 12, 
556, 587, 380, 43, 792, 827, 687, 568, 275, 608, 
893, 863, 825, 741, 831, 406, 855, 83, 279, 290, 
341, 7, 381, 256, 437, 292, 945, 474, 326, 970, 820, 
44, 539, 903, 640, 592, 285, 512, 594, 788, 677, 
197, 787, 927, 400, 239, 220, 342, 14, 902, 677, 
858, 481, 824, 925, 639, 677, 903, 287, 223, 271, 
997, 774, 602, 293, 766, 10, 416, 638, 311, 186, 
729, 613, 31, 930, 219, 357, 887, 88, 579, 985, 446, 
334, 910, 447, 321, 183, 862, 297, 641, 139, 980, 
199, 687, 374, 322, 22, 319, 991, 672, 788, 262, 
828, 389, 684, 178, 958, 492, 597, 803, 259, 386, 
800, 86, 936, 712, 494, 447, 254, 932, 78, 789, 121, 
897, 120, 819, 935, 307, 246, 96, 16, 639, 549, 85, 
867, 509, 960, 690, 301, 348, 440, 792, 117, 157, 
567, 184, 912, 244, 686, 843, 112, 927, 328, 801, 
178, 720, 385, 380, 399, 377, 287, 76, 574, 291, 
731, 430, 670, 466, 758, 104, 825, 23, 502, 821, 
979, 753, 28, 970, 855, 958, 20, 999, 184, 598, 668, 
877, 736, 174, 850, 715, 131, 289, 786, 55, 36, 785, 
129, 851, 411, 677, 493, 913, 405, 630, 695, 582, 
555, 806, 65, 775, 448, 774, 905, 925, 353, 356, 
106, 884, 178, 176, 182, 114, 258, 112, 924, 923, 
853, 959, 300, 652, 729, 141, 14, 493, 94, 281, 668, 
173, 834, 855, 839, 665, 361, 168, 808, 34, 179, 
736, 139, 396, 963, 946, 760, 458, 390, 70, 698, 
846, 979, 597, 410, 194, 888, 97, 852, 770, 572, 
623, 453, 323, 941, 876, 99, 5, 129, 868, 552, 146, 
231, 949, 268, 755, 608, 705, 504, 635, 392, 970, 
654, 785, 295, 761, 684, 146, 482, 162, 541, 818, 
622, 828, 724, 232, 568, 807, 569, 580, 864, 709, 
217, 594, 687, 167, 248, 447, 27, 339, 341, 921, 
508, 923, 962, 430, 240, 62, 688, 212, 176, 478, 
664, 871, 219, 398, 889, 577, 312, 827, 365, 33, 
677, 751, 506, 658, 848, 717, 321, 400, 180, 561, 
926, 515, 932, 839, 828, 997, 355, 42, 334, 854, 
884, 599, 93, 393, 399, 246, 825, 553, 456, 181, 
564, 64}

% selects all elements that are smaller or equal to 97
\Preselect{\LstLong}{97}{\mylist}
\mylist

\MemberQ{\mylist}{5}
5 is \ifmember\else not\fi in the list

\MemberQ{\mylist}{6}
6 is \ifmember\else not\space\fi in the list


% selects all elements that are smaller or equal to 50 and sorts them,
% but is this the output you want
\Hits{\mylist}{50}{\hitlist}
\hitlist
\end{document}

Just for completeness: a membership test that is not restricted to integers. (I am sure that there are many features such as expandibility and so on which this does not have but it does not require packages and seems to be reasonably fast. I I knew what "expandable" means precisely, I may be able to appreciate this feature more. ;-)

\documentclass{article}
\newif\ifmember
\makeatletter% for \@for see e.g. https://tex.stackexchange.com/a/100684/121799
\newcommand{\MemberQ}[2]{\global\memberfalse%
    \edef\temp{#2}%
    \@for\next:=#1\do{\ifx\next\temp\relax\global\membertrue\fi}}
\makeatother
\begin{document}

\MemberQ{a,4,7,11}{11} \ifmember in\else out \fi

\MemberQ{a,4,7,11}{3} \ifmember in\else out \fi

\MemberQ{a,4,7,11}{A} \ifmember in\else out \fi

\MemberQ{a,4,7,11}{a} \ifmember in\else out \fi

\end{document}