How to create a table of random characters in XeTeX

I use a simple approach to generate the list. It may cost too much memory and can be improved using better algorithm.

\documentclass{article}
\usepackage[scale=0.8,centering]{geometry}
\usepackage[nofonts]{ctex}
\setCJKmainfont{SimSun}
\parindent=0pt

\usepackage{pgfcore,pgffor}
\pgfmathsetseed{2} % set seed if you wish
\begin{document}
\foreach \i in {1,...,1000}{%
  \loop
    \pgfmathrandominteger\randind{"4E00}{"9FA5}%
  \ifcsname used\randind\endcsname\repeat
  \expandafter\xdef\csname used\randind\endcsname{}%
  \symbol{\randind}%
  \ifnum\numexpr\i/25*25\relax=\i (\i)\par \fi
}
\end{document}

enter image description here


I tested myself, I know about 40% characters in the sheet. 8000 out of 20000 is more than expected, but there're 2103 simplified Hanzi with their unsimplified variants, say I know about 6000.

Well, due to some linguistics research, the most common 3800 characters can cover 99.9% general text, and 6600 cover 99.999%. Thus, 95% characters from Hongloumeng do not mean quite many. The most common 3500 Hanzi's from 《现代汉语常用字表》 cover 99.48% text, I think your friend can read all of them to make a proper test.


\documentclass[a4paper]{article}
\usepackage{geometry}
\geometry{margin=2cm,heightrounded}
\usepackage{fontspec}
\setmainfont{STFangsong}

\input{random}
\newcount\cjkcharcnt
%\random=3

\newif\ifshownumbers

\def\cjkchar{\setrannum{\cjkcharcnt}{"4E00}{"9FBB}%
  \ifcsname CJK\the\cjkcharcnt\endcsname
    \message{Recomputing (collision)}\let\next\cjkchar
  \else
    \expandafter\let\csname CJK\the\cjkcharcnt\endcsname\empty
    \ifshownumbers{\footnotesize(\number\cjkcharcnt) }\fi
    \ifnum\XeTeXcharglyph\cjkcharcnt=00
      \message{Recomputing (missing character)}\let\next\cjkchar
    \else
      \char\cjkcharcnt\let\next\relax
    \fi
  \fi
  \next}

\newcommand{\row}{\hbox to\hsize{%
  \cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil
  \ifshownumbers\else
    \cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil
    \cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil
  \fi
  \cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar\hfil\cjkchar}}

\begin{document}

%\shownumberstrue % uncomment to show (decimal) numbers

\row\row\row\row\row\row\row\row\row\row
\row\row\row\row\row\row\row\row\row\row
\row\row\row\row\row\row\row\row\row\row
\row\row\row\row\row\row\row\row\row\row
\row\row\row\row\row\row\row\row\row\row
\ifshownumbers
  \row\row\row\row\row\row\row\row\row\row
  \row\row\row\row\row\row\row\row\row\row
  \row\row\row\row\row\row\row\row\row\row
  \row\row\row\row\row\row\row\row\row\row
  \row\row\row\row\row\row\row\row\row\row
\fi

\end{document}

Ten or twenty characters per row, fifty or a hundred rows; the (decimal) character number is attached if \shownumberstrue is uncommented. If by chance the same number is generated twice, it's discarded and recomputed. I get collisions very rarely (not more than one). The macros check also if the character exists in the font.

The response of time (without \shownumberstrue) is

real        0m2.740s
user        0m1.660s
sys         0m0.384s

Tags:

Cjk

Tables

Xetex