What characters should I use or not use in usernames on Linux?
Solution 1:
More specifically, the POSIX ("Portable Operating System Interface for Unix") standard (IEEE Standard 1003.1 2008) states:
3.437 User Name
A string that is used to identify a user; see also User Database. To be portable across systems conforming to POSIX.1-2017, the value is composed of characters from the portable filename character set. The
<hyphen-minus>
character should not be used as the first character of a portable user name.
3.282 Portable Filename Character Set
The set of characters from which portable filenames are constructed.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 . _ -
Any username that complies with this standard is POSIX-compliant, and ought to be safe.
Solution 2:
My advice to you is to follow the standard recommended by the default NAME_REGEX. You can actually put nearly anything in a user name under *NIX but you may encounter odd problems with library code that makes assumptions. Case in point:
http://blog.endpoint.com/2008/08/on-valid-unix-usernames-and-ones-sanity.html
My question to you: do you have a lot of domain names that would collide with each other if you stripped out the unusual punctuation? For example, do you have both "QUALITY-ASSURANCE" and QUALITYASSURANCE" as domain names? If not, you could simply adopt a policy of stripping out the unusual characters and using what's left as the user name.
Also, you could use the "real name" section of the GECOS field in the /etc/passwd information to store the original, unmodified domain name, and scripts could extract it pretty easily.