Warning message during building an R package: invalid uid value replaced by that for user 'nobody'

I get this all the time. tl;dr you can ignore it, it has to with the problem that tar files can't portably allow user IDs greater than 32767, but user IDs on some systems are greater than this.

Searching the code on Winston Chang's Github mirror of the R source tree finds this code in src/library/utils/R/tar.R:

 uid <- info$uid
 ## uids are supposed to be less than 'nobody' (32767)
 ## but it seems there are broken ones around: PR#15436
 if(!is.null(uid) && !is.na(uid)) {
      if(uid < 0L || uid > 32767L) {invalid_uid <- TRUE; uid <- 32767L}
      header[109:115] <- charToRaw(sprintf("%07o", uid))
 }

(here's a link to the referenced bug report and other discussion on the devtools issue list).

Looking at /etc/passwd on my system shows that I have userid 56347.

bolker:x:56347:1001:Ben Bolker,,,,:/home/bolker:/bin/bash

Wikipedia says

POSIX requires the UID to be an integer type. Most Unix-like operating systems represent the UID as an unsigned integer. The size of UID values varies amongst different systems; some UNIX OS's[which?] used 15-bit values, allowing values up to 32767, while others such as Linux supported 16-bit UIDs, making 65536 unique IDs possible. The majority of modern Unix-like systems have switched to 32-bit UIDs, allowing 4,294,967,296 (232) unique IDs.

and

For compatibility between 16-bit and 32-bit UIDs, many Linux distributions now set it to be 2^16−2 = 65,534; the Linux kernel defaults to returning this value when a 32-bit UID does not fit into the return value of the 16-bit system calls.[11]

Brian Ripley says

A tarball can only store uids up to 'nobody' (usually 32767), and certainly larger ones cannot be unpacked portably. The warnings did not occur before, but the tarball produced could cause problems when unpacking with other tools.

I can't find much more documentation on this (except that Wikipedia says there are Unix systems with 15-bit uids out there). The GNU tar page appears to give uid as an length-8 character type ...


To further clarify Ben Bolker's spot-on explanation and to provide a workaround, this only happens for users who have uid or gid greater than 32767, e.g.

$ id --user
60839   # <= causes the warning

$ id --group
900     # <= OK

This warning happens when utils::tar() is used and the internal, built-in tar function is used, which is the default. It's in that internal function the warning is produced. The most common reason for these warnings is when we build an R package, e.g. R CMD build pkgname.

Not easy to find, but from help("build", package="utils") one can read that the internal tar function can be overridden by environment variable R_BUILD_TAR. So, for users on Linux and macOS, set:

R_BUILD_TAR=tar

in your ~/.Renviron. This will cause R CMD build to use tar on your PATH and most likely that tar tool will accept UIDs and GIDs larger than 32767. This is, for instance, true for the tar tool that comes with CentOS 7:

$ tar --version | head -1
tar (GNU tar) 1.26

PS. help("build", package="utils") also mentions R_INSTALL_TAR. That seems to come in play when using R CMD check (sic!), so I'm not sure it's wise to set it. For instance, you might not get warnings on too-long filenames in tarballs when you check your package locally but will fail when you submit to CRAN.

Tags:

R