What is causing sprof to complain about "inconsistency detected by ld.so"?
I got this error with PyTorch DataLoader when using multiple workers. Python does multiprocessing by launching many processes and one of the process had this error while reading a file in read-only mode (for CIFAR10 dataset). Simply re-running the script solved the issue so I believe this some sort of sporadic rare OS error. With PyTorch if you set num_workers=0
that may also help resolve the error.
Below is the full error in case anyone is interested:
Inconsistency detected by ld.so dl-open.c 272 dl_open_worker Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
Traceback (most recent call last):
File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/queue.py", line 173, in get
self.not_empty.wait(remaining)
File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/threading.py", line 299, in wait
gotit = waiter.acquire(True, timeout)
File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError DataLoader worker (pid 272) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
I got a bit curious since this is still broken in OpenSuse 12.x. I would have thought a bug originally reported in '09 or so would have been fixed by now. I guess nobody really uses sprof. (or maybe dl-open is so fragile that people are scared to touch it :-)
The issue boils down to the __RTLD_SPROF flag used as argument to dlopen. Take any simple program that calls dlopen, or that flag to the second arg and you get the same failed assertion. I used the sample program at the bottom of http://linux.die.net/man/3/dlopen as an example
handle = dlopen(argv[1], RTLD_LAZY | __RTLD_SPROF);
From what I can tell from a quick look at dl-open.c, this flags short circuits some of what dl_open does. So the r_flag specified in the assertion doesn't get set to RT_CONSISTENT.