Linux, sockets, non-blocking connect
You should use the following steps for an async connect:
- create socket with
socket(..., SOCK_NONBLOCK, ...)
- start connection with
connect(fd, ...)
- if return value is neither
0
norEINPROGRESS
, then abort with error - wait until
fd
is signalled as ready for output - check status of socket with
getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
- done
No loops - unless you want to handle EINTR
.
If the client is started first, you should see the error ECONNREFUSED
in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket
operation.
There are a few ways to test if a nonblocking connect succeeds.
- call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
- call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
- call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1
D. J. Bernstein gathered together various methods how to check if an asynchronous connect()
call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername()
to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN
, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1)
, which will fail as well but the error you get is the error you would have gotten from connect()
if it wasn't non-blocking.