Python Socket Receive Large Amount of Data
The accepted answer is fine but it will be really slow with big files -string is an immutable class this means more objects are created every time you use the +
sign, using list
as a stack structure will be more efficient.
This should work better
while True:
chunk = s.recv(10000)
if not chunk:
break
fragments.append(chunk)
print "".join(fragments)
You can use it as: data = recvall(sock)
def recvall(sock):
BUFF_SIZE = 4096 # 4 KiB
data = b''
while True:
part = sock.recv(BUFF_SIZE)
data += part
if len(part) < BUFF_SIZE:
# either 0 or end of data
break
return data
TCP/IP is a stream-based protocol, not a message-based protocol. There's no guarantee that every send()
call by one peer results in a single recv()
call by the other peer receiving the exact data sent—it might receive the data piece-meal, split across multiple recv()
calls, due to packet fragmentation.
You need to define your own message-based protocol on top of TCP in order to differentiate message boundaries. Then, to read a message, you continue to call recv()
until you've read an entire message or an error occurs.
One simple way of sending a message is to prefix each message with its length. Then to read a message, you first read the length, then you read that many bytes. Here's how you might do that:
def send_msg(sock, msg):
# Prefix each message with a 4-byte length (network byte order)
msg = struct.pack('>I', len(msg)) + msg
sock.sendall(msg)
def recv_msg(sock):
# Read message length and unpack it into an integer
raw_msglen = recvall(sock, 4)
if not raw_msglen:
return None
msglen = struct.unpack('>I', raw_msglen)[0]
# Read the message data
return recvall(sock, msglen)
def recvall(sock, n):
# Helper function to recv n bytes or return None if EOF is hit
data = bytearray()
while len(data) < n:
packet = sock.recv(n - len(data))
if not packet:
return None
data.extend(packet)
return data
Then you can use the send_msg
and recv_msg
functions to send and receive whole messages, and they won't have any problems with packets being split or coalesced on the network level.