Write thread-safe to file in python
We used the logging module:
import logging
logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)
def application(env, start_response):
logger.info("%s %s".format("hello","world!"))
start_response('200 OK', [('Content-Type', 'text/html')])
return ["Hello!"]
I've made a simple writer, that uses threading
and Queue
and works fine with multiple threads. Pros: teoreticaly it can aссept data from multiple processes without blocking them, and write asynconiosly in other thread. Cons: additional thread for writing consumes resourses; in CPython threading
doesn't give real multithreading.
from queue import Queue, Empty
from threading import Thread
class SafeWriter:
def __init__(self, *args):
self.filewriter = open(*args)
self.queue = Queue()
self.finished = False
Thread(name = "SafeWriter", target=self.internal_writer).start()
def write(self, data):
self.queue.put(data)
def internal_writer(self):
while not self.finished:
try:
data = self.queue.get(True, 1)
except Empty:
continue
self.filewriter.write(data)
self.queue.task_done()
def close(self):
self.queue.join()
self.finished = True
self.filewriter.close()
#use it like ordinary open like this:
w = SafeWriter("filename", "w")
w.write("can be used among multiple threads")
w.close() #it is really important to close or the program would not end
Look at the Queue
class, it is thread safe.
from Queue import Queue
writeQueue = Queue()
in thread
writeQueue.put(repr(some_object))
Then to dump it to a file,
outFile = open(path,'w')
while writeQueue.qsize():
outFile.write(writeQueue.get())
outFile.flush()
outFile.close()
Queue
will accept any python object, so if you're trying to do something other than print to a file, just store the objects from the worker threads via Queue.put
.
If you need to split the commits across multiple invocations of the script, you'll need a way to cache partially built commits to disk. To avoid multiple copies trying to write to the file at the same time, use the lockfile
module, available via pip. I usually use json to encode data for these purposes, it supports serializing strings, unicode, lists, numbers, and dicts, and is safer than pickle.
with lockfile.LockFile('/path/to/file.sql'):
fin=open('/path/to/file')
data=json.loads(fin.read())
data.append(newdata)
fin.close()
fout=open('/path/to/file','w')
fout.write(json.dumps(data))
fout.close()
Note that depending on OS features, the time taken to lock and unlock the file as well as rewrite it for every request may be more than you expect. If possible, try to just append to the file, as that will be faster. Also, you may want to use a client/server model, where each 'request' launches a worker script which connects to a server process and forwards the data on via a network socket. This sidesteps the need for lockfiles; depending on how much data you're talking, it may be able to hold it all in memory in the server process, or the server may need to serialize it to disk and pass it to the database that way.
WSGI server example:
from Queue import Queue
q=Queue()
def flushQueue():
with open(path,'w') as f:
while q.qsize():
f.write(q.get())
def application(env, start_response):
q.put("Hello World!")
if q.qsize() > 999:
flushQueue()
start_response('200 OK', [('Content-Type', 'text/html')])
return ["Hello!"]