python selenium, find out when a download has completed?
There is no built-in to selenium way to wait for the download to be completed.
The general idea here would be to wait until a file would appear in your "Downloads" directory.
This might either be achieved by looping over and over again checking for file existence:
- Check and wait until a file exists to read it
Or, by using things like watchdog
to monitor a directory:
- How to watch a directory for changes?
- Monitoring contents of files/directories?
import os
import time
def latest_download_file():
path = r'Downloads folder file path'
os.chdir(path)
files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
newest = files[-1]
return newest
fileends = "crdownload"
while "crdownload" == fileends:
time.sleep(1)
newest_file = latest_download_file()
if "crdownload" in newest_file:
fileends = "crdownload"
else:
fileends = "none"
This is a combination of a few solutions. I didn't like that I had to scan the entire downloads folder for a file ending in "crdownload". This code implements a function that pulls the newest file in downloads folder. Then it simply checks if that file is still being downloaded. Used it for a Selenium tool I am building worked very well.
I came across this problem recently. I was downloading multiple files at once and had to build in a way to timeout if the downloads failed.
The code checks the filenames in some download directory every second and exits once they are complete or if it takes longer than 20 seconds to finish. The returned download time was used to check if the downloads were successful or if it timed out.
import time
import os
def download_wait(path_to_downloads):
seconds = 0
dl_wait = True
while dl_wait and seconds < 20:
time.sleep(1)
dl_wait = False
for fname in os.listdir(path_to_downloads):
if fname.endswith('.crdownload'):
dl_wait = True
seconds += 1
return seconds
I believe that this only works with chrome files as they end with the .crdownload extension. There may be a similar way to check in other browsers.
Edit: I recently changed the way that I use this function for times that .crdownload
does not appear as the extension. Essentially this just waits for the correct number of files as well.
def download_wait(directory, timeout, nfiles=None):
"""
Wait for downloads to finish with a specified timeout.
Args
----
directory : str
The path to the folder where the files will be downloaded.
timeout : int
How many seconds to wait until timing out.
nfiles : int, defaults to None
If provided, also wait for the expected number of files.
"""
seconds = 0
dl_wait = True
while dl_wait and seconds < timeout:
time.sleep(1)
dl_wait = False
files = os.listdir(directory)
if nfiles and len(files) != nfiles:
dl_wait = True
for fname in files:
if fname.endswith('.crdownload'):
dl_wait = True
seconds += 1
return seconds