Format of cookies when using wget?
The format is Netscape format as stated in the man page and this format is:
The layout of Netscape's cookies.txt file is such that each line contains one name-value pair. An example cookies.txt file may have an entry that looks like this:
.netscape.com TRUE / FALSE 946684799 NETSCAPE_ID 100103
Each line represents a single piece of stored information. A tab is inserted between each of the fields.
From left-to-right, here is what each field represents:
domain - The domain that created AND that can read the variable.
flag - A TRUE/FALSE value indicating if all machines within a given domain can access the variable. This value is set automatically by the browser, depending on the value you set for domain.
path - The path within the domain that the variable is valid for.
secure - A TRUE/FALSE value indicating if a secure connection with the domain is needed to access the variable.
expiration - The UNIX time that the variable will expire on. UNIX time is defined as the number of seconds since Jan 1, 1970 00:00:00 GMT.
name - The name of the variable.
value - The value of the variable.
(From "The Unofficial Cookie FAQ", edited for clarity)
One way of getting cookies for wget is to use the --keep-session-cookies options of wget.
For example :
wget --keep-session-cookies --save-cookies cookies.txt "http://MYSITE/?__login=USER&__password=PASS"
The ?__login etc
depends on the web site you're trying to mirror, you might have to look at how the authentication form works.
Then you can use :
wget --mirror --load-cookies cookies.txt http://MYSITE/
The Netscape cookies file format for each data line is as above, but you won't be able to read it in with HTTP::Cookies::Netscape
unless it has a header line like this, which the complete file format requires:
# Netscape HTTP Cookie File
or this:
# HTTP Cookie File