Is it possible to import dates and times directly as AbsoluteTime and by pass DateLists?
The data in the file test.xls
are
03/Jan/2000 45.46 04/Jan/2000 43.92 05/Jan/2000 44.38 06/Jan/2000 42.9 07/Jan/2000 43.46 10/Jan/2000 43.78 11/Jan/2000 42.65 12/Jan/2000 41.26 13/Jan/2000 42.04 14/Jan/2000 43.78
An alternative approach is to exploit the fact that Office documents are zipped collections of XML
files. So,
Step 1: rename the source file by adding .zip
to the file name: test.xlsx.zip
.
Step 2: Import the appropriate xml
file in the zip
file, extract the data elements and re-format:
Cases[Import["C:\\ your directory \\test.xlsx.zip", {"ZIP", "xl\\worksheets\\sheet1.xml"}],
XMLElement["v", {}, {value_}] :> value, Infinity]
// Partition[#, 2] &
This gives:
{{"36528", "45.46"}, {"36529", "43.92"}, {"36530", "44.38"}, {"36531", "42.9"}, {"36532", "43.46"}, {"36535", "43.78"}, {"36536", "42.65"}, {"36537", "41.26"}, {"36538", "42.04"}, {"36539", "43.78"}}
where the first entry in each sublist is Excel's DATEVALUE
(serial date number that counts the number of days from 1/1/1900).
Puzzle: I would expect that converting Excel's DATEVALUE
to Mathematica's AbsoluteTime
(number of seconds from 1/1/1900) would be as simple as multiplying the former by 24*60*60
. But doing that with:
excelDateValues = {"36528", "36529", "36530", "36531", "36532", "36535", "36536", "36537", "36538", "36539"}
and
DateList /@ (24*60*60*ToExpression@excelDateValues)
gives
{{2000, 1, 5, 0, 0, 0.}, {2000, 1, 6, 0, 0, 0.}, {2000, 1, 7, 0, 0, 0.}, {2000, 1, 8, 0, 0, 0.}, {2000, 1, 9, 0, 0, 0.}, {2000, 1, 12, 0, 0, 0.}, {2000, 1, 13, 0, 0, 0.}, {2000, 1, 14, 0, 0, 0.}, {2000, 1, 15, 0, 0, 0.}, {2000, 1, 16, 0, 0, 0.}}
which is off by two days. Hopefully, there is a less naive approach to the get the right conversion factor to go from excel Datevalues
to Mma AbsoluteTime
so that a modified version of Cases[]
above gives the desired result.
Puzzle resolved: Thanks to Mr.Wizard's reference, the historical background to the two-day discrepancy is explained beautifully in Joel Spolsky's great story . So, unless your data does contain dates going back early 1900's for most
cases just subtracting 2 from final output dates should be ok. But ... things can get more complicated considering possible excel date system settings and varying defaults accross OSs. (see XL 1900 and 1904 date systems)
EDIT: Import
uses the filename extension if no format is provided as the second argument. For zip files it returns the filenames in the zipped archive. For the example case
Import["C:\\ your directory \\test.xlsx.zip"]
returns
{"[Content_Types].xml", "_rels\\.rels","xl\\_rels\\workbook.xml.rels", "xl\\workbook.xml", "xl\\styles.xml", "xl\\worksheets\\sheet1.xml", "xl\\theme\\theme1.xml", "customXml\\item1.xml", "customXml\\_rels\\item1.xml.rels", "customXml\\_rels\\item2.xml.rels", "docProps\\app.xml", "customXml\\itemProps2.xml", "customXml\\item2.xml", "customXml\\itemProps1.xml", "docProps\\core.xml"}
Unfortunately, according to the documentation, Import
automatically converts things that look like dates in an Excel spreadsheet to DateList
format.
I tried messing about with alternative ways of importing such as realrawdata = ("Data" /. Import["Data for graphs.xlsx", "Rules"])[[1]]
, but no luck.
It might be possible to develop your own import converter, but this seems like an awful lot of work.
I have not yet figured out how to change or configure the J/Link modules that do the actual translation, but there is a wasteful (from your perspective) application of ToDate
which we can bypass, thereby returning dates in AbsoluteTime format and saving a bit of time during the Import. (specifying the format is a bit faster than letting Mathematica detect it.)
Block[{ToDate = Identity},
Import["test.xls", "XLS"]
]
The low level conversion function appears to be inside:
\SystemFiles\Converters\Java\Convert.jar\System\Convert\ExcellDump.class
I don't know how to decompile a .class file much less how to write one, but it appears to be quite simple.