Diagnosing slow Excel spreadsheets?
I've found the answer to my problem!
Using the clues given to me by allquixotic, I opened the xlsx files in 7-Zip, and compared the file sizes. There was one file that was much larger than the others. The file xl\drawings\drawing1.xml was a file that contained multiple references to formats, over and over again.
I spent a few hours trying to figure out a pattern, but wasn't able to. Nothing I was able to do could get it working! Then, after a fit of aggravation, I just deleted the damn file, and tried to reopen in Excel (2010- didn't test it in Excel 2007).
It complained that the file was damaged, and asked if I'd like to attempt a repair. Repairing the file simply erased the shape, but didn't change the formatting of the file at all. I had to resave the file as the same file, which was a bit odd, but it worked!
As I'd mentioned before, I hadn't tested this solution for any products other than Excel 2010, so I don't know if the shape file was critical to Excel 2k7 or any OpenOffice products. But, if you have a similar problem, hopefully this could be helpful.
Are you using the binary .xls format or the new XML-based .xlsx format? In general the .xlsx format results in a dramatic reduction in filesize.
Check for things like an excessive number of styles stored in the document.
Try to "remove personal information" (a feature of Excel/Word/etc.) to clean up certain kinds of cruft that might be sitting in the file.
If the spreadsheet is, or has previously been shared, it may have old share data stored in it.
One simple fix is to copy and paste just the relevant data from that spreadsheet to a new one, then save it in .xlsx format, and see how small it is. If it's very small then you have your answer -- Excel is doing poor accounting of its internal file format's data structures.
Also check for complicated or circular formulas and references to external sheets (especially those on network drives). If you think a formula might be slow, you can step through it using the formula auditing tool.
Last thing: if you save it as an .xlsx and it's still big, try downloading the OpenXML SDK Productivity tool: http://www.microsoft.com/en-us/download/details.aspx?id=5124
Open up the .xlsx in that and just take a look at all the elements within the file and see if anything is obviously extraneous. This may require knowledge of XML and of the specific OpenXML schemas, but it's a surefire way to find out what is causing the bloat.
P.S. -- if this kind of thing annoys you, stop using Microsoft formats / programs or suggest to your customer to do so. If you google around for "Microsoft Access database bloat" you'll see that Microsoft has a long history of letting their proprietary formats leak tons of useless data to disk that never gets cleaned up. It's like a really nasty memory leak that eats your disk instead of your RAM.
Another possible solution would be:
- Create a copy of the problematic Excel file
- Open that copy
- Press CTRL+A and then click on "Clear" -> "Clear Formats"
- Repeat step 3 on every worksheet
- Save the file and try to reopen it
When there is too much formatting, especially if there are many different styles applied to many individual cells, Excel really struggles trying to apply formatting when opening files.
Naturally, when you remove all formatting you'll end up with just text with default style applied to it in all of the cells with borders, shading, etc gone. But this way you can pinpoint the cause of the problem.