Wherein I occasionally rant on various topics including, but not limited to, PHP, Music, and whatever other Topics I find interesting at the moment, including my brain tumor surgery of August 2010.

Saturday, July 07, 2007

PHP Microsoft Excel Reader and Serial Killer Dates

So, my sister's ex-company of about 6 years that has shut down operations, but still has outstanding loan accounts to collect on...

They have this arcane accounting system and it generates fixed-width field .TXT files, with the wrong dates (with 2-digit year, of course).

The correct dates are in some Excel spreadsheets.

So they wanted a quick hack script to change columns 254 through 259 in some .TXT files based on a simple lookup in some Excel files.

Some developer told them it would take two weeks, and my sister didn't believe that, and they weren't real keen on spending that much $$$.

I told her it was more like a 2 hour job, which might turn into 2 days, when all is said and done.

It actually only took about 8 hours, all told, and most of that was spent in attempting to use various Excel reader PHP scripts that just plain didn't work.

The one up on PHPClasses ends up using parse_url to try to figure out where a local file is, and bombs out. It's a really nifty stream-filter OOP thingie, if it only actually worked...

And what's with all the annoying ads? Maybe less ads would get more traffic would get more revenue... [shrug]

There were several Excel Reader commercial options in the $100 to $200 range, which didn't appeal at all.

There's an Excel Writer in the PEAR Repository, but I couldn't see how to make it read Excel files.

There was some other Excel Reader script, but I forget now why I rejected it. Oh well.

Finally, I found an Excel Reader PEAR package, only it's on Sourceforge instead of in PEAR.

Go figure.

It also had a bug, where it was doing an include of 'OLERead.php' but the file was actually named 'OLERead.inc'

Not quite sure how that passed by a QA process, any QA process but it's trivial to fix.

I did submit a bug report, so hopefully it will get fixed. That is the nice thing about OpenSource.

The example to read a whole sheet in as an array was pretty much all I needed after that quick '.php' -> '.inc' hack.

I didn't even try the reading as a stream thing, since there are only a few thousand accounts.

So I had a nice Excel Reader to do the account number to date lookup.

Yippee.

Then came the joy of Microsoft Excel internal date format...

It's basically a "count" of days from January 1 1900 for the integer part, and a count of seconds for the fractional/decimal part.

Of course, Microsoft aped Lotus 123 and knowingly left in the bug of 1900 being a leap year (it's not)!

So from 0 to 60, the date is "off" by one, and at 60, the bogus date of February 29, 1900 is output. Everything is great from 61 up to a zillion or so where you get to December 31, 9999.

Now, granted, none of these loans date back to the first couple months of 1900, but it's still pretty irksome...

And you'd want a conversion function to be correct and re-usable, rather than something that only works for a limited input set. (Y2K anybody?)

I found a conversion function in C, and ported it to PHP:
http://l-i-e.com/excel_date.phps

I'm not claiming it's the best code ever, and I don't even know what it does, really, as I just changed the variables to have $ in front, and swapped int() function into floor() function.

Slapped in an sprintf instead of returning the individual month/day/year as pass-by-access args, since it seemed easier.

That pretty much sums up the past couple evenings for me.

I occasionally run into a Microsoft devotee who wonders why I hate Microsoft so much.

Really, if this rant doesn't make it clear why I hate Microsoft, I simply cannot hold a rational conversation with you...

It would have been a 2-hour job if MS wasn't so stupid, but it was an 8-hour job because Microsoft is, well, stupid.

17 comments:

susenjit said...

Hi,

I need a help. I have installed the PHP-ExcelReader and corrected the file name problem. Now while trying to run the example its giving error as -

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 512 bytes) in /home/susen/public_html/xls-read/Excel/oleread.inc on line 172

Can you suggest me solution in this regards. Thanks in advance.

regards,
susenjit

Richard Lynch said...

When you run PHP, it has php.ini settings which allow it to limit the amount of RAM it will use, to make sure run-away memory leaks don't kill your machine.

So you'll need to alter php.ini to crank up that memory limit to be big enough to hold your spreadsheet.

If you are running from the command line, you can also use the -d flag to over-ride php.ini settings.
php -h
will tell you more about that.

d'Doc said...

hi, i know of your pain. you were lucky 8 hours was all you spent---me, well... luckily i found the explanation...here in your blog, first! thanks man. whew, 'intentional bug' my ass---i could shoot those crazy microsoft nerds!

Wojciech said...

If you run ExcelReader on 64 bit architecture you will experience memory exhausting no matter the file and no matter the phsyhical memory. That is causing by overloading an integer in a program that counts that integer has 32bits!

On 64 bits Operating Systems PHP Integer has 64bits!

So i made simple solution in my version:
in excelreader file I've changed function:

_GetInt4d

from:
function _GetInt4d($data, $pos) {
return ord($data[$pos]) | (ord($data[$pos+1]) << 8) | (ord($data[$pos+2]) << 16) | (ord($data[$pos+3]) << 24);
}

to:
function _GetInt4d($data, $pos){
$result=ord($data[$pos]) | (ord($data[$pos+1]) << 8) | (ord($data[$pos+2]) << 16) | (ord($data[$pos+3]) << 24);
if ($result>268435455) $result-=4294967296;
return $result;
}

same in file oleread with GetInt4d.
I know that's poor hack but that was fast patch to allow script to work on 64 bits OS.

max said...

Oh my goodness! Thanx for the blessed getInt4D hack for 64bit systems!!!

I've almost broken my forehead digging into memory exhaust problem.

this could also help in detecting 64-bit system with PHP.
http://www.ush.it/2007/08/22/architecture-detection-by-php-anomaly/


p.s. And M$ === Must Die of couz.

Guinevir Freng said...

you saved me from wasting my entire week, mate

10x a lot and have a great life

Ihsanul said...

thanks for your solution. I didn't realize if my clients using 64bit AMD processor. But with a little patch you give, viola... everything work fine... thanks.. thanks... a lot !

ahs10 said...

hello, thanks for the fix, but i can't get it to work. where are you implementing this excel_date function?

Daniel said...

A slightly more robust fix for GetInt4d looks like this (I don't know why your function has a preceding '_', mine's perhaps a different version):

function GetInt4d($data, $pos) {
//32/64bit architecture code:
$_or_24 = ord($data[$pos+3]);

if ($_or_24>=128)
$_ord_24 = -abs((256-$_or_24) << 24);
else
$_ord_24 = ($_or_24&127) << 24;

return ord($data[$pos]) | (ord($data[$pos+1]) << 8) | (ord($data[$pos+2]) << 16) | $_ord_24;
}

ahs10: I implemented it as a global function but anywhere that's accessible from where you want access it should be fine - it's not an existing function you have to patch but a new function you'll need to call when handling your spreadsheet. I guess you'll need to offer some more information about your project for more specific help.

Javier said...

about the excel Reader:
I've got the files for the reader but the xlt file does not open with excel prg. it says that is read only file. Can you send or post the files you guys have been using to make the excel reader work?

Thanks,

Jav

Rubem said...

Thanks a lot for to share with us your solution! It worked fine to solve the "exhausted memory" problem in a FreeBSD 6.2 / SMP amd64 system.

Thanks!!

dangerastafari said...

Thanks for the GetInt4d fix, daniel's code is working for us on both 32 and 64 bit machines. Would have taken us a while to find this bug I think had it not been for wojciech.

LS said...

I never post comments, but now ... How can I ever thank you? Was looking for over hours! Thanks

Erickson said...

Thanks Wojciech !! You're the man!!!

Alexis said...

Yesterday I worked with excel sheet and saved all info.But today all data were lost.And I used-Excel repairs,this tool I saw in one forum.It is free as far as I know,besides that tool can recovered contents and export of recovered data into a new document in Microsoft Excel format.

Eiger Yap said...

Thanks so much ^^

Webtemp said...

Hi there ...

Go figure , that 3 years later . This post actualy saved MY 8 hours.
For me things worked just fine for 15 mins ... and that's all thanks to YOUR post!

Thanks a million !

--Ex