mysql select query within a serialized array

If you have control of the data model, stuffing serialized data in the database will bite you in the long run just about always. However, oftentimes one does not have control over the data model, for example when working with certain open source content management systems. Drupal sticks a lot of serialized data in dumpster columns in lieu of a proper model. For example, ubercart has a 'data' column for all of its orders. Contributed modules need to attach data to the main order entity, so out of convenience they tack it onto the serialized blob. As a third party to this, I still need a way to get at some of the data stuffed in there to answer some questions.

a:4:{s:7:"cc_data";s:112:"6"CrIPY2IsMS1?blpMkwRj[XwCosb]gl<Dw_L(,Tq[xE)~(!$C"9Wn]bKYlAnS{[Kv[&Cq$xN-Jkr1qq<z](td]ve+{Xi!G0x:.O-"=yy*2KP0@z";s:7:"cc_txns";a:1:{s:10:"references";a:1:{i:0;a:2:{s:4:"card";s:4:"3092";s:7:"created";i:1296325512;}}}s:13:"recurring_fee";b:1;s:12:"old_order_id";s:2:"25";}

see that 'old_order_id'? thats the key I need to find out where this recurring order came from, but since not everybody uses the recurring orders module, there isnt a proper place to store it in the database, so the module developer opted to stuff it in that dumpster table.

My solution is to use a few targeted SUBSTRING_INDEX's to chisel off insignificant data until I've sculpted the resultant string into the data gemstone of my desires. Then I tack on a HAVING clause to find all that match, like so:

SELECT uo.*,
SUBSTRING_INDEX(
 SUBSTRING_INDEX(
  SUBSTRING_INDEX( uo.data, 'old_order_id' , -1 ),
 '";}', 1),
'"',-1) 
AS `old order id`
FROM `uc_orders AS `uo`
HAVING `old order id` = 25

The innermost SUBSTRING_INDEX gives me everything past the old_order_id, and the outer two clean up the remainder.

This complicated hackery is not something you want in code that runs more than once, more of a tool to get the data out of a table without having to resort to writing a php script.

Note that this could be simplified to merely

SELECT uo.*,
SUBSTRING_INDEX(
  SUBSTRING_INDEX( uo.data, '";}' , 1 ),
'"',-1) 
AS `old order id`
FROM `uc_orders` AS `uo`
HAVING `old order id` = 25

but that would only work in this specific case (the value I want is at the end of the data blob)


As GWW says in the comments, if you need to query things this way, you really ought to be considering storing this data as something other than a big-ole-string (which is what your serialized array is).

If that's not possible (or you're just lazy), you can use the fact that the serialized array is just a big-ole-string, and figure out a LIKE clause to find matching records. The way PHP serializes data is pretty easy to figure out (hint: those numbers indicate lengths of things).

Now, if your serialized array is fairly complex, this will break down fast. But if it's a flat array, you should be able to do it.

Of course, you'll be using LIKE '%...%', so you'll get no help from any indicies, and performance will be very poor.

Which is why folks are suggesting you store that data in some normalized fashion, if you need to query "inside" it.

Tags:

Mysql

Php

Arrays