Database Design: track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space)

What you're describing is an Entity-Attribute-Value database, which is often used for exactly th situation you describe, sparse data tied to a single entity.

An E-A-V table is easy to search. The problem isn't finding rows, it's finding related rows.

Having different tables for different entities provides domain modeling, but they also provide a weak form of metadata. In E-A-V there are no such abstractions. (The Java analogy to E-A-V would be declaring that all functions' formal arguments were of type Object -- so you'd get no type-checking.)

We can easily look up the property keys, but nothing groups these property keys.

Wikipedia has a very good article on E-A-V, but read it now -- it's mostly the work of one author, and is slated for "improvement".

I recommend that you consider the approach known as vertical partitioning. This means that you keep defining tables with a UserID key, you could call them User1, User2, etc. Start a new table when you hit the maximum row size for your database. The benefit of this approach is that the values are still true database attributes. This will wind up saving time when working with this data, e.g. data binding.

The key question to answer is: are these really attributes? Do they represent the struture of information that you must collect about the user. If so, the best way to model them is to make them columns. The only reason you must resort to vertical partitioning is the row size limit of the database.

If, on the other hand, a flexible attribute system is called for, then by all means go with the property key/property value system. For example, if users were allowed to define their own attributes dynamically, then you'd definitely want the key/value system. However, I would say key/value is not the best way if you understand the structure of your data and have legitimately identified hundreds of attributes for users.

As a side note, I must say that you should question entities with large numbers of attributes. They may be valid, but it's also quite likely that you're missing some entities at the conceptual level. In other words, mabe all of these attributes don't related to the user per se, but to some other entity that is related to users.

The UserProperties table approach is how I would model it. As you suggested, a clustered index on userID will mean range searches on userID will be fast (i.e. for all properties relating to a single user). Might also add a non-clustered index on UserID and PropertyKey for single key-2-value selects per user.

Database Design: track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space)

Tags:

Database Design

Related

Recent Posts