How to understand a database that is already developed?

I'm actually hoping you don't find much for answers here as I've made my career based on coming in on these large undocumented data models and trying to figure them out. But for what it's worth:

  • I don't like the automated data modeller / electronic modeller, though this might be personal opinion. My preference is to find a white board (or paper) and draw out your data model by hand. If you are a kinaesthetic learner (learn by hands on participation), I've found this to be the best way of familiarizing yourself with the new database...as nice as an automated system is to read the database, you won't learn what you will when you draw it by hand.

  • There is a limited number of data modelling techniques, however they can be combined in a lot of ways. My guess with a larger database like you have here, you will have multiple programmers creating it, which means you'll likely see multiple techniques used in the same database. In the past I have found a system that had it's circuit information stored as a single table that self joined onto itself repeatedly to store the information for a data circuit while the customer information section was a very straight forward star design...2 very separate programming styles, likely two separate developers. I later ventured into the phone circuit section of the app, which I recognized immediately as the same style (likely same programmer) as the data circuit section was. Usually, developers will be assigned to a logical division that correlates to a section of your business...watch for patterns in similar sections.

  • The physical database structure is only one section to understand...on the Left (prior to the database) is how the data is generated and loaded into your database (data warehouse?). Understanding what your data is and how it is created is the first step in knowing what you are looking for in the database after it's loaded.

  • Opposite side of above, after the data is in the database...understanding how the data is consumed (used by your users) will help you understand what they have been getting out of it and what they need from it. Extra points if you can get your hands on scripting used to generate existing reports as the from statement will help you see how existing tables are used.

  • Never forget to interview your users...especially if you can locate one that was around for the initial deploy of the system. If it's in-house designed, odds are it was these people that provided some of the initial requirements for the system and talking to them will give you an idea of what the people who designed the system first heard when they went requirement gathering. The logical division of your company (customer care vs operations vs billing vs etc...) is usually the same division your data model will follow.

  • And lastly...Play! If a dev or QA environment is available, start writing queries and see what comes back...alter your statement and try again.

I think the biggest folly you will want to avoid is focussing solely on how the tables are arranged. Understand the data thats in it, how it is generated and how it is consumed. Understand your company, how it's arranged and how it functions. The manner in which it's stored (the data modelling) is secondary to this understanding.


I recently went through the same process... The thing that I think helped me the most was creating my own database diagram. I used the free tool wwwsqldesigner. It seemed pretty easy, the only issue I had was that it wouldn't work in Chrome for some reason, but firefox worked fine. I just put in the tables that I use a lot, and I find that I frequently refer back to it when I need to do something new.

If you can use the automatically generated database diagrams, that would be a better way to go, but I wasn't personally able to get them to work (I'm using sql server).

Good luck!


If I jump into a massive SQL Server database with hundreds of tables and millions of records here are two queries I use to help make sense of it all, find the "main" tables, and then narrow down to specific tables and columns.

--Query to show list of tables ordered by number of records in each table
    SELECT
        t.NAME AS TableName,
        SUM(p.rows) AS [RowCount]
    FROM 
        sys.tables t
    INNER JOIN      
        sys.indexes i ON t.OBJECT_ID = i.object_id
    INNER JOIN 
        sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
    INNER JOIN 
        sys.columns c on t.object_id = c.object_id
    WHERE   
        i.index_id <= 1
    GROUP BY 
        t.NAME, i.object_id, i.index_id, i.name 
    ORDER BY 
        SUM(p.rows) DESC



--Query to show any columns or table names like what I'm looking for
SELECT
    c.name, t.name
FROM sys.columns c
    INNER JOIN sys.tables t ON c.object_id = t.object_id
WHERE c.name LIKE '%#ColumnName%' OR t.name LIKE '%#TableName%'

Tags:

Sql

Foxpro