What are object-relational databases, and why is this model necessary in spatial databases?
The relational model and the object-oriented paradigm
The state-of-the-art relational model, originated in 1970 by Dr. E. F. Codd, is applied science concerning the field of database administration. Its two solid pilars are first-order logic and set theory.
The object-oriented paradigm, as devised by Dr. Alan Kay, is an approach useful to build application programs. It has proven to be quite effective in the construction of graphic user interfaces that facilitate the interaction between end-users and information systems.
As noted, each of the two frameworks mentioned above serves a very specific purpose. When utilized to generate the appropriate kind of component, both can be of great help in a software development project.
Database administration, application programs and the term “object-relational”
Before the conception and development of the relational model, (a) the design, creation and administration of application programs was heavily mixed with (b) the design, creation and administration of the data (all this in an ad hoc fashion). There was no science available to handle the data in a general and sound manner. Dr. E. F. Codd was actually a programmer at IBM, and faced first-hand the difficulties caused by said circumstances. With his brilliance, practical experience and mathematical background he was capable of envisioning and developing an elegant and robust foundation to manage the data which, amid other substantial factors, permits handling said resource in an application program-independent manner.
I am not sure but, in case an “‘object-relational’ database management system” might exist, it would be an artefact onto which “‘object-relational’ databases” could be implemented. An “‘object-relational’ database” would be a device that was created by means of an entanglement of object-oriented constructs along with relational instruments.
It is hard to tell if a database management system of the nature described above can exist since, by introducing object-oriented constructs, many of the relational capabilities are put at risk (see the “Codd’s Twelve Rules” for relevant information), therefore it could hardly be considered relational even when it would be partly so.
In fact, (i) appending object-oriented tools to a relational database management system and (ii) using them in the relational databases implemented on top of it is entirely needless.
Nevertheless, there are “current” opinions that advocate, via object-oriented patterns, a mixture of design, creation and administration regarding (a) application programs and (b) databases. This can be interpreted as an invitation to make a regression to a pre-scientific (i.e., pre-relational) era as far as the databases part is concerned (because, naturally, there is not an object-oriented model related to data administration).
For its part, the relational model provides general and strong mechanisms to design (the logical structure), constrain (the values) and manipulate (with abstractions) the data. One of their benefits is that they are very simple (but never simplistic). In order to get the most out of the aforementioned mechanisms, a database administrator/operator must work, necessarily, with relations (usually declared as tables in a given SQL platform), and as you know, a relation or table is not part of the object-oriented paradigm (which neither provides data integrity constraints nor general data manipulation operations, and it is not supposed to do so).
The power of a relational language lies in its expressive capabilities, not in its computational aspects. Roughly speaking, when following relational methods one declares the structure of the things of interest (the way they are, their structure), while with an object-oriented language (e.g., Smalltalk) one should deal mainly with the behaviour of the things of significance (how they do what they do, the processes they carry out), which is paramount with respect to programming an application.
One of the multiple advantages of a relational database is that, as it has to be designed independently of the languages used or paradigms followed at the application programming stage, it can work along with multiple languages and/or application programming paradigms and/or multiple application programs at the same time.
That being said, a relational database must to be built by a designer who follows relational principles, therefore the construction of (1) a particular database onto (2) a certain relational database management system does not automatically grant that particular database the relational label.
The ArcGIS geodatabase
With the intention to help in the understanding of the arcgis.com links you include in your question, I consider that when the term geodatabase is used, it is actually a contextual designation for a complete geographic information system that may include
- one or more proper databases built on top of one or more database management systems, and
- one or more application programs that work along with said database(s).
In that respect, let us take a look at the architecture of the proper database(s) of a geodatabase, where it is stated that:
The geodatabase storage model is based on a series of simple yet essential relational database concepts and leverages the strengths of the underlying database management system (DBMS). Simple tables and well-defined attribute types are used to store the schema, rule, base, and spatial attribute data for each geographic dataset. This approach provides a formal model for storing and working with your data. Through this approach, structured query language (SQL)—a series of relational functions and operators—can be used to create, modify, and query tables and their data elements.
So that somewhat suggests that the data is being handled independently of the application programs that access it, and it would be very valuable of course.
Then, this page contains the following title (in the form of an assertion) and paragraph:
The geodatabase is object relational
The geodatabase employs a multitier application architecture by implementing advanced logic and behavior in the application tier on top of the data storage tier (managed within various database management systems [DBMS], files, or extensible markup language [XML]). The geodatabase application logic includes support for a series of generic geographic information system (GIS) data objects and behaviors such as feature classes, raster datasets, topologies, networks, and much more.
Which seems to indicate that the objects behaviour (the way in which the application program objects act) is handled where it should be handled, i.e., at the application program level (or “tier”, as described there).
Further, back in the geodatabase architecture page, an identical title and a paragraph that is very similar to the one brought up above are introduced as follows:
The geodatabase is object relational
The geodatabase is implemented using the same multitier application architecture found in other advanced DBMS applications; there is nothing exotic or unusual about in its implementation. The multitier architecture of the geodatabase is sometimes referred to as an object-relational model. The geodatabase objects persist as rows in DBMS tables that have identity, and the behavior is supplied through the geodatabase application logic. The separation of the application logic from the storage is what allows support for several different DBMSs and data formats.
An excerpt that catches my attention in a particular way is “the geodatabase objects persist as rows in DBMS tables that have identity”, which is misleading because the tables (i.e., relations) of a relational database retain rows (i.e., tuples) that represent assertions that carry a specific meaning provided by a certain business domain predicate (that can be used to define an entity type), therefore it does not “persist” objects. Moreover, since an essential feature of an object is its behaviour (basically, its methods), it would be very interesting to know how it is “persisted” in a relational database.
On the other hand, the significant fragment “the separation of the application logic from the storage is what allows support for several different DBMSs and data formats” appears to stress the enormous relevance of handling the data separately from the application program.
Conclusion
My conclusion is that an ArcGIS geodatabase is neither (a) an object-relational database nor (b) an object-relational database management system. It may well be, as previously mentioned, a complete geographic information system that consists of (i) one or more actual databases (some of which could be more or less relational) and (ii) one or more application programs (some of which could be more or less object-oriented) accessing said databases.
Perhaps that is why it is contextually called “object relational”.
The "object"-reference in object-relational refers to object-oriented programming. This is a programming style where objects like a triangle or a square or some other geographical entity can be told to move itself (coordinates translated) or to rotate a certain degree or to scale some percentage, regardless, which class it is (triangle, square, whatever). As programmer, you don't know which class a particular object is and the programming language takes care of carrying out the correct calculations when you tell this object to move (change 3, 4 or more coordinates).
An object-relational database now combines features of object-oriented programming and relational databases and takes care of converting between objects with methods (move, rotate and scale) and tables, which usually lack these methods.
The relational model is indeed very simple. Columns in a row in a relation are all related to a key, thereby defining a table/relation. Everything else follows from those principles.
Typically the goal of object databases is to abstract some of the modeling that would be done in a traditional database to determine what object attributes are related to a key and how the design should be created, and how things will be accessed and how things should be indexed. And to handle higher-level concepts like inheritance or polymorphism or object variation, all of which is effectively hand-designed in a relational model.
Similarly with network databases, moving some concepts of these higher-level constructs to first-order features within the database.
I think it's a little unfair to relational databases which are very much standardized and researched and documented that all these other "databases" get to use that name (like other databases that existed before relational databases and normalization was researched and defined), but so be it - how many times have you been told someone's database is in Excel?
There's nothing wrong with these approaches, but they do tend to not be good general approaches, platforms are not that standardized and performance is still typically what it is. A well indexed and normalized data model is still going to perform as well as its design. And a system with a lot of specialization like an object or network database may or may not outperform it depending on whether the design is really using the features to their best advantage.
It's still bits on a disk and references to other bits on a disk and if you don't do that smart, it will still be slow...