Combine two unrelated tables/models with same primary key in Django
General idea
You can use qs.union:
- create 2 models without any relations between them. Don't forget to use
class Meta: managed = False
- select from the first model, annotate with subquery and union with second:
from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import Coalesce
# OperationalDevice fields: ip, mac
# AllowedDevice fields: ip, type, owner
USE_EMPTY_STR_AS_DEFAULT = True
null_char_field = models.CharField(null=True)
if USE_EMPTY_STR_AS_DEFAULT:
default_value = ''
else:
default_value = None
# By default Expressions treat strings as "field_name" so if you want to use
# empty string as a second argument for Coalesce, then you should wrap it in
# `Value()`.
# `None` can be used there without wrapping in `Value()`, but in
# `.annotate(type=NoneValue)` it still should be wrapped, so it's easier to
# just "always wrap".
default_value = Value(default_value, output_field=null_char_field)
operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef('ip'))
qs1 = (
AllowedDevice.objects
.all()
.annotate(
mac=Coalesce(
Subquery(operational_devices_subquery.values('mac')[:1]),
default_value,
output_field=null_char_field,
),
)
)
qs2 = (
OperationalDevice.objects
.exclude(
ip__in=qs1.values('ip'),
)
.annotate(
type=default_value,
owner=default_value,
)
)
final_qs = qs1.union(qs2)
Generic approach for multiple fields
A more complex but "universal" approach may use Model._meta.get_fields()
. It will be easier to use for cases where "second" model have more that 1 extra field (not only ip,mac
). Example code (not tested, but gives general impression):
# One more import:
from django.db.models.fields import NOT_PROVIDED
common_field_name = 'ip'
# OperationalDevice fields: ip, mac, some_more_fields ...
# AllowedDevice fields: ip, type, owner
operational_device_fields = OperationalDevice._meta.get_fields()
operational_device_fields_names = {_f.name for _f in operational_device_fields} # or set((_f.name for ...))
allowed_device_fields = AllowedDevice._meta.get_fields()
allowed_device_fields_names = {_f.name for _f in allowed_device_fields} # or set((_f.name for ...))
operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef(common_field_name))
left_joined_qs = ( # "Kind-of". Assuming AllowedDevice to be "left" and OperationalDevice to be "right"
AllowedDevice.objects
.all()
.annotate(
**{
_f.name: Coalesce(
Subquery(operational_devices_subquery.values(_f.name)[1]),
Value(_f.get_default()), # Use defaults from model definition
output_field=_f,
)
for _f in operational_device_fields
if _f.name not in allowed_device_fields_names
# NOTE: if fields other than `ip` "overlap", then you might consider
# changing logic here. Current implementation keeps fields from the
# AllowedDevice
}
# Unpacked dict is partially equivalent to this:
# mac=Coalesce(
# Subquery(operational_devices_subquery.values('mac')[:1]),
# default_for_mac_eg_fallback_text_value,
# output_field=null_char_field,
# ),
# other_field = Coalesce(...),
# ...
)
)
lonely_right_rows_qs = (
OperationalDevice.objects
.exclude(
ip__in=AllowedDevice.objects.all().values(common_field_name),
)
.annotate(
**{
_f.name: Value(_f.get_default(), output_field=_f), # Use defaults from model definition
for _f in allowed_device_fields
if _f.name not in operational_device_fields_names
# NOTE: See previous NOTE
}
)
)
final_qs = left_joined_qs.union(lonely_right_rows_qs)
Using OneToOneField for "better" SQL
Theoretically you can use device_info = models.OneToOneField(OperationalDevice, db_column='ip', primary_key=True, related_name='status_info')
: in AllowedDevice
. In this case your first QS may be defined without use of Subquery
:
from django.db.models import F
# Now 'ip' is not in field names ('device_info' is there), so add it:
allowed_device_fields_names.add(common_field_name)
# NOTE: I think this approach will result in a more compact SQL query without
# multiple `(SELECT "some_field" FROM device_info_table ... ) as "some-field"`.
# This also might result in better query performance.
honest_join_qs = (
AllowedDevice.objects
.all()
.annotate(
**{
_f.name: F(f'device_info__{_f.name}')
for _f in operational_device_fields
if _f.name not in allowed_device_fields_names
}
)
)
final_qs = honest_join_qs.union(lonely_right_rows_qs)
# or:
# final_qs = honest_join_qs.union(
# OperationalDevice.objects.filter(status_info__isnull=True).annotate(**missing_fields_annotation)
# )
# I'm not sure which approach is better performance-wise...
# Commented one will use something like:
# `SELECT ... FROM "device_info_table" LEFT OUTER JOIN "status_info_table" ON ("device_info_table"."ip" = "status_info_table"."ip") WHERE "status_info_table"."ip" IS NULL
#
# So it might be a little better than first with `union(QS.exclude(ip__in=honest_join_qs.values('ip'))`.
# Because later uses SQL like this:
# `SELECT ... FROM "device_info_table" WHERE NOT ip IN (SELECT ip FROM "status_info_table")`
#
# But it's better to measure timings of both approaches to be sure.
# @GrannyAching, can you compare them and tell in the comments which one is better ?
P.S. To automate models definition you can use manage.py inspectdb
P.P.S. Maybe multi-table inheritance with custom OneToOneField(..., parent_link=True)
may be more helpful for you than using union
.
Since ip
is primary key in both an the first table is getting updated frequently, I suggest updating the second table and converting the ip
in the second table to have ip
of the first table as a OneToOneField
.
This is how your models should look like:
class ModelA(models.Model):
ip = models.GenericIPAddressField(unique=True)
mac = models.CharField(max_length=17, null=True, blank=True)
class ModelB(models.Model):
ip = models.OneToOneField(ModelA)
type = models.CharField()
owner = models.CharField()
docs
You can also have the one to one relation using a separate column:
class ModelB(models.Model):
ip = models.GenericIPAddressField(unique=True)
type = models.CharField()
owner = models.CharField()
modelA = models.OneToOneField(ModelA)
So now you can have the ip address as the primary key, and you can still refer to the table ModelA
using the field modelA
.
Once you have a value from one of both tables just do a query into the other one, looking for id. Since these two tables are separated you must do an extra query. You don't need to create an explicit relation, since you are looking into its "id/ip". So once you have a first value, named 'first_object', just look for its relative into the other table.
other_columns = ModelB.objects.get(id=first_object.id)
Then if you want just 'add' the desired columns to the other model and sent a single object to whatever you want:
first_object.attr1 = other_columns.attr1
...