Values larger than 1/3 of a buffer page cannot be indexed
You have an UNIQUE index on (content, ref_id)
, called editor_contentmodel_content_2192f49c_uniq
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)
I'm not sure why this there to begin with. So let's step back and address what this does. This makes sure that content
, and ref_id
are unique. However, in PostgreSQL the UNIQUE
constraint is implemented with a btree which makes this a poor solution. Using this method, you're creating a btree with content which essentially duplicates the size of this small table, and makes for a gigantic index. A gigantic index that is still limited by content size though -- as you've found. It raises a few questions
Do you care that content is unique? If you do care that content is unique for ref_id, then what you probably want is to store the hash of the content. Something like..
CREATE TABLE foo ( ref_id int, content text ); CREATE UNIQUE INDEX ON foo (ref_id,md5(content));
This will instead store the md5sum of content on the btree. So long as ref_id has content with a unique md5 over that ref_id, you're good.
If you don't care that
content
is unique consider removing it entirely.
It may be worth nothing that when you implement a UNIQUE
constraint with a btree (as PostgreSQL does), you get an added index for free. Under a normal circumstance this has a fringe benefit.
CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);
Will speed up the query
SELECT *
FROM foo
WHERE ref_id = 5
AND content = 'This content'
However, when you chance it to use the functional md5()
variant there is no longer an index on content, so now to use that index you'll have to
- Only query on ref_id,
- Add to ref_id a clause that
md5(content) = md5('This content')
The whole text = text
is over rated. That's almost never what you want. If you're looking to speed up query-time over text the btree is pretty useless. You likely want to look into
- pgtrgm
- text_pattern_ops
- Full text search (FTS)
UPDATE 1
Base on your JSON I would suggest storing it as a jsonb
, and then creating the index on md5(content)
; so perhaps rather than the above instead run this.
ALTER TABLE public.editor_contentmodel
ALTER COLUMN content
SET DATA TYPE jsonb
USING content::jsonb;
CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));
UPDATE 2
You ask which indexes you should remove
gollahalli_me_django=> create unique index on editor_contentmodel (ref_id, md5(content::text));
CREATE INDEX
gollahalli_me_django=> \d editor_contentmodel;
Table "public.editor_contentmodel"
Column | Type | Modifiers
-----------+--------------------------+-----------
ref_id | character varying(120) | not null
content | jsonb | not null
timestamp | timestamp with time zone | not null
Indexes:
"editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id) <---- 1
"editor_contentmodel_ref_id_md5_idx" UNIQUE, btree (ref_id, md5(content::text))
"editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops) <----2
Here is the surprising answer: you should remove all of them except: editor_contentmodel_pkey
which says that all ref_id
's need to be unique.
editor_contentmodel_content_2192f49c_uniq
this index makes sure that you'reUNIQUE
onref_id
ANDcontent
, but if you can't have a duplicateref_id
you can never have a duplicate content for thatref_id
. So you can never violate this index without also violatingeditor_contentmodel_pkey
. That makes it pointless.editor_contentmodel_ref_id_md5_idx
this index is also pointless for the same reason. You can never have a duplicatemd5(content::text)
overref_id
because regardless of what the value ofmd5(content::text)
is you can never have a duplicateref_id
.editor_contentmodel_ref_id_8f74b4f3_like
is also a bad idea because you're duplicating the index overref_id
. This isn't useless, it's just not-optimal. Instead, if you need avarchar_pattern_ops
use it instead over just thecontent
field.
As a last note, we don't much use varchar
in PostgreSQL because it's implemented as a varlena with a check constraint. There is no gain to it, and there is nothing lost when you simply use text
. So unless there is a concrete reason why ref_id
can ever be 120 characters but it can be 119 characters, then I would simply use the text
type.
UPDATE 3
Let's go back to your earlier problem..
psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"
This is telling you the problem is specifically with index "editor_contentmodel_content_2192f49c_uniq"
. You've defined that as
"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)
So the problem here is that you're trying to create an index over content
. But, again, the index itself stores the actual json content of content
, and that's what exceeds the limit. This isn't actually an issue, because even if that limit wasn't in place editor_contentmodel_content_2192f49c_uniq
would be totally useless. Why? again you can't add more uniqueness to a row that is already guaranteed to be 100% unique. You don't seem to be getting this. Let's keep it simple.
ref_id | content
1 | 1
1 | 1
1 | 2
2 | 1
In the above a lone unique index/constraint (with no other indexes) over (ref_id, content)
makes sense because it would stop the duplication of (1,1)
. An index over (ref_id, md5(content))
would also make sense because it would stop the duplication of (1,1)
by proxy of stopping the duplication of (1, md5(1))
. However all of this works because in the example I've given ref_id
is NOT guaranteed to be UNIQUE
. Your ref_id
is not this ref_id
. Your ref_id
is a PRIMARY KEY
. That means it is guaranteed to be UNIQUE.
That means the duplicate (1,1)
and the row of (1,2)
could NEVER be inserted. That also means that indexes over anything in addition to ref_id can not guarantee more uniqueness. They would have to be less strict than the index you currently have. So your table could only look like this
ref_id | content
1 | 1
2 | 1
"editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id) "editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)
Since ref_id is the primary key, you can't have duplicate values of it. That means the unique constraint on the combination (content, ref_id) is useless, as anything that violates that it would also violate the primary key constraint. Just get rid of it.