Designing a SQL schema for a combination of many-to-many relationship (variations of products)
Applying normalization to your problem the solution is as given. Run and see it on Fiddle
Fiddle
CREATE TABLE products
(
product_id int auto_increment primary key,
name varchar(20),
description varchar(30)
);
INSERT INTO products
(name, description)
VALUES
('Rug', 'A cool rug' ),
('Cup', 'A coffee cup');
create table variants (variant_id int auto_increment primary key,
variant varchar(50)
);
insert into variants (variant)
values ('color'),('material'),('size') ;
create table variant_value(value_id int auto_increment primary key,
variant_id int ,
value varchar(50)
);
insert into variant_value (variant_id,value)
values (1 ,'red'),(1 ,'blue'),(1 ,'green'),
(2 ,'wool'),(2 ,'polyester'),
(3 ,'small'),(3 ,'medium'),(3 ,'large');
create table product_Variants( product_Variants_id int auto_increment primary key,
product_id int,
productVariantName varchar(50),
sku varchar(50),
price float
);
create table product_details(product_detail_id int auto_increment primary key,
product_Variants_id int,
value_id int
);
insert into product_Variants(product_id,productVariantName,sku,price)
values (1,'red-wool' ,'a121',50);
insert into product_details(product_Variants_id , value_id)
values( 1,1),(1,4);
insert into product_Variants(product_id,productVariantName,sku,price)
values (1,'red-polyester' ,'a122',50);
insert into product_details(product_Variants_id , value_id)
values( 2,1),(2,5);
Part of your issues stem from a confusion between product and SKU.
When you sell, "XYZ pullover, size M, blue model", the latter corresponds to an SKU. It is marketed as an XYZ pullover (the product), which has a set of attributes (size and colors), each with their own set of potential values. And not all possible combinations of the latter might yield valid deliverables: you won't find absurdly thin and long jeans. SKUs, products, attributes, attribute values.
And when a user wants a $10 blue pullover, he's actually looking for an SKU within a product category.
I hope the above clears up your confusion and where your problem and question stem from.
In terms of schema, you want something like this:
products
- #product_id
- name
- description
Optionally, also add:
- price
- in_stock
This is a marketing related table. Nothing else. If anything outside of marketing uses a product in your application, you'll end up in a world of pain down the road.
The price, if present, is a master price used to populate the field when it's null in SKUs. This makes price entry more user-friendly.
in_stock is a hopefully self-explanationary flag, ideally maintained by a trigger. It should be true if any SKU related to that product is in stock.
product_attributes
- product_id
- #attribute_id
- name
product_attribute_values
- attribute_id
- #value_id
- value
This just holds things like Color, Size, etc., along with their values like blue, red, S, M, L.
Note the product_id field: create a new set of attributes and values per product. Sizes change depending on the product. Sometimes it's S, M, L, etc.; other times, it'll be 38, 40, 42, and what not. Sometimes, Size is enough; other times, you need Width and Length. Blue might be a valid color for this product; another might offer Navy, Royal Blue, Teal and what not. Do NOT assume that there is any relationship between one product's attributes and those of another; the similarities, when they exist, are entirely cosmetic and coincidental.
SKUs
- product_id
- #sku_id
- price
Optionally, add:
- name
- barcode
- stock
This corresponds to the deliverables that get shipped.
It's actually the most important table underneath. This, rather than the product_id, is almost certainly what should get referenced in customer orders. It's also what should get referenced to for stock-keeping and so forth. (The only exception I've ever seen to the latter two points is when you sell something really generic. But even then, the better way to deal with this in my experience is to toss in an n-m relationship between interchangeable SKUs.)
The name field, if you add it, is primarily for convenience. If left null, use app-side code to make it correspond to the generic product's name, expanded if necessary with the relevant attribute names and values. Filling it allows to rephrase the latter generic name ("Levis' 501, W: 32, L: 32, Color: Dark Blue") with something more natural ("Levis' 501, 32x32, Dark Blue").
In case it matters, stock is better maintained using a trigger in the long run, with a double-entry bookkeeping schema in the background. This allows to distinguish between in stock and available for shipment today (which is the figure that you actually want here) vs in stock but already sold, among the multitudes of real-world scenarios that you'll encounter. Oh, and... it's occasionally a numeric, rather than an integer, if you ever need to sell anything measured in kilos or liters. If so, be sure to add an extra is_int flag, to avoid customers sending you orders for .1 laptops.
product_variants
- product_id
- #sku_id
- #attribute_id
- value_id
This links the deliverable's id with the corresponding attributes and values, for the sake of generating default names.
The primary key is on (sku_id, attribute_id).
You might find the product_id field an aberrance. It is, unless you add foreign keys referencing:
- SKUs (product_id, sku_id)
- product_attributes (product_id, attribute_id)
- product_attribute_values (attribute_id, value_id)
(Don't forget the extra unique indexes on the corresponding tuples if you decide to add these foreign keys.)
Three additional remarks in conclusion.
Firstly, I'd like to stress once again that, in terms of flow, not all combinations of attributes and values yield a valid deliverable. Width might be 28-42 and length might be 28-42, but you probably won't see a seriously skinny 28x42 jeans. You're best off NOT automatically populating every possible variation of every product by default: add UI to enable/disable them as needed, make it checked by default, alongside name, barcode and price fields. (Name and price will usually be left blank; but one day, you'll need to organize a sale on blue pullovers only, on grounds that the color is discontinued, while you continue to sell the other options.)
Secondly, keep in mind, if you ever need to additionally manage product options, that many actually are product attributes in disguise, and that those that aren't yield new SKUs that must also be taken into account when it comes to stock-keeping. A bigger HD option for a laptop, for instance, is really a variant of the same product (Normal vs Large HD size) that is masquerading as an option due to (very valid) UI considerations. In contrast, wrapping the laptop as a christmas gift is a genuine option that has references a completely separate SKU in bookkeeping terms (e.g. .8m of gift wrap) -- and, should you ever need to come up with average marginal costs, a fraction of staff time.
Lastly, you'll need to come up with an ordering method for your attributes, their values, and the subsequent variants. For this, the easiest is to toss in an extra position field in the attributes and values tables.
I would use 4 tables:
generic_product: product_id, name, description
e.g. 1, 'rug', 'a coffee rug' / 2, 'mug', 'a coffee mug'
generic_product_property: product_id, property_id, property_name
e.g. 1, 10, 'color' / 1, 11, 'material'
sellable_product: sku, product_id, price
e.g. 'A121', 1, 50.00 / 'A122', 1, 45.00
sellable_product_property: sku, property_id, property_value
e.g. 'A121', 10, 'red' / 'A121', 11, 'wool' / 'A122', 10, 'green' / 'A122', 11, 'wool'
This will allow your user to define any property for your sellable products he wants.
Your application will have to ensure with its business logic that sellable_products are described completely (check that for every applicable generic product property the sellable product property is defined).