How to copy migrate data to new tables with identity column, while preserving FK relationship?

Here's a way that scales easily to three related tables.

Use MERGE to insert the data into the copy tables so that you can OUTPUT the old and new IDENTITY values into a control table and use them for related tables mapping.

The actual answer is just two create table statements and three merges. The rest is sample data setup and tear down.

USE tempdb;

--## Create test tables ##--

CREATE TABLE Customers(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [Name] NVARCHAR(200) NOT NULL
);

CREATE TABLE Orders(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [CustomerId] INT NOT NULL,
    [OrderDate] DATE NOT NULL,
    CONSTRAINT [FK_Customers_Orders] FOREIGN KEY ([CustomerId]) REFERENCES [Customers]([Id])
);

CREATE TABLE OrderItems(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [OrderId] INT NOT NULL,
    [ItemId] INT NOT NULL,
    CONSTRAINT [FK_Orders_OrderItems] FOREIGN KEY ([OrderId]) REFERENCES [Orders]([Id])
);

CREATE TABLE Customers2(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [Name] NVARCHAR(200) NOT NULL
);

CREATE TABLE Orders2(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [CustomerId] INT NOT NULL,
    [OrderDate] DATE NOT NULL,
    CONSTRAINT [FK_Customers2_Orders2] FOREIGN KEY ([CustomerId]) REFERENCES [Customers2]([Id])
);

CREATE TABLE OrderItems2(
    [Id] INT NOT NULL PRIMARY KEY IdENTITY,
    [OrderId] INT NOT NULL,
    [ItemId] INT NOT NULL,
    CONSTRAINT [FK_Orders2_OrderItems2] FOREIGN KEY ([OrderId]) REFERENCES [Orders2]([Id])
);

--== Populate some dummy data ==--

INSERT Customers(Name)
VALUES('Aaberg'),('Aalst'),('Aara'),('Aaren'),('Aarika'),('Aaron'),('Aaronson'),('Ab'),('Aba'),('Abad');

INSERT Orders(CustomerId, OrderDate)
SELECT Id, Id+GETDATE()
FROM Customers;

INSERT OrderItems(OrderId, ItemId)
SELECT Id, Id*1000
FROM Orders;

INSERT Customers2(Name)
VALUES('Zysk'),('Zwiebel'),('Zwick'),('Zweig'),('Zwart'),('Zuzana'),('Zusman'),('Zurn'),('Zurkow'),('ZurheIde');

INSERT Orders2(CustomerId, OrderDate)
SELECT Id, Id+GETDATE()+20
FROM Customers2;

INSERT OrderItems2(OrderId, ItemId)
SELECT Id, Id*1000+10000
FROM Orders2;

SELECT * FROM Customers JOIN Orders ON Orders.CustomerId = Customers.Id JOIN OrderItems ON OrderItems.OrderId = Orders.Id;

SELECT * FROM Customers2 JOIN Orders2 ON Orders2.CustomerId = Customers2.Id JOIN OrderItems2 ON OrderItems2.OrderId = Orders2.Id;

--== ** START ACTUAL ANSWER ** ==--

--== Create Linkage tables ==--

CREATE TABLE CustomerLinkage(old INT NOT NULL PRIMARY KEY, new INT NOT NULL);
CREATE TABLE OrderLinkage(old INT NOT NULL PRIMARY KEY, new INT NOT NULL);

--== Copy Header (Customers) rows and record the new key ==--

MERGE Customers2
USING Customers
ON 1=0 -- we just want an insert, so this forces every row as unmatched
WHEN NOT MATCHED THEN
INSERT (Name) VALUES(Customers.Name)
OUTPUT Customers.Id, INSERTED.Id INTO CustomerLinkage;

--== Copy Detail (Orders) rows using the new key from CustomerLinkage and record the new Order key ==--

MERGE Orders2
USING (SELECT Orders.Id, CustomerLinkage.new, Orders.OrderDate
FROM Orders 
JOIN CustomerLinkage
ON CustomerLinkage.old = Orders.CustomerId) AS Orders
ON 1=0 -- we just want an insert, so this forces every row as unmatched
WHEN NOT MATCHED THEN
INSERT (CustomerId, OrderDate) VALUES(Orders.new, Orders.OrderDate)
OUTPUT Orders.Id, INSERTED.Id INTO OrderLinkage;

--== Copy Detail (OrderItems) rows using the new key from OrderLinkage ==--

MERGE OrderItems2
USING (SELECT OrderItems.Id, OrderLinkage.new, OrderItems.ItemId
FROM OrderItems 
JOIN OrderLinkage
ON OrderLinkage.old = OrderItems.OrderId) AS OrderItems
ON 1=0 -- we just want an insert, so this forces every row as unmatched
WHEN NOT MATCHED THEN
INSERT (OrderId, ItemId) VALUES(OrderItems.new, OrderItems.ItemId);

--== ** END ACTUAL ANSWER ** ==--

--== Display the results ==--

SELECT * FROM Customers2 JOIN Orders2 ON Orders2.CustomerId = Customers2.Id JOIN OrderItems2 ON OrderItems2.OrderId = Orders2.Id;

--== Drop test tables ==--

DROP TABLE OrderItems;
DROP TABLE OrderItems2;
DROP TABLE Orders;
DROP TABLE Orders2;
DROP TABLE Customers;
DROP TABLE Customers2;
DROP TABLE CustomerLinkage;
DROP TABLE OrderLinkage;

When I've done this in the past, I did it something like this:

Backup both databases.
Copy the rows you want to move from the first DB to the second into a new table, without an IDENTITY column.
Copy all child rows of those rows into new tables without foreign keys to the parent table.

Note: We'll refer to the above set of tables as "temporary"; however, I highly recommend you store them in their own database, and back that up as well when you're done.

Determine how many ID values you need from the second database for rows from the first database.
Use DBCC CHECKIDENT to shift the next IDENTITY value for the target table to 1 beyond what you need for the move. This will leave an open block of X IDENTITY values that you can assign to the rows being brought over from the first database.
Set up a mapping table, identifying the old IDENTITY value for the rows form the first DB, and the new value that they'll use in the second DB.
Example: You're moving 473 rows that will need a new IDENTITY value from the first database to the second. Per DBCC CHECKIDENT, the next identity value for that table in the second database is 1128 right now. Use DBCC CHECKIDENT to reseed the value to 1601. You will then populate your mapping table with the current values for the IDENTITY column from your parent table as old values, and use the ROW_NUMBER() function to assign the numbers 1128 through 1600 as the new values.
Using the mapping table, update the values in what's usually the IDENTITY column in the temporary parent table.
Using the mapping table, update the values that are usually foreign keys to the parent table, in all the copies of the child tables.
Using SET IDENTITY_INSERT <parent> ON, insert the updated parent rows from the temporary parent table into the second DB.
Insert the updated child rows form the temporary child tables into the second DB.

NOTE: If some of the child tables have IDENTITY values of their own, this gets quite complicated. My actual scripts (partially developed by a vendor, so I can't really share them) deal with dozens of tables and primary key columns, including some that weren't auto-increment numeric values. However, these are the basic steps.

I retained the mapping tables, post migration, which had the benefit of allowing us to find a "new" record based on an old ID.

It's not for the faint of heart, and must, must, must be tested (ideally multiple times) in a test environment.

UPDATE: I should also say that, even with this, I didn't worry overmuch about "wasting" ID values. I actually set up my ID blocks in the second database to be 2-3 values larger that what I needed, to try to ensure I wouldn't accidentally collide with existing values.

I certainly understand not wanting to skip hundreds of thousands of potential valid IDs during this process, especially if the process will be repeated (mine was ultimately run a grand total of around 20 times over the course of 30 months). That said, in general, one cannot rely on auto-increment ID values to be sequential without gaps. When a row is created and rolled back, the auto-increment value for that row goes away; the next row added will have the next value, and the one from the rolled back row will be skipped.

How to copy migrate data to new tables with identity column, while preserving FK relationship?

Tags:

Sql Server

Identity

Foreign Key

Related

Recent Posts