Expanding a dataset based on a weight field

An efficient way to perform this task uses an auxiliary table of numbers. This is simply a table with integers from 1 to n, where 'n' is perhaps a million or so. A numbers table comes in handy for all sorts of regular tasks.

CREATE TABLE dbo.Demo
(
    SampleNumber    integer IDENTITY NOT NULL,
    SampleWeight    integer NOT NULL,
    Attribute1      integer NOT NULL,
    Attribute2      integer NOT NULL,
    Attribute3      integer NOT NULL,

    CONSTRAINT [PK dbo.Demo SampleNumber]
        PRIMARY KEY (SampleNumber),

    CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
        CHECK (SampleWeight BETWEEN 1 AND 50)
);

INSERT INTO dbo.Demo
    (
    SampleWeight,
    Attribute1, 
    Attribute2, 
    Attribute3
    )
VALUES
    (2, 23, 32, 65),
    (1, 32, 56, 75),
    (3, 54, 25, 87);

SELECT
    D.SampleNumber,
    D.Attribute1,
    D.Attribute2,
    D.Attribute3
FROM dbo.Demo AS D
JOIN dbo.Numbers AS N
    ON N.n BETWEEN 1 AND D.SampleWeight;

DROP TABLE dbo.Demo;

Execution plan:

Plan

Output:

╔══════════════╦════════════╦════════════╦════════════╗
║ SampleNumber ║ Attribute1 ║ Attribute2 ║ Attribute3 ║
╠══════════════╬════════════╬════════════╬════════════╣
║            1 ║         23 ║         32 ║         65 ║
║            1 ║         23 ║         32 ║         65 ║
║            2 ║         32 ║         56 ║         75 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
╚══════════════╩════════════╩════════════╩════════════╝

SQLfiddle here.

You can use CTE to do it.

here is a sql fiddle for it : http://sqlfiddle.com/#!3/0b172/8

create table t(sampn int, weight   int, attrib1 int, attrib2 int, attrib3 int);

insert into t values    (1 ,      2 ,      23 ,     32   ,   65)
 insert into t values   (2  ,     1  ,     32  ,    56    ,  75)
  insert into t values  (3   ,    3   ,    54   ,   25     , 87)

-- solution using cte.

;WITH cte (
    sampn
    ,[weight]
    ,attrib1
    ,attrib2
    ,attrib3
    ,RepeatedTime
    )
AS (
    SELECT sampn
        ,[weight]
        ,attrib1
        ,attrib2
        ,attrib3
        ,1
    FROM t
    WHERE [weight] <> 0 -- this will take care if the weight is ZERO

    UNION ALL

    SELECT sampn
        ,[weight]
        ,attrib1
        ,attrib2
        ,attrib3
        ,RepeatedTime + 1
    FROM cte a
    WHERE a.[weight] > a.RepeatedTime
    )
SELECT sampn
    ,attrib1
    ,attrib2
    ,attrib3
FROM cte  
ORDER BY sampn
OPTION (MAXRECURSION 100) -- change this if you have more recursions

I agree with Paul that the numbers table is probably the best solution that you are going to come up with in terms of manageability and performance. With that said, you could approach this problem from a different angle, using XML.

This solution is fairly easy to understand. Performance on a small number of rows should be fine, but for a large number of rows, you will have to test.

Update: I have added the exact code that I am using that is working, in my test environment.

USE tempdb
GO

IF OBJECT_ID('tempdb.dbo.Demo') IS NOT NULL
BEGIN
    DROP TABLE dbo.Demo;
END
GO

CREATE TABLE dbo.Demo
(
    SampleNumber    integer IDENTITY NOT NULL,
    SampleWeight    integer NOT NULL,
    Attribute1      integer NOT NULL,
    Attribute2      integer NOT NULL,
    Attribute3      integer NOT NULL,

    CONSTRAINT [PK dbo.Demo SampleNumber]
        PRIMARY KEY (SampleNumber),

    CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
        CHECK (SampleWeight BETWEEN 1 AND 50)
);

INSERT INTO dbo.Demo
    (
    SampleWeight,
    Attribute1, 
    Attribute2, 
    Attribute3
    )
VALUES
    (2, 23, 32, 65),
    (1, 32, 56, 75),
    (3, 54, 25, 87);
GO

SELECT 
    y.SampleNumber,
    Att.vals.value('@Attribute1','int'),
    Att.vals.value('@Attribute2','int'),
    Att.vals.value('@Attribute3','int')
FROM(
    SELECT 
        SampleNumber,
        Attribute=(SELECT CAST(REPLICATE(CAST(Attribute.Data AS VARCHAR(MAX)),SampleWeight) AS XML))
    FROM dbo.Demo
    CROSS APPLY(   
        SELECT 
            [@Attribute1] = Attribute1,
            [@Attribute2] = Attribute2,
            [@Attribute3] = Attribute3
        FOR XML PATH('data'),ROOT('root'), TYPE
    ) AS Attribute (Data)
) AS y
CROSS APPLY y.Attribute.nodes('/root/data') as Att(vals)

Expanding a dataset based on a weight field

Tags:

Sql Server

Sql Server 2008 R2

Related

Recent Posts