Dec 28, 2014

MERGE statement data synchronization sql server best practices

A very common business requirement we generally come across where we need to synchronization of two tables which includes:

1. Inserting of new data if it is not present in target table while it is in the source table
2. Updating the existing data if it already present in the source table
3. Deletion of data if present in the target table while it is not present in the source table.
 
Sometimes we need to implement all above three requirement of sometimes part of this. There are many approaches to accomplish this task:

Approach one:

a. Delete all the records from target table.
b. Insert the all the records from the source table to target table

Script:

INSERT INTO tblTarget
SELECT * FROM tblSource

Total logical read in first execution: 141996
Total logical read in second execution: 141996
Total logical read in third execution: 141996

If this script is executing by a scheduler in the interval of 30 seconds then in each execution total logical read would be around 141996.

Approach two:
a. Check the existence of records.
b. INSERT\DELETE|UPDATE operation should be performed according the existence.

Script:

UPDATE tblTarget
SET vcData1 = S.vcData1
     ,vcData2 = S.vcData2
     ,vcData3 = S.vcData3
     ,vcData4 = S.vcData4
FROM tblSource S
WHERE S.ntSourceKeyID = tblTarget.ntTargetKeyID

INSERT INTO tblTarget
SELECT *
FROM tblSource S
WHERE NOT EXISTS (
          SELECT *
          FROM tblTarget
          WHERE S.ntSourceKeyID = ntTargetKeyID
          )

DELETE
FROM tblTarget
WHERE NOT EXISTS (
          SELECT *
          FROM tblSource
          WHERE ntSourceKeyID = tblTarget.ntTargetKeyID
          )


Total logical read in first execution: 136159
Total logical read in second execution: 6420
Total logical read in third execution: 6420

This approach is much better than approach one. There is very less number of logical read from second time onward.

Approach three:
a. Using merge statement

Script:


MERGE tblTarget T
USING tblSource S
     ON T.ntTargetKeyID = S.ntSourceKeyID
WHEN MATCHED
     THEN
          UPDATE
          SET vcData1 = S.vcData1
              ,vcData2 = S.vcData2
              ,vcData3 = S.vcData3
              ,vcData4 = S.vcData4
WHEN NOT MATCHED BY TARGET
     THEN
          INSERT (
              ntTargetKeyID
              ,vcData1
              ,vcData2
              ,vcData3
              ,vcData4
              )
          VALUES (
              S.ntSourceKeyID
              ,S.vcData1
              ,S.vcData2
              ,S.vcData3
              ,S.vcData4
              )
WHEN NOT MATCHED BY SOURCE
     THEN
          DELETE;


Total logical read in first execution: 134019
Total logical read in second execution: 2140
Total logical read in third execution: 2140

So, approach three is much better than approach two from second time onward. MERGE statement is very efficient way to accomplish multiple DML operation at the same time and will get significant performance difference to perform some cases and should be used at appropriate time.

No comments:

Post a Comment