Search This Blog

Wednesday, November 16, 2016

How to Load the Data in Complex Schema using SSIS

Case Study - How to Load the Data in Complex Schema using SSIS


Scenario

  • Flat file which contains the Employee Information, and each row needs to move or insert multiple table/Schema in Parallel.
  • Schema is basically like parent and Child Architecture.
  • Parent table Primary Key will be foreign key of Child table.

Structure of the Flat File

  1. Employee Name
  2. Physical Address
  3. Digital Email Address
  4. Mobile Number
  5. Experience
  6. Occupation
  7. Basic pay
  8. PF
  9. Allowance
  10. (etc..)  

DB Schema      

Company Master is the Parent table and its Child tables are Employee Master, Salary Master and Address Master.
Employee Master is having 2 Child tables Employee Name, Occupation and Experience.
Salary Master is having 1 Child table Employee Salary Details.
Address Master is having 2 Childs Physical Address and Digital Address.
There are 2 Association tables.
  1. Salary and Employee master Association.
Primary key of Employee Master and Salary Master will be Stored in this Association table

  1. Address and Employee master Association.
Primary key of Employee Master and Address Master will be Stored in this Association table

Relationship between the tables and the data flow  


Task

Extract the Flat file and Load the data into multiple tables in parallel.

How to Achieve?

We can achieve this in 2 ways. One is using Script Component and another one is using tricky methods.

Solution – 1




--> In SSIS - DFT (Data Flow task) we have a Script Component.


Using the Script component write all the insert Script using C# or VB.net.
Configuring the Script component:-
Drag and drop the Script component into the dataflow of DFT it will show the following options.
















 Select the Destination option. Since we are going to insert the target table.









Select the connection manager and provide the Name . 















Once you configured your Source Component all the Columns will be listed in Input Column tab in Script component.

  


















Go to Script tab and select the Script language based on your option (VB or C#)
Click Edit Script to write your coding.
Once the script window is opened you can see 3 Methods.
  1. Sub PreExecute()
This method is called once, before rows begin to be processed in the data flow.

  1. Sub PostExecute()
This method is called after all the rows have passed through this component.

  1. Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
This method is called once for every row that passes through the component from Input0.
How to Acquire the Connection?
We have another public function in the Script window. Here the ADO is the Connection manager which we specified while configuring the Script component.
Refer the Second Image.
  Public Overrides Sub AcquireConnections(ByVal Transaction As Object)

       connMgr = Me.Connections.ADO
       sqlConn = CType(connMgr.AcquireConnection(Nothing), SqlConnection)

End Sub
What to write in the 3 methods?
In the PreExecute method we need to define our Query and its Parameters. Also Refer the yellow marked query which will be provide the Auto generated ID of each row. 

In the Input0_ProcesInputRow method we need to pass the source column values to PreExecute method. Yellow marked Row represents the Source Column and Address_ID is returned the Auto generated Id which can be used for other Child tables foreign Key. 
We can use this Script component to insert all the tables when we have these kind of complex Schema’s.
Drawbacks:-
  1. Since we are using the Script component and Insert statement this will take long time to import all the records.
  2. This code will work only the ADO.Net Connection and it will not work OLEDB Connection.
  3. If we are using OLEDB Connection we need to change the .net Code accordingly.

Solution – 2

This is another method to achieve our goal which is tricky and cool method.
Once the File is loaded in to stage table. We need to generate our own RowID using the below Query
select ROW_NUMBER() OVER (ORDER BY (SELECT 1))
+(select isnull(max(EmployeeID),4000) from Employee with (nolock))  AS EmployeeID

  • Query initially select the maximum Employee id and added with the Row number of the Stage Table.
  • We need to Generate ID for all Parent ID and it be passed to Child tables Foreign Key.

Once we build our Query we need to pass as SQL Command in OLEDB Source Editor.



 


    And check the Keep identity in OLE DB Destination Editor.
 




This will load the Data into all the target tables in parallel.
Sample DFT


This will Load the data without any performance issue and it’s one of quicker method to load the data.
Drawbacks:-
  1. While this SSIS package is running we cannot insert any single record in the target table. Since the auto generated id is defined by us.

Conclusion

In One of our Project we have complex schema which is defined by Client. We approaches the 2 solutions and finally selected the second one. Since the Second solution is faster as expected.