SSIS Complex source File : Script Component to Handle Key=value records
Integration services – Complex Structured Source File
What is the complexity in the File .
- The data is available in Source file in the form of “Key = Value “ in each row of file . The complexity here is some Key names are columns in Table1 of the same row in destination and Some Key names are Row data in Table2.
- The data is Available in some sets in Source File and each Set represents to one test case . Each set is separated by 2 new rows . The Complexity here is each set needs to be load in table1 as well as in Table2 . So one set of data which presents as each row of source files needs to transfer as columns in Table1 and Rows in Table2 based on Key Name .
- The souce file name naming convention is like TcRunExport_From20120425_110700_To20120425_112200_244258.txt
The complexity here is we need to extract To datetime and store in a destination tables in Datetime column . The other complexity here is when ever the new file not available the SSIS package should fail with error file not found .
4 . Given below the sets of data in Source file and final destination loaded data in tables for certain Tctun_id ( unique representationid for a set ) . For clear understanding of requirement set the of data for Tcrun_id = 61987899 in source file and how it is loaded in destination tables ( See fig1 and Fig2)
Fig 1 :Sample Data in Source File :
Td.Name = Sms Mobisud
Tc.Name = Sms Mobisud - Mobisud
Tp.Name = Sms_Notif_Rating_1.5
TestScenario.Name = Sms Mobisud
TcRun.Id = 61987899
TcRun.StartDate = 2012/04/25 11:03:49
TcRun.EndDate = 2012/04/25 11:06:14
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PLMN 206/01
T.NeId = PLMN 206/01
O.Port.Code = RTUPC00426-04
T.Port.Code = GSE42-0446-01
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxxx
T.Region.Name = ANTWERPEN
O.Msc.Name = Prox.ARLON
O.Region.Name = ARLON
ErrorCause = NoError
O.AttachSuccessful = Yes
O.Gprs.AttachTime = 2.373
O.SmsAcknowledgement = Yes
O.SmsReceivedNotif = Yes
O.SmsSendingTime = 2.824
OT.SmsDeliveryTime = 5.408
OT.SmsDeliveryToNotifTime = 3.264
OT.SmsReceivedSuccessful = Yes
OT.SmsWithinExpectedTime = Yes
Result = PASSED
T.AttachSuccessful = Yes
Td.Name = Voice_qual
Tc.Name = VoicePesq_2.0_2G
Tp.Name = VoicePesq_2.0
TestScenario.Name = Wavre_Marais_Voice_qual
TcRun.Id = 61987921
TcRun.StartDate = 2012/04/25 11:03:51
TcRun.EndDate = 2012/04/25 11:06:17
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PORT Wavre_VQM
T.NeId = PORT Marais_VQM
O.Port.Code = RTUPC01148-01
T.Port.Code = RTUPC00978-03
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Region.Name = BRUXELLES
T.RingNumber = 2
O.Msc.Name = Prox.WAVRE
O.Region.Name = WAVRE
ErrorCause = NoError
O.AttachSuccessful = Yes
O.AudioCodec = AMR_FR
O.DetachSuccessful = Yes
OT.AnswerTimeStamp = 89.671
OT.CallSetupSuccessful = Yes
OT.CallSetupTime = 6.118
OT.CallSetupTimeUnderThrld = Yes
OT.DialStartTimeStamp = 76.924
OT.HangupTimeStamp = 137.924
OT.PesqScoreAvg = 3.7
OT.PesqScoreMax = 3.84
OT.PesqScoreMin = 3.54
OT.PesqUnderThrldPerc = 0
OT.PlayErrorPerc = 0
OT.QoSEvaluateErrorPerc = 0
OT.RingDuration = 6.008
Result = PASSED
T.AttachSuccessful = Yes
T.AudioCodec = AMR_FR
T.Cli.CheckSuccessful = Yes
T.DetachSuccessful = Yes
T.RingSuccessful = Yes
TO.PesqScoreAvg = 3.01
TO.PesqScoreMax = 3.35
TO.PesqScoreMin = 2.76
TO.PesqUnderThrldPerc = 0
TO.PlayErrorPerc = 0
TO.QoSEvaluateErrorPerc = 0
Td.Name = Voice_qual
Tc.Name = VoicePesq_2.0_2G
Tp.Name = VoicePesq_2.0
TestScenario.Name = Leuven_Marais_Voice_qual
TcRun.Id = 61987923
TcRun.StartDate = 2012/04/25 11:03:50
TcRun.EndDate = 2012/04/25 11:06:18
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PORT Leuven_VQM
T.NeId = PORT Marais_VQM
O.Port.Code = RTUPC01019-02
T.Port.Code = RTUPC01149-03
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Region.Name = BRUXELLES
T.RingNumber = 2
O.Msc.Name = Prox.LEUVENMSC
O.Region.Name = LEUVEN
ErrorCause = NoError
O.AttachSuccessful = Yes
O.AudioCodec = AMR_FR
O.DetachSuccessful = Yes
OT.AnswerTimeStamp = 93.202
OT.CallSetupSuccessful = Yes
OT.CallSetupTime = 6.226
OT.CallSetupTimeUnderThrld = Yes
OT.DialStartTimeStamp = 80.273
OT.HangupTimeStamp = 140.179
OT.PesqScoreAvg = 3.71
OT.PesqScoreMax = 3.86
OT.PesqScoreMin = 3.53
OT.PesqUnderThrldPerc = 0
OT.PlayErrorPerc = 0
OT.QoSEvaluateErrorPerc = 0
OT.RingDuration = 6
Result = PASSED
T.AttachSuccessful = Yes
T.AudioCodec = AMR_FR
T.Cli.CheckSuccessful = Yes
T.DetachSuccessful = Yes
T.RingSuccessful = Yes
TO.PesqScoreAvg = 3.46
TO.PesqScoreMax = 3.71
TO.PesqScoreMin = 3.25
TO.PesqUnderThrldPerc = 0
TO.PlayErrorPerc = 0
TO.QoSEvaluateErrorPerc = 0
Fig2 :Destination Tables for one Tcrun_id = 61987899
Solution in Step by Step .
Step 1. Create an FTP task to pick the latest file from source FTP location and place it in SourceInput Folder .
Step1.1 Create FTP connection Manager : give the FTP server name , Port , User name and password
Step 1.2 : Add the FTP task in Control flow .
Create Input text file connection to store the FTP file .
Create a variable to give the FTP file path which changes every time . Till fulfill dynamic fTP file path create expression in variable
Expression = "/PDMS_directory/*_"+ @[User::SrcFileProgressiveNumber] +".txt" and value looks like /PDMS_directory/*_244258.txt
Step 2 : How to Extract data from Complex structured file .
Step 2.1 Create For Each loop container which locates the folder where the source file is located . get the filename and store it in a variable @Filename.
Step 2.2 To get the Todate from the file name create a variable and give the expression to get the Todata in datetime formate
Expression :
SUBSTRING( @[User::FileNameOnly] ,35,4)+"/"+SUBSTRING( @[User::FileNameOnly] ,39,2)+"/"+SUBSTRING( @[User::FileNameOnly] ,41,2)+" "+SUBSTRING( @[User::FileNameOnly] ,44,2)+":"+SUBSTRING( @[User::FileNameOnly] ,46,2)+":"+SUBSTRING( @[User::FileNameOnly] ,48,2)
And value looks like 2012/04/25 11:22:00
Step 3 : Create Data Flow task in For each loop container .
Step 3.1 this is the most critical part . to read the complex file we need to create a script task as source in data flow tab .
The data flow task looks like
Step 3.2 Open the Script transformation Editor by double clicking on script component task .
In Inputs and Outputs Tab : Create 2 Outputs one for each destination table and name it to your convenience . and add the all the columns in each table in those output buffer as shown below . and give the exact data type for each column as per the destination tables .
In script Tab , pass the variables of the source file full path , To End date variable .
And click on edit script tab to write the code .
Here is the code . Declare the variables required in script main block .
In Preexcute () method create streamreader object by passing the source file path .
In PostExcute() method close the Sreamreader object .
Brief explination of the code .
Step 1 Create while loop to read the file till Endofstream ie last line of the file .
Step 2 Read each line of source file split the string with “=” delimiter to separate Key and Value details of each line .
Step 3 . compare the Key name of each line in source file with Test table columns if it matches store the value part in the corresponding Output buffer column .
Step 4 if the Key name is not matched with Test table columns then we need to store that Key name in KPI_name column in Results table and its value part in KPI_value column in Results table .
Step 5 : How to identify the each set . each set ends with 2 new lines so check for every blank line and count it after getting 2 blank line that means one set is completed
So add new buffer row by Output0Buffer.AddRow()
Step 6 : For Results table each row in source file is a row in destination table for that you need to add results buffer row for each source file like OutputResultsBuffer.AddRow()
' Microsoft SQL Server Integration Services Script Component
' Write scripts using Microsoft Visual Basic 2008.
' ScriptMain is the entry point class of the script.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Dim sr As System.IO.StreamReader
Dim FileProcessingNumber As Long
Dim TcRunEnddate As String
Dim FilePath As String
Dim TcRunId As Integer
Dim TcRun_StartDate As Date
Public Overrides Sub PreExecute()
MyBase.PreExecute()
FilePath = Variables.FileName
sr = New System.IO.StreamReader(FilePath)
FileProcessingNumber = Variables.FileProgressiveNumber
TcRunEnddate = Variables.TcRunEndDate
End Sub
Public Overrides Sub PostExecute()
MyBase.PostExecute()
sr.Close()
End Sub
Public Overrides Sub CreateNewOutputRows()
Dim lineIndex As Integer = 0
Dim EmptyRow As Integer = 0
Dim TestRecord As Integer = 0
Dim T_Sim_default As Integer = 0
Output0Buffer.AddRow()
While (Not sr.EndOfStream)
Dim line As String = sr.ReadLine()
If (line.Length = 0) Then
EmptyRow = EmptyRow + 1
If (EmptyRow = 2 And (Not sr.EndOfStream)) Then
If T_Sim_default = 0 Then
Output0Buffer.tsimmsisdnvoice = "03200000"
End If
Output0Buffer.AddRow()
T_Sim_default = 0
EmptyRow = 0
GoTo line1
End If
End If
TestRecord = 0
Dim columnArray As String() = line.Split(Convert.ToChar("="))
If (columnArray(0).Length > 0) Then
'Output0Buffer.tsimmsisdnvoice = "03200000"
Output0Buffer.server = "ANTS01"
Try
If (columnArray(0).ToString = "Td.Name ") Then
Output0Buffer.tdname = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "Tc.Name ") Then
Output0Buffer.tcname = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "TestScenario.Name ") Then
Output0Buffer.testscenarioname = columnArray(1)
TestRecord = 1
ElseIf (columnArray(0) = "TcRun.Id ") Then
Output0Buffer.tcrunid = columnArray(1).Substring(1)
TcRunId = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "TcRun.StartDate ") Then
Output0Buffer.tcrunstartdate = columnArray(1).Substring(1)
TcRun_StartDate = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "TcRun.Result ") Then
Output0Buffer.tcrunresult = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "Error.Step ") Then
Output0Buffer.errorstep = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "Error.Protocol ") Then
Output0Buffer.errorprotocol = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "O.NeId ") Then
Output0Buffer.oneid = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "O.Port.Code ") Then
Output0Buffer.oportcode = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "O.Sim.MsisdnVoice ") Then
Output0Buffer.osimmsisdnvoice = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "T.NeId ") Then
Output0Buffer.tneid = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "T.Port.Code ") Then
Output0Buffer.tportcode = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "O.Region.Name ") Then
Output0Buffer.oregionname = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "T.Region.Name ") Then
Output0Buffer.tregionname = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "T.Sim.MsisdnVoice ") Then
Output0Buffer.tsimmsisdnvoice = columnArray(1).Substring(1)
T_Sim_default = 1
TestRecord = 1
ElseIf (columnArray(0) = "Server ") Then
Output0Buffer.server = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause ") Then
Output0Buffer.errorcause = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause.FailureReason ") Then
Output0Buffer.errorcausefailurereason = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause.PhaseName ") Then
Output0Buffer.errorcausephasename = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause.Role ") Then
Output0Buffer.errorcauserole = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause.StepError ") Then
Output0Buffer.errorcausesteperror = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "ErrorCause.UserDefinedErrorCause ") Then
Output0Buffer.errorcauseuserdefined = columnArray(1).Substring(1)
TestRecord = 1
ElseIf (columnArray(0) = "TcRun.EndDate ") Then
Output0Buffer.tcrunenddate = columnArray(1).Substring(1)
TestRecord = 1
End If
Output0Buffer.tsmeas = TcRunEnddate
Output0Buffer.FileProgressiveNumber = FileProcessingNumber
If (TestRecord = 0 And columnArray(0) <> "Tp.Name " And columnArray(0) <> "O.Msc.Name " And columnArray(1).Length > 1 And columnArray(1) <> " 0") Then
If (Not sr.EndOfStream) Then
OutputResultsBuffer.AddRow()
End If
OutputResultsBuffer.kpiname = columnArray(0).Replace(".", "_")
OutputResultsBuffer.kpivalue = columnArray(1)
OutputResultsBuffer.tcrunid = TcRunId
OutputResultsBuffer.tcrunstartdate = TcRun_StartDate
OutputResultsBuffer.server = "ANTS01"
OutputResultsBuffer.tsmeas = TcRunEnddate
End If
Catch
'MsgBox("Startdate :" + columnArray(1).Substring(1))
End Try
End If
line1: lineIndex = lineIndex + 1
End While
End Sub
End Class
Step 4 : Avoid Package Failure when new file not found in FTP location
As we need to look for new file for every 5 minutes in FTP location , so times file creation will be delayed and in that case FTP task fails with File not Found Error .
The SSIS package shouldn’t fail for File not found error . to avoid this increase the
Maximum Error count parameter of the package to 2 . so that if we get that error it won’t show the package as failed .
Comments
Post a Comment