SSIS Complex source File : Script Component to Handle Key=value records

Integration services – Complex Structured Source File

 

What is the complexity in the File .
  1. The data is available in  Source file  in the  form of “Key = Value “ in each row of file .  The complexity here is some Key  names are columns in Table1 of the same row in destination and Some Key names are Row data in Table2.

  1. The data is Available  in some sets in Source File and each Set represents to one test case . Each set is separated by 2 new rows . The Complexity here is each set needs to be load in table1 as well as in Table2 . So one set of data  which presents as each row of source files needs to transfer as columns in Table1 and Rows in Table2 based on Key Name .

  1. The souce file name naming convention is like  TcRunExport_From20120425_110700_To20120425_112200_244258.txt
The complexity here is we need to extract To datetime and store in a destination tables in Datetime column . The other complexity here is when ever the new file not available the SSIS package should fail with error file not found .

      4 . Given below the sets of data  in Source file and final destination loaded data in tables for certain Tctun_id ( unique representationid for a set ) . For clear understanding of requirement  set the of data for Tcrun_id = 61987899 in source file and how it is loaded in destination tables ( See fig1 and Fig2)



Fig 1 :Sample Data in Source File :
Td.Name = Sms Mobisud
Tc.Name = Sms Mobisud - Mobisud
Tp.Name = Sms_Notif_Rating_1.5
TestScenario.Name = Sms Mobisud
TcRun.Id = 61987899
TcRun.StartDate = 2012/04/25 11:03:49
TcRun.EndDate = 2012/04/25 11:06:14
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PLMN 206/01
T.NeId = PLMN 206/01
O.Port.Code = RTUPC00426-04
T.Port.Code = GSE42-0446-01
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxxx
T.Region.Name = ANTWERPEN
O.Msc.Name = Prox.ARLON
O.Region.Name = ARLON
ErrorCause = NoError
O.AttachSuccessful = Yes
O.Gprs.AttachTime = 2.373
O.SmsAcknowledgement = Yes
O.SmsReceivedNotif = Yes
O.SmsSendingTime = 2.824
OT.SmsDeliveryTime = 5.408
OT.SmsDeliveryToNotifTime = 3.264
OT.SmsReceivedSuccessful = Yes
OT.SmsWithinExpectedTime = Yes
Result = PASSED
T.AttachSuccessful = Yes


Td.Name = Voice_qual
Tc.Name = VoicePesq_2.0_2G
Tp.Name = VoicePesq_2.0
TestScenario.Name = Wavre_Marais_Voice_qual
TcRun.Id = 61987921
TcRun.StartDate = 2012/04/25 11:03:51
TcRun.EndDate = 2012/04/25 11:06:17
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PORT Wavre_VQM
T.NeId = PORT Marais_VQM
O.Port.Code = RTUPC01148-01
T.Port.Code = RTUPC00978-03
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxx

T.Region.Name = BRUXELLES
T.RingNumber = 2
O.Msc.Name = Prox.WAVRE
O.Region.Name = WAVRE
ErrorCause = NoError
O.AttachSuccessful = Yes
O.AudioCodec = AMR_FR
O.DetachSuccessful = Yes
OT.AnswerTimeStamp = 89.671
OT.CallSetupSuccessful = Yes
OT.CallSetupTime = 6.118
OT.CallSetupTimeUnderThrld = Yes
OT.DialStartTimeStamp = 76.924
OT.HangupTimeStamp = 137.924
OT.PesqScoreAvg = 3.7
OT.PesqScoreMax = 3.84
OT.PesqScoreMin = 3.54
OT.PesqUnderThrldPerc = 0
OT.PlayErrorPerc = 0
OT.QoSEvaluateErrorPerc = 0
OT.RingDuration = 6.008
Result = PASSED
T.AttachSuccessful = Yes
T.AudioCodec = AMR_FR
T.Cli.CheckSuccessful = Yes
T.DetachSuccessful = Yes
T.RingSuccessful = Yes
TO.PesqScoreAvg = 3.01
TO.PesqScoreMax = 3.35
TO.PesqScoreMin = 2.76
TO.PesqUnderThrldPerc = 0
TO.PlayErrorPerc = 0
TO.QoSEvaluateErrorPerc = 0


Td.Name = Voice_qual
Tc.Name = VoicePesq_2.0_2G
Tp.Name = VoicePesq_2.0
TestScenario.Name = Leuven_Marais_Voice_qual
TcRun.Id = 61987923
TcRun.StartDate = 2012/04/25 11:03:50
TcRun.EndDate = 2012/04/25 11:06:18
TcRun.Result = Passed
Error.Step =
Error.Protocol =
O.NeId = PORT Leuven_VQM
T.NeId = PORT Marais_VQM
O.Port.Code = RTUPC01019-02
T.Port.Code = RTUPC01149-03
O.Sim.MsisdnVoice = +32 xxxxxxxxxx
T.Sim.MsisdnVoice = +32 xxxxxxxxxx

T.Region.Name = BRUXELLES
T.RingNumber = 2
O.Msc.Name = Prox.LEUVENMSC
O.Region.Name = LEUVEN
ErrorCause = NoError
O.AttachSuccessful = Yes
O.AudioCodec = AMR_FR
O.DetachSuccessful = Yes
OT.AnswerTimeStamp = 93.202
OT.CallSetupSuccessful = Yes
OT.CallSetupTime = 6.226
OT.CallSetupTimeUnderThrld = Yes
OT.DialStartTimeStamp = 80.273
OT.HangupTimeStamp = 140.179
OT.PesqScoreAvg = 3.71
OT.PesqScoreMax = 3.86
OT.PesqScoreMin = 3.53
OT.PesqUnderThrldPerc = 0
OT.PlayErrorPerc = 0
OT.QoSEvaluateErrorPerc = 0
OT.RingDuration = 6
Result = PASSED
T.AttachSuccessful = Yes
T.AudioCodec = AMR_FR
T.Cli.CheckSuccessful = Yes
T.DetachSuccessful = Yes
T.RingSuccessful = Yes
TO.PesqScoreAvg = 3.46
TO.PesqScoreMax = 3.71
TO.PesqScoreMin = 3.25
TO.PesqUnderThrldPerc = 0
TO.PlayErrorPerc = 0
TO.QoSEvaluateErrorPerc = 0


Fig2 :Destination Tables for one Tcrun_id = 61987899


Solution in Step by Step .

 

 Step 1. Create an FTP task to pick the latest file  from source FTP location and place it in SourceInput Folder .

  
  Step1.1  Create FTP connection Manager : give the FTP server name , Port , User name and password 

Step 1.2  : Add the FTP task in Control flow .
            Create Input text file connection to store the FTP file .
            Create a variable to give the FTP file path which changes every time . Till fulfill dynamic fTP file path create expression in variable

Expression =  "/PDMS_directory/*_"+ @[User::SrcFileProgressiveNumber] +".txt"  and value looks like  /PDMS_directory/*_244258.txt

Step 2 : How to Extract data from  Complex structured file .


Step 2.1  Create For Each loop container which locates the folder where the source file is located . get the filename and store it in a variable @Filename.

Step 2.2 To get the Todate from the file name create a variable and give the expression to get the Todata in datetime formate

 Expression :

SUBSTRING( @[User::FileNameOnly] ,35,4)+"/"+SUBSTRING( @[User::FileNameOnly] ,39,2)+"/"+SUBSTRING( @[User::FileNameOnly] ,41,2)+" "+SUBSTRING( @[User::FileNameOnly] ,44,2)+":"+SUBSTRING( @[User::FileNameOnly] ,46,2)+":"+SUBSTRING( @[User::FileNameOnly] ,48,2)


And  value looks like   2012/04/25 11:22:00


Step 3 : Create Data Flow task  in For each loop container .



Step 3.1  this is the most critical part . to read the complex file we need to create a script task  as source in data flow tab .

The data flow task looks like

Step 3.2  Open the Script transformation Editor by double clicking on script component task .

In Inputs and Outputs Tab :  Create 2 Outputs one for each destination table and name it to your convenience . and add the all the columns in each table in those output buffer  as shown  below . and give the exact data type for each column  as per the destination tables .


In script Tab ,  pass the variables of the source file full path , To End date variable .
 And click on edit script tab to write the code .
Here is the code . Declare the variables required in script main block .
In Preexcute ()  method  create streamreader object  by passing the source file path .
In PostExcute() method close the Sreamreader object .

Brief explination of the code .

Step  1 Create while loop to read the file till Endofstream ie last line of the file .

Step 2  Read each line of source file split the string with “=” delimiter to separate Key and Value details of each line .

Step 3 . compare the Key name of each line in source file with Test table columns if it matches store the value part  in the corresponding  Output buffer column .

Step 4 if the Key name is not matched with Test table columns then we need to store that Key name in KPI_name  column in Results table and its value part in KPI_value column in Results table .


Step 5 : How to identify the each set .  each set ends with 2 new lines so check for every blank line and count it after getting 2 blank line that means one set is completed
So add new buffer row by Output0Buffer.AddRow()

Step 6 : For Results table each row in source file is a row in destination table for that you need to add results buffer row for each source file like OutputResultsBuffer.AddRow()









' Microsoft SQL Server Integration Services Script Component
' Write scripts using Microsoft Visual Basic 2008.
' ScriptMain is the entry point class of the script.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Dim sr As System.IO.StreamReader
    Dim FileProcessingNumber As Long
    Dim TcRunEnddate As String
    Dim FilePath As String
    Dim TcRunId As Integer
    Dim TcRun_StartDate As Date

    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        FilePath = Variables.FileName
        sr = New System.IO.StreamReader(FilePath)
        FileProcessingNumber = Variables.FileProgressiveNumber
        TcRunEnddate = Variables.TcRunEndDate
    End Sub

    Public Overrides Sub PostExecute()
        MyBase.PostExecute()
        sr.Close()
    End Sub

    Public Overrides Sub CreateNewOutputRows()
        Dim lineIndex As Integer = 0
        Dim EmptyRow As Integer = 0
        Dim TestRecord As Integer = 0
        Dim T_Sim_default As Integer = 0

        Output0Buffer.AddRow()

        While (Not sr.EndOfStream)
            Dim line As String = sr.ReadLine()
            If (line.Length = 0) Then
                EmptyRow = EmptyRow + 1
                If (EmptyRow = 2 And (Not sr.EndOfStream)) Then
                    If T_Sim_default = 0 Then
                        Output0Buffer.tsimmsisdnvoice = "03200000"
                    End If
                    Output0Buffer.AddRow()
                    T_Sim_default = 0
                    EmptyRow = 0


                    GoTo line1

                End If
            End If
            TestRecord = 0

            Dim columnArray As String() = line.Split(Convert.ToChar("="))
            If (columnArray(0).Length > 0) Then


                'Output0Buffer.tsimmsisdnvoice = "03200000"
                Output0Buffer.server = "ANTS01"

                Try
                    If (columnArray(0).ToString = "Td.Name ") Then
                        Output0Buffer.tdname = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "Tc.Name ") Then
                        Output0Buffer.tcname = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "TestScenario.Name ") Then
                        Output0Buffer.testscenarioname = columnArray(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "TcRun.Id ") Then
                        Output0Buffer.tcrunid = columnArray(1).Substring(1)
                        TcRunId = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "TcRun.StartDate ") Then
                        Output0Buffer.tcrunstartdate = columnArray(1).Substring(1)
                        TcRun_StartDate = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "TcRun.Result ") Then
                        Output0Buffer.tcrunresult = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "Error.Step ") Then
                        Output0Buffer.errorstep = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "Error.Protocol ") Then
                        Output0Buffer.errorprotocol = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "O.NeId ") Then
                        Output0Buffer.oneid = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "O.Port.Code ") Then
                        Output0Buffer.oportcode = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "O.Sim.MsisdnVoice ") Then
                        Output0Buffer.osimmsisdnvoice = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "T.NeId ") Then
                        Output0Buffer.tneid = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "T.Port.Code ") Then
                        Output0Buffer.tportcode = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "O.Region.Name ") Then
                        Output0Buffer.oregionname = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "T.Region.Name ") Then
                        Output0Buffer.tregionname = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "T.Sim.MsisdnVoice ") Then
                        Output0Buffer.tsimmsisdnvoice = columnArray(1).Substring(1)
                        T_Sim_default = 1
                        TestRecord = 1
                    ElseIf (columnArray(0) = "Server ") Then
                        Output0Buffer.server = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause ") Then
                        Output0Buffer.errorcause = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause.FailureReason ") Then
                        Output0Buffer.errorcausefailurereason = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause.PhaseName ") Then
                        Output0Buffer.errorcausephasename = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause.Role ") Then
                        Output0Buffer.errorcauserole = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause.StepError ") Then
                        Output0Buffer.errorcausesteperror = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "ErrorCause.UserDefinedErrorCause ") Then
                        Output0Buffer.errorcauseuserdefined = columnArray(1).Substring(1)
                        TestRecord = 1
                    ElseIf (columnArray(0) = "TcRun.EndDate ") Then
                        Output0Buffer.tcrunenddate = columnArray(1).Substring(1)
                        TestRecord = 1
                    End If

                    Output0Buffer.tsmeas = TcRunEnddate
                    Output0Buffer.FileProgressiveNumber = FileProcessingNumber
                    If (TestRecord = 0 And columnArray(0) <> "Tp.Name " And columnArray(0) <> "O.Msc.Name " And columnArray(1).Length > 1 And columnArray(1) <> " 0") Then
                        If (Not sr.EndOfStream) Then
                            OutputResultsBuffer.AddRow()
                        End If
                        OutputResultsBuffer.kpiname = columnArray(0).Replace(".", "_")
                        OutputResultsBuffer.kpivalue = columnArray(1)
                        OutputResultsBuffer.tcrunid = TcRunId
                        OutputResultsBuffer.tcrunstartdate = TcRun_StartDate
                        OutputResultsBuffer.server = "ANTS01"
                        OutputResultsBuffer.tsmeas = TcRunEnddate

                    End If

                Catch
                    'MsgBox("Startdate :" + columnArray(1).Substring(1))
                End Try

            End If

line1:      lineIndex = lineIndex + 1

        End While

    End Sub

End Class



Step 4 : Avoid Package Failure when new file not found in FTP location 


As we need to look for new file for every 5 minutes in FTP location , so times file creation will be delayed and in that case FTP task fails with File not Found Error .
The SSIS package shouldn’t fail for File not found error . to avoid this increase  the
Maximum Error count parameter of the package to 2 . so that if we get that  error it won’t show the package as failed .
 

Comments

Popular posts from this blog

SSAS : A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. (Microsoft.AnalysisServices.AdomdClient). + Solution

SSIS Error : Unable to infer the XSD from the XML file. The XML contains multiple namespaces

SSRS : [Teradata Database] 3939 There is a mismatch between the number of parameters specified and the number of parameters required.