SSIS | Data Flow Task for Beginners – ETL Data Pump Component with Example

Last modified date:

In the last article, I introduced the Control Flow feature. Using control flow we can manage through tasks what SSIS package does. The most frequently used is a Data flow task, which contains data transfer logic (ETL processes).

Difference between Control Flow and Data flow task

Unfortunately, these 2 terms are often confused. Control flow is not a task. It is the SSIS package management layer. It defines tasks (eg Execute SQL, Script task, Dataflow task, etc.) and relations/sequence between them.

Data flow is one of the tasks that can be implemented within Control flow and contains a logic to get data from point A to point B. Control flow can contain 1 or more Data flow tasks.

Data Flow – Introduction

As I mentioned earlier, it is probably the most used task at all. This is because the SSIS package is most often created to provide some data flow or data transformation before it is transferred to a target destination.

Together with the connection manager, data flow ensures connection to a data source (and destination), transformation and data transfer to a certain destination. On the screenshot below is the final state of our SSIS package from the previous article where we have prepared the general control flow logic. We have used Execute SQL tasks that provide logging (start and end) and the data flow task (but we have no logic yet in it).

Dat flow task - example

The data flow task on the screenshot is in the middle of the workspace, named “ETL”. We can double-click this task and look inside.

Data flow task - detail

The content of the task is still completely empty. Note that the SSIS Toolbox (on the left in the workspace) offers a different set of components than a control flow.

SSIS Data Flow – Component Types, Source, Destination

Let’s take a look at the SSIS Toolbox – what we have in Data flow by default in Business Intelligence Studio. The toolbox can be logically classified into two basic categories:

  • Data Connection (Source and Destination) and
  • Data Transformation

In SSIS Toolbox, components are categorized as follows:

1. Data Connection

    • 1.1. Favorites – We can find here Source and Destination Assistant (wizard), which will help us to create a provider to some file or database table
    • 1.2. Other Sources – We can choose the connection provider. E.g. we would choose OLE DB Source to connect to SQL Server.
      • ADO NET Source
      • CDC Source
      • Excel Source
      • Flat File Source
      • ODBC Source
      • OLE DB Source
      • Raw File Source
      • XML Source
    • 1.3. Other Destination – We can choose the connection provider, for example, to connect to SQL Server we would choose OLE DB Source. Data will be imported to this destination
      • ADO NET Destination
      • Data Mining Model Training
      • DataReader Destination
      • Dimension processing
      • Excel Destination
      • Flat File Destination
      • ODBC Destination
      • OLE DB Destination
      • Partion Processing
      • Raw File Destination
      • Recordset Destination
      • SQL Server Compact Destination
      • SQL Server Destination

2. Data Transformation

    • 2.1. Common – Here we find the most commonly used tools for data transformation, such as Derived Column, Data Conversion, etc.
      • Aggregate component
      • Balanced Data Distributor
      • Conditional Split
      • Data Conversion
      • Data Streaming Destination
      • Derived Column
      • HDFS File Destination
      • HDFS FIle Source
      • Lookup component
      • Merge component
      • Merge Join component
      • Multicast component
      • OData Source
      • OLE DB Command
      • Row Count component
      • Script Component
      • Slowly Changing dimension
      • Sort component
      • Union All component
    • 2.2. Other Transformations
      • Audit
      • Cache Transform
      • CDC Splitter
      • Character Map
      • Copy Column
      • Data Mining Query
      • DQS Cleansing
      • Export Column
      • Fuzzy Grouping
      • Fuzzy Lookup
      • Import Column
      • Percentage Sampling
      • Pivot
      • Row sampling
      • Term Extraction
      • Term Lookup
      • Unpivot

Individual components will be presented in separate articles.

Example – Load data from Excel into SQL Server table

The task in our SSIS package will be to modify the Data flow task to do the following logic:

  • In the C: \ Biportal_Data folder we have an excel named “Source_Data.xls”. To this file we want to connect
  • Add a new Date_Timestamp column as a timestamp
  • Save the data to SQL Server – localhost, dbo.Excel_Data table

The source data looks like this:

Data Flow - example - load from excel to SQL Server

The target table in SQL Server looks like this:

DAta flow example - oad data from excel to SQL server - destination

Solution:

1) First, we need to connect to the data in Excel

  • In other Sources, drag the Excel Source component to the Data flow space
  • Then click on New connection manager (because the connection to Excel is not created yet)
  • Set the path to the file C:\Biportal_Data\Source_Data.xls and then click OK

Excel connection manager - data flow example

  • Next, select Sheet on which the data is located

Data flow example - create excel connection manager - sheet

  • Then click Columns in the left sidebar to see if the result contains our data column

data flow example - excel connection manager - source editor - columns

  • Everything is OK and we can click OK, Data source is ready

2) Now we want to add a new Date_Timestamp column to our source data

  • Select the Derived Column component from the SSIS toolbox and drag it to the data flow. Connect the two components using the arrow

data flow example - adding derived column

  • Double click on the derived column and set it exactly as on the screenshot. It’s done

data flow example - derived column settings

3) Prepare the Data Destination Provider – we need to load our source data together with the Timestamp (derived column) into the SQL Server table

  • Similarly to the Excel source (step 1), we also prepare the destination connection. In SSIS Toolbox under category “Other destination”, select OLE DB Destination and drag it to the Data Flow space. Then open it and set the connection to the target server (localhost) and database (biportal). Under “Name of the table or view” select our target table dbo.Excel_Data

Data flow example - ole db destination settings

  • Then we have to map the columns from the source file to the target table. Click on “Mappings”. If the column names in the source Excel and in the destination table have the same name, the OLE DB destination provider understands to map them. If they are named differently, it is necessary to map the columns manually using black arrows. So there is no need to do anything in our case. Confirm OK

Data flow example - destination settings - mapping

Our final Control flow and data flow look like this

Control flow example - final  Data flow example - final

So we can run the entire SSIS package and look at the result

Control flow example - final after package execution  data flow example - final after package execution

Data has been successfully delivered to a SQL Server table

Data flow example - final check

My name is Jan Zedníček and I work as a BI Developer at Kentico Software in Brno. Mostly you can see me there working in the office, but I also work partly as a freelancer. I have been working as a freelancer for many companies for more than 5 years, but Kentico is the matter of my heart. I used to work as a financial controller at companies like Aero Vodochody or Sberbank and I also used to be a bond program manager in Unicapital Investment group. When I am not at work, I like playing volleyball, chess, doing a workout in the gym and I enjoy tasting of best quality rums. I am trying to summarize all my knowledge on this website not to forget them (because of the rum effect, you know =) and to put them forward to anyone. Don´t worry about asking for help or writing some comments.

Category: SSIS - Intgration Services

About Ing. Jan Zedníček - BI Developer, Finance controller

My name is Jan Zedníček and I work as a BI Developer at Kentico Software in Brno. Mostly you can see me there working in the office, but I also work partly as a freelancer. I have been working as a freelancer for many companies for more than 5 years, but Kentico is the matter of my heart. I used to work as a financial controller at companies like Aero Vodochody or Sberbank and I also used to be a bond program manager in Unicapital Investment group. When I am not at work, I like playing volleyball, chess, doing a workout in the gym and I enjoy tasting of best quality rums. I am trying to summarize all my knowledge on this website not to forget them (because of the rum effect, you know =) and to put them forward to anyone. Don´t worry about asking for help or writing some comments.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.