A bit about Azure Data Factory: A SQL Script for creating Structure from SQL Tables

by Julie Smith

I have been working with Microsoft’s shiny new Azure Data Integration tool, Azure Data Factory. ADF was made generally available on August 12th
ADF is available on the Azure portal, and you can use it to create pipelines to move data to and from other cloud based data stores and on premise data stores using Data Management Gateways.
There is a lot of documentation and info about ADF online. If you are brand new to it, I’d recommend starting here with the learning path from Microsoft. ADF has been in preview since 2014, and one caution I’d give you is that the domain specific JSON used by ADF went through a major rewrite mid July. So if you find a post previous to that, understand that any of the JSON in it will be old and need to be translated. In my experience, the JSON editor on the portal attempts to translate it for you. There is also a GitHub site  with a translator
Reza Rad is sharing a lot of great content as well on his blog.
This post is going to be a “skip to the chase,” targeted post with a script to help you speed up your JSON descriptions of SQL tables. As I mentioned, ADF is ALL JSON –ALL THE TIME.
So if you have a SQL table or Azure SQL table and it is going to be a dataset, here is a script to help you create the bits that go inside the Structure tag. Use the @TableName variable to provide the table name. Adjust with a schema name variable if your table has a schema and you have multiple schemas using the same table names.

For table FactSales defined as:

I will use the following SQL script to produce the highlighted JSON for the structure inside the ADF dataset:

  ,@TableName VARCHAR(100)

SET @TableName = 'FactSales'
SET @FinalOrd = (
    SELECT max(ordinal_position) 
    FROM information_schema.columns
    WHERE Table_Name = @TableName
    WITH cte
AS (
  SELECT Ordinal_Position  
      WHEN Ordinal_Position < @FinalOrd 
        THEN '{ "name": "' + column_name + '", "type": "' + CASE 
            WHEN Data_Type LIKE '%int%'
              THEN 'Int32"},'
            WHEN Data_Type NOT LIKE '%int%'
              AND Numeric_Precision IS NOT NULL
              THEN 'Decimal"},'
            WHEN Data_Type LIKE '%date%'
              THEN 'Date"},'
            WHEN Data_Type = 'bit'
              THEN 'Boolean"},'
            WHEN Data_Type = 'uniqueidentifier'
              THEN 'Guid:"},'
            ELSE 'String"},'
      ELSE '{ "name": "' + column_name + '", "type": "' + CASE 
          WHEN Data_Type LIKE '%int%'
            THEN 'Int32"}'
          WHEN Data_Type NOT LIKE '%int%'
            AND Numeric_Precision IS NOT NULL
            THEN 'Decimal"}'
          WHEN Data_Type LIKE '%date%'
            THEN 'Date"}'
          WHEN Data_Type = 'bit'
            THEN 'Boolean"}'
          WHEN Data_Type = 'uniqueidentifier'
            THEN 'Guid:"}'
          ELSE 'String"}'
      END AS GeneratedJSON
  WHERE Table_Name = @TableName
  SELECT GeneratedJSON		
  FROM cte	
ORDER BY Ordinal_Position

This will produce the following in SSMS:


Copy and paste that into the JSON template in between the brackets for the Structure. To get to this, from the Azure Portal in a factory, go to Author and Deploy, then click on New Data Set and select the SQL type, either SQL Server table or Azure SQL Table:

Insert the JSON this script provides in between the brackets after the word “structure”

Initially, it will look like this:

Once you deploy successfully, it will reformat itself and look like this:

The ADF editor formats the columns.

The reason I built this script was mainly to translate the data types (ADF only uses Six datatypes: Int, Decimal, Date, Boolean, Guid and String ), only to learn through my research that for a linked service to SQL Server or Azure SQL, the data type is optional. In fact, defining the structure is ONLY NECESSARY if you are going to use Column Mappings later on in the copy activity of the pipeline. If you are going to just copy a data set with same column names to another structure on another linked service with same column names, then even defining the structure is unnecessary. All of that is a bit unclear when you are first starting out though. And I see use cases where you could use this script to define the data sets for Azure Blob Storage or other data set types, leveraging a SQL table as a template.
I hope this saves you some laborious JSON scripting. Thanks for reading!

Published by

Julie Smith

One half of the Datachix.com. ETL person. This picture is the best of me ever taken. Don't be too sad when you meet me for real :)

Leave a Reply