I was happy to co-present a session at this week’s Atlanta BI User Group with Rob Volk (@SQL_R) meeting entitled “Harvesting XML Data from the Web with Power Query and Curl.” The demo gods were not with me on my grand finale demo that night however. I had spent the demo building a Power Query Function and when I tried to invoke it against a list of values, I got a failure which I couldn’t resolve that night. Of course, as soon as I opened the spreadsheet the next day I immediately saw the problem, which I will share here, as I think it is probably going to be something people encounter frequently as they start to work with Power Query.
What the Function Did:
Here’s the setup: www.SQLSaturday.com contains a page for every SQL Saturday, and if it’s available, the schedule for the event. Atlanta’s last event was this month and was SQL Saturday #285-hence, its schedule is located at http://sqlsaturday.com/285/schedule.aspx. Any other SQL Saturday event number works in the same manner. If I want to use Power Query to view this data as a page, I would enter that url as the source in Power Query:
So two days ago I posted this. It’s a way to generate an SSIS Expression for use in an Incremental Load’s Conditional Split. A friend had pointed out that this pattern was not the best– as NULL handling is not always as easy as replacing the NULL with what you might consider a safe value. I also got a very thoughtful comment on the post from a lovely gentlemen expressing the same concern. So obsessed, I went back to tinkering. I came up with ANOTHER expression (and consequently another T-SQL generator for it). I like this one a little better as it seems to me that it performs what is asked without introducing risk of replacing NULL values. So folks please read this, use it, bash it up and let me know what you think.
Here’s the new (to me, sure someone had already figured this out) NULL Handling expression for DELTA rows, using the column Color as an example:
This does NOT break the Conditional Split if there are NULLS. There can be NULLS in either the source or the destination or both and it does not break the pipeline. I love that.
How to read it from the left:
The whole expression will evaluate as TRUE and the row split into the Delta path when either the yellow portion or the Green and blue portion together evaluate to True.
The yellow highlighted expression asks: is either side NULL while the other is not? If yes, then evaluate to TRUE.
The Green highlighted section: Are both sides NOT NULL ? If yes, Then Blue highlighted section asks: are they unequal? If yes, then evaluate to True.
If both sides are non null, yet equal, or if both sides are NULL, then the condition is not met and the row is ignored. Just like we want it to be.
Now the SQL to generate the whole expression can be datatype agnostic. I love that too. Here is the SQL to generate the whole concatenated shebang for all of your columns:
Click on the SQL below to get a copy/past version 🙂
Happy 2014! Happy to report that Audrey and I were both renewed as SQL Server MVPs today! To celebrate I’m publishing a really long blog post.
This is a MONSTER long post. The main point of this post was to give you guys some T-SQL code which can be used against the information schema view of a SQL Server table destination to spit out a complex SSIS Expression. While that was the point of the post, I felt like I also needed to provide context of what I was trying to do or what you might be doing when the need for such SQL would arise.
also, I had Audrey read it over and she scolded me for a sad lack of chuckles. So I’m adding in some random Chuckle-y interludes. Enjoy them. Or skip them completely. Chuckle interludes are indicated by the face of Chuck Norris for easy recognition.
One of the most common scenarios encountered while ETL’in is an Incremental Load –that is determining if a source row already exists in the target database (often your warehouse), and if it does, has it changed since the last time it was loaded? The pseudo code goes like this: Continue reading →
I hadn’t played with Data Explorer Power Query since it became Power Query last week with the announcement of Power BI for Office 365. Since I am presenting on it at the Microsoft Integration Architects group this week, I opened it up to prep the demos. By the way, Power Query is very cool. It’s an Excel Add-in that allows an information worker to perform self service BI. With Power Pivot and Power View, it completes a self service BI trifecta of ETL, compressed storage and presentation. Now that Power BI is available as a cloud service, this not only gives users a complete stack of BI tools, but allows those tools to be more easily accessed via mobile devices.
Since Power Query is designed to be a self service tool, it focuses on ETL by way of drag and drop/gui tools. However, one of my favorite features of Power Query was that you can access and copy the underlying code (which is a language called “M” by the way), make a few slight changes to it and reuse. For instance if I needed data from several tables on the internet on different urls, but they all have the same structure, I could labor through the first page with the gui, then copy that code, paste it into notepad, change the url address of the source only and paste the new code into a new Power Query. and BAM! It’ done.
This was all accomplished via the query script icon, a cute little fella that looks like a scrolled piece of paper.
All of the steps on the right hand column above represent pieces of work done via the gui to get the data in this format. By clicking on the query script icon, I get this:
Saturday I was at Innovative Architects with several of my colleagues prepping labs for an exciting event being held April 27th. It’s the Global Windows Azure Bootcamp and it’s a very cool idea and thing that Microsoft is doing. There will be sessions all over the world (see below map) where Microsoft will be hosting free training on Azure. While we are all training, we will also each deploy a processor to a Global Render Farm. Here is a cute video explaining what a render farm is: http://www.youtube.com/watch?v=4VUWrZRCtI8 (this video is a plug for renderfarm.fi , which is a volunteer/free service, NOT affiliated with Microsoft, I just thought it offered a cute and good explanation of what a render farm is.) Here (link) is another video showing creator Alan Smith explaining the render farm he created for Azure.
The session in Atlanta will be hosted by two of my colleagues from Innovative Architects, Vaishali Shah and Rob Bramhill. They have devoted weeks of time to prepping the day and their hard work was very evident yesterday as we were doing final preparations.
I look forward to working more with Azure, specifically creating HDInsight clusters to apply the same type of scalability to data as this Render Farm experiment applies to processing animations.
I’m back from the MVP Summit and processing a staggering number of thoughts. Not only am I processing the brain dump of NDA technical goodness of exciting new stuff coming to the product, but a number of personal reflections. How lucky I am to have been somehow deemed worthy of this honor. How fortunate the SQL MVPs are in the number of sessions, pampering and genuine respect we receive from the SQL Server Product team. How lucky I am to be a part of one of the closest technical communities out there. The SQL Community calls itself family because we truly have a network of close, warm friendships. It dawned on me recently that I could go practically anywhere in the world (the world!) and find a colleague I have met from being a part of the SQL Community. I am a happy lady right now (in spite of the double ear infection).
Another benefit to you, my readers from my attendance to the Summit is a new motivation to continue sharing my knowledge with the community. That said, let me be the millionth prodigal blogger to apologize for my lack of blogging. Insert every lame but true excuse here: I’ve been busy, I’ve had things come up in my personal life, all true, but I know you want to hear from me.
So from the back logs of my “things I wanted to blog about” I picked a random DQS tidbit: How to import domain values and synonyms into DQS from Excel. Get ready for a wee bit of technical content! Continue reading →
Quick, happy announcement… My fellow Datachix, Julie Smith, has received the SQL Server MVP award for 2013! Also, I’ve been renewed as a SQL Server MVP for 2013. What does this mean? Yeah… 100% of the people who blog on this site (all two of us) are MVPs!
So, Julie, while I am not the first to congratulate you, let me be the first to do it on our blog. 🙂
On a personal note, I am grateful and humbled to be a part of the MVP community for another year. Having experienced 12 months of the program, I’m honored that Microsoft saw fit to include me for another year.
As a bonus, here’s a picture of Julie being awesome:
Rock on, my friend!
p.s. Note to self: It’s been a while since you blogged about anything, Audrey. Get your act together and post more often! (Nodding…)
Guess what guys? I’m speaking at PASS Summit this year! My session titled Data Quality Services—Finally! was selected and I get to go to the premier SQL Server conference and present on a topic which I find truly exciting. Data Quality Services (DQS) is new with SQL Server 2012. Along with Master Data Services (MDS), it represents Microsoft’s entry into Master Data Management (MDM).
The ideal use of MDS and DQS represent to me a paradigm shift in the way I.T. folks and business folks interact. These tools are not just another way for I.T. to develop solutions for “the business” based on requirements (whether those requirements are thoroughly documented via a watershed lifecycle or verbally shared with agile). DQS and MDS were explicitly designed for continued and iterative use by business users with expert Knowledge about their data. In MDM speak, these experts are called Data Stewards. I recommend that anyone in Data continue getting familiar with these concepts. Data Governance and Master Data Management are terms you are likely to see from now on. For more information I recommend David Loshin (http://mdmbook.com/) . His book, Master Data Management is a comprehensive guide to this topic. I’ve also found useful articles here , including articles on ways for I.T. to get “buy in “ from C-levels.
So, back to today’s post. As I was prepping my session for PASS Summit on Data Quality Services, I was extremely fortunate to be in communication with Matthew Roche and Matt Masson of Microsoft, who are also presenting on DQS (and MDS– actually all of EIM) at Summit. These two are both uber “ersum”, always willing to help me, answering emails at all hours of day and night.
Let’s dive in: Keeping data clean with DQS starts with creating a Knowledge Base. This is done in the Data Quality Client. Inside a Knowledge Base you create and maintain Domains. Single domains are the equivalent to a column or attribute in a database.
A glitch somewhere caused PASS to redo the Community Vote for Lightning Talks–Another Chance to make your choice known for your favorite sessions. And if you need help with the menu, may I suggest the ham?
Also, if you are in the area, I hope you are considering Columbus, Georgia’s first SQL Saturday (#167) ! I am delighted to be speaking at this event!
Thanks to the folks who came out to see my DQS presentation in Indy this past weekend. Thanks to Caroline Bailey, Hope Foley, Kyle Neier , Eddie Weurch, and the rest of the Indy crew who put on an excellent event. Here is my slide deck as promised. I met many fantastic folks and had a paleolithic good time.