The Datachix Blog











{July 11, 2014}   Codestock –SlideDeck

by Julie Smith

 

I had a blast presenting BIML at Codestock today– for my attendees, here are the materials.



By Julie Smith

I was happy to co-present a session at this week’s Atlanta BI User Group with Rob Volk (@SQL_R) meeting entitled “Harvesting XML Data from the Web with Power Query and Curl.” The demo gods were not with me on my grand finale demo that night however. I had spent the demo building a Power Query Function and when I tried to invoke it against a list of values, I got a failure which I couldn’t resolve that night. Of course, as soon as I opened the spreadsheet the next day I immediately saw the problem, which I will share here, as I think it is probably going to be something people encounter frequently as they start to work with Power Query.

What the Function Did:

Here’s the setup: www.SQLSaturday.com contains a page for every SQL Saturday, and if it’s available, the schedule for the event. Atlanta’s last event was this month and was SQL Saturday #285-hence, its schedule is located at  http://sqlsaturday.com/285/schedule.aspx. Any other SQL Saturday event number works in the same manner. If I want to use Power Query to view this data as a page, I would enter that url as the source in Power Query:

After choosing Table 1 from the choices on the Navigator pane: Read the rest of this entry »



by Julie Smith

So two days ago I posted this. It’s a way to generate an SSIS Expression for use in an Incremental Load’s Conditional Split. A friend had pointed out that this pattern was not the best– as NULL handling is not always as easy as replacing the NULL with what you might consider a safe value. I also got a very thoughtful comment on the post from a lovely gentlemen expressing the same concern. So obsessed, I went back to tinkering.  I came up with ANOTHER expression (and consequently another T-SQL generator for it). I like this one a little better as it seems to me that it performs what is asked without introducing risk of replacing NULL values. So folks please read this, use it, bash it up and let me know what you think.

Here’s the new (to me, sure someone had already figured this out) NULL Handling expression for DELTA rows, using the column Color as an example:

This does NOT break the Conditional Split if there are NULLS. There can be NULLS in either the source or the destination or both and it does not break the pipeline. I love that.

How to read it from the left:

The whole expression will evaluate as TRUE and the row split into the Delta path when either the yellow portion or the Green and blue portion together evaluate to True.

The yellow highlighted expression asks: is either side NULL while the other is not? If yes, then evaluate to TRUE.

The Green highlighted section: Are both sides NOT NULL ? If yes, Then Blue highlighted section asks: are they unequal? If yes,  then evaluate to True.

If both sides are non null, yet equal, or if both sides are NULL, then the condition is not met and the row is ignored. Just like we want it to be.

Now the SQL to generate the whole expression can be datatype agnostic. I love that too. Here is the SQL to generate the whole concatenated shebang for all of your columns:

Click on the SQL below to get a copy/past version :)

Click on this to get copy/paste version.

Read the rest of this entry »



By Julie Smith

Happy 2014! Happy to report that Audrey and I were both renewed as SQL Server MVPs today!  To celebrate I’m publishing a really long blog post.

This is a MONSTER long post. The main point of this post was to give you guys some T-SQL code which can be used against the information schema view of a SQL Server table destination to spit out a complex SSIS Expression. While that was the point of the post, I felt like I also needed to provide context of what I was trying to do or what you might be doing when the need for such SQL would arise. 

also, I had Audrey read it over and she scolded me for a sad lack of chuckles. So I’m adding in some random Chuckle-y interludes. Enjoy them. Or skip them completely. Chuckle interludes are indicated by the face Chuck Norris doesn't sleep. He waits. of Chuck Norris for easy recognition.

One of the most common scenarios encountered while ETL’in is an Incremental Load –that is determining if a source row already exists in the target database (often your warehouse), and if it does, has it changed since the last time it was loaded? The pseudo code goes like this: Read the rest of this entry »



Today, I was reading a blog post.  The article was published by a group that I would consider reliable and reputable.  It was on a topic that I have passing familiarity with, and would like to be better at.  I’m not going to name the post nor the topic, because that’s not the point.  Here’s what went down…

1130 Hours: Read post.  Surprised at the absolutes declared in the article.

1135 Hours: Read it again.  Think that either I’ve missed some really basic lessons on this topic, or that maybe the article has provided some less than ideal guidance.

1209 Hours: E-mail a friend who I know is an expert on the subject.  Ask him to read it and let me know if he thinks it is right.

1300 Hours: Get response from generous and patient friend.  His e-mail (which was longer than the blog post) explains in very clear terms that there are problems with not only the guidance that the article provides, but also how some of the fundamental concepts were represented.

1420 Hours: Go find another post on the same topic by another trusted expert.  Read it and confirm for the second time that my original suspicions about the article were correct.

1430 Hours: Pat myself on the back for knowing just enough about the topic to realize that it seemed off in the first place.

1445 Hours: Go back to blog post to write comment that maybe the article could use a second look.  See that someone has already done that.  Decide not to pile on.

Now, I’ve been watching a lot of The Newsroom and House of Cards lately, both of which have characters who are journalists.  I think that reading blog posts, articles, and books is a lot like being a journalist.  One source is not enough.  If you’re hearing something new or something that contradicts what you think you know, don’t take the article at face value.  Go find a second source, and make sure that the second source didn’t use your first source as their source.  (Caveat:  I do have a list of absolutely trusted writers.  But it is my list, and is based on a lot of factors.  Okay, fine.  I’ll share one.  His name rhymes with Tall Candle.)  Also, if you’re finding conflicting advice, don’t be afraid to ask questions.  Any writer worth her salt is willing to accept some peer review.

And yes, if you’re wondering, I felt very MacKenzie McHale for all of 3 seconds. 

??????????????????????????? (photo from: http://www.fanpop.com/clubs/the-newsroom-2012/images/33579445/title/mackenzie-mchale-photo)

I imagined myself, headset on, shouting, “We’re not going live with this until we confirm a second source!”  Then I remembered that no one but me was consuming this information.  Oh well.

If you’re on the other side of the keyboard and are writing an article you plan to send out into the world, here are a few guidelines to live by:

1) Unless you are 110% sure that your guidance applies 100% of the time, don’t speak in absolutes.  “It depends” is a running joke for a reason.

2) Find someone you trust to tech edit for you.  Heck, find two people.

3) Remember that there are a lot of young database professionals out there who are reading your work in order to figure out how to do their jobs.  Don’t take that lightly.

4) If someone comments on your article and says you’re wrong, engage with them.  They’ve taken the time to read and comment on your work.  Granted, there are trolls out there, but a thoughtful comment demands a thoughtful response.  And keep an open mind.

5) If you’re preaching something that goes against conventional thought, take the time to post links to opposing views.  Help your reader make an informed choice about which advice to follow.

6) Encourage your reader to do their own research with the information you’ve provided.

7) If you discover that you’ve presented bad information, correct the article.  Own it.

All that being said, the proliferation of online resources has made us all better.  Don’t be afraid to put your research and opinions out there.  Just research, verify, and test.  And look for that second source before going live with the scoop. 

Blog on, my friends…

–Audrey



by Julie

 

I hadn’t played with Data Explorer Power Query since it became Power Query last week with the announcement of Power BI for Office 365. Since I am presenting on it at the Microsoft Integration Architects group this week, I opened it up to prep the demos. By the way, Power Query is very cool. It’s an Excel Add-in that allows an information worker to perform self service BI. With Power Pivot and Power View, it completes a self service BI trifecta of ETL, compressed storage and presentation. Now that Power BI is available as a cloud service, this not only gives users a complete stack of BI tools, but allows those tools to be more easily accessed via mobile devices.

Since Power Query is designed to be a self service tool, it focuses on ETL by way of drag and drop/gui tools. However, one of my favorite features of Power Query was that you can access and copy the underlying code (which is a language called “M” by the way), make a few slight changes to it and reuse. For instance if I needed data from several tables on the internet on different urls, but they all have the same structure, I could labor through the first page with the gui, then copy that code, paste it into notepad, change the url address of the source only and paste the new code into a new Power Query. and BAM! It’ done.

This was all accomplished via the query script icon, a cute little fella that looks like a scrolled piece of paper.

powerquery

All of the steps on the right hand column above represent pieces of work done via the gui to get the data in this format. By clicking on the query script icon, I get this:

powerqueryM

which I can modify and paste into new queries.

But back to my story of a
change between Data Explorer and Power Query. In Data Explorer, the query script icon was always there. I spent a good ten minutes one night this week in a cold panic, wondering if I had forgotten how to “work” Power Query, because I couldn’t find it after upgrading. It appears that now you have to expressly enable this feature by going to Options under Machine Settings and clicking “Enable Advanced Query Editing.” See my steps 1,2, and 3 below.

enablingScrollyThingie

So there you have it people. Happy Advanced Querying.

Shameless plugs:

Going to be in the Indianapolis area August 9th? Interested in learning the basics of how to create and load a data warehouse? Join Audrey and me for “From Here to BI: Data Warehousing with SQL Server MVPs Audrey Hammonds and Julie Smith.” We’ll also be presenting for SQL Saturday #242 main event on August 10th. Look forward to seeing you there.

Also don’t miss Audrey’s session for 24 Hours of PASS– Design Matters! The Performance Impact of Database Design



In the scramble to get hands on keyboards in a development shop, one of the most important phases of development often gets short shrift.  That phase is discovery.  I was lucky enough to spend a few weeks with one of our clients working through a formal discovery phase, and was reminded how valuable this time and effort can be.  It also got me thinking… How can any developer, regardless of project role, gather the information that normally comes out of a discovery effort?

Discovery Who? 

So what is Discovery, exactly?  Discovery is the part of development where you answer a few key and seemingly basic questions about the work you think you want to do.  These questions include:  Why? What? Who? When?  There are a few key outputs of a discovery cycle, including:

  • Problem statement – a short, concise explanation of what’s in need of improvement/enhancement/rework
  • Solution statement – a short, concise statement about what’s going to be done about the problem
  • Scope – a list of things that will be done, as well as a list of things that won’t be done
  • Assumptions – any assumptions that have been made about environment, people, technology, complexity, etc. in regards to the project
  • Risks – a list of things that can derail the project, which might include company direction, technology changes, dependent projects, or that “simple” external system that you need to integrate but haven’t had a chance to see yet
  • High-level time estimates – a high level SWAG (Stupid Wild-Ass Guess) at how long it should take to complete the work, or a time window based on other external factors
  • High-level resource needs – a high level list of people (or skill sets) and technology resources (hardware, software) that are needed to complete the project

Notably absent is one thing:  How?  That’s because the how is design, not discovery.  Resist the overwhelming urge to start designing the solution before you fully define the problem.  This is the part that I struggle with every time I’m asked to help out with the discovery phase.

A Developer’s Life

Discovery information will come at you in one of two ways:  either you’ll be asked to answer the above questions and help define the project, or someone might come to you with all or part of this information and expect you to be involved in the subsequent phases of the project.  In both cases, it is absolutely vital that you (Yes, you… the dashing figure reading this post) feel comfortable with the outputs of discovery before you start designing or developing anything.  This is known (in my head) as Sneaky Discovery.   (Credit to Julie for her invention, the Sneaky Proof of Concept… I’ve adapted the concept for Discovery) It ain’t as evil as it sounds, I promise.  Let’s talk about this one for a bit.

I know… it can be uncomfortable to question a project’s intent or setup, especially when you haven’t been asked to.  But bear with me for a moment.  I’m not asking you to put on your fedora w/optional press pass tucked into the band and annoy the ever loving crap out of the people around you while sporting your best Humphrey Bogart voice.  I’m just asking you to take 30-60 minutes and make sure that you feel good about the work you’re about to do.  Let’s look at a dramatic reenactment of a common situation:

 

Scene: John, a diligent data professional is at his desk, working on the latest product release.  Enter Manager (Mike).

Manager: Hey! Bob!  How are you?

John: It’s, um, it’s John.  Not Bob.  I’m good.  What can I do for you?

Mike:  Good, Bob, good.  Glad to hear it.  I just walked out of a Very Important Meeting with Very Important People, and I have a new set of tasks for you.  We’re going to speed up Database X.  Also, I have these 17 new reports that need to be developed… um, somewhere.  Let me check my notes… Ah, there we are!  I’ll e-mail this right over to you so you can get started.  How long do you think this will take?  Can you have it by next Friday?

John takes a deep breath, and time stops.  Enter Mysterious Announcer (MA).

MA:  Right now, John is faced with a career conundrum.  His fate is in his hands, even though he doesn’t realize it right now.  He can either nod and say “Yessir, right on that, sir” and make his manager feel temporarily happy about his decision to hire John, or he can ask a few questions that might make his manager feel a bit uncomfortable in the short term, but will save the company thousands of dollars in the long term.  Let’s see how John handles it.

Time restarts. John turns to his Manager.

John:  Gee, Mike.  That’s a lot to absorb.  I have a few questions to ask before I commit to a time estimate.  Mind if we sit down and talk through the project together?

Mike: (uncertain) Sure?  I think we can do that.

Angels sing, bells ring, beautiful and intelligent women begin to notice John’s unconventional good looks and charming, yet adorably shy, personality

 

Friends, this is your moment to shine.  When faced with a situation like this, it’s up to you to do your own version of Sneaky Discovery and make sure that you’re not diving into the shallow end head-first.

When John gets a chance to talk to his manager, here are a few basic questions he should be asking:

1)      Why are we doing this?  What is the problem we’re trying to solve here?  (Problem Statement)

2)      What’s the solution that we’re going with?  Is it the only one available?  Why did we pick this one? (Solution Statement)

3)      What are the things that we’re going to do, and how does each of them help solve the problem we’re fixing?  What are we NOT doing? (Scope)

4)      Any assumptions about technology direction, complexity, or anything else that led us to the next Friday deadline? (Assumptions)

5)      Anything I should know that can cause us to get off track? (Risks)

6)      How long did you assume this project would take?  Have you made any commitments about delivery?  (High-level time)

7)      Who else is working on this? Do we have a dedicated development environment? (High-level resources)

8)      BONUS QUESTIONS #1: Who is impacted by this change?  Is it okay if I check in with them?

9)      BONUS QUESTIONS #2: How are we testing this?  How are we deploying it?

This conversation will probably take less than an hour for a narrow scope of work.  It does two things:  1) you step into the project with more information than you had initially and, 2) you show your manager that you want to understand the work you’re doing.  Good managers appreciate this.  If you’ve a manager who doesn’t appreciate your questions and desire to understand the work you’re doing, you’ve got a lot to think about, my friend.

Now, the worst case scenario is that John’s manager can’t answer these questions.  What to do then?  In a perfect world, you pull the comically large brake and hit the giant red STOP! button.  Now, it is up to you to decide how to handle the situation.  I’ll tell you how I normally handle stuff like this, because it happens… often.

 

Manager: Um, I don’t really know why we’re doing this.  The CTO asked for it, and I’m here to get it done.  That’s why I’ve been the vice president of middle management for 15 years, damn it!

Audrey:  Okay.  I can do this work.  However, let me lay out some of the risks for you.  We don’t understand why we’re doing it, and we might make poor design decisions because we don’t see the big picture.  Also, without a clear understanding of what problem we’re solving, I don’t know how to tell you that we’re done.

Manager: Okay, works for me.

Audrey:  Can you put that in an e-mail for me?

 

If I feel likely to get burned, I put my concerns/feedback in e-mail.  If I trust my manager to hear me, I tell him in person.  The point is that I never dive into design or development until I’ve either gained broad understanding of the project or at least told someone that I’m working under less than ideal circumstances.

As long as we’re all sharing, let me tell you… we’ve all been there.  We’ve all taken the task, done the work, and delivered something without really knowing why.  It happens.  The trick is to know when you’re dealing with a tough situation and how to mitigate risk to you as well as your team.

You’ll likely find that it is up to you to help flesh out those Assumptions and Risks, and to make suggestions about Scope.  As the in the trenches guy, you’re probably aware of issues that your manager isn’t.  Don’t be shy about sharing that information.  Trust me, ‘tis better to air the dirty laundry at the beginning of the project than to grumble at the end because you “just knew” that this issue was going to come up.

An Aside on Scope

We’ve all heard of it, and if you’ve worked for any amount of time in this business, you’ve lived it.  The dreaded scope creep.  Many times, scope creep happens because discovery was left wanting.  Or, the team isn’t disciplined about sticking to the original scope.  The pain of scope creep is never totally avoided, but it can be minimized.  Foremost, make sure that you have some scope to work with.  Remember how Manager Mike said that he wanted to “speed up Database X”?  Wow, that’s a broad statement.  He also wants to do it pretty quickly.  This means that you probably can’t re-architect the entire system or purchase shiny new hardware to solve this problem.  You might end up with something like this in your scope statement:

Task:  Speed up Database X

In Scope:
1. Tune top 10 worst performing queries
2. Examine index maintenance plan and add key missing indexes
3. Archive old data that is no longer used and remove from primary database
 
Out of Scope:
1. Change logical design of the database
2. Upgrade hardware
3. Rework data access layer

On a highly regulated team, this may end up in a formal document.  If you’re like many developers and things run a little more fast and loose in your shop, there is nothing wrong with shooting off an e-mail or having a conversation about what can be done in the time you have.  You could say, “Mike, we have a week and a half to do this work.  Let’s focus on low-hanging fruit and non-invasive changes.  When we have more time, we can talk about more aggressive changes”.  (Managers love idioms like “low-hanging fruit”)  He might say, “Great!”, or he might come back with, “So if we wanted to address your out of scope items, how long would that take?”  Both are acceptable responses, and you should be ready for them.  Point is, put some boundaries around the work you’re doing, and set expectations about how much can really be done with the time and budget provided.  This allows you to be the guy who never says, “No”, but also protects you from unreasonable requirements.

Final Thoughts

One last thought on discovery within the development cycle.  As developers, we want to know everything up front.  We want details, and we want them now.  Here’s the thing… it ain’t gonna happen.  Take a deep breath, accept the unknowable, and focus your Sneaky Discovery efforts on getting some clarity on why you’re doing what you’re doing, what you’re doing (and not doing), and what might throw the project off.  The rest will shake out during design and development (and possibly testing), and that’s okay.  We’re never going to get it perfectly right, but asking a few of the right questions at the right time can get everyone comfortable with how the rest of the project is going to go.

Now, get out there, get proactive, discover some stuff, and get vocal about how your work fits into the larger picture!  I’m rooting for you!

–Audrey



We are so excited about SQL Saturday 242 on August 10 in Indianapolis, IN!  Why?  Well, glad you asked us.  We’re going to be presenting our first-ever SQL Saturday Pre-Con, and are forever grateful to the wonderful organizers of this event for giving us this opportunity.  We’ve done this training day here in Atlanta before, but there’s just something extra-special about being part of a SQL Saturday.

The Pre-Con is titled, From Here to BI: Data Warehousing from Start to Finish, and we think it is going to be a lot of fun.  We’re most excited that we get to spend a day talking about topics that we love, including database design, ETL, and all the cool concepts and tools that go along with building a data warehouse and moving data around.  Our goal is to show that SSIS isn’t scary, and neither is dimensional modeling.  With practical examples and real-world scenarios, our audience will hopefully walk away with great ideas that they can take to their own teams on Monday morning.

We were discussing all of the reasons why we’re excited about Indy, so we decided that what we really needed was a list (we’re nothing if not organized).  So, without further ado, our Top Ten Reasons Julie and Audrey are excited about presenting a Pre-Con at SQL Saturday 242:

 

10:  David Letterman and Kurt Vonnegut are both Indianapolis natives.  It happens that Julie has something of a crush on Dave.  Audrey would have a crush on Kurt, but he died in 2007 so that’s kind of creepy.  Let’s just say she’s an admirer.

9:  Audrey gets to talk about Dimensional Modeling, which is one of her favorite topics.  Rumor has it that she has a Ralph Kimball poster hanging in her closet, but no one’s been able to verify it yet.

8:  Julie gets to talk about ETL Frameworks, a key design concept when moving data, especially in complex integration environments.  The extra time allows her to dig into the topic and discuss best practices and lessons learned.

7:  If the SQL Saturday organizers (Hope Foley (B|T), Eddie Wuerch (B|T)) are any indication, Indiana is populated by the coolest people on the planet, and we can’t wait to meet them.   Plus, Audrey gets to finally mark Indiana off of her list of states to visit.

6:  We get to discuss Master Data Services (MDS) and Data Quality Services (DQS), two products that we’re actually using at client sites and want to share with the world. Julie has been doing a lot of research and writing on DQS, so expect a great discussion about the tool!

5:  Julie gets to preach the business key gospel to a whole new audience.  By now, everyone in the South Eastern U.S. has heard her story, and it is time to take it to the Midwest.  Audrey asks that someone in the audience suggest that business keys aren’t that important in ETL development.  She promises that you’ll all be entertained by Julie’s response.

4:  We get to hang out with Bill Pearson (B|T), who is presenting his own Pre-Con, called Practical Self-Service BI with PowerPivot for Excel.  We contend that he has the best hair in the SQL Server community.  Don’t believe us?  Come see for yourself!

3:  We hear there’s this thing in Indiana called a Tenderloin Sandwich.  Deep fried pork loin on a bun?  Sign these Southern Girls up!

2:  We get to talk end-to-end development, which is almost impossible to do in a one-hour session.  There are so many great topics to discuss, and we’re super excited that we get a full day to cover them!

1:  And the number one reason we’re excited about SQL Saturday 242…  Road Trip!

tl

 

So there you have it, just 10 of the myriad reasons that we’re thrilled to present a Pre-Con at SQL Saturday 242.  Sign up before July 12 to get the Early Bird price.  Hope to see you there!

–Julie and Audrey



from Julie Smith

Saturday I was at Innovative Architects with several of my colleagues prepping labs for an exciting event being held April 27th. It’s the Global Windows Azure Bootcamp and it’s a very cool idea and thing that Microsoft is doing.  There will be sessions all over the world (see below map) where Microsoft will be hosting free training on Azure.  While we are all training, we will also each deploy a processor to a Global Render Farm. Here is a cute video explaining what a render farm is:  http://www.youtube.com/watch?v=4VUWrZRCtI8 (this video is a plug for renderfarm.fi , which is a volunteer/free service, NOT affiliated with Microsoft, I just thought it offered a cute and good explanation of what a render farm is.)  Here (link) is another video showing creator Alan Smith explaining the render farm he created for Azure.

The session in Atlanta will be hosted by two of my colleagues from Innovative Architects, Vaishali Shah and Rob Bramhill.  They have devoted weeks of time to prepping the day and their hard work was very evident yesterday as we were doing final preparations.

Apparently penguins are participating from Antarctica.

Apparently penguins are participating from Antarctica.

I look forward to working more with Azure, specifically creating HDInsight clusters to apply the same type of scalability to data as this Render Farm experiment applies to processing animations.



{March 15, 2013}   A Giant Thank You, and a Link

First of all, thanks so much to everyone who supported, organized, and attended the Day of Data Warehousing fundraiser yesterday. I am humbled and honored to have had a chance to spend the day with all of you. Couldn’t have asked for a better group to spend a Thursday with!

As promised, here is the link to my slides, demos, databases, and documentation: http://sdrv.ms/16tszZZ

If you have any questions, comments, feedback, or just want to say hi, please get in touch!

Personal E-Mail: audreydhammonds@gmail.com
Work E-Mail: audrey.hammonds@innovativearchitect.com
Twitter: @DataAudrey

Model on, my friends…

–Audrey



et cetera