Discovery… Not Just for Explorers

In the scramble to get hands on keyboards in a development shop, one of the most important phases of development often gets short shrift.  That phase is discovery.  I was lucky enough to spend a few weeks with one of our clients working through a formal discovery phase, and was reminded how valuable this time and effort can be.  It also got me thinking… How can any developer, regardless of project role, gather the information that normally comes out of a discovery effort?

Discovery Who? 

So what is Discovery, exactly?  Discovery is the part of development where you answer a few key and seemingly basic questions about the work you think you want to do.  These questions include:  Why? What? Who? When?  There are a few key outputs of a discovery cycle, including:

  • Problem statement – a short, concise explanation of what’s in need of improvement/enhancement/rework
  • Solution statement – a short, concise statement about what’s going to be done about the problem
  • Scope – a list of things that will be done, as well as a list of things that won’t be done
  • Assumptions – any assumptions that have been made about environment, people, technology, complexity, etc. in regards to the project
  • Risks – a list of things that can derail the project, which might include company direction, technology changes, dependent projects, or that “simple” external system that you need to integrate but haven’t had a chance to see yet
  • High-level time estimates – a high level SWAG (Stupid Wild-Ass Guess) at how long it should take to complete the work, or a time window based on other external factors
  • High-level resource needs – a high level list of people (or skill sets) and technology resources (hardware, software) that are needed to complete the project

Notably absent is one thing:  How?  That’s because the how is design, not discovery.  Resist the overwhelming urge to start designing the solution before you fully define the problem.  This is the part that I struggle with every time I’m asked to help out with the discovery phase.

A Developer’s Life

Discovery information will come at you in one of two ways:  either you’ll be asked to answer the above questions and help define the project, or someone might come to you with all or part of this information and expect you to be involved in the subsequent phases of the project.  In both cases, it is absolutely vital that you (Yes, you… the dashing figure reading this post) feel comfortable with the outputs of discovery before you start designing or developing anything.  This is known (in my head) as Sneaky Discovery.   (Credit to Julie for her invention, the Sneaky Proof of Concept… I’ve adapted the concept for Discovery) It ain’t as evil as it sounds, I promise.  Let’s talk about this one for a bit.

I know… it can be uncomfortable to question a project’s intent or setup, especially when you haven’t been asked to.  But bear with me for a moment.  I’m not asking you to put on your fedora w/optional press pass tucked into the band and annoy the ever loving crap out of the people around you while sporting your best Humphrey Bogart voice.  I’m just asking you to take 30-60 minutes and make sure that you feel good about the work you’re about to do.  Let’s look at a dramatic reenactment of a common situation:

 

Scene: John, a diligent data professional is at his desk, working on the latest product release.  Enter Manager (Mike).

Manager: Hey! Bob!  How are you?

John: It’s, um, it’s John.  Not Bob.  I’m good.  What can I do for you?

Mike:  Good, Bob, good.  Glad to hear it.  I just walked out of a Very Important Meeting with Very Important People, and I have a new set of tasks for you.  We’re going to speed up Database X.  Also, I have these 17 new reports that need to be developed… um, somewhere.  Let me check my notes… Ah, there we are!  I’ll e-mail this right over to you so you can get started.  How long do you think this will take?  Can you have it by next Friday?

John takes a deep breath, and time stops.  Enter Mysterious Announcer (MA).

MA:  Right now, John is faced with a career conundrum.  His fate is in his hands, even though he doesn’t realize it right now.  He can either nod and say “Yessir, right on that, sir” and make his manager feel temporarily happy about his decision to hire John, or he can ask a few questions that might make his manager feel a bit uncomfortable in the short term, but will save the company thousands of dollars in the long term.  Let’s see how John handles it.

Time restarts. John turns to his Manager.

John:  Gee, Mike.  That’s a lot to absorb.  I have a few questions to ask before I commit to a time estimate.  Mind if we sit down and talk through the project together?

Mike: (uncertain) Sure?  I think we can do that.

Angels sing, bells ring, beautiful and intelligent women begin to notice John’s unconventional good looks and charming, yet adorably shy, personality

 

Friends, this is your moment to shine.  When faced with a situation like this, it’s up to you to do your own version of Sneaky Discovery and make sure that you’re not diving into the shallow end head-first.

When John gets a chance to talk to his manager, here are a few basic questions he should be asking:

1)      Why are we doing this?  What is the problem we’re trying to solve here?  (Problem Statement)

2)      What’s the solution that we’re going with?  Is it the only one available?  Why did we pick this one? (Solution Statement)

3)      What are the things that we’re going to do, and how does each of them help solve the problem we’re fixing?  What are we NOT doing? (Scope)

4)      Any assumptions about technology direction, complexity, or anything else that led us to the next Friday deadline? (Assumptions)

5)      Anything I should know that can cause us to get off track? (Risks)

6)      How long did you assume this project would take?  Have you made any commitments about delivery?  (High-level time)

7)      Who else is working on this? Do we have a dedicated development environment? (High-level resources)

8)      BONUS QUESTIONS #1: Who is impacted by this change?  Is it okay if I check in with them?

9)      BONUS QUESTIONS #2: How are we testing this?  How are we deploying it?

This conversation will probably take less than an hour for a narrow scope of work.  It does two things:  1) you step into the project with more information than you had initially and, 2) you show your manager that you want to understand the work you’re doing.  Good managers appreciate this.  If you’ve a manager who doesn’t appreciate your questions and desire to understand the work you’re doing, you’ve got a lot to think about, my friend.

Now, the worst case scenario is that John’s manager can’t answer these questions.  What to do then?  In a perfect world, you pull the comically large brake and hit the giant red STOP! button.  Now, it is up to you to decide how to handle the situation.  I’ll tell you how I normally handle stuff like this, because it happens… often.

 

Manager: Um, I don’t really know why we’re doing this.  The CTO asked for it, and I’m here to get it done.  That’s why I’ve been the vice president of middle management for 15 years, damn it!

Audrey:  Okay.  I can do this work.  However, let me lay out some of the risks for you.  We don’t understand why we’re doing it, and we might make poor design decisions because we don’t see the big picture.  Also, without a clear understanding of what problem we’re solving, I don’t know how to tell you that we’re done.

Manager: Okay, works for me.

Audrey:  Can you put that in an e-mail for me?

 

If I feel likely to get burned, I put my concerns/feedback in e-mail.  If I trust my manager to hear me, I tell him in person.  The point is that I never dive into design or development until I’ve either gained broad understanding of the project or at least told someone that I’m working under less than ideal circumstances.

As long as we’re all sharing, let me tell you… we’ve all been there.  We’ve all taken the task, done the work, and delivered something without really knowing why.  It happens.  The trick is to know when you’re dealing with a tough situation and how to mitigate risk to you as well as your team.

You’ll likely find that it is up to you to help flesh out those Assumptions and Risks, and to make suggestions about Scope.  As the in the trenches guy, you’re probably aware of issues that your manager isn’t.  Don’t be shy about sharing that information.  Trust me, ‘tis better to air the dirty laundry at the beginning of the project than to grumble at the end because you “just knew” that this issue was going to come up.

An Aside on Scope

We’ve all heard of it, and if you’ve worked for any amount of time in this business, you’ve lived it.  The dreaded scope creep.  Many times, scope creep happens because discovery was left wanting.  Or, the team isn’t disciplined about sticking to the original scope.  The pain of scope creep is never totally avoided, but it can be minimized.  Foremost, make sure that you have some scope to work with.  Remember how Manager Mike said that he wanted to “speed up Database X”?  Wow, that’s a broad statement.  He also wants to do it pretty quickly.  This means that you probably can’t re-architect the entire system or purchase shiny new hardware to solve this problem.  You might end up with something like this in your scope statement:

Task:  Speed up Database X

In Scope:
1. Tune top 10 worst performing queries
2. Examine index maintenance plan and add key missing indexes
3. Archive old data that is no longer used and remove from primary database
 
Out of Scope:
1. Change logical design of the database
2. Upgrade hardware
3. Rework data access layer

On a highly regulated team, this may end up in a formal document.  If you’re like many developers and things run a little more fast and loose in your shop, there is nothing wrong with shooting off an e-mail or having a conversation about what can be done in the time you have.  You could say, “Mike, we have a week and a half to do this work.  Let’s focus on low-hanging fruit and non-invasive changes.  When we have more time, we can talk about more aggressive changes”.  (Managers love idioms like “low-hanging fruit”)  He might say, “Great!”, or he might come back with, “So if we wanted to address your out of scope items, how long would that take?”  Both are acceptable responses, and you should be ready for them.  Point is, put some boundaries around the work you’re doing, and set expectations about how much can really be done with the time and budget provided.  This allows you to be the guy who never says, “No”, but also protects you from unreasonable requirements.

Final Thoughts

One last thought on discovery within the development cycle.  As developers, we want to know everything up front.  We want details, and we want them now.  Here’s the thing… it ain’t gonna happen.  Take a deep breath, accept the unknowable, and focus your Sneaky Discovery efforts on getting some clarity on why you’re doing what you’re doing, what you’re doing (and not doing), and what might throw the project off.  The rest will shake out during design and development (and possibly testing), and that’s okay.  We’re never going to get it perfectly right, but asking a few of the right questions at the right time can get everyone comfortable with how the rest of the project is going to go.

Now, get out there, get proactive, discover some stuff, and get vocal about how your work fits into the larger picture!  I’m rooting for you!

–Audrey

Where have we been? and, Debugging SSIS Variables Part 3 — The Watch Window

So like all bloggers, Audrey and I made a resolution to blog regularly and frequently back in November of 2010.  We did well for a stretch, but we’ve had many good things happen and the blog has taken a hit.  I went to the inaugural SQL Rally two weeks ago.  I met for the first time a lot of great community people– Karla Landrum, Michael Blizzard, Mike and Karen Rhodes, Bill Graziano, Rick Heiges, Kendra Little, Jeremiah Peschka, Karen Lopez, Timothy Mitchell, Bradley Balls, Tom LaRock, Allen Kinsel,  Mark Broadbent— I know I’m forgetting lots of names.  It was wonderful to meet folks from all over the country and the world who are excited to learn about SQL Server.  I also got to hang with lots of my buddies whom I only get to see at events like these.

Then it was on to Microsoft TechEd which was hosted in our home town of Atlanta, GA.  Audrey and I spent some time at the PASS booth, encouraging folks to attend their local user group meetings.

Then there’s the book.  See Jorge’s post here on the great news about the SQL Bible for 2011.  Audrey and I will probably not be sleeping much in the next few months as we work on the Wiley SQL 2011 Bible.  It’s crazy exciting!

So if we become ever so slightly less vocal here on the ol’ Datachix blog; remember, it’s not you , it’s us!  And now on to our regularly scheduled blog post:

Debugging SSIS Variables Part 3 — The Watch Window

Continue reading

Cage Match I: Anti-Joins

Before we get started on the Cage Match, a quick shout-out to my blogging partner and fellow Datachix, Julie Smith (Blog | Twitter).  She’ll be speaking at SQL Rally (May 11-13).  I’ve seen her slides, and man, are you guys in for a treat.  Attend her session for some SSIS and Datachix-y Awesomeness!

 On to your regularly scheduled blog post…

One of the coolest things about writing and presenting is every time I present, I end up learning something new. Either someone in the audience points out something I hadn’t thought of, or a question spawns a whole new line of thinking. Both scenarios are very cool.

Recently, I presented on EXCEPT. I love this concept, because it is very “natural language-y”. It’s simple and effective, and I use it all the time. I had a few comments, mostly around performance and other patterns that can be used to achieve the same results. As a result of these comments, I decided that it was time for the inaugural Datachix Cage Match. So, without further ado, I present to you….

CAGE MATCH I: ANTI-JOINS!

Anti-Joins are used when we want a result set that represents this idea: Give me everything from one result set that doesn’t exist in another result set.

The Players:

NOT IN (The Old Man) – This is probably the first form of anti-join we all learn. It’s easy to understand, and it works. It’s developed a reputation as a resource-eater, and common wisdom is to avoid it.

SELECT Title, LastName 
FROM dbo.ImportBookONE 
WHERE Title +'--'+ LastName NOT IN  
  (SELECT Title +'--'+ LastName FROM dbo.ImportBookTWO );

LEFT JOIN (The Ninja) – This is the pattern that looks least like an Anti-Join. It’s an outer join coupled with a WHERE clause eliminating any rows that didn’t show results in the right query. It can be confusing if seen in the wild, because it isn’t immediately apparent that it’s an anti-join. It’s a favorite of savvy database developers.

SELECT o.Title, o.LastName 
FROM dbo.ImportBookONE o 
  LEFT JOIN dbo.ImportBookTWO t   
    ON o.Title = t.Title 
      AND o.LastName = t.LastName 
WHERE t.Title IS NULL   
  AND t.LastName IS NULL;

NOT EXISTS (The Existentialist) – NOT EXISTS has been around for a long time, and seems to get overlooked pretty often as an Anti-Join option. It comes right out and asks the age-old question, “Does what I’m looking for even exist”?

SELECT Title, LastName 
FROM dbo.ImportBookONE o 
WHERE NOT EXISTS   
  (SELECT t.Title, t.LastName
  FROM dbo.ImportBookTWO t
  WHERE t.Title = o.Title     
    AND t.LastName = o.LastName );

EXCEPT (The New Kid) – EXCEPT has been available since SQL Server 2005, and is a cousin to UNION. It’s the one I presented on, and is a personal favorite. (But I promise we’ll stick to facts for this cage match)

SELECT Title, LastName 
FROM dbo.ImportBookONE   
  EXCEPT 
SELECT Title, LastName 
FROM dbo.ImportBookTWO;

The Setup:

The cage match will consist of 4 rounds:

Round 1: iHeartBooks Match – I take the EXCEPT I wrote for the presentation I did and rewrite the Anti-Join using all 4 options

Round 2: Mini Match – 4 contestants, 10 and 8 row tables, respectively

Round 3: Big Match – 4 contestants, 332,500 rows in one table, 321,416 rows in the second

Round 4: Main Event Match – 4 contestants, 6,760,676 rows in one table, and 6,250,625 rows in the second, with a 2-part business key thrown in for good measure

The Measurements:

CPU Time – The amount of CPU time per query (in milliseconds)

Reads – The number of logical disk reads per query

Duration – The total amount of processing time for the query

Percent of Batch – The percentage of the batch taken up by the query (when all 4 anti-joins are run in a single batch) – this doesn’t really tell us much besides giving us rough comparative numbers, but it’s interesting so I’m keeping it

ROUND 1: iHeartBooks Match

Using the same database I presented with, I run the variations on Anti-Join. Below are the queries competing in Round 1:

LEFT JOIN NOT EXISTS
SELECT o.Title, o.LastName
FROM dbo.ImportBookONE o
LEFT JOIN dbo.ImportBookTWO t
 ON o.Title = t.Title
  AND o.LastName = t.LastName
WHERE t.Title IS NULL
 AND t.LastName IS NULL;
SELECT Title, LastName
FROM dbo.ImportBookONE o
WHERE  NOT EXISTS
(
SELECT t.Title, t.LastName
FROM dbo.ImportBookTWO t
WHERE t.Title = o.Title
AND t.LastName = o.LastName
);
EXCEPT NOT IN
SELECT Title, LastName
FROM dbo.ImportBookONE
EXCEPT
SELECT Title, LastName
FROM dbo.ImportBookTWO;
SELECT Title, LastName
FROM dbo.ImportBookONE
WHERE Title +'--'+ LastName NOT IN
(
SELECT Title +'--'+ LastName
FROM dbo.ImportBookTWO
);

OUTCOME:

CONTESTANT CPU READS DURATION Percent of Batch
LEFT JOIN 0 51 40 25
NOT EXISTS 15 51 6 25
EXCEPT 0 51 7 25
NOT IN 0 51 0 25

Hmmm… Not very decisive. Every query had the same number of logical reads? That’s no fun! We’re working with very few rows, and we really end up with no discernable difference between each contestant. On to Round 2!

ROUND 2: Mini Match

Now, we’re going to see if this logical reads thing holds with another set of queries. This time, we’ve got 2 very small tables, one with 10 dates and one with 8 dates. The Anti-Join will return 2 rows.

LEFT JOIN NOT EXISTS
SELECT m.CageMatchDate
FROM ExceptCageMatchTiny m
LEFT JOIN ExceptCageMatchTinySub s
ON m.CageMatchDate = s.CageMatchDate
WHERE s.CageMatchDate IS NULL;
SELECT CageMatchDate
FROM ExceptCageMatchTiny m
WHERE NOT EXISTS
(
SELECT CageMatchDate
FROM ExceptCageMatchTinySub s
WHERE m.CageMatchDate = s.CageMatchDate
);
EXCEPT NOT IN
SELECT CageMatchDate
FROM ExceptCageMatchTiny
EXCEPT
SELECT CageMatchDate
FROM ExceptCageMatchTinySub;
SELECT CageMatchDate
FROM ExceptCageMatchTiny
WHERE CageMatchDate NOT IN
  (SELECT CageMatchDate
  FROM ExceptCageMatchTinySub);

OUTCOME:

CONTESTANT CPU READS DURATION Percent of Batch
LEFT JOIN 0 31 5 19
NOT EXISTS 16 31 6 18
EXCEPT 0 31 7 45
NOT IN 15 31 6 18

So what have our first two rounds of the cage match taught us? Well, if you’re working with very small data sets, you probably won’t see much difference in performance. Look at the differential between overall duration – almost nothing. This is a great example of the trap that we as database developers tend to fall into. We work with small sample databases during development, and it allows us to be lazy. Every option looks similar, and we go with what’s familiar instead of what’ll scale. Watch what happens in Rounds 3 & 4. Also, a great example of why only looking at Percent of Batch in SSMS can be misleading… 45% of the batch was EXCEPT, even though CPU time registered 0 and overall Duration was only 7? If you’re interested in seeing a complete picture of what’s happening on a server when you hit F5 in SSMS, take the time to kick off Profiler.

ROUND 3: Big Match

Ah, now the pros are entering the cage! We’re working with over 300,000 rows in each of our two tables, and there might be some blood spilled (finally!).

LEFT JOIN NOT EXISTS
SELECT m.CageMatchDate
FROM ExceptCageMatchMain m
LEFT JOIN ExceptCageMatchSecondary s
ON m.CageMatchDate = s.CageMatchDate
WHERE s.CageMatchDate IS NULL;
SELECT CageMatchDate
FROM ExceptCageMatchMain m
WHERE NOT EXISTS
(
SELECT CageMatchDate
FROM ExceptCageMatchSecondary s
WHERE m.CageMatchDate = s.CageMatchDate
);
EXCEPT NOT IN
SELECT CageMatchDate
FROM ExceptCageMatchMain
EXCEPT
SELECT CageMatchDate
FROM ExceptCageMatchSecondary;
SELECT CageMatchDate
FROM ExceptCageMatchMain
WHERE CageMatchDate NOT IN
 (SELECT CageMatchDate
 FROM ExceptCageMatchSecondary)

OUTCOME:

CONTESTANT CPU READS DURATION Percent of Batch
LEFT JOIN 389 1912 436 7
NOT EXISTS 390 1912 372 7
EXCEPT 624 1912 705 12
NOT IN 1262 1001148 880 74

Ah, the Old Man’s starting to show some wear. Logical reads for NOT IN went through the roof, and if you look at the Execution Plan, you can see why:

Poor guy had to spool out to a temporary table to hold the rows. None of the other contestants ran into this, and it absolutely killed NOT IN. I think he might be on his last leg. Very interesting that LEFT JOIN, NOT EXISTS, and EXCEPT are still showing the same number of logical reads. I was surprised by this.

ROUND 4: The Main Event

Finally, the round you’ve all been waiting for! A chance to see a no holds barred showdown between each of the contestants. So far, NOT IN has fallen behind, but LEFT JOIN, NOT EXISTS, and EXCEPT have both held up well. In this round, we’re working with two tables containing well over 6 million rows each. Just to make it more challenging, we’ve set up a two-part key to evaluate. Let’s see the final queries and results:

LEFT JOIN NOT EXISTS
SELECT m.CageMatchAlpha,  m.CageMatchNumber
FROM ExceptCageMatchGinormous m
LEFT JOIN ExceptCageMatchGinormousSub s
ON m.CageMatchAlpha = s.CageMatchAlpha
AND m.CageMatchNumber = s.CageMatchNumber
WHERE s.CageMatchAlpha IS NULL
AND s.CageMatchNumber IS NULL;
SELECT CageMatchAlpha, CageMatchNumber
FROM ExceptCageMatchGinormous m
WHERE NOT EXISTS
(
SELECT CageMatchAlpha, CageMatchNumber
FROM ExceptCageMatchGinormousSub s
WHERE m.CageMatchAlpha = s.CageMatchAlpha
AND m.CageMatchNumber = s.CageMatchNumber
);
EXCEPT NOT IN
SELECT CageMatchAlpha, CageMatchNumber
FROM ExceptCageMatchGinormous
EXCEPT
SELECT CageMatchAlpha, CageMatchNumber
FROM ExceptCageMatchGinormousSub;
SELECT CageMatchAlpha, CageMatchNumber
FROM ExceptCageMatchGinormous
WHERE CageMatchAlpha+'-'
  +CAST(CageMatchNumber as varchar(10))
NOT IN
 (SELECT CageMatchAlpha+'-'
  +CAST(CageMatchNumber as varchar(10))
 FROM ExceptCageMatchGinormousSub);

OUTCOME:

CONTESTANT CPU READS DURATION Percent of Batch SSMS Run Time
LEFT JOIN 12446 30844 13029 8 0:14
NOT EXISTS 11483 30840 13917 7 0:13
EXCEPT 19456 30848 19988 10 0:20
NOT IN 60216 20363994 116674 76 1:52

Aha! Finally, we see some real difference in performance. The Old Man (NOT IN) has been knocked out cold. But, surprisingly, the other 3 contestants held up surprisingly well. EXCEPT started to show some cracks in the millions-of-rows arena and lagged behind LEFT JOIN and NOT EXISTS. I think I’m going to have to call it a tie between The Ninja and The Existentialist, with The New Kid in a close third.

This post was fun to research and fun to write. But there’s a moral to the story besides “NOT IN = Bad and LEFT JOIN = Good”. If you’re in a development environment, think about the size of your development data set versus what you expect your production data to look like. (Yeah, you heard me… Capacity Planning) Develop using best practices, and avoid lazy pitfalls (Like NOT IN). And most importantly, take the time to try different options. I mean, who hasn’t dreamed of refereeing a cage match?

Query on, my friends….

–Audrey

Five Places to Document Before You Win the Lottery

When a project starts to go sideways and you’re cursing Naive Past You who signed off on the “aggressive” time estimates as you feed quarters into the vending machine in hopes that powdered donuts and honey buns won’t give you a heart attack before you hit the deadline, the last thing you want to talk about is documentation. Non-developers, if you value your life, observe the developer and her surroundings before you ask whether the documentation is updated. If you see any of the following…

  • Pyramid of Diet Coke or Mountain Dew cans
  • Pile of takeout containers (half-eaten because there’s no time)
  • Wild, bloodshot eyes
  • Caffeine-induced shakes
  • Sleeping bag and/or pillow

 …Back away slowly. DO NOT, under any circumstances, ask the developer if the documentation is up to date. You may lose a valuable body part before you can wedge yourself behind the file cabinet.

I love the beginning of a project. It’s full of resolutions and promises about how this time it’s going to be different. A quick run-down of statements I’ve heard (and said):

  1. “We’re building time into the schedule to keep the documentation up to date”
  2. “All of my time estimates will include some buffer for documentation”
  3. “QA will not sign off without documentation”
  4. “We’ve made accurate documentation part of the critical path”
  5. “We’ll document as we go”

What ends up happening? I can sum it up in 4 words: “We’ll fix it later”. But we never do. Then we leave and the next poor soul comes in blind and realizes that the documentation is either non-existent or woefully out of date. He gets to spend the next few weeks being indignant and self-righteous because the documentation is bad. He quickly forgets the undocumented tangle of code he left behind at his last job.

So, what exactly is documentation? Is it a necessary evil, a sacrifice to the development lifecycle gods? Is it yet another bit of unproductive administration work you have to do to check all the boxes on the project plan? Is it some antiquated legacy of waterfall methods? No. It is none of these things. Documentation, when done well, is a conversation between you and the next guy who comes along and looks at your work. You might not be there to explain why you had to do that weird thing in that stored procedure. You might not be able to tell the story behind the funky flat table you wish your name wasn’t associated with. You might not be able to sit down over lunch and talk about the wacky business rules you’re trying to enforce. But, your documentation will be there. More than anything else, this is your legacy.

Am I proposing that you create exhaustive documentation explaining every decision and line of code? No. That’s totally unrealistic, and I believe that it’s what gets us into trouble. Overwhelmed by the goal of perfect documentation, we end up not writing any documentation at all. I fall into the good enough camp in my approach. Even then, it can be a struggle to keep up. In that spirit, I have a few must-haves and guidelines for what and when to document. Also, I’m sort of a function over form kind of girl. If it works, do it. If it’s pretty, well, that’s nice too. Of course, I’m a developer. If you’re a DBA, your list will probably look different. But, I recommend the same basic approach. What are my must-haves and what are my triggers for a bit of explanatory prose? Or, you know, poetry… but if you have time to write documentation in iambic pentameter, you probably need a more challenging job.

Must-Haves

Data Dictionary – If nothing else, get the basics down:

  1. Entities and their purpose
  2. Attributes and their definition
  3. Business names for entities and attributes
  4. Explanation or examples of what data can end up in a column

Source to Target Mapping – if you’re moving data around, make a map

  1. Where does the data come from?
  2. Where is it going?
  3. What are the stops along the way?

Anything Manual – if you have to do anything out of process or manually, write it down

  1. Bob from Accounting asked you to run a “quick little report” for him in 2009. Somehow, you’re still running it every Monday
  2. You manually kick off that pesky job every Thursday evening instead of scheduling it
  3. You had to (gasp!) manually update data in the database

Other Guidelines

In-Line Documentation – (Stored Procedures/Functions/Scripts/etc.)

  1. Set up a nice little template inside any procedure/function/script that tells: Author/Date/Purpose/Major Revisions
  2. If you did something unconventional inside some encapsulated code, make a note of why you went this direction
  3. If some piece of code started out pretty and then grew horns and a tail and bought a pitchfork from Evil Processes R Us, explain what happened
  4. If you’re working around some limitation of the application/database/etc., say how and why

In Case I Win the Lottery Documentation (aka, In Case I Get Hit by a Bus)

    Imagine that you walk out the door this afternoon, and for some wonderful (not getting hit by a bus) reason, you never step foot inside your office again. You never log into the network again, and no one knows how to get in touch with you. What do only you know? Trust me, if you’ve been with an organization for any length of time, I can guarantee that you have some proprietary knowledge. This can be the least formal of all. Just make a list of things you hope someone else knows how to do when you’re sitting in Tahiti sipping fruity drinks, ogling the very cute cabana boy as you ask him to bring you yet another extra towel.In our little SQL Server community, we talk a lot about being part of the community. (I know… totally Meta) Documentation is a great way to be a good citizen inside our world. Chances are someone you know, or someone who knows someone you know, will be coming into an organization after you leave. Helping the next guy ramp up more quickly and avoid your early stumbles is just about the best you can do to support your fellow database professional.

Noir SQL… Or, a Hardboiled Approach to Getting the Job Done

You can tell a lot about my state of mind by the books I’m reading. Lately, it’s Urban Fantasy with a Noir feel to it. Specifically, I’m reading Mike Carey’s Felix Castor series, and I just finished a book by Richard Kadrey called Sandman Slim: A Novel. I love the anti-hero. The protagonist who is gritty and dirty and has a few great scars is my kind of guy. He unapologetically breaks the rules and isn’t all, “it’s more about the journey than the destination.” For him, destination is what matters, no matter now you got there.

Lately, I feel a bit like the scarred anti-hero. I’m doing some things in a production environment that I’m not totally thrilled about, and I wish I could stop the line and do things the “right” way. I want to use SSIS to transform data. I want to encapsulate processes into neat, repeatable, parameterized modules. But, you know what? When there’s a same-day turnaround on a request, you make do. You go a little Noir on your T-SQL, know what I mean?

I want to show you two things that I’ve actually done in the past few weeks. No, given a nice, neat environment, this SQL might never have been written. Am I proud of it? Well, yes. Yes I am. At the end of the day, I got the customer what he needed. Was it pretty? No. I’m cool with that. Being the anti-hero is kind of fun every once in a while.

Fixed-Width Output

I needed to give a guy a text file in fixed-width format. I had a process from my predecessor that just wasn’t working. The file was already late. So here’s what I did. I’m using the AdventureWorks database to show an example.

SELECT
	LEFT((ISNULL(Title,'')+SPACE(50)), 8)+
	LEFT((ISNULL(FirstName,'')+SPACE(100)), 20)+
	LEFT((ISNULL(LastName,'')+SPACE(100)), 30)+
	LEFT((ISNULL(MiddleName,'')+SPACE(100)), 5)+
	LEFT((ISNULL(EmailAddress,'')+SPACE(100)), 35)+
	LEFT((ISNULL(Phone,'')+SPACE(100)), 25)
FROM AdventureWorks.Person.Contact	;

The result:

Paste it into Notepad and see how it looks:

I save the text file and send it on. Pour myself a whiskey, neat, and light up an unfiltered Lucky Strike.  Okay, not really, but you know what I mean. 

A quick run-down:

ISNULL: If any of the values I’m concatenating are NULL, then the entire string will come back as NULL. I wrap all of my columns in ISNULL like so:

ISNULL(Title, ‘’)

This sets the value to an empty string if the value is NULL.

SPACE: This handy little string function will pad the given number of spaces onto the result you return. I want to make sure I end up with enough padded spaces to fill out the fixed-width portion of that column. So, I pad the output:

ISNULL(Title, ‘’)+SPACE(50)

This will give me the output from the Title column, plus 50 spaces.

LEFT: Now, not every value coming out of the database is going to have the exact same number of columns. So, I use the LEFT function to trim it down to the exact length I want. LEFT will take the left-most number of characters you tell it to. If I say,

LEFT((ISNULL(Title,”)+SPACE(50)), 8 )

I’m telling it to give me characters 1-8 that are returned. Since I’ve padded my output with spaces, it’ll be the result from the column, plus as many spaces as I need to pad the output to 8.

Pretty? No. Functional? Yes. Noir SQL? Absolutely.

Remove Unwanted Characters

Next up, I have a source file I use from another department. It comes in Excel format, and includes a phone number. I’m supposed to get something that looks like this: 1112223333. Nice, neat, simple. What do I get? A hodge-podge of phone number formats. I’m looking at something like this:

CREATE TABLE PhoneNumber
(
	PhoneNumber varchar(50)
); 

INSERT INTO PhoneNumber(PhoneNumber)
VALUES
	('1112223333'), ('(111) 222-3333'), ('111-222-3333'), ('111 222 3333'); 	

SELECT PhoneNumber
FROM PhoneNumber

Okay. So I need to clean these numbers up quickly. Destination, not journey, my friends. I’m the anti-hero. I import the data into SQL Server using the Import/Export utility so I can manipulate the data. Then, I run this nifty little REPLACE statement:

SELECT PhoneNumber,
	CASE
	WHEN ISNUMERIC(PhoneNumber) = 0
		THEN REPLACE(
			REPLACE(
				REPLACE(
					REPLACE(PhoneNumber, '-', ''),			--Strip out dashes
				' ', ''),							--Strip out spaces
			')', ''),								--Strip out close parenthesis
		'(', '')									--Strip out open parenthesis
		ELSE PhoneNumber
	END as FormattedPhoneNumber
FROM dbo.PhoneNumber

Check out the results:

Sweet. It’s quick, it’s dirty, and it saved me having to wait on the source data provider to clean things up on his end. I turn the query into an UPDATE statement, and I’ve got clean data to import.  Again, a run-down of the functions:

ISNUMERIC: Tells me whether the value I’m passing is qualifies as a number or not. NOTE: It recognizes hexadecimal as a number, so use carefully. I set up a CASE statement that asks if the value is numeric. If it is, that means I don’t have any characters like “(“, “)”, or “-“ in there. If not, I apply a nested REPLACE to the value.

REPLACE: Replace is awesome. I can say something like this: REPLACE(PhoneNumber, ‘-‘, ‘’). This is saying that if I find a dash, I want to replace it with an empty string. What’s really cool is that I can nest them. So, I can tell it to remove the dashes, then the spaces, then the open parenthesis, and finally the close parenthesis in one statement.

Bottom line: Sometimes things just have to get done. The difference between an anti-hero and a true antagonist is that we anti-heroes know to go back and do things the right way as soon as we get a moment to breathe. In the meantime, don’t apologize for leaving behind a few unmarked graves when you need to get the job done. We’re anti-heroes. We have the scars to prove it.

Debugging SSIS Variables Part Two–The Locals Window

From the book of Bloggering, Chapter 59 Verse 11: Thou shalt humbly accepteth better solutions to thine posits as they appeareth in thine Comments Section.   Yea verily, although thou may fearest change, thou shalt learneth new and better coding in this manner and be pleasing in the sight of thine employer.

I had a blog post several weeks ago about debugging the values of SSIS Variables. I shared the method I have always used, which was simply (actually maybe it’s not simple at all) adding a script task into the control flow and placing a message box in the control flow, displaying the value of the variable. Continue reading

Adventures in MDX – Sets

Oy, Audrey has violated the Book of Bloggering! She failed to post on her designated Tuesday, and fellow Datachix Julie had to step in with back-to-back posts. (Reason #81 why I heart her.) In my defense, dear readers, I’m neck deep in work at a new client. And while the Book of Bloggering dictates the alternating schedule, the Book of BI Consulting Chapter 17, Verse 12 says: “Thou shalt keep thy client happy at all costs. Regardless of disgruntled users, tight deadlines, or processes in need of improvement, the BI Consultant shall deliver, and deliver well.” As much as I love to write blog posts, my Wine and Kindle budgets demand a regular paycheck.

So, with no further ado, I present to you my long-overdue post. It’ll be the second in what will hopefully become an MDX series. Today, we’ll be taking a look at Sets. Building upon the last post, Adventures in MDX – Tuples, we’re still getting a handle on the structure and concepts around querying a cube. But, I pinky-promise you; we WILL eventually begin writing some pretty darn cool queries.

As I’ve mentioned before, I’m just not good at MDX. No excuses… I’m just not. Rather than jumping in and memorizing functions and complex structures, I’m trying to train my over-saturated brain to comprehend how the data is structured, and subsequently, how to get said data out of the cube and into a result set. The first step was to understand Tuples. A quick recap: A Tuple is a data point – the intersection of all of the dimensions at a certain place. Imagine you have a nice, big, freshly baked cake. Maybe a chocolate layer cake with chocolate ganache. I’m just spit-balling… choose any flavor you like. This is our cube proxy. Anyway, stick a toothpick into that cake. The spot where the point of the toothpick stops: Tuple. It’s a single point in our cake, the intersection of eggs, flour, chocolate, sugar, butter, etc.

Now, take a knife and cut into the cake. You’ve defined a set. It’s a collection of Tuples. Cool, huh? Now, stop cutting! I have to get through a basic set before we cut (SELECT) a whole slice of data out of that decadent, delicious cake. As I mentioned before, I’m using the Microsoft SQL Server 2008 MDX Step-by-Step book as my primary resource. Much credit to these guys for their excellent tome.

The best way to illustrate a SET is to build up a SELECT statement in MDX. So, that’s just what we’re going to do.

Defining a set gives you a lot of power over the way that your result is presented to you as well as what’s included in it. In most cases, you’re going to define what’s on COLUMNS and ROWS, the first two of 128 possible axes you can define. I’d love to talk to the sadistic you-know-what at Microsoft that thought it would be funny to try to make my brain fry by encouraging me to even attempt to visualize how a result set would be presented on 128 different axes. It’s okay, though, somebody probably failed to tell him that SSMS only allows you to return two axes in a result set. If you try to define a third, PAGE, for those of you keeping track at home, you’ll get an error message instead of results. Ha! Take that, Mr. Microsoft Over-Achiever!

Anyway, let’s start building us a SELECT statement in Management Studio. First, make it as basic as possible:

SELECT
FROM [Adventure Works];

You get this:

Um, okay, that’s nice. 80 million dollars. That tells me… nothing useful. But, we have a syntactically correct MDX query, so I’m not complaining.

TANGENT:

By the way, what does that ~80 million represent? Reseller Sales Amount. Why? Because it’s the default measure for the Adventure Works cube. How do we know? Open up BIDS. Open the Analysis Services Database, Adventure Works. Open the Adventure Works cube, and go to the Cube Structure tab. Right-click on the Adventure Works cube in the Measures section (top-left corner) and select Properties. There’s a defaultmember property. It says Reseller Sales Amount. There you go.

END TANGENT

But, we can do better. Let’s define a set that will give us column headers:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
} ON COLUMNS
FROM [Adventure Works];

That thing up there in the SELECT clause? A SET! Note that it’s enclosed in curly brackets ({}). Yes, I know they’re called braces. I call them curly brackets. More descriptive. Also note that each thing between the commas is a Tuple. Therefore, Collection of Tuples! There is an important, nay, vital rule that is being followed here: When I explicitly name a dimension in my tuple, each of the tuples in the set references the same hierarchy. Now, I don’t have to define the SAME LEVEL of the hierarchy in all of my tuples. I can do something like this:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
	,([Sales Territory].[Sales Territory Country])
} ON COLUMNS
FROM [Adventure Works];

Cool. But, I can’t reference two different hierarchies from the Sales Territory dimension in one set. Check this out:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
	,([Sales Territory].[Sales Territory Region].[Northeast])
} ON COLUMNS
FROM [Adventure Works];

Ooh, error. Back to the cake analogy… This would be sort of like starting to cut into the cake, and then picking up the knife and stabbing it into another part of the cake. You wouldn’t expect a clean slice, and the same goes for the query. It just doesn’t know how to pull this data back. By the same token, I can’t reference two different hierarchies either. Really, why would you do this to your lovely chocolate ganache, anyway?

Okay, there’s more we can do with these column headers that are being returned. We can define a more detailed tuple. Maybe I want to see why people bought products in Australia. Watch this:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Quality])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Price])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Magazine Advertisement])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Review])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Manufacturer])
} ON COLUMNS
FROM [Adventure Works];

Sweet. Remember the rules from the Tuple episode? When a tuple is defined, every single dimension is actually represented in the query, even if you don’t explicitly name it. It defines the tuple members used according to the Other Three Rules*: Default Member, then (All) Members, then First Member. Before, the Sales Reason dimension was accounted for, but it was using the (All) Members rule because a Default Member isn’t defined. This time around, we’re telling the query exactly which members from the Sales Reason dimension to return, as well as which order to return them in. I could go on. I could define this tuple out to my heart’s content. BUT, there is one big rule to follow: The Set requires that the dimensions are given in the same order in every tuple. The following query will return an error:

SELECT
{
	 ([Sales Reason].[Sales Reason].[Quality], [Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Price])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Magazine Advertisement])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Review])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Manufacturer])
} ON COLUMNS
FROM [Adventure Works];

Again, this query is like stabbing your knife into the cake and expecting to come out with a beautiful slice. MDX likes clean cuts. So, it wants consistently defined tuples. Humor it.

Okay. Remember how I told you to quit after making the first cut into your cake? Go ahead, make the second cut. I’ll wait…… Oh good, you’re back. Hey, you have a little icing on your chin. Right there. No, there. To the left. There you go, got it. So, you cut twice (asked for two Sets) and ended up with a nice piece of cake (Data) didn’t you? Awesome. Let’s continue to wring the life out of this analogy and look at the MDX.

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
} ON COLUMNS
,
{
	 ([Date].[Calendar Year].[CY 2005])
	,([Date].[Calendar Year].[CY 2006])
	,([Date].[Calendar Year].[CY 2007])
} ON ROWS
FROM [Adventure Works];

Okay, so what’s this doing? Well, it’s saying, “Hey, MDX, I want you to go out and find the Reseller Sales Amount. Then, I want you to break it down for me. I want column headers that show the Countries I’ve specified. Then, I want row headers that show the years 2005 – 2007. Finally, I want the portion of the overall Reseller Sales Amount in a cell at the intersection of the Country and the Year.”

I said that we weren’t going to get into functions yet, but I do have one little thing I want to close with. The Members function. This is sort of like the “SELECT *” of MDX. You can tag a “.Members” onto the end of a [Dimension].[Hierarchy].[Level] reference (or even a [Dimension].[Hierarchy] reference) inside a tuple. I’m going to re-write the COLUMN set to return pretty much the same data, but with less carpal-tunnel syndrome.

SELECT
{
	 ([Sales Territory].[Sales Territory Country].Members)
} ON COLUMNS
,
{
	 ([Date].[Calendar Year].[CY 2005])
	,([Date].[Calendar Year].[CY 2006])
	,([Date].[Calendar Year].[CY 2007])
} ON ROWS
FROM [Adventure Works];

Check that out. It even gives us members we didn’t know to ask for, including an (All) Members summary. This function is great for a couple of reasons: 1) You don’t have to type so much. 2) If you don’t know all of the hierarchy members, you don’t have to go look them up.  And, if the members change down the road, you’re not slogging through MDX queries manually updating them. 

Alright, so maybe we’re not to awesome-ness yet, but you have to admit, not too shabby. There are about a bazillion other things you can do with these sets, and we’ll get to them. But for now, let’s take a break and enjoy the lovely piece of cake… er, data we’ve created.

Query on, my friends.

*Other Three Rules because the Three Rules are strictly reserved for references to Isaac Asimov’s I, Robot and the Foundation series. If you’ve only seen he movie, for the love of all that is good in this world, go read the books. While you’re at it, go read Starship Troopers by Robert Heinlein. That book got the short end of the movie stick too. Seriously, Denise Richards? Denise Richards?!? They should have made her shave her head to stay true to the story.

FOR XML a la Mode –a study on Modes, especially using For XML Path Mode to Concatenate Data

Hello folks, Julie here with the as promised weekly Datachix blog—

Hup Hup—did you say just say  “Julie” here?

Yes I did.

You specifically said last week that you and Audrey would blog weekly and ALTERNATELY.  You posted last week. Isn’t it Audrey’s turn?

Yes, but Audrey is fiercely battling Reporting Services at a new client site–she can’t post this week.   Also, I think there were Ninjas…. and maybe some Bear-Sharks……… she’s really really….. busy.

Fine.

 Hello folks, Julie here with the as promised weekly Datachix blog.  😉   Recently a buddy of mine left a comment for us which went like this:

 Can you post an example of putting data from multiple rows into a single column?

 For example you have this:

Cat | meow

Cat | eat

Dog | bark

Dog | fetch

 

And you want to display it as this:

Cat | meow, eat

Dog | bark, fetch

It’s hard to find a good example of this online.

Thanks, Ben

  Continue reading

Debugging SSIS Variables—and a Happy Announcement from the Datachix.

Audrey and I have decided to blog regularly and alternately on Tuesdays. We’ve each had good runs of things and had months where we posted a lot, but we’ve never had great consistency. So now we’ve committed to a schedule.

From the Book of Bloggering Chapter 23, Verse 9: The Datachix shall blog weekly. The day of the week on which they shall post shall be Tuesdays. Mondays will not be the day the postings shall occur, unless it is only to load and schedule a post for the following day–Tuesday. Neither shall Wednesdays be the day of the posting, unless there be a time zone situation where the reader is residing in Wednesday time whereas the Datachix (the said aforementioned poster/bloggers) are residing in Tuesday, and therefore posting their blog. Thursdays are right out, as are Fridays. Don’t even mention Saturdays.

Continue reading

Adventures in MDX – Tuples

Personal Note: I wrote the bulk of this post last night. Before I read Chris Webb’s blog post about the future of SSAS and MDX. I almost didn’t post this after reading that. But you know what? Screw you guys, I’m learning MDX anyway. I hate the idea that Microsoft would potentially remove aspects of the BI stack because the learning curve is too high. If I can’t keep up, then put me out of a job. Don’t dumb-down the functionality. That being said, I think there’s a lot of value in understanding the language that accesses any data store. It forces you to think about the internal structure of what you’re working with, and therefore, I see value in learning MDX either way. (But really… I was so disheartened after reading about some of the coming changes. You MS peeps had better know what you’re doing!) On to the original post:

My favorite movie is My Fair Lady. I love Audrey Hepburn. I love the Pygmalion story. Quick aside: I used to tell people that my parents named me after her. They didn’t, and the true story is convoluted. My mom loved the name Audrey Dalton (my middle name is Dalton), which was the name of a movie star. My great-great grandmother was Dalton Harris, and she thought it would be cool to name me after her and the actress. Then she met my dad. His mom’s name was Audrey. (She went by her middle name, Geraldine, which I never understood… but I digress.) Anyway, when I was born, she told everyone that I was named for my paternal grandmother and my maternal great-great grandmother. When, secretly, I was just named after an actress with a name she liked. I’m glad she told me this. (Audrey Dalton was a total hottie.)

<–  Audrey Dalton, HOTTIE (courtesy of Ballybane Enterprise Centre  http://www.bbec.ie/blog/?p=708)

This week, I’ve decided to start digging into MDX. There are three reasons for this:

1) It’s PASS Summit week. While I’m not there, I’m trying to get into the spirit of things by learning something new.
2) I’m just not good at MDX. There is no excuse for this.
3) I’m gearing up for my MCITP exam in Business Intelligence 2008. I hear rumor that there are MDX questions.

So anyway, I feel a lot like Eliza Doolittle this week. If you’re not familiar with the story, she is the subject of a bet between Henry Higgins and Colonel Pickering. They bet that Prof. Higgins can’t pass her off as a Lady (with a capital “L”) in a year. She’s just a lowly flower girl, complete with cockney accent. In order to refine her, he has to teach her how to speak again. It’s her own language, but she has to learn how to use it in a totally unfamiliar way. Instead of saying, “In ‘artford, ‘ereford, and ‘ampshire, ‘urricanes ‘ardly h-ever ‘appen”, she has to learn to say, “In Hartford, Hereford, and Hampshire, hurricanes hardly ever happen”. (Swear to cheesus, I haven’t hit IMDB yet… I really love this movie) Same words, same meaning, totally different accent.

Rather than a flower girl trying to sound like a Lady, I’m a T-SQL girl trying to sound like an MDX Lady. Or something like that. You know what I mean. 

   <– T-SQL Flower Girl

To get started, I picked up Microsoft SQL Server 2008 MDX Step by Step (by Brian C. Smith, C. Ryan Clay, and Hitachi Consulting). I’m starting with the basics, so right now I’m in “SELECT * FROM” territory. Or, “SELECT FROM ” territory, since we’re talking MDX.

Transitioning from T-SQL to MDX is not easy. The syntax is just familiar enough to me to trip me up. I keep catching myself trying to equate a query against a cube to a query against a relational data store. It’s not the same, and it has been tough for me to wrap my head around it. But, “I washed my face and ‘ands before I come, I did”, so I think I’m ready to get started.

So far, I’ve learned about one important concept: Tuples. The point of this blog post is to force myself to regurgitate what I’ve learned, because to paraphrase something Jen McCown (Blog | Twitter) said the other day, you don’t really know something until you’ve taught it. True that. Please keep reading, but also read a book by an expert. I’ve been happy with the Step by Step book so far.

Wait… one more silly analogy. Writing T-SQL is a bit like cutting out paper dolls. It can be complex, but it’s just two dimensional space. Writing MDX is like chiseling a hole into a big rock at a specific point. It’s n-dimensional space. While a bit goofy, this visualization has really helped me draw a line between T-SQL and MDX.

Tuples (as Translated by Me)

A tuple is basically the identifying characteristics of a cell inside a cube. Really, a data point inside a cube. Say I have three dimensions, Actor, Movie, and Year. Say I have a Measure Group that includes Budget Amount. Say I wanted to find the cell, or data point, identifying the budget for the movie My Fair Lady starring Audrey Hepburn that came out in 1964. I’d look at the attribute-hierarchies Audrey Hepburn, My Fair Lady, and 1964. (Hey, I didn’t say it was a well designed cube!) Those identifying characteristics of the cell are the tuple, which would be formatted something like this in MDX;

(
[Actor].[Audrey Hepburn]
,[Movie].[My Fair Lady]
,[Year].[1964]
,[Measures].[Budget Amount]
)

Another way to look at it is in terms of math. I always swore that geometry and algebra were pointless in high school. Well, Mr. Smith, you were right. I’m about to talk axes. (axises? axii?) Each of my attribue-hierarchies and my measure group make up an axis within my cube. Don’t even try to visualize a 4-dimensional cube. I did, and it made my head hurt when I ran out of 3-dimensional space. Let’s label each axis:

[Actor].[Audrey Hepburn] = a1
[Movie].[My Fair Lady] = a2
[Year].[1964] = a3
[Measures].[Budget Amount] = a4

Now, if I want to identify the point that is the intersection, my notation would look something like this: (a1, a2, a3, a4). I also imagine four lines (in 2-dimensional space, all intersecing one another at one point. That point is my tuple.

The MDX syntax for my query looks like this:

SELECT
FROM [Pretend Movie Cube]
WHERE
(
[Actor].[Audrey Hepburn]
,[Movie].[My Fair Lady]
,[Year].[1964]
,[Measures].[Budget Amount]
);

It would return one value: $17,000,000

Some Key Points:

1) Every attribute-hierarchy gets an axis, NOT every Dimension. So, if I had two attribute-hierarchies within my Actor dimension, Audrey Hepburn and Rex Harrison, they all have an axis. I could actually reference the same Dimension multiple times like so:

(
[Actor].[Audrey Hepburn]
,[Actor].[Rex Harrison] <–Henry Higgins!
,[Movie].[My Fair Lady]
,[Year].[1964]
,[Measures].[Budget Amount]
);

2) Measures each get an axis. They are treated differently at design time, but for the purposes of seeking out that one cell or set of cells, it’s treated just the same as an attribute hierarchy.

3) Analysis Services allows you to be lazy. You can define what’s called a Partial Tuple, leaving out some axis references. But… it’s going to try to figure out where on that missing axis you were headed. It’s going to go in this order:
        1 – Default member (defined at design time)
        2 – (All) member –remember that Measures don’t have an (All) member
        3 – First member

Am I getting this?  Have I missed the boat?  Close, but no cigar?  Any other cliche suggesting I don’t know what I’m talking about?