Adventures in MDX – Sets

Oy, Audrey has violated the Book of Bloggering! She failed to post on her designated Tuesday, and fellow Datachix Julie had to step in with back-to-back posts. (Reason #81 why I heart her.) In my defense, dear readers, I’m neck deep in work at a new client. And while the Book of Bloggering dictates the alternating schedule, the Book of BI Consulting Chapter 17, Verse 12 says: “Thou shalt keep thy client happy at all costs. Regardless of disgruntled users, tight deadlines, or processes in need of improvement, the BI Consultant shall deliver, and deliver well.” As much as I love to write blog posts, my Wine and Kindle budgets demand a regular paycheck.

So, with no further ado, I present to you my long-overdue post. It’ll be the second in what will hopefully become an MDX series. Today, we’ll be taking a look at Sets. Building upon the last post, Adventures in MDX – Tuples, we’re still getting a handle on the structure and concepts around querying a cube. But, I pinky-promise you; we WILL eventually begin writing some pretty darn cool queries.

As I’ve mentioned before, I’m just not good at MDX. No excuses… I’m just not. Rather than jumping in and memorizing functions and complex structures, I’m trying to train my over-saturated brain to comprehend how the data is structured, and subsequently, how to get said data out of the cube and into a result set. The first step was to understand Tuples. A quick recap: A Tuple is a data point – the intersection of all of the dimensions at a certain place. Imagine you have a nice, big, freshly baked cake. Maybe a chocolate layer cake with chocolate ganache. I’m just spit-balling… choose any flavor you like. This is our cube proxy. Anyway, stick a toothpick into that cake. The spot where the point of the toothpick stops: Tuple. It’s a single point in our cake, the intersection of eggs, flour, chocolate, sugar, butter, etc.

Now, take a knife and cut into the cake. You’ve defined a set. It’s a collection of Tuples. Cool, huh? Now, stop cutting! I have to get through a basic set before we cut (SELECT) a whole slice of data out of that decadent, delicious cake. As I mentioned before, I’m using the Microsoft SQL Server 2008 MDX Step-by-Step book as my primary resource. Much credit to these guys for their excellent tome.

The best way to illustrate a SET is to build up a SELECT statement in MDX. So, that’s just what we’re going to do.

Defining a set gives you a lot of power over the way that your result is presented to you as well as what’s included in it. In most cases, you’re going to define what’s on COLUMNS and ROWS, the first two of 128 possible axes you can define. I’d love to talk to the sadistic you-know-what at Microsoft that thought it would be funny to try to make my brain fry by encouraging me to even attempt to visualize how a result set would be presented on 128 different axes. It’s okay, though, somebody probably failed to tell him that SSMS only allows you to return two axes in a result set. If you try to define a third, PAGE, for those of you keeping track at home, you’ll get an error message instead of results. Ha! Take that, Mr. Microsoft Over-Achiever!

Anyway, let’s start building us a SELECT statement in Management Studio. First, make it as basic as possible:

SELECT
FROM [Adventure Works];

You get this:

Um, okay, that’s nice. 80 million dollars. That tells me… nothing useful. But, we have a syntactically correct MDX query, so I’m not complaining.

TANGENT:

By the way, what does that ~80 million represent? Reseller Sales Amount. Why? Because it’s the default measure for the Adventure Works cube. How do we know? Open up BIDS. Open the Analysis Services Database, Adventure Works. Open the Adventure Works cube, and go to the Cube Structure tab. Right-click on the Adventure Works cube in the Measures section (top-left corner) and select Properties. There’s a defaultmember property. It says Reseller Sales Amount. There you go.

END TANGENT

But, we can do better. Let’s define a set that will give us column headers:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
} ON COLUMNS
FROM [Adventure Works];

That thing up there in the SELECT clause? A SET! Note that it’s enclosed in curly brackets ({}). Yes, I know they’re called braces. I call them curly brackets. More descriptive. Also note that each thing between the commas is a Tuple. Therefore, Collection of Tuples! There is an important, nay, vital rule that is being followed here: When I explicitly name a dimension in my tuple, each of the tuples in the set references the same hierarchy. Now, I don’t have to define the SAME LEVEL of the hierarchy in all of my tuples. I can do something like this:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
	,([Sales Territory].[Sales Territory Country])
} ON COLUMNS
FROM [Adventure Works];

Cool. But, I can’t reference two different hierarchies from the Sales Territory dimension in one set. Check this out:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
	,([Sales Territory].[Sales Territory Region].[Northeast])
} ON COLUMNS
FROM [Adventure Works];

Ooh, error. Back to the cake analogy… This would be sort of like starting to cut into the cake, and then picking up the knife and stabbing it into another part of the cake. You wouldn’t expect a clean slice, and the same goes for the query. It just doesn’t know how to pull this data back. By the same token, I can’t reference two different hierarchies either. Really, why would you do this to your lovely chocolate ganache, anyway?

Okay, there’s more we can do with these column headers that are being returned. We can define a more detailed tuple. Maybe I want to see why people bought products in Australia. Watch this:

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Quality])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Price])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Magazine Advertisement])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Review])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Manufacturer])
} ON COLUMNS
FROM [Adventure Works];

Sweet. Remember the rules from the Tuple episode? When a tuple is defined, every single dimension is actually represented in the query, even if you don’t explicitly name it. It defines the tuple members used according to the Other Three Rules*: Default Member, then (All) Members, then First Member. Before, the Sales Reason dimension was accounted for, but it was using the (All) Members rule because a Default Member isn’t defined. This time around, we’re telling the query exactly which members from the Sales Reason dimension to return, as well as which order to return them in. I could go on. I could define this tuple out to my heart’s content. BUT, there is one big rule to follow: The Set requires that the dimensions are given in the same order in every tuple. The following query will return an error:

SELECT
{
	 ([Sales Reason].[Sales Reason].[Quality], [Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Price])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Magazine Advertisement])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Review])
	,([Sales Territory].[Sales Territory Country].[Australia], [Sales Reason].[Sales Reason].[Manufacturer])
} ON COLUMNS
FROM [Adventure Works];

Again, this query is like stabbing your knife into the cake and expecting to come out with a beautiful slice. MDX likes clean cuts. So, it wants consistently defined tuples. Humor it.

Okay. Remember how I told you to quit after making the first cut into your cake? Go ahead, make the second cut. I’ll wait…… Oh good, you’re back. Hey, you have a little icing on your chin. Right there. No, there. To the left. There you go, got it. So, you cut twice (asked for two Sets) and ended up with a nice piece of cake (Data) didn’t you? Awesome. Let’s continue to wring the life out of this analogy and look at the MDX.

SELECT
{
	 ([Sales Territory].[Sales Territory Country].[Australia])
	,([Sales Territory].[Sales Territory Country].[Canada])
	,([Sales Territory].[Sales Territory Country].[Germany])
	,([Sales Territory].[Sales Territory Country].[United Kingdom])
	,([Sales Territory].[Sales Territory Country].[United States])
} ON COLUMNS
,
{
	 ([Date].[Calendar Year].[CY 2005])
	,([Date].[Calendar Year].[CY 2006])
	,([Date].[Calendar Year].[CY 2007])
} ON ROWS
FROM [Adventure Works];

Okay, so what’s this doing? Well, it’s saying, “Hey, MDX, I want you to go out and find the Reseller Sales Amount. Then, I want you to break it down for me. I want column headers that show the Countries I’ve specified. Then, I want row headers that show the years 2005 – 2007. Finally, I want the portion of the overall Reseller Sales Amount in a cell at the intersection of the Country and the Year.”

I said that we weren’t going to get into functions yet, but I do have one little thing I want to close with. The Members function. This is sort of like the “SELECT *” of MDX. You can tag a “.Members” onto the end of a [Dimension].[Hierarchy].[Level] reference (or even a [Dimension].[Hierarchy] reference) inside a tuple. I’m going to re-write the COLUMN set to return pretty much the same data, but with less carpal-tunnel syndrome.

SELECT
{
	 ([Sales Territory].[Sales Territory Country].Members)
} ON COLUMNS
,
{
	 ([Date].[Calendar Year].[CY 2005])
	,([Date].[Calendar Year].[CY 2006])
	,([Date].[Calendar Year].[CY 2007])
} ON ROWS
FROM [Adventure Works];

Check that out. It even gives us members we didn’t know to ask for, including an (All) Members summary. This function is great for a couple of reasons: 1) You don’t have to type so much. 2) If you don’t know all of the hierarchy members, you don’t have to go look them up.  And, if the members change down the road, you’re not slogging through MDX queries manually updating them. 

Alright, so maybe we’re not to awesome-ness yet, but you have to admit, not too shabby. There are about a bazillion other things you can do with these sets, and we’ll get to them. But for now, let’s take a break and enjoy the lovely piece of cake… er, data we’ve created.

Query on, my friends.

*Other Three Rules because the Three Rules are strictly reserved for references to Isaac Asimov’s I, Robot and the Foundation series. If you’ve only seen he movie, for the love of all that is good in this world, go read the books. While you’re at it, go read Starship Troopers by Robert Heinlein. That book got the short end of the movie stick too. Seriously, Denise Richards? Denise Richards?!? They should have made her shave her head to stay true to the story.

Leave a Reply