#Meme15 – Why do you blog?

I have a confession to make… Blogging is hard. Staring at a blank page, cursor blinking in that impatient, foot-tapping way, crappy blog ideas spewing from my desperate mind like oil at Spindletop circa 1901.

Pictured: Audrey having crap ideas

“I know… I’ll blog about my cat. I’ll, um, equate my cat’s love of shiny things to my love of foreign key constraints. You know, because foreign keys are shiny. Wait? What? I’m an idiot.” So, when Jason Strate (B | T) proposed the idea of #meme15, I was all for it. Tell me what to blog about? Save me from myself? Sign me up!

This week’s questions are:

1) Why did you start blogging?

2) Why do you currently blog?

Question 1: Why did you start blogging?

Let’s go way back to January 2010. Julie Smith (B | T) and I were talking and we said, “Hey! Let’s do a blog! It’ll be fun! We can write funny stories, be irreverent, and amuse ourselves with our oodles of witty”. I’m pretty sure there was wine involved. We didn’t think much about networking or career development or even education. We just thought it would be fun to do. Some technical blogs can be, shall we say… dry. We name no names. We both like the idea of making data fun. “Hey”, we thought, “we crack ourselves up regularly. Maybe we can crack someone else up”. That’s it. Almost 2 years later, we’re still blogging. We love it. And our reasons for it have evolved. Which brings us to…

Question 2: Why do you currently blog?

Easy. I can name all my reasons in three words: Me. You. Us. Oh, you want details? Gosh, you’re demanding. Okay, twist my arm. I’ll elaborate.

1) I blog for myself.

This is an important principle to me. I blog because I enjoy it. My first rule of writing is: Amuse Yourself. If something amuses me, I like to share the fun. I think that as humans, we’re all storytellers. We want to know that there are people willing to listen to our stories, and what is a blog post but a story? Sure, it might not be Shakespeare, but it’s still my story. It could be called narcissism to say that I want people to hear what I have to say, and that’s okay. Anyone who tells you that they’re 100% altruistic is probably trying to recruit you into their cult. Don’t believe them. They’re going to make you wear burlap robes and ugly running shoes and refer to their leader as Supreme Ultimate Bob. And burlap is so 1990’s.

Trust me. I'm only thinking of you.

Point is, on the day that I don’t get a rush from hitting “Publish” on a post is the day I stop blogging.

p.s. That’s my Gavin, showing off his new winking skills. Watch out ladies!

2) I blog for you.

It’s sappy. I know. But it’s true. Absolutely NOTHING in my professional life makes me happier than having someone tell me that a blog post I wrote (or a presentation I did) helped them do their job better. The idea that something I put out there made somebody else look smart… that’s so cool.

Dramatic reenactment: Me hugging our readers

Photo courtesy of Stuart Miles http://www.freedigitalphotos.net/images/view_photog.php?photogid=2664

This is part of the evolution of blogging. When we started, I honestly didn’t think anyone was paying attention. Then, I had a few people come up to me and say that they’d read my post on Random Topic X, and that they had a question or that they’d applied my solution. It was a little scary to realize that people occasionally pay attention to my ramblings. I realized that I had a responsibility to be as correct as I knew how, and to continue to share what I learn. And it is so SATISFYING to share information. Try it. You’ll like it.

3) I blog for us.

By us, I mean the SQL Community. A community like ours requires care and feeding. I’ve benefitted in a thousand ways from this loose network of passionate people, and if I’m going to take from the community, I feel an obligation to give something back. If everyone read blog posts but no one ever wrote them, then we’d all be reading the same Microsoft Support articles about Installing SQL Server 6.5 over and over again. Boring. Blogging allows me to contribute to this awesome community while letting my Geek Flag fly.

What? I'm a geek, not an artist.

Plus, I get to have cool conversations with interesting people. Like I said, “Sign me up”.

Blog on, my friends…

–Audrey

SQL University – Entity-Relationship Modeling (Part II)

Hey there! I hope you’ve had a great Friday. Today, we’re wrapping up Data Modeling Week here at SQL University. Many thanks to Jorge Segarra (Blog|Twitter) for giving me the opportunity to cover these topics. Data Modeling is my first love, and I’m thrilled that I’ve had a chance to write about it. There is no way to cover anything but some of the big ideas in three posts, but there is a lot of information out there for you to learn from. I provided some resources in the last post, Entity-Relationship Modeling (Part I), and I think they bear repeating. Jump to the bottom of this post for recommendations on further learning.

Any-who… On Wednesday, we talked about the basics of an Entity-Relationship Diagram (ERD). Entities, Relationships, and Attributes were discussed, and we got the beginnings of an ERD about Country down on paper. If you haven’t read that post yet, you might want to jump over to it and go through that first. (Link in paragraph above) What we’ll talk about here will probably make more sense. Or not. All up to you, my friend.

So what are we talking about today? Refinement! No, not finishing school refinement. Wait… maybe we are. What we’re going to do is polish up the first cut of the ERD by walking through the entities, relationships, and attributes and identifying areas that don’t quite work. Then, we’ll talk about how to tweak those areas to make them walk with their shoulders back and their eyes up, like the prim and proper ERD elements they are.

<Soapbox 1> You will never get any model perfectly right on the first try. It’s okay. The fear of imperfection can create inertia, and that’s why a lot of modeling projects stall and eventually fail. Recognize that you don’t know everything up front, and that your data isn’t perfect (and neither are you). Just do what you can with what you’ve got. Something is almost always better than nothing. </Soapbox 1>

Between Wednesday and today, I went through the entities for the Country model and added them to the ERD. I added attributes and even some inherited information between the entities.

A note on attribute notation. I used underlines and italics to note some different things about the attributes. This isn’t formalized, but it helps me see what’s what. If I’m using a real modeling tool (which I’m not here), a lot of this metadata about the attributes will be handled by the modeling tool GUI.

But, for today, here’s the notation I’m using. I kind of like it, because I’m not dependent on a particular piece of software, but it tells me what’s going on. There’s a not-so-subtle message here: Care less about formal notation and more about consistency and readability. Do what works for you and your organization.

Here’s what I ended up with:

I’ll let you read through the model since you totally rock at reading ERD’s now. Next up, we’ll start addressing issues we see.

MANY-TO-MANY RELATIONSHIPS

The first issue we’re going to address is Many-to-Many relationships. In the real world, these kinds of relationships happen all the time. For example, COFFEE is drunk by many PEOPLE, and PEOPLE drink many cups of COFFEE. However, in a database, Many-to-Many relationships can create all kinds of problems. Trying to query data like this can be a bit of a nightmare, so we resolve the problem with what is called an Intersect Entity. It is just what it sounds like: An entity that intersects the relationship between two entities. We take that Many-to-Many relationship and turn it into two One-to-Many relationships. You know what? It’ll make more sense if I just show you:

This is actually very simple:

1) Create a new Entity, and give it a name that is the concatenation of the names of the entities you’re intersecting

2) Split up that Many-to-Many relationship into two One-to-Many relationships, with the Many side being on the Intersect Entity. Now I have two sets of verb phrases:

“LANGUAGE may be the communication method for one or more COUNTRY LANGUAGE, and COUNTRY LANGUAGE must be a type of LANGUAGE”

“COUNTRY must communicate via one or more COUNTRY LANGUAGE, and COUNTRY LANGUAGE must be the communication method for a COUNTRY”

One thing that you’ll want to make sure you adjust in the relationships: The optionality of the intersect entity needs to be mandatory, because it’s all inherited information.

3) Allow the Intersect Entity to inherit the key from the two entities. The combination of these two becomes the key for the Intersect Entity. (Note: A key made up of more than one attribute is known as a composite key)

4) Finally, add non-key attributes to the Intersect Entity. Now, this is very, very cool. Remember on Wednesday when we talked about Attribute rules? One of them was “Attributes must be atomic”. The example provided was about the percentage of people who speak a language within a country. An attribute like “English-60%” is not valid. However, we can now take advantage of that Intersect Entity, and we have a perfect place to put that information. Percentage is only one thing (atomic), and it helps to define the combination of COUNTRY and LANGUAGE.

We have one other Many-to-Many to resolve: CONTINENT and COUNTRY. Trivia Question: What countries are transcontinental? Answer: Egypt, Russia, Turkey, Kazakhstan, Azerbaijan, and Georgia. Keep that one in your back pocket for your next team trivia night. 🙂

INHERITED ATTRIBUTES

Classic Inheritance

When there is a relationship between two entities, the child entity (the one on the Many side of a One-to-Many relationship) will inherit the key of the parent entity (the one on the One side of a One-to-Many relationship). If you’re familiar with physical database implementation, these are your foreign keys. Thinking in the context of an ERD, there is a very important principle at play:

An Entity owns its data.

An entity is the one and only place for information about that particular thing. I don’t want information in more than one place. Let’s look at COUNTRY and GOVERNMENT TYPE:

I have an inherited key of Government Type Name. I also have an attribute in COUNTRY of Government Type. Because I’ve inherited the key of the GOVERNMENT TYPE entity, I don’t need the Government Type attribute any more. The advantage of allowing entities to inherit attributes like this is that I have a single place to store Government Type information, and will ensure that I don’t get variations on the descriptors later on. We’ll remove the Government Type attribute from COUNTRY.

The Aggregation Question

Another attribute that we should look at is Population. In my original ERD, I have Population in multiple entities: CONTINENT, COUNTRY, STATE/PROVINCE, and CITY. This is a tough one, and is going to require a decision on your part. Here’s what you need to ask yourself: Can the Population of a CITY aggregate up to a STATE/PROVINCE Population, then up to COUNTRY, then up to CONTINENT? Ideally, yes. In practice? Maybe not. Not only do you need to think about the principles of data modeling here, you need to think about your data quality and your business.

From a purist standpoint, we should store this attribute at the lowest level we can (CITY), and depend on the relationships between the entities to allow us to calculate Population at a higher level (STATE/PROVINCE, COUNTRY, or CONTINENT). From a pragmatic standpoint, we don’t really know if the entire population of a continent lives in defined cities. Also, are we looking for estimates or hard numbers?

The downside to storing an attribute in multiple places is that now I’m going to have to maintain that attribute multiple times. Let’s say I increase the Population of Atlanta by 5,000. Now I have to go through and increase the Population of Georgia by 5,000, the Population of the United States by 5,000, and the Population of North America by 5,000. That’s a bit of a nightmare. Conversely, what happens if I get an update to the population of North America? How do I push that data down to the lower levels? Allocation rules? Let me tell you something about allocation rules… they aren’t fun. I still get the shakes when I think of a system I worked on that had to manage financial allocations.

So, all that being said, be a purist and approach this question optimistically. If you’re not sure, assume you can define Population at the CITY level and aggregate up as needed. If someone proves you wrong, ask yourself if you really need Population at every level. Remember, you can always change it later, and use physical implementation to manage your data. Let the structure of the ERD work for you and keep things as simple as possible for as long as possible. We’re going to remove Population from STATE/PROVINCE, COUNTRY, and CONTINENT.

I could make the same argument about Area (as in the measurement width X height). But looking at the business rules, how often does the area of a continent change? Especially the “official” measurement. Same goes for COUNTRY and STATE/PROVINCE. While I could theoretically aggregate this data, I don’t see a lot of value in it. Entity-Relationship Modeling isn’t all about following checklists and rules. Sometimes you’re going to have to make judgment calls. Don’t panic. First, you’re pretty smart. Second, you can always change the model if you chose wrong.

Let’s take a quick look at the current state of our model now that we’ve made some tweaks:

Wow, she’s looking pretty good so far. Let’s see what else we can to do polish her up.

ATTRIBUTE GRANULARITY

We touched on attribute granularity a little when talking about Population and Area. We also have another, less obvious, issue. Take a look at STATE/PROVINCE. We have two attributes, Capital City and Largest City. You can definitely make the argument that these attributes help define a STATE/PROVINCE. But, sitting back and thinking about it for a while, we also have a CITY entity. We could also define a CITY as being the Capital City or the Largest City.

Plus, remember that principle we talked about earlier? Entities own their data. Which entity owns information about cities? CITY does. STATE/PROVINCE has a relationship to CITY, but it shouldn’t be in the business of defining information about cities. CITY is quite capable of doing that herself, thankyouverymuch. I want attributes to define business keys at the lowest level possible. Plus, we’ve got a potential data integrity issue here. What if someone fat-fingers the value for Capital City and types “Sacremento” instead of “Sacramento”? And what if I want to know the population of the capital of California? Now I can’t get that information from CITY, because the misspelled name doesn’t exist in my table. We’re going to move Capital City and Largest City out of STATE/PROVINCE and into CITY, and instead of having to store the name of the city again, now we just need some sort of indicator.

IS MY ATTRIBUTE REALLY AN ATTRIBUTE?

The final area of refinement we’re going to discuss is the one you’re most likely to miss. Again, don’t panic. If you miss it, remember that you can always clean it up later. What we want to do is look at each attribute and ask the question, “Is this really an attribute?”

Let’s take the Currency attribute in COUNTRY. At first glance, yeah, a country has a currency. Makes sense. But, let’s dig in a bit more.

1) Does this attribute have meaningful information I’d like to store about it?

Remember the initial definition of an Entity? A thing/object that is significant enough that we need to know information about it. What might we want to know about currency? Well, we might care about exchange rate and stability. I can’t throw Exchange Rate into COUNTRY, because it doesn’t describe the COUNTRY, it describes the CURRENCY. (Attributes should not describe other attributes) Oh, look at that… CURRENCY just got upgraded to ALL CAPS. It’s an entity now.

2) Will the values in this attribute repeat?

Another way to evaluate an attribute is to look ahead to the data that might be stored in it. Looking at Currency again, let’s think about what will end up in this attribute. Many countries have their own currency, so those values wouldn’t repeat. But… Euro-Zone. Most of Europe is now on a single currency. Do I want to store the value “Euro” multiple times in the COUNTRY entity, or do I want to store it one time in a CURRENCY entity and allow the COUNTRY entity to inherit the CURRENCY key? Naysayers will tell you that more entities make a model overly complex. I disagree. Having to manage one value across multiple instances of an entity makes a model complex.

<Soapbox 2> I believe that data modeling is 90% thinking and 10% physical labor. This can be tough for someone who likes Seeing Things Happening to understand. Give yourself time to think. Digest your model, think about it, and then let it stew for a bit. If you’re anything like me, you’ll be in the shower at 6:00 in the morning, and somewhere between shampoo and conditioner, you’ll go, “Hey! Currency isn’t an attribute!” Epiphanies are important, and fight for the time to allow them to happen. </Soapbox 2>

Now we have two good arguments for a CURRENCY entity. Let’s add it to the model:

This has turned into a pretty clean model. We’ve eliminated our Many-to-Many relationships, ensured that attributes live in the correct entities, that attributes really are attributes. I’m pretty happy with where we ended up. As I’ve said a few times, there is so much to learn about Entity-Relationship Modeling that there is no way we could cover it in a single week. But, thank you for sticking it out and reading what I have to say about one of my very favorite topics.

Now get out there and model something! It’s fun, I promise!

–Audrey

As promised: Resources!

Community Members:

  • Karen Lopez (B|T)
  • Louis Davidson (B|T)
  • Thomas LeBlanc (B|T)

A note: Not only do these three people write about data modeling, they also present regularly. Keep an eye on their blogs for great content and information about where you can go watch them teach in person.

User Group:

PASS Data Architecture Virtual Chapter

Books:

Case*Method: Entity Relationship Modelling by Richard Barker

This book is the one I received as course material in my very first database design class in 1997. I still use it. In fact, it’s sitting beside me right now so I can reference it as I work on this post. It’s out of print, but you can still find new and used versions for sale online. First published in 1990, it’s my personal definitive guide to purist modeling. It says Oracle on the cover, and you’re going to see COBOL code inside, but trust me… you’ll learn from this book.

Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, et al

Louis does an amazing job of teaching relational data modeling as well as couching his work in the context of SQL Server. If you’re a SQL Server professional (or want to be), pick up this book.

Data Model Patterns: Conventions of Thought by David C. Hay

Once upon a time, I used to think that all work I did had to be original. Now, I’m older and wiser. Mr. Hay provides an entire book of data model patterns for different industries and types of data. It’s a great reference to see how to do it right, and to get some ideas for your own database.

SQL University – Entity-Relationship Modeling (Part I)

Hey there! Welcome back! Or just Welcome, whichever applies. This week at SQL University (organized by SQLChicken himself, Jorge Segarra) we’re discussing data modeling. On Monday, we talked Conceptual Modeling. Today, we’re going to talk about Entity-Relationship Modeling. Now, fair warning, this is a BIG topic. People spend their entire careers working with ER Diagrams, and there is a plethora of material out there for you to peruse. Also, today I’m going to focus on OLTP, or transactional, data models. There is an entire discipline focused on OLAP, or dimensional data models (data warehousing), but I firmly believe that you’ve got to understand how to build an OLTP model before you tackle an OLAP model. I’m going to hit the high points, hopefully give you a few things to think about, and and direct you to some materials (and people) that can help round out the topic. Obviously, this isn’t a comprehensive list, but it’s a really good starting point.

Resources

Community Members:

  • Karen Lopez (B|T)
  • Louis Davidson (B|T)
  • Thomas LeBlanc (B|T)

A note: Not only do these three people write about data modeling, they also present regularly. Keep an eye on their blogs for great content and information about where you can go watch them teach in person.

User Group:

PASS Data Architecture Virtual Chapter

Books:

Case*Method: Entity Relationship Modelling by Richard Barker

This book is the one I received as course material in my very first database design class in 1997. I still use it. In fact, it’s sitting beside me right now so I can reference it as I work on this post. It’s out of print, but you can still find new and used versions for sale online. First published in 1990, it’s my personal definitive guide to purist modeling. It says Oracle on the cover, and you’re going to see COBOL code inside, but trust me… you’ll learn from this book.

Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, et al

Louis does an amazing job of teaching relational data modeling as well as couching his work in the context of SQL Server. If you’re a SQL Server professional (or want to be), pick up this book.

Data Model Patterns: Conventions of Thought by David C. Hay

Once upon a time, I used to think that all work I did had to be original. Now, I’m older and wiser. Mr. Hay provides an entire book of data model patterns for different industries and types of data. It’s a great reference to see how to do it right, and to get some ideas for your own database.

INTRODUCTION

So, what is an Entity-Relationship Diagram (ERD)? It is: A visual representation of entities, the relationships between entities, and the attributes that describe them.

At the end of the day, an ERD has three major components:

  • Entity – a thing/object that is significant enough that we need to know information about it
  • Relationship – an association between two entities
  • Attribute – something that describes the entity or the state of the entity

At this point, it would be easy to say, “Oh, I get it… tables, foreign keys, and columns!” Resist that urge. There is a lot of value in getting your mind away from technology and database platform for a while and thinking in terms of an ERD. Why? Well, your focus isn’t really on defining a database yet. I mean, we all know how this story ends (spoiler alert: with a database), but let’s let this plot thicken for a bit before we read the back page. What you’re doing at this point is defining the business. If you do it right, someone should be able to look at your ERD of Widgets, Inc. and with a little understanding of notation be able to say, “Ah, I see what they do for a living.”

Also, Entity-Relationship Diagram <> Database. Yes, what you model with an ERD will probably end up in a database, but I think that unless your database is pretty narrow in scope, you’ve probably got multiple ERD’s in a single database. Or conversely, you might have many databases that reflect a single ERD. It’s why Entity-Relationship Diagrams are referred to as “logical models”. You’re not defining the physical structure of the database; you’re defining the logical structure. The physical structure has a whole other set of factors to consider.

ENTITIES

As I mentioned above, an entity is a thing. Looking at Thesaurus.com, there are a few other synonyms for entity that apply in this context: Individual, Item, Material, Something, Stuff

And an entity doesn’t have to be a physical thing. It can be a concept or idea. It’s a NOUN. Remember that if nothing else sinks in today: entities are nouns.

As with all things, there are some guidelines you should follow. Building on the conceptual model we started with the last post, let’s start looking at an Entity-Relationship Diagram that models Country. (The nations, not the music.)

1) An entity is one thing, and that one thing should be defined by only one entity.

The entity COUNTRY is one thing, a single concept. If I follow this rule, I shouldn’t have a second entity named NATION, because that overlaps with COUNTRY.

2) An entity should have a name that represents the type of thing, not an instance of the thing.

This can be a tough one. Basically, your entity name should be high-level enough that it doesn’t limit how you can describe it later through relationships and attributes. For example, my entity is named COUNTRY, not UNITED STATES. COUNTRY is the type of thing, UNITED STATES is an instance of that thing.

3) An entity must be uniquely identifiable.

If my entity isn’t uniquely identifiable, then it isn’t a single thing. It’s multiple things, and in the world of data modeling, trying to cram two things into one thing is a bad thing. As you’re making your list of entities, ask yourself: Is there something about each occurrence of this entity that differentiates it from every other occurrence? That something is your natural key, or business key.

Aside: Next time you see my fellow Datachix Julie Smith, ask her about business keys. Then go get some popcorn and enjoy the rant. Take your time; she’ll still be going strong when you get back. 🙂

4) Entities are singular.

Why? Because they represent one thing! For some reason, there’s a lot of debate about this. Just because your entity has multiple instances of that thing – COUNTRY has United States, Canada, Mexico, etc. – doesn’t mean that you’re talking about Countries. You’re defining an entity that contains instances of COUNTRY. Thus, singular.

Notation

Entities should be represented by a rounded rectangle with the entity name in ALL CAPS inside of it. If it helps, make note of a few instances of the entity. If you know the natural key already, you can note that too.

So, let’s list some entities and occurrences related to this Country ERD:

The next step is to define the relationships between these entities.

VERY IMPORTANT CAVEAT: I’m using this subject area for illustrative purposes. My goal is to keep things as simple as possible, so I know that there are a lot of other things I might do with a Country ERD, and that there are some business rules I’ve oversimplified.

RELATIONSHIPS

What is a relationship? It is: A named, meaningful relationship between two entities.

Each relationship has two sides, and there are characteristics of those sides that need to be defined. If you can’t define the relationship, then you might not have one. It’s okay – there are always other fish in the sea.

1) Name – this is commonly known as a “verb phrase”. Where an entity is a noun, your relationships are verbs. Try to come up with meaningful verb phrases. If you’re stretching and end up with 42 relationships with verb phrases like, “is related to”, then you might be forcing it.

Let’s try one: COUNTRY and STATE/PROVINCE

Verb Phrase: A STATE/PROVINCE is contained within a COUNTRY, and a COUNTRY is made up of STATE/PROVINCE

2) Cardinality (how many) – there are three kinds of relationships: One-to-Many, Many-to-Many, and One-to-One

If we refine our verb phrase, we can add information about cardinality. In this case, we have a One-to-Many relationship between STATE/PROVINCE and COUNTRY

A STATE/PROVINCE is contained within one COUNTRY, and a COUNTRY is made up of one or more STATE/PROVINCE

3) Optionality – this tells us whether there is a mandatory relationship or not. Some entities must be related, others are optional. Let’s refine the verb phrase again:

A STATE/PROVINCE must be contained within one COUNTRY, and a COUNTRY may be made up of one or more STATE/PROVINCE

Notation

There are many different notations, but I prefer what is known as Crow’s Foot notation because it is simple and I’m comfortable with it. But, use what works for you. Just be consistent.

I like this notation because it tells you a lot about the relationship in a very concise way. Reading it can take a little getting used to, but here’s what you do: Starting with the entity and following the relationship line that comes out of it through to the related entity, say…

“[ENTITY NAME] [optionality] [name] [cardinality] [ENTITY NAME]” where optionality is denoted by solid (mandatory) or dashed (optional) line and cardinality is denoted by crow’s foot (many) or a straight-line (one) connector.

Such as, “STATE/PROVINCE must be contained within one COUNTRY”. I know it’s a lot to take in, but practice a little, and it’ll make sense. I promise.

What you’ve defined with a relationship is a business rule. If you’re lucky enough to have business rules in advance, that’ll take you a long way towards defining your relationships.

ATTRIBUTES

Let’s talk attributes. Attributes are a detail that helps to describe the entity or the state of the entity.

If I’m looking at my COUNTRY entity, I might have many ways to describe it and its state

  • Name
  • Population
  • Sovereign?
  • Established Date
  • Government Type
  • Currency
  • National Symbol
  • Language

These are potential attributes for my entity. Why do I say potential? Well, because until we start filling out this model, we’re not really sure whether they fit as attributes or not. Attributes have rules:

1) Attributes should depend on the key.

So far, I’ve defined my natural (or business) key of Country as Country Code. Each attribute needs to be examined and the question asked: Does this attribute define that key and nothing else?

2) Attributes should be atomic.

This means that an attribute serves as one descriptor of the entity. It can’t represent two things. Say I had an attribute called LanguagePercent that contained information like “English-60%” and “Spanish-40%”. This needs to be handled differently. (Come back on Friday when we talk about ERD refinement to see how)

3) Attributes should not repeat.

Taking the Language attribute, let’s think of this in real-world terms: Some countries have a single, official national language (I’m looking at you, France), but others have multiple recognized languages. Belgium has three: Dutch, French, and German. I don’t want to have to add attributes to my entity like Language1, Language2, Language3. There is a solution: LANGUAGE isn’t an attribute after all. It’s an entity, with a relationship to COUNTRY. How about that? We just refined our list of entities.

What’s really cool about this is that COUNTRY and LANGUAGE actually have a Many-to-Many relationship once you break it out and start working through your verb phrase:

“COUNTRY must communicate via one or more LANGUAGE, and a LANGUAGE may be the communication method for one or more COUNTRY”

Many-to-many relationships are valid business/real world relationships, but they don’t fly within a database. On Friday when we talk ERD refinement, we’re going to have to address this.

4) Attributes should not describe other attributes.

What if I had two attributes, National Symbol and Has National Symbol? Well, the information contained in National Symbol is going to be dependent on whether there is one. We don’t want that. Our attributes should only define the entity.

Given this, we have the start of an Entity-Relationship Diagram for Country.

We’re going to stop here for today. For my homework, I’m going to go through the rest of the entities, define the relationships between them, and then define attributes. If you’re feeling frisky, try fleshing out the model on your own. On Friday, we’ll compare notes then start the very important refinement process. This is where we take a first draft of an ERD, and see how we can improve it. Data Modeling is an iterative process. No one creates a perfect model on the first try – that’s why we’ll spend Friday learning about how to methodically refine our ERD.

Thanks for reading, and I hope to see you again on Friday.

–Audrey

VOTE! No Really, Go Vote. Please. (And One Personal Endorsement)

Suffrage, noun. Expression of opinion by means of a ballot. The right of suffrage (which is held to be both a privilege and a duty) means, as commonly interpreted, the right to vote for the man of another man’s choice, and is highly prized. –Ambrose Bierce

If you’ve been a member of PASS (Professional Association for SQL Server) since June 1, 2011, you’re eligible to vote for the PASS Board of Directors this December. Don’t take this lightly! Remember in 7th grade when you voted for that witch Sarah [last name redacted] for Student Council President because she gave you a glitter sticker and a scented pencil? And you ended up with the WORST HOMECOMING DANCE EVER? Yeah, um, neither do I.

Seriously, fellow PASSonians, this is kind of a big deal. If you care about your local user group, SQL Saturday, Virtual Chapters, PASS Summit, or any other awesome thing that PASS does for us, you have your Board of Directors to (partially) thank for it. (Never forget the local volunteers… Never forget.) Last year, the voter turnout was abysmal. Truly. This isn’t some bond issue, referendum-only mid-term election! These are our leaders! Let’s take an hour out of our lives to read up on the candidates, log in to the PASS website, and choose the men and women who will steer this big old ship we call Community for a while.

We’re lucky. Really and truly lucky to have such an amazing slate of candidates. I know most of them personally, and all of them by reputation, and I’ll tell you what, I’d be happy with any combination of these wonderful people as our leaders.

  • Adam Jorgensen (B|T)
  • Rob Farley (B|T)
  • Denise McInerney (B|T)
  • Geoff Hiten (B|T)
  • Kendal Van Dyke (B|T)
  • Sri Sridharan (B|T)

But, I want to shamelessly promote one candidate right now. Geoff Hiten. He’ll be getting my vote this year. Why? Well, let me tell you why.

1) Geoff gives a damn. Really, truly, honestly gives a damn. When he says that he wants to make PASS work for its members, he means it. He and I have had quite a few conversations about this topic, and every single time we talk I walk away more convinced that his motivations are sincere. He loves this community, and he wants to protect it.

2) He knows his stuff. He’s a fairly rare combination of up to speed, technically brilliant, socially aware, and community-involved. He can reconfigure your cluster for you while discussing the presentation you’re working on while simultaneously staying in the loop on the latest in Big Data. And, when you call him for help, he’s there and committed.

3) Geoff quietly promotes other community members. I know; he’s done it for me. So subtly, in fact, that sometimes I haven’t even noticed until afterwards. (Thank you, Geoff)

4) Geoff will tell you what he thinks. Bluntly. He’s not a politician, and getting it right is much more important to him than being perceived as nice. Friends, this is an asset. He cares about getting it right. On the flip side, if you think Geoff is wrong or has missed the mark, you can tell him. He’ll listen. He’ll internalize what he’s heard from you and use it to make things better.

5) And the biggest reason why I’m voting for Geoff for the PASS BoD? Because when he finds out that I wrote this blog post, he’s going to KILL ME. Seriously. He knows where I work. He might show up. He’s going to hate that I did this, and that’s why I love this guy. He’s a great friend, amazing colleague, and I’m proud to say: Vote for Geoff!

Vote on, my friends…

–Audrey

SQL University – Conceptual Modeling

Welcome to SQL University’s Fall 2011 Semester week of Data Modeling.  I’m thrilled to be a part of this great endeavor again, and a giant thank you to the man behind the project, Jorge Segarra (Blog|Twitter).  I hope you enjoyed Julie Smith’s (Blog|Twitter) SSIS Deployment posts last week, and are looking forward to Stacia Misner’s (Blog|Twitter) upcoming week of MDX.  I can’t even tell you how cool it is to be scheduled between these two amazing women.

So… Data Modeling!  Well, not quite yet. I want to take a step back and talk Conceptual  Modeling. Let me explain why. Every good IT person I know is a born problem-solver. It’s one of the reasons we love what we do. We are presented with a problem and get our thrills from finding a solution. But here’s the thing… We’re so enthusiastic about finding a solution that we occasionally jump right past understanding the problem at hand and just start doing stuff. In our world, the results of solutions to undefined problems can vary from the happy accident of an appropriate fix to a snowballing A Simple Plan-esque disaster.

I can assure you, they’re contemplating where to bury the server they just destroyed. It all started with dropping that index.

Continue reading SQL University – Conceptual Modeling

SQL University – SSIS 2012 Deployments, Lesson Two — Part B

Welcome back. Today we’ll finish up this whole series by Executing packages with the SSIS Catalog in SSIS 2012

This is the conclusion of my series on Deploying and Executing Projects and Packages in SSIS 2012. If you are just joining us, you may want to read Lesson One and Lesson Two, Part A.

We’ve deployed the project to a folder in SSIS Catalog, now let’s create Environments and Environment Variables inside the SSIS Catalog:

Continue reading SQL University – SSIS 2012 Deployments, Lesson Two — Part B

SQL University – SSIS 2012 Deployments, Lesson Two — Part A

Lesson Two. Deploying Projects in SSIS 2012 (Today’s class is longer than I expected, we’ll take a break and finish tomorrow.)

Before we get started, a big thank you to Jorge Segarra (blog|Twitter) for inviting me to write for SQL University .


PreRequisite: Lesson One

By the end of this part of the lesson you should understand the following processes/terms.

  1. Building SSIS Projects
  2. Ispac files
  3. The SSIS Catalog
  4. Deploying ispac files

By the end of the second half of the lesson, the following will be covered:

  1. SSIS Catalog Environments
  2. SSIS Catalog Environment Variables.
  3. SSIS Executions
  4. Building Agent jobs with SSIS Catalog projects.

So you’ve got your Project with a Parent package. (by the way, you should designate Parent Packages as Entry Point packages inside a project by right clicking them and selecting “Entry Point Package”). You’ve got some child packages. You’ve got some variables. You’ve got project level parameters for values which should be available to all packages inside a project (like say, a directory for importing/exporting files, a client to process, a business unit to process, etc. ) and you’ve got package parameters for values which need to be passed from parent packages to child packages. You’re ready to Build, Deploy and Execute in SSIS 2012 utilizing all of its new toys features.

Oooo shiny new SSIS Project—Ready for Deployment

Continue reading SQL University – SSIS 2012 Deployments, Lesson Two — Part A