SQL University – Entity-Relationship Modeling (Part I)

Hey there! Welcome back! Or just Welcome, whichever applies. This week at SQL University (organized by SQLChicken himself, Jorge Segarra) we’re discussing data modeling. On Monday, we talked Conceptual Modeling. Today, we’re going to talk about Entity-Relationship Modeling. Now, fair warning, this is a BIG topic. People spend their entire careers working with ER Diagrams, and there is a plethora of material out there for you to peruse. Also, today I’m going to focus on OLTP, or transactional, data models. There is an entire discipline focused on OLAP, or dimensional data models (data warehousing), but I firmly believe that you’ve got to understand how to build an OLTP model before you tackle an OLAP model. I’m going to hit the high points, hopefully give you a few things to think about, and and direct you to some materials (and people) that can help round out the topic. Obviously, this isn’t a comprehensive list, but it’s a really good starting point.

Resources

Community Members:

  • Karen Lopez (B|T)
  • Louis Davidson (B|T)
  • Thomas LeBlanc (B|T)

A note: Not only do these three people write about data modeling, they also present regularly. Keep an eye on their blogs for great content and information about where you can go watch them teach in person.

User Group:

PASS Data Architecture Virtual Chapter

Books:

Case*Method: Entity Relationship Modelling by Richard Barker

This book is the one I received as course material in my very first database design class in 1997. I still use it. In fact, it’s sitting beside me right now so I can reference it as I work on this post. It’s out of print, but you can still find new and used versions for sale online. First published in 1990, it’s my personal definitive guide to purist modeling. It says Oracle on the cover, and you’re going to see COBOL code inside, but trust me… you’ll learn from this book.

Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, et al

Louis does an amazing job of teaching relational data modeling as well as couching his work in the context of SQL Server. If you’re a SQL Server professional (or want to be), pick up this book.

Data Model Patterns: Conventions of Thought by David C. Hay

Once upon a time, I used to think that all work I did had to be original. Now, I’m older and wiser. Mr. Hay provides an entire book of data model patterns for different industries and types of data. It’s a great reference to see how to do it right, and to get some ideas for your own database.

INTRODUCTION

So, what is an Entity-Relationship Diagram (ERD)? It is: A visual representation of entities, the relationships between entities, and the attributes that describe them.

At the end of the day, an ERD has three major components:

  • Entity – a thing/object that is significant enough that we need to know information about it
  • Relationship – an association between two entities
  • Attribute – something that describes the entity or the state of the entity

At this point, it would be easy to say, “Oh, I get it… tables, foreign keys, and columns!” Resist that urge. There is a lot of value in getting your mind away from technology and database platform for a while and thinking in terms of an ERD. Why? Well, your focus isn’t really on defining a database yet. I mean, we all know how this story ends (spoiler alert: with a database), but let’s let this plot thicken for a bit before we read the back page. What you’re doing at this point is defining the business. If you do it right, someone should be able to look at your ERD of Widgets, Inc. and with a little understanding of notation be able to say, “Ah, I see what they do for a living.”

Also, Entity-Relationship Diagram <> Database. Yes, what you model with an ERD will probably end up in a database, but I think that unless your database is pretty narrow in scope, you’ve probably got multiple ERD’s in a single database. Or conversely, you might have many databases that reflect a single ERD. It’s why Entity-Relationship Diagrams are referred to as “logical models”. You’re not defining the physical structure of the database; you’re defining the logical structure. The physical structure has a whole other set of factors to consider.

ENTITIES

As I mentioned above, an entity is a thing. Looking at Thesaurus.com, there are a few other synonyms for entity that apply in this context: Individual, Item, Material, Something, Stuff

And an entity doesn’t have to be a physical thing. It can be a concept or idea. It’s a NOUN. Remember that if nothing else sinks in today: entities are nouns.

As with all things, there are some guidelines you should follow. Building on the conceptual model we started with the last post, let’s start looking at an Entity-Relationship Diagram that models Country. (The nations, not the music.)

1) An entity is one thing, and that one thing should be defined by only one entity.

The entity COUNTRY is one thing, a single concept. If I follow this rule, I shouldn’t have a second entity named NATION, because that overlaps with COUNTRY.

2) An entity should have a name that represents the type of thing, not an instance of the thing.

This can be a tough one. Basically, your entity name should be high-level enough that it doesn’t limit how you can describe it later through relationships and attributes. For example, my entity is named COUNTRY, not UNITED STATES. COUNTRY is the type of thing, UNITED STATES is an instance of that thing.

3) An entity must be uniquely identifiable.

If my entity isn’t uniquely identifiable, then it isn’t a single thing. It’s multiple things, and in the world of data modeling, trying to cram two things into one thing is a bad thing. As you’re making your list of entities, ask yourself: Is there something about each occurrence of this entity that differentiates it from every other occurrence? That something is your natural key, or business key.

Aside: Next time you see my fellow Datachix Julie Smith, ask her about business keys. Then go get some popcorn and enjoy the rant. Take your time; she’ll still be going strong when you get back. 🙂

4) Entities are singular.

Why? Because they represent one thing! For some reason, there’s a lot of debate about this. Just because your entity has multiple instances of that thing – COUNTRY has United States, Canada, Mexico, etc. – doesn’t mean that you’re talking about Countries. You’re defining an entity that contains instances of COUNTRY. Thus, singular.

Notation

Entities should be represented by a rounded rectangle with the entity name in ALL CAPS inside of it. If it helps, make note of a few instances of the entity. If you know the natural key already, you can note that too.

So, let’s list some entities and occurrences related to this Country ERD:

The next step is to define the relationships between these entities.

VERY IMPORTANT CAVEAT: I’m using this subject area for illustrative purposes. My goal is to keep things as simple as possible, so I know that there are a lot of other things I might do with a Country ERD, and that there are some business rules I’ve oversimplified.

RELATIONSHIPS

What is a relationship? It is: A named, meaningful relationship between two entities.

Each relationship has two sides, and there are characteristics of those sides that need to be defined. If you can’t define the relationship, then you might not have one. It’s okay – there are always other fish in the sea.

1) Name – this is commonly known as a “verb phrase”. Where an entity is a noun, your relationships are verbs. Try to come up with meaningful verb phrases. If you’re stretching and end up with 42 relationships with verb phrases like, “is related to”, then you might be forcing it.

Let’s try one: COUNTRY and STATE/PROVINCE

Verb Phrase: A STATE/PROVINCE is contained within a COUNTRY, and a COUNTRY is made up of STATE/PROVINCE

2) Cardinality (how many) – there are three kinds of relationships: One-to-Many, Many-to-Many, and One-to-One

If we refine our verb phrase, we can add information about cardinality. In this case, we have a One-to-Many relationship between STATE/PROVINCE and COUNTRY

A STATE/PROVINCE is contained within one COUNTRY, and a COUNTRY is made up of one or more STATE/PROVINCE

3) Optionality – this tells us whether there is a mandatory relationship or not. Some entities must be related, others are optional. Let’s refine the verb phrase again:

A STATE/PROVINCE must be contained within one COUNTRY, and a COUNTRY may be made up of one or more STATE/PROVINCE

Notation

There are many different notations, but I prefer what is known as Crow’s Foot notation because it is simple and I’m comfortable with it. But, use what works for you. Just be consistent.

I like this notation because it tells you a lot about the relationship in a very concise way. Reading it can take a little getting used to, but here’s what you do: Starting with the entity and following the relationship line that comes out of it through to the related entity, say…

“[ENTITY NAME] [optionality] [name] [cardinality] [ENTITY NAME]” where optionality is denoted by solid (mandatory) or dashed (optional) line and cardinality is denoted by crow’s foot (many) or a straight-line (one) connector.

Such as, “STATE/PROVINCE must be contained within one COUNTRY”. I know it’s a lot to take in, but practice a little, and it’ll make sense. I promise.

What you’ve defined with a relationship is a business rule. If you’re lucky enough to have business rules in advance, that’ll take you a long way towards defining your relationships.

ATTRIBUTES

Let’s talk attributes. Attributes are a detail that helps to describe the entity or the state of the entity.

If I’m looking at my COUNTRY entity, I might have many ways to describe it and its state

  • Name
  • Population
  • Sovereign?
  • Established Date
  • Government Type
  • Currency
  • National Symbol
  • Language

These are potential attributes for my entity. Why do I say potential? Well, because until we start filling out this model, we’re not really sure whether they fit as attributes or not. Attributes have rules:

1) Attributes should depend on the key.

So far, I’ve defined my natural (or business) key of Country as Country Code. Each attribute needs to be examined and the question asked: Does this attribute define that key and nothing else?

2) Attributes should be atomic.

This means that an attribute serves as one descriptor of the entity. It can’t represent two things. Say I had an attribute called LanguagePercent that contained information like “English-60%” and “Spanish-40%”. This needs to be handled differently. (Come back on Friday when we talk about ERD refinement to see how)

3) Attributes should not repeat.

Taking the Language attribute, let’s think of this in real-world terms: Some countries have a single, official national language (I’m looking at you, France), but others have multiple recognized languages. Belgium has three: Dutch, French, and German. I don’t want to have to add attributes to my entity like Language1, Language2, Language3. There is a solution: LANGUAGE isn’t an attribute after all. It’s an entity, with a relationship to COUNTRY. How about that? We just refined our list of entities.

What’s really cool about this is that COUNTRY and LANGUAGE actually have a Many-to-Many relationship once you break it out and start working through your verb phrase:

“COUNTRY must communicate via one or more LANGUAGE, and a LANGUAGE may be the communication method for one or more COUNTRY”

Many-to-many relationships are valid business/real world relationships, but they don’t fly within a database. On Friday when we talk ERD refinement, we’re going to have to address this.

4) Attributes should not describe other attributes.

What if I had two attributes, National Symbol and Has National Symbol? Well, the information contained in National Symbol is going to be dependent on whether there is one. We don’t want that. Our attributes should only define the entity.

Given this, we have the start of an Entity-Relationship Diagram for Country.

We’re going to stop here for today. For my homework, I’m going to go through the rest of the entities, define the relationships between them, and then define attributes. If you’re feeling frisky, try fleshing out the model on your own. On Friday, we’ll compare notes then start the very important refinement process. This is where we take a first draft of an ERD, and see how we can improve it. Data Modeling is an iterative process. No one creates a perfect model on the first try – that’s why we’ll spend Friday learning about how to methodically refine our ERD.

Thanks for reading, and I hope to see you again on Friday.

–Audrey

SQL University – Conceptual Modeling

Welcome to SQL University’s Fall 2011 Semester week of Data Modeling.  I’m thrilled to be a part of this great endeavor again, and a giant thank you to the man behind the project, Jorge Segarra (Blog|Twitter).  I hope you enjoyed Julie Smith’s (Blog|Twitter) SSIS Deployment posts last week, and are looking forward to Stacia Misner’s (Blog|Twitter) upcoming week of MDX.  I can’t even tell you how cool it is to be scheduled between these two amazing women.

So… Data Modeling!  Well, not quite yet. I want to take a step back and talk Conceptual  Modeling. Let me explain why. Every good IT person I know is a born problem-solver. It’s one of the reasons we love what we do. We are presented with a problem and get our thrills from finding a solution. But here’s the thing… We’re so enthusiastic about finding a solution that we occasionally jump right past understanding the problem at hand and just start doing stuff. In our world, the results of solutions to undefined problems can vary from the happy accident of an appropriate fix to a snowballing A Simple Plan-esque disaster.

I can assure you, they’re contemplating where to bury the server they just destroyed. It all started with dropping that index.

Continue reading SQL University – Conceptual Modeling

SQL University – SSIS 2012 Deployments, Lesson Two — Part B

Welcome back. Today we’ll finish up this whole series by Executing packages with the SSIS Catalog in SSIS 2012

This is the conclusion of my series on Deploying and Executing Projects and Packages in SSIS 2012. If you are just joining us, you may want to read Lesson One and Lesson Two, Part A.

We’ve deployed the project to a folder in SSIS Catalog, now let’s create Environments and Environment Variables inside the SSIS Catalog:

Continue reading SQL University – SSIS 2012 Deployments, Lesson Two — Part B

SQL University – SSIS 2012 Deployments, Lesson Two — Part A

Lesson Two. Deploying Projects in SSIS 2012 (Today’s class is longer than I expected, we’ll take a break and finish tomorrow.)

Before we get started, a big thank you to Jorge Segarra (blog|Twitter) for inviting me to write for SQL University .


PreRequisite: Lesson One

By the end of this part of the lesson you should understand the following processes/terms.

  1. Building SSIS Projects
  2. Ispac files
  3. The SSIS Catalog
  4. Deploying ispac files

By the end of the second half of the lesson, the following will be covered:

  1. SSIS Catalog Environments
  2. SSIS Catalog Environment Variables.
  3. SSIS Executions
  4. Building Agent jobs with SSIS Catalog projects.

So you’ve got your Project with a Parent package. (by the way, you should designate Parent Packages as Entry Point packages inside a project by right clicking them and selecting “Entry Point Package”). You’ve got some child packages. You’ve got some variables. You’ve got project level parameters for values which should be available to all packages inside a project (like say, a directory for importing/exporting files, a client to process, a business unit to process, etc. ) and you’ve got package parameters for values which need to be passed from parent packages to child packages. You’re ready to Build, Deploy and Execute in SSIS 2012 utilizing all of its new toys features.

Oooo shiny new SSIS Project—Ready for Deployment

Continue reading SQL University – SSIS 2012 Deployments, Lesson Two — Part A

SQL University –We Could Be Heroines…. Not Just For One Day

Hi All, Julie here.  Welcome to day 2 of Women In Technology week here at SQL University.  Thanks to Jorge for sponsoring this topic again.  Jes Borland (Blog: Twitter) has gotten us off to a great start.  Tomorrow will be my blogging partner Audrey Hammonds (Blog | Twitter).  Thursday’s Professor will be Jen McCown (Blog | Twitter) and Friday’s will be Wendy Pastrick (Blog | Twitter).

It has proven to be a hard writing task for me.  On the one hand, there does still seem to be a problem with a lack of women going into I.T.   I have listened to women tell me some pretty horrific stories of workplace malfeisance, and in addition to other factors which may have caused the mistreatment, gender did seem to play a role.   And when you miss an opportunity or get squeezed out of an environment you want to be a part of, that sucks.  It hurts financially.  It wounds.

But on the other hand, in my experience, the trials and tribulations I have encountered as a result of being a female in I.T. have been entirely surmountable and minor.  This may not have been the case for me a generation ago.  For that I count my blessings.  And my saying that does not for a minute mean that my attitude is “oh you girls, stop complaining” or “I don’t believe you”.  But my point is that I do not approach the gender gaps in technology with feminist rage, nor do most of my female coworkers.   Really the most common attitude I encounter from people around this issue is bewilderment.  We can’t understand why more women don’t want to go into a field which we find very rewarding.  So for now, I am simply interested in getting the word out that careers in I.T. are great.  I love my job and I want to help and encourage anyone (male or female) else who may be interested in this field to pursue it to the best of his/her ability.   I want people to know one of my favorite things about I.T., which is that it is still largely a meritocracy, where ability  and hard work will get you very far. Continue reading SQL University –We Could Be Heroines…. Not Just For One Day