|Hey there! Welcome back! Or just Welcome, whichever applies. This week at SQL University (organized by SQLChicken himself, Jorge Segarra) we’re discussing data modeling. On Monday, we talked Conceptual Modeling. Today, we’re going to talk about Entity-Relationship Modeling. Now, fair warning, this is a BIG topic. People spend their entire careers working with ER Diagrams, and there is a plethora of material out there for you to peruse. Also, today I’m going to focus on OLTP, or transactional, data models. There is an entire discipline focused on OLAP, or dimensional data models (data warehousing), but I firmly believe that you’ve got to understand how to build an OLTP model before you tackle an OLAP model. I’m going to hit the high points, hopefully give you a few things to think about, and and direct you to some materials (and people) that can help round out the topic. Obviously, this isn’t a comprehensive list, but it’s a really good starting point.
- Karen Lopez (B|T)
- Louis Davidson (B|T)
- Thomas LeBlanc (B|T)
A note: Not only do these three people write about data modeling, they also present regularly. Keep an eye on their blogs for great content and information about where you can go watch them teach in person.
PASS Data Architecture Virtual Chapter
Case*Method: Entity Relationship Modelling by Richard Barker
This book is the one I received as course material in my very first database design class in 1997. I still use it. In fact, it’s sitting beside me right now so I can reference it as I work on this post. It’s out of print, but you can still find new and used versions for sale online. First published in 1990, it’s my personal definitive guide to purist modeling. It says Oracle on the cover, and you’re going to see COBOL code inside, but trust me… you’ll learn from this book.
Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, et al
Louis does an amazing job of teaching relational data modeling as well as couching his work in the context of SQL Server. If you’re a SQL Server professional (or want to be), pick up this book.
Data Model Patterns: Conventions of Thought by David C. Hay
Once upon a time, I used to think that all work I did had to be original. Now, I’m older and wiser. Mr. Hay provides an entire book of data model patterns for different industries and types of data. It’s a great reference to see how to do it right, and to get some ideas for your own database.
So, what is an Entity-Relationship Diagram (ERD)? It is: A visual representation of entities, the relationships between entities, and the attributes that describe them.
At the end of the day, an ERD has three major components:
- Entity – a thing/object that is significant enough that we need to know information about it
- Relationship – an association between two entities
- Attribute – something that describes the entity or the state of the entity
At this point, it would be easy to say, “Oh, I get it… tables, foreign keys, and columns!” Resist that urge. There is a lot of value in getting your mind away from technology and database platform for a while and thinking in terms of an ERD. Why? Well, your focus isn’t really on defining a database yet. I mean, we all know how this story ends (spoiler alert: with a database), but let’s let this plot thicken for a bit before we read the back page. What you’re doing at this point is defining the business. If you do it right, someone should be able to look at your ERD of Widgets, Inc. and with a little understanding of notation be able to say, “Ah, I see what they do for a living.”
Also, Entity-Relationship Diagram <> Database. Yes, what you model with an ERD will probably end up in a database, but I think that unless your database is pretty narrow in scope, you’ve probably got multiple ERD’s in a single database. Or conversely, you might have many databases that reflect a single ERD. It’s why Entity-Relationship Diagrams are referred to as “logical models”. You’re not defining the physical structure of the database; you’re defining the logical structure. The physical structure has a whole other set of factors to consider.
As I mentioned above, an entity is a thing. Looking at Thesaurus.com, there are a few other synonyms for entity that apply in this context: Individual, Item, Material, Something, Stuff
And an entity doesn’t have to be a physical thing. It can be a concept or idea. It’s a NOUN. Remember that if nothing else sinks in today: entities are nouns.
As with all things, there are some guidelines you should follow. Building on the conceptual model we started with the last post, let’s start looking at an Entity-Relationship Diagram that models Country. (The nations, not the music.)
1) An entity is one thing, and that one thing should be defined by only one entity.
The entity COUNTRY is one thing, a single concept. If I follow this rule, I shouldn’t have a second entity named NATION, because that overlaps with COUNTRY.
2) An entity should have a name that represents the type of thing, not an instance of the thing.
This can be a tough one. Basically, your entity name should be high-level enough that it doesn’t limit how you can describe it later through relationships and attributes. For example, my entity is named COUNTRY, not UNITED STATES. COUNTRY is the type of thing, UNITED STATES is an instance of that thing.
3) An entity must be uniquely identifiable.
If my entity isn’t uniquely identifiable, then it isn’t a single thing. It’s multiple things, and in the world of data modeling, trying to cram two things into one thing is a bad thing. As you’re making your list of entities, ask yourself: Is there something about each occurrence of this entity that differentiates it from every other occurrence? That something is your natural key, or business key.
Aside: Next time you see my fellow Datachix Julie Smith, ask her about business keys. Then go get some popcorn and enjoy the rant. Take your time; she’ll still be going strong when you get back. 🙂
4) Entities are singular.
Why? Because they represent one thing! For some reason, there’s a lot of debate about this. Just because your entity has multiple instances of that thing – COUNTRY has United States, Canada, Mexico, etc. – doesn’t mean that you’re talking about Countries. You’re defining an entity that contains instances of COUNTRY. Thus, singular.
Entities should be represented by a rounded rectangle with the entity name in ALL CAPS inside of it. If it helps, make note of a few instances of the entity. If you know the natural key already, you can note that too.
So, let’s list some entities and occurrences related to this Country ERD:
The next step is to define the relationships between these entities.
VERY IMPORTANT CAVEAT: I’m using this subject area for illustrative purposes. My goal is to keep things as simple as possible, so I know that there are a lot of other things I might do with a Country ERD, and that there are some business rules I’ve oversimplified.
What is a relationship? It is: A named, meaningful relationship between two entities.
Each relationship has two sides, and there are characteristics of those sides that need to be defined. If you can’t define the relationship, then you might not have one. It’s okay – there are always other fish in the sea.
1) Name – this is commonly known as a “verb phrase”. Where an entity is a noun, your relationships are verbs. Try to come up with meaningful verb phrases. If you’re stretching and end up with 42 relationships with verb phrases like, “is related to”, then you might be forcing it.
Let’s try one: COUNTRY and STATE/PROVINCE
Verb Phrase: A STATE/PROVINCE is contained within a COUNTRY, and a COUNTRY is made up of STATE/PROVINCE
2) Cardinality (how many) – there are three kinds of relationships: One-to-Many, Many-to-Many, and One-to-One
If we refine our verb phrase, we can add information about cardinality. In this case, we have a One-to-Many relationship between STATE/PROVINCE and COUNTRY
A STATE/PROVINCE is contained within one COUNTRY, and a COUNTRY is made up of one or more STATE/PROVINCE
3) Optionality – this tells us whether there is a mandatory relationship or not. Some entities must be related, others are optional. Let’s refine the verb phrase again:
A STATE/PROVINCE must be contained within one COUNTRY, and a COUNTRY may be made up of one or more STATE/PROVINCE
There are many different notations, but I prefer what is known as Crow’s Foot notation because it is simple and I’m comfortable with it. But, use what works for you. Just be consistent.
I like this notation because it tells you a lot about the relationship in a very concise way. Reading it can take a little getting used to, but here’s what you do: Starting with the entity and following the relationship line that comes out of it through to the related entity, say…
“[ENTITY NAME] [optionality] [name] [cardinality] [ENTITY NAME]” where optionality is denoted by solid (mandatory) or dashed (optional) line and cardinality is denoted by crow’s foot (many) or a straight-line (one) connector.
Such as, “STATE/PROVINCE must be contained within one COUNTRY”. I know it’s a lot to take in, but practice a little, and it’ll make sense. I promise.
What you’ve defined with a relationship is a business rule. If you’re lucky enough to have business rules in advance, that’ll take you a long way towards defining your relationships.
Let’s talk attributes. Attributes are a detail that helps to describe the entity or the state of the entity.
If I’m looking at my COUNTRY entity, I might have many ways to describe it and its state
- Established Date
- Government Type
- National Symbol
These are potential attributes for my entity. Why do I say potential? Well, because until we start filling out this model, we’re not really sure whether they fit as attributes or not. Attributes have rules:
1) Attributes should depend on the key.
So far, I’ve defined my natural (or business) key of Country as Country Code. Each attribute needs to be examined and the question asked: Does this attribute define that key and nothing else?
2) Attributes should be atomic.
This means that an attribute serves as one descriptor of the entity. It can’t represent two things. Say I had an attribute called LanguagePercent that contained information like “English-60%” and “Spanish-40%”. This needs to be handled differently. (Come back on Friday when we talk about ERD refinement to see how)
3) Attributes should not repeat.
Taking the Language attribute, let’s think of this in real-world terms: Some countries have a single, official national language (I’m looking at you, France), but others have multiple recognized languages. Belgium has three: Dutch, French, and German. I don’t want to have to add attributes to my entity like Language1, Language2, Language3. There is a solution: LANGUAGE isn’t an attribute after all. It’s an entity, with a relationship to COUNTRY. How about that? We just refined our list of entities.
What’s really cool about this is that COUNTRY and LANGUAGE actually have a Many-to-Many relationship once you break it out and start working through your verb phrase:
“COUNTRY must communicate via one or more LANGUAGE, and a LANGUAGE may be the communication method for one or more COUNTRY”
Many-to-many relationships are valid business/real world relationships, but they don’t fly within a database. On Friday when we talk ERD refinement, we’re going to have to address this.
4) Attributes should not describe other attributes.
What if I had two attributes, National Symbol and Has National Symbol? Well, the information contained in National Symbol is going to be dependent on whether there is one. We don’t want that. Our attributes should only define the entity.
Given this, we have the start of an Entity-Relationship Diagram for Country.
We’re going to stop here for today. For my homework, I’m going to go through the rest of the entities, define the relationships between them, and then define attributes. If you’re feeling frisky, try fleshing out the model on your own. On Friday, we’ll compare notes then start the very important refinement process. This is where we take a first draft of an ERD, and see how we can improve it. Data Modeling is an iterative process. No one creates a perfect model on the first try – that’s why we’ll spend Friday learning about how to methodically refine our ERD.
Thanks for reading, and I hope to see you again on Friday.