Why You Need to Avoid Dinosaurs and Learn Data Normalization
12-Feb-2020
This is a post about data normalization, and why you should understand how it works. If you have never heard this term before, don't worry - most people haven't. If you have, this is a good place to refresh your knowledge.
You see, almost all data that you use and see in your daily life, came from a database somewhere. It may have taken several steps and transformations before it got into your hands, but it started in a database.
Take your phone bill, for example. The data started in your phone company's database, was queried to just get your data for last month, joined together with all your related data (address, usage etc), and then presented to you in a nice pdf or printed format for you to read. Seems simple, right?
Well, yes.. and no. You see, when the phone company took your data, created your account, and tracked your information, they split the data into logical
So, if you just looked at one table on its own, you would never understand it! For example, the table that has your phone plan on it might only be a table with columns of ID numbers! Why?
There are many reasons why relational databases have this characteristic:
- It makes the data much more efficient because no data item is duplicated. Imagine the crazy amount of space it would take if they wrote your name on each phone call's record! Instead, they probably just use your device number on that call record. The same ID is used in a device table which has your personal account or ID number on it, which can be used to pull up your single personal record on it, with your name in all of its glory.
- If they need to update your name, they only need to do it in one place!
- It makes the data storage more efficient, because there is almost no empty space or fields in the database.
When we split up our data in tables to represent objects in the real world, we call it data normalization.
For creators, it is an art form. For professionals and budding data analysts, it is a critical bit of knowledge and skill that you should master and practice regularly, until you understand it fully. You need an understanding on this concept as you study SQL and queries.
Click here to get started reading on data normalization. I encourage you to pay attention to the examples.
The first time you deep-dive into a database of any complexity (think manufacturing, insurance) to get data to transform for analytical purposes, you'll understand why you need to understand normalization. If you're an analyst with poor understanding of normalization, these databases will eat you alive. Think Jurassic Park.
Avoid dinosaurs. Don't wait around for someone else to get your data for you. Learn normalization and SQL, get read-only access where you can from your system administrators, and practice your SQL on real databases. Learn how each one was designed. You will see that some are clunky and inefficient, but workable; others, you may find are elegant or even beautiful in their design. You will find that the latter are a joy to run analytics against.