Why Finding Uniqueness is the Key: 6 Methods to Help
Paying attention to uniqueness in your table is an exercise in mindfulness. Why? Because most professional people who have a dataset land on their desk don't take the time to truly find out what it represents. Just as I said it: they don't really know what it is. They think they know. They want to know. But do they? Often, if they made a mistake identifying it, they don't know until some consequence happens as a result.
For example, Maria is asked to give sales numbers on a recent initiative. The new initiative doesn't have sales reports yet, so she asks for, and receives, a CSV data extract of sales numbers from the IT department. She promptly opens the sheet in Excel, sums the sale amount column, and gives the result to her boss. The numbers look pretty good compared to their estimates! Later, not as much money comes in for that initiative. It turns out that Maria didn't identify the key.. further down in the recordset, there were additional entries for each sale, each with the same total. Some sales were counted twice, making the numbers look better than they actually were. They wasted a bunch of extra money with no return because of this decision.
Finding uniqueness simply means finding out what makes each row in your table unique. Once we know what that is, we have found the key for that table. Sometimes there is a key. Sometimes not. If you feed data into a tool, like an Excel chart or a Power BI visualization, without knowing what the data key is, you will have no idea what the output really means.
You might have a fancy, colourful graph, but that result is dangerous because it may not represent the truth. You or your bosses might make decisions on that information; close a deal, ramp up production, terminate staff, invest large amounts of money, or other major decisions.
So, before you do any analysis on your data, take some steps to make sure you know the data! Here are some steps to get you started:
1. Can you even find the key? If you can't find it (even after help), don't analyze that data. Try to get a better dataset first.
2. Is the key prone to problems? For example are there employee numbers with leading zeroes? Tools like Excel might truncate those so when you feed the data into Power BI. Your totals could be affected.
3. Are there duplicates in the key column? If you find duplicates in your key, you may need to perform a simple transformation before your analysis to get the right totals.
4. Do several fields make up the key? This is a composite key. Pay attention to how things like dollar amounts relate to the key. Is it Order+Line Item? Is that number the order total or order line item total (as was the case in Maria's problem above).
5. Is the dataset complete? For example, if you didn't create the dataset, how do you know it has all of the product types? Do the extract yourself (as I'll show in upcoming posts) or ask for more information about how it was extracted.
6. Muddy data? Does one field hold data for more than one field in it? Is there a pattern of how it is stored in one field? Perhaps you can split this field into two or more fields for analysis.
So, to wrap this up: Paying attention to your keys and identifying uniqueness should become habit for anyone doing data analysis! Stay tuned: I will be creating a how-to post for each of these points over the next few weeks!