As a teenager, I was notorious in my family for having the messiest of rooms. While I preferred to call it “organized chaos” rather than a mess, I cannot deny the amount of time and effort I spent (wasted) when needing to locate something in that chaos. The struggles I had in finding that missing sock, or my essay I had printed off for school, were met with the oft repeated question by my parents, “why don’t you just tidy up?”
In data analysis, the workflow can generally be thought of in the three stages of structural organization, visualization and finally modelling of the data set. However, data sets that I typically work with are what data analysts would call “messy”. Much like the time wasted searching through my childhood mess of a room, messy data sets drastically increase the amount of time and cost of structural organizational stage of the data analysis workflow.
Being able to reduce costs and find efficiencies in processes is paramount. Having “tidy” data sets not only reduces costs but will allow data analysts (consultants) to deliver results faster and with less potential for error. You might be wondering, well what is tidy data? Those tables and charts looked good in your PDF report, right? Aren’t they tidy?
Let me share a few tips on how you and your organization can save time and money by tidying up those valuable data sets that took valuable resources to collect and produce, in order to provide maximum return on that investment.
Rules for Tidy Data in Data Analysis:
That’s it, three rules: straightforward and simple. Once data is in a tidy format, then your data analyst can use powerful tools to visualize and model the data set.
Now, these suggestions might not be applicable or possible in every situation. However, as a data analyst, I can say that more than 75% of my billable time is spent on the structural organization portion of the data analysis workflow. If you are able to implement even some of what I have discussed here, it will result in considerable savings in time and resources for your organization. If you have any questions about this, or about a potential project you are looking to begin, don’t hesitate to reach out. At Chemistry Matters, we have the expertise and experience to help you get your data collected and analyzed right the first time! Tidy data can then turn into slick visuals. More to come on designing the proper visualizations to get your point across to your audiences.
References:
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10