Published on:

One of the Ten Commandments of programmers working with data: record each change you make!

One hugely important lesson branded into me from analyzing data is the importance of step-by-step procedures. This may sound elementary, but when you start with an Excel file of data from a client, it is crucially important to keep an audit trail of each step of your transformations and calculations.

If you change the names of columns so that they are consistent with code you have already written, you should record and store each change. If you add another variable [think of a variable as a column in Excel], you need to track how you made that addition. Do likewise for any calculations, such as calculating and storing external spending per lawyer. And, by the way, comments along the way complement your efforts to be logical and measured.

Data preparation always involves learning as you go, so if you haven’t saved the steps you have taken, you create nightmares of uncertainty about the quality of your data. Or you can’t figure out how you got (or failed to get) some result later on.

I visualize data preparation as starting from the original data and then methodically molding it: cleaning it, re-arranging it, adding to it, sub-setting it, and naming it. When you save that sculpting, you can go back and confirm each step, or alter one or more of them, and be confident that the final data set gives you a consistent, accurate, and reliable platform for graphics and exploratory data analysis.