Statistics have been teased and tortured for the purposes of their producers since antiquity. So it’s not surprising that the vast majority of the metrics used for web analytics today are not actually telling you what you think they are.
Two key issues in any statistical endeavor are starting with clean data and applying the right filters to this data. These sound relatively straightforward, but things can get a bit complicated with overlapping industry jargon creating confusion. In fact, different analytics tools label a metric with a similar name, but are actually measuring a more or less fine-grained data set than the similarly named metric.
So most of the problems with confusing web analytics metrics today are not about using the wrong statistical approach or bad math, almost all of the problems result from conflating terms for metrics, making an “apples to oranges” comparison or simply because you are using bad data.
Technical Issues That Can Skew Your Web Analytics Results
Bad data is a general term for data that is not clean enough to produce accurate results when statistical analyses are applied. In most cases, bad data stems from technical issues in data collection that result in inconsistent data sets.
Some of the most common technical issues that lead to bad data include tracking code not appearing on every page, no filtering of self-referrals, referral spam, applying the wrong data filters (very common), improper use of “regular expressions”, poor data sampling, problems with DNT and Adblocks, and poor URL tagging.
It’s pretty easy to understand why not having tracking code on every page, poor data sampling or applying the wrong filters to your data can result in inaccurate web metrics, but the improper use of regular expressions (regex) is also a common issue in web analytics.
A regex is a snippet of code that lets you to locate data you are searching for, and also perform a specified action when the target data is found. Well-designed regexes can save you a great deal of time in analyzing your web data, especially with larger data sets, but a poorly thought-out regex can wreak havoc on your data accuracy, as it is not actually capturing all the words, expressions or conditions it was intended to.
One effective method to avoid the common problem of applying the wrong filters to your data is to set up an internal process for an automatic double-check of all new data filters by a second IT staff member before implementation. There is no going back after applying a new filter to your data. If you make a major mistake with a filter, you will have to rollback to a saved data version and start from scratch all over again. This can mean dozens of hours wasted reinventing the wheel.
There are no real shortcuts when it comes to web analytics. The only way to be certain that the conclusions you draw from analyzing your web data are valid is to ensure that your data is clean and that you are applying appropriate filters given the type and amount of data.
Making decisions about your business based on web analytics derived from bad data or using poorly designed data filters is a recipe for disaster. But if you take the time to make sure you have clean data, and are careful and systematic with data filtering, web analytics can provide owners and managers with the actionable information they need to make the right decisions regarding the operation of their businesses.