Massive mobile data. Location analytics. GPS data analysis. These buzzwords highlight the explosion of services and technologies that promise to provide the kinds of consumer insights that previously were not possible for many businesses.
Mobile data solutions come in a variety of forms, from raw data to pre-packaged software to custom analysis. However, the technical jargon used to describe these solutions can make it difficult for the average buyer to determine which vendors will offer the most reliable answers.
Some vendors highlight the scale of their raw datasets. While the raw datasets that feed the analysis matter, an even more important factor is the way that this data is processed. Purchasing raw mobile data alone has two disadvantages: it is both noisy and nearly impossible to verify whether the data is authentic. When shopping for data and mobile insights, buyers need to make sure that they are accessing quality location data as it relates to accurate and reliable insights.
Quality location data should undergo an extensive cleaning process that removes duplicates, null values, outlying study dates, and any other irrelevant data. More importantly, a vendor should incorporate horizontal accuracy, clusters, phone-based pings, normalization, and persistence into the data cleansing process.
If you are in the market for a mobile data analysis or software solution, consider the following factors when evaluating the cleanliness – and reliability – of the solution.
Horizontal Accuracy
During the data cleansing process, about a third of the raw mobile data collected has a horizontal accuracy value that is outside of what is considered reliable for interpretation.
What is horizontal accuracy? The horizontal accuracy is a radius of uncertainty for a location, measured in meters. In other words, a mobile data point may signify a ping with an accuracy value of 100, which means that the actual location of that person is anywhere from this point to 100 meters out.
Different study areas require different horizontal accuracy thresholds. If analyzing a university, a 100-meter accuracy value may be sufficient. However, if studying a small fast-food restaurant location, the threshold must be much smaller in order to obtain greater accuracy.
Clusters
While a ping could have a small horizontal accuracy of 5 meters, the mobile device could be in transit, i.e. a person is simply walking or driving by the study area, which means they likely did not visit the location. How do you determine whether the mobile device is in transit or visiting the location? This is where clusters come in to play.
A cluster is as it sounds – a cluster of data points. Instead of relying on one ping and its horizontal accuracy, a cluster triangulates the actual location based on multiple pings.
Phone-Based Pings
Another important factor to take into account while cleansing data is whether the source of the ping originates from a phone or a vehicle. Traffic data from GPS-enabled vehicles (vehicular volume) is generally noisy and not applicable to analysis, whereas a person’s phone is much more indicative of human behavior as humans almost always have their phone on their person. A GPS ping from a vehicle may suggest insight into the general area where a person is going. However, it does not accurately represent origin or destination.
Normalization
Volume received at the national level is not always representative of the population. The data, therefore, must be normalized for the population. Data normalization is essentially a type of process wherein data is reorganized in such a way so that it can be properly understood and used in analysis.
Persistence
Persistence asks the question – how long does a particular mobile device’s signal trail last? Signals are obtained through the apps that collect the device’s location data over time. Some signal trails can only be followed for a week or even less.
Conclusion
Cleaning data is important if you want to turn big data into actionable findings. Buxton understands this, which is why our proprietary Mobilytics technology takes into account horizontal accuracy, clusters, phone-based pings, normalization, and persistence to provide a deeper understanding into consumer behavior, such as who visitors are, where they come from, where they live and work, and much more.
Learn more about Mobilytics and how to get started with a reliable mobile data tool.