Editors Note: Today’s author is Richard Boire, the Founder of Boire Filler Group, a tenured provider of data analytics and predictive modeling in the Canadian market. Hanifin Loyalty has partnered with Boire Filler Group to provide Customer Insight Solutions to many clients and we thought readers of Loyalty Truth would enjoy reading about BFG’s methodology to manage a successful data analytics project. In this first of a two-part post, Richard details the Data Discovery process. Much of this article was published recently in Direct Magazine and is shared here in an adapted format with permission of Richard Boire and BFG.
In many of our engagements with new clients, the old Donald Rumsfeld phrase of ‘We don’t know what we don’t know” is very applicable as these organizations commence their journey into database analytics. In most of these projects, there is no clear definable objective or goal when undertaking these projects.
In fact, these companies look for outside consultation to create a roadmap of strategy and tactics on what database analytics projects they should undertake. It is often very difficult to convince these organizations of the longer term benefits of database analytics as they are focused on achieving short-term gains to resolve an immediate business need.
Since these exercises don’t always yield an immediate return on investment, the true challenge of a data discovery exercise is to strike a balance between the longer term analytics goals versus the desire for short term ROI.
One common feature between projects is the open-ended nature of the assignment. The process of exploration and discovery is the focus of the project with the goal being to build an analytical roadmap. Yet, even open-ended projects require structure in order to provide guidelines and steps which are necessary for its success.
In the experience of Boire Filler Group, this process involves four steps:
- Data Audit
- Preliminary Analysis
The preparation stage represents the effort of the analytics practitioners to increase their knowledge of the client’s current business and results. In data mining and analytics, all experts agree that analytics projects require both domain knowledge and data mining expertise in order to really optimize a given solution.
Domain knowledge is specific knowledge pertaining to the client’s business and represents knowledge which is both unique for the industry sector (finance, retail, etc.), but also that which is unique to the mechanics of how that client business runs. Of course, the domain knowledge of the practitioner will never be as exhaustive as the client, but will serve to create an adequate foundation of knowledge to continue an effective discovery exercise.
Initial tasks include conducting extensive interviews with key business stakeholders from Marketing, IT, Analytics (if it exists), Finance, and the Executive sponsor. During these meetings, key business issues and challenges are identified and an understanding of what data is available is achieved. Business reports or any other documents that provide meaningful information about the business are shared with the practitioner. At the end of this stage, a data extract is requested which consists of all the files and fields that will be required for the remainder of this project.
Data audits are a core pre-requisite to any data discovery exercise. The practitioner moves to become “intimate” with the data and defines stronger relationship with data than the standard phrase of “data knowledge”.
A data extract is requested by the practitioner, the data is loaded into their system, and standardized reports are produced that provide the following results:
- Data completeness or coverage as indicated by the number of missing values in a variable
- Assessment of how values or outcomes distribute within a given variable
- Data inconsistencies and gaps are identified. The change in values over time are documented as are groups of records where certain anomalies exist.
From these results, the quality of the data can be assessed and information selected for use in future analytics exercises. Files are often linked in attempt to create a customer view with one record per customer. With these links determined, the variable creation exercise may commence. This represents the most labor-intensive and arguably most important portion of the work in the entire discovery process. It is here that meaningful variables are created which are later used in future analytics exercises.
Besides the exhaustive reports from the data audit, a summary level report is produced showing major findings from the data audit. Gaps within the data environment are identified and improvements made clear. Some of these gaps can be filled by data overlays. Good examples of this are Stats Can data for B2C analytics or perhaps Dun and Bradstreet or Info-Canada data for B2B analytics.
A good example of how data overlays might fill a gap is if income was a key component in any analysis but simply unavailable within the current data environment. Using income at the postal area level as opposed to the individual level might be a secondary option attempting to derive insights based on income. Once this data audit exercise is completed, preliminary analysis is conducted which transitions our work into analytics.