We've been looking at how to apply aspects of the data science process to L&D using a fictional health and social care company called XL Support. In previous posts I've discussed the first two steps in the process which are:
- Frame the problem or define the question you want to answer.
- Identify and get the data you need to answer the question.
The third step in the data science process is to process the data also know as data wrangling. Raj Bandyopadhyay whose version of the process I have been using in these posts defines data wrangling this way:
Real, raw data is rarely usable out of the box. There are errors in data collection, corrupt records, missing values and many other challenges you will have to manage. You will first need to clean the data to convert it to a form that you can further analyze.
In reality people who work in a data acience capacity spend most of their time in this step of the process. While machine learning and model building is what we often here about data science, the unexciting task of cleaning data and making it fit for analysis is really crucial and without it no machine learning can happen. The L&D team at XL has identified fourdata sources they want to use to answer their question and they are:
- Details of face-to-face compliance training from the organisation's Compliance Training Tracker dashboard.
- Support needs of all the people XL supports and cares for. These support needs will be used to identify what staff need to know and be able to do. As it stands, the support needs are in 120 separate spreadsheets with messy data entered in different formats.
- Return data from an IT skills need survey administerd through survey monkey.
- A list of key leadership and management behaviours and expected outcomes that the leadership team believes should form the basis of the organisation's leadership development programme for first line managers.
As you can see, all these data are in different formats that need to be brought together to answer the question of:
What learning and development needs must be met to provide all teams with the capabilities to support the people in their services effectively.
The team must collate and clean each piece of data into a form that will be usable for the analysis. As they will be trying to answer questions from the data, it needs to be of good quality, otherwise they will get low quality answers. No doubt the team will be spending a lot of time cleaning and getting the data ready. Once the data is ready, then analysis can start. Since the answer to the question that the L&D team needs to answer does not necessarily need any kind of complex data modelling or machine learning, most of their analysis will be based on Exploratory Data Analysis (EDA) which is all about understanding the data using descriptive statistics and visualization.
We will discuss that in the next post.