Graduateland has been around for quite some time now. It was founded in 2010 as a result of a bachelor thesis by two students from CBS, and a couple of tech-savvy people. Since Day 1, a lot of aspiring students and graduates have joined the Graduateland universe. More than 1,000,000 students from +190 countries and +5.000 universities to be accurate. Furthermore, a tremendous amount of jobs has made its way through Graduateland and associated university portals. Twenty-three-million jobs actually. That’s a lot of jobs - and a lot of valuable data just lying around gathering dust.
The recruiter meets Genie’s lamp
How can we leverage these mountains of data to improve the experience for recruiters and our users? Everyone’s talking about AI, Artificial Intelligence. Machine learning, big data, deep learning, neural networks and self-enforcing deep neural networks are all buzzwords we hear on a daily basis. The 23M jobs that we have access to clearly qualify as “big data”, so we started looking into which possibilities we had at hand. If a recruiter encountered the genie from Aladdin, what would the recruiter’s deepest wish be? Answer: The ability to look into the crystal ball and predict the performance of a given recruiting campaign, including metrics such as number of applications, and most importantly, the quality of applicants.
So this is what we set out to solve. Our skilled recruitment advisers have always been quite good at predicting the outcome of a given job campaign, but this has been based of a combination of experience and qualified gut feeling - until now.
Scope of Version 1
Fast-forward a couple of weeks, and we had the first version of what we call the ‘Prediction Monster’ ready. The Prediction Monster is a machine learning algorithm, trained on all of the 23M jobs, designed to predict the number of applications a given job will get. The first version of the algorithm is solely engineered to predict the number of applications, and does not take into consideration the quality of these applicants, which is obviously a highly important factor. We are saving that for v2.
Some technical stuff
The algorithm is a so-called supervised ‘regression’-algorithm, meaning it returns a numerical result (the number of applications), as opposed to predicting a binary outcome for instance (e.g. a YES or a NO). You might have encountered the term ‘regression’ from your high-school math, and the peculiar thing about machine learning is, that even though it sounds super complicated and highly technical, a lot of the concepts behind are actually pretty straightforward.
First things first, for a machine learning algorithm to perform well, it needs the right features. A feature is defined as “an individual measurable property or characteristic of a phenomenon being observed”. Translated into plain English: a feature is a variable that you feed into the algorithm, that aids in the prediction. In our case, the variables that play a role in the number of applications a job gets are, amongst others: the type of job, the job category, which university portals the job is posted at, the exposure level, how many emails are used to promote the job etc. These are all factors that influence the performance of a recruitment campaign.
Hamburger / No hamburger?
Machine learning algorithms are “trained”. Essentially what this means is that you feed training data into the algorithm, and have it learn the relationship between the features and the “target variable” (which, in our case, is the number of applications). The algorithm is trained on the training data (usually around 80% of the dataset), and then one tests the performance of the algorithm on the “test set” (the remaining 20%). In our case, we used a Random Forest regression algorithm, which is a whole lot of “Decision Trees” in one algorithm. A decision tree is a tree-like graph of decisions and their possible consequences. Below is an example of a decision tree - keep in mind our algorithm is a bit more complex than this.
Now comes the fun part – testing our algorithm for the first time. Sweaty hands. Dilated pupils. Pulse racing… 27% accuracy. Damn, that is disappointing. Obviously, accuracy is a measure of how well the algorithm is predicting, but you might wonder how accuracy is actually calculated. As mentioned above, algorithms are tested on the test set, which is a subset of the overall dataset. In order to calculate the accuracy of algorithms, one measures the difference between the predicted value and the actual value in the test set. In this process, you can look at two metrics: Mean Average Error (MAE) and Root Mean Average Error (RMSE).
MAE is basically just the average difference between the predicted and actual values in the test set, whereas RMSE is a bit more advanced.
RMSE squares the error (the difference between actual and predicted value), before they are averaged, resulting in RMSE giving relatively high weight to large errors. This is advantageous when large errors are particularly undesirable.
Our first run got a MAE of 7,11 and a RMSE of 214,79.
What that means is, that on average our algorithm is off by about 7 applications.
Thus when presented for a job that would actually receive 10 applications, the algorithm will predict anywhere from 3 to 17 applications for the particular job. The abnormally high RMSE is also evidence of some predictions that are completely off the charts.
Digging some more into the data, we found that the algorithm has a hard time assessing the performance of jobs that have received more than 250 applications. Luckily, jobs that receive that amount of applications are a minority (who wants that many applications anyway?), so we decided to limit the scope and solely fine-tune the algorithm to perform well for jobs under 250 applications.
The following chart showcases the relationship between the algorithm’s predicted number of applications versus the actual number of applications in the test set. One can sense a linear tendency, but too many jobs would still be predicted too imprecisely, so there was definitely room for improvement. Back to work.
A lot of tweaks, cups of coffees and long nights later, we made a breakthrough. The breakthrough was called feature engineering. Feature engineering is the process of using domain knowledge of the data to create features that improve the performance of machine learning algorithms. One example of a characteristic that we had to apply feature engineering on, was the location of a given job. We know how this carries huge importance in the number of applications that it receives, primarily because we have hard data on the preferences of our users, as well as their search behaviour. But initially the algorithm didn’t pick up on that, so we had to feed in some features that would allow it to recognize the relationship between job location and performance.
This kick-started a comprehensive analysis of all of Graduateland’s +1.000.000 users, their behaviour and preferences, in order to get a measure for the number of potential candidates who would be prone to apply for a given job, segmented by job type, job category, job location, etc. This kind of data really seemed to please the ‘Prediction Monster’, improving the accuracy of our algorithm by orders of magnitude, and moving us one step closer to fulfilling the prophecy for our dear recruiter.
Another vital factor we sought to incorporate into the algorithm, was the importance of seasonality. Seasonality refers to the time of year the job is posted, and we knew for a fact, that the number of applications a given job receives is highly correlated to the time of year the recruitment campaign is initiated. This is based on the human behavior of the students/graduates who have exams, go on vacation, return from vacation after having spent all their savings, act on their new year’s resolutions etc. The chart below shows the average number of applications a job gets, segmented by month.
As you can see, the number of applications vary a lot by month. There are interesting tendencies in this chart: January has by far the most applications, due to a variety of reasons. One of the reasons is all of summer internships, which are being promoted in that period. September is also busy because of the vast majority of graduate programmes having their application deadline in autumn. This knowledge is crucial to incorporate into the algorithm, which we did by specifying the month the job was posted, along with the number of applications that are usually delivered in this period.
All these initiatives helped sky-rocket the performance of the ‘Prediction Monster’.
From mediocre 27% accuracy to, hold on… 84% accuracy!
Check out the following diagram showing a subset of the results, and this time one can clearly see the linear trend. This time around, the algorithm has a Mean Average Error (MAE) of 0,8 applications, which means that, back to our previous example, it would predict between 9 and 11 applications – as opposed to between 3 and 17 applications. Now we’re talking.
We improved the algorithm from a rather horrible accuracy of 27% to 84%, actually managing to build a rather robust model to predict performance of recruiting efforts. We have by no means crossed the finish line - there is still a lot of work ahead of us. One thing is predicting the number of applications a job will get, another thing is measuring the quality of these applicants. That’s what matters in the end. What we have built now, is what product people would label a Minimum Viable Product or MVP. Now comes the time taking this rather fragile ‘Prediction Monster’ to a full-blown, terrifying, fire-breathing, princess-guarding, recruitment-dragon.
Happy to answer any questions related to the Graduateland Prediction Monster.