How to make career guidance system intelligent
If you have a large amount of question, each of them can represent a feature. Assuming you are going to have a LOT of features, finding the series of if-else statements that fulfills the criteria is hard (Recall that a full tree with
n
questions is going to have2^n
"leaves" - representing2^n
possible answers for these questions, assuming each question is yes/no question).Since hard programming the above is not possible for a large enough (and probably a realistic size
n
- there is a place for heuristical solutions one of those is Machine Learning, and specifically - the classification problem. You can have a sample of people answering your survey, with an "expert" saying what is the best career for them, and let an algorithm find a classifier for the general problem (If you want to convert it into a series of yes-no questions automatically, it can be done with a decision tree, and an algorithm like C4.5 to create the tree).It could also be important to determine - which questions are actually relevant? Is a gender relevant? Is height relevant? These questions as well can be answered using ML algorithms with feature selection algorithms for example (one of these is PCA)
Regarding the "technology" aspect - there is a nice library in java - called Weka which implement many of the classification algorithms out there.
One question you could ask (and try to find out in your project) which classification algorithm will be best for this problem? Some possibilities are The above mentioned C4.5, Naive Bayes, Linear Regression, Neural Networks, KNN or SVM (which usually turned out best for me). You can try and back your decision which algorithm to use with a statistical research and a statistical proof which is better. Wilcoxon test is the standard for this.
EDIT: more details on point 2:
- In here an "expert" can be a human classifier from the field of HR that reads the features and classifies the answers. Obtaining this data (usually called the "training data") is hard and expansive sometimes, if your university has an IE or HR faculty, maybe they will be willing to help.
- The idea is: Gather a bunch of people who first answer your survey. Then, give it to a human classifier ("expert") which will chose what is the best career for this person, based on his answers. The data with the classification given by the expert is the input of the learning algorithm, its output will be a classifier.
- A classifier is a function itself, that given answers to a surveys - predicts what is the "classification" (suggested career) for the person who did this survey.
- Note that once you have a classifier - you do not need to maintain the training data any more, the classifier alone is enough. However, you should have your list of questions and the answers for these questions will be the features provided to the classifier.
All you have to do to satisfy them is create a simple learning system:
- Change your thesis terminology so it is described as "learning the best career" instead of using the word "intelligent". Learning is a form of artificial intelligence.
- Create a training regime. Do this by giving the questionnaire to people that already have careers and also ask questions to find out how satisfied they are with their career. That way your system can train on what makes a good career match and what makes a bad one.
- Choose a learning system to absorb the data from (2). For example, one source of ideas might be this recent paper: http://journals.cluteonline.com/index.php/RBIS/article/download/4405/4493. Product sum networks are cutting edge in AI and apply well to expert-system-like problems.
Finally, try to give a twist to whatever your technology is to make it specific to your problem.
In my final project, I had some experience with Jena RDF inference engine. Basically, what you do with it is create a sort of knowledge base with rules like "if user chose this answer, he has that quality" and "if user has those qualities, he might be good for that job". Adding answers into the system will let you query his current status and adjust questions accordingly. It's pretty easy to create a proof of concept with it, it's easier to do than a bunch of if-else, and if your professors worship prolog-ish style things, they'll like it.