Artificial Intelligence Methods to Detect Cheating in Games

At least in substantial part, this is my day job. From your Question, it seems you are thinking of the discipline of Machine Learning (rather than the broader rubric, AI). And i think your instincts are correct--an ML algorithm is ideally suited to fraud prediction/detection because it can generalize over a highly non-linear domain and it can adapt (as new data is fed to it). So because of these two primary characteristics, it is far more difficult for fraudsters to discern the algorithms' "rules" for prediction--because these rules are in fact a complexly reticulated set of soft-constraints and which change over time as the algorithm learns against new data. (I might suggest though setting aside A* unless you have a particular reason to believe pathfinding is a useful heuristic for your problem--i am reluctant to say there is no connection, but if there is, it's certainly an unorthodox one--i have never seen pathfinding applied to this sort of problem).

The only fact you mentioned about the type of online fraud you are interested in identifying is multiple accounts by a single user. No doubt a variety of techniques could be applied here, but i'll mention one analytical technique in particular because: (i) i have actually used it in the scenario you mentioned; and (ii) it is outside the scope of the other Answers, so far.

The technique is based in graph theory.

The premise: accounts which are owned by the same user are often best identified not by their individual behavior (clickstream) but by their relationship to one another--in other words by their network behavior.

An example: chip-dumping in online poker. Here, an individual opens multiple new accounts on a poker site (using bogus information) and then claims the advertised bonus for each account (e..g, deposit of $100 is matched by a $100 bonus). Of course, the bonus has highly restrictive "cash-out rules, generally a threshold number of hands played before the bonus becomes like cash and can be withdrawn from the player's accounts as cash.

So the goal of chip dumping is to turn those bonus dollars in to real cash. One person opens five separate accounts (as five different people) then opens one more "legitimate" account (using their genuine identity). These six players--again actually just a single player--will play at one table against each other and the five sham accounts will quickly lose their stacks to the legitimate account, which quickly cashes out their winnings because of course the cash-out restrictions on bonuses apply only to the account to which they were originally given; hence the cash-out restrictions are completely circumvented.

What's difficult about this type of scheme is that the illegal conduct is virtually impossible to detect on an individual account basis--*the bad behavior, collusion, arises from the interaction of a group of commonly-owned accounts*--in other words, the behavior of interest needs to be studied at the network level.

And therefore, Graph Theory is a natural framework for analysis.

The technique i have applied was based on an academic paper by Chau et al. at Carnegie Mellon, titled Detecting Fraudulent Personalities in Networks of Online Auctioneers (PDF).

The fraud scenario at the heart of this paper is this: a seller on eBay wishes to sell a very expensive item (which they likely don't even own, but in any event, have no intention of ever shipping to the buyer) to a willing buyer. In order to induce the innocent buyer to willingly engage in the transaction, the fraudulent seller first acquires a very high (artificially high) reputation by engaging in a number of "successful" sales of items to a group of buyers; these buyers are often sham accounts controlled by the buyer.

More specifically, the authors of this Paper combine data across two levels (account level and network level) using a Belief Propagation algorithm over a Markov Random Field.

The signature graph structure, by the way, is known as a bipartite core, arising from a group of accounts which have a very high number of transactions among the members of this group, but very few outside of this group (i.e., with the rest of the eBay community).


If you have access to the user's game-movements log you could use clustering to group users who play 'similar'. Once you have the clusters you could use the IP to filter users inside each cluster.

Another approach may be to use a supervised-learning algorithm like Desicion-Trees, IBK, etc. But for this to work you need a training set with samples of users you already know have cheated.

You can use Weka data mining software to find patterns inside the data. And it has an option to connect directly to a database. It includes clustering, desicion-trees, ibk and a lot of algorithms to try with. But you need a basic understanding of each algorithm in order to interpret the results.