Thursday, 24 November 2016

Some Help Detecting Fraudulent Transactions with Math

Jo Craven McGinty "The Numbers" columnist for The Wall Street Journal, filed a story in December, 2015 about a team of forensic accountants sifting through refunds issued by a national call center in the U.S. They found something didn't add up, there were too many fours in the data. And it was up to them to figure out why.

The anomaly was so subtle that it might have slipped by unnoticed. But with employee fraud costing an estimated $300 billion a year, they paid due attention. In this case, there were several hundred operators across the country authorised to issue refunds of up to $50; anything larger required the approval of a supervisor.
Each operator had processed more than 10,000 refunds over several years. With so much money going out the door, there was opportunity for Fraud, and KPMG decided to check the validity of the payments with using Benford's Law. The Wall Street Journal, Accountants Increasingly Use Data Analysis to Catch Fraud

The history behind Benford's law is as fascinating as the law itself; Although it's called Benford's law, he (Benford) was not the first to observe this digit bias. The phenomenon was first discovered by astronomer-mathematician Simon Newcomb (1835-1909), who observed this behavior more than 50 years before it became Benford's Law.
Newcomb observed that library copies of logarithm books were considerably more worn in the beginning pages which dealt with low digits and progressively less worn on the pages dealing with higher digits.He concluded that the 10-digits do not occur with equal frequency.

In 1881 He published a short article in the American Journal of Mathematics in which he concluded that more numbers exist with the first digit of one, than with other numbers. An article which didn't receive much attention then. Put simply, the first digits of naturally occurring data sets follow Benford's Law; where the digit 1 tends to occur with a greater probability of about 30%. If all the digits occurred equally, the probability would be 1 out of 9 (11.1%).

The Law is nowadays widely used in fraud detection such that Computer Aided Audit Tools like IDEA and ACL Include digital analytics based on Benford's Law. The results below are from analysis using R (free open source statistical analysis tool) that you can be used to test transactions. The data set Corporate Payments data (comes with Benford Analysis Package in R).

In fraud detection, the basic assumption is that first, second, third, and other digits in real data follow the Benford distribution while the digits in fabricated data do not. In the example we tested the first digits only.
After analysing the first digits of invoice payments amounts for possible deviations from Benford's law and the results as shown in the Corporate Payments Distribution chart above. The bars represent the proportion of digits and the line graph is the expected Benford's Law distribution.

The results show that, largely the first digits of the payments data were approximately distributed as Benford's Law or at least followed a pattern of monotonic decline. Except, the number of first digits starting with 5 and 9 where higher than Benford’s estimates. Further examination is required to determine whether these deviations are material indicators of potential fraudulent data.



Friday, 11 November 2016

Another look at the polls

The dust hasn't quite settled after the 2016 US presidential elections. The campaigns were rough and now everyone's asking why the polls got it so wrong. How did they all miss Trump's victory?

I'm in the group that don't really like politics but I love looking at the stats and predictions. I might become ABC's Anthony Green when I grow up. On election morning I didn't do much at work after stumbling on to this great  US Polls Data Set on Kaggle which is a collection of polls run from 2015.

Its a rich source of information with over 10k polls run by 188 pollsters. A simple aggregation of the polls by means of adjusted poll results provided a fairly accurate prediction for the election results
by smoothing variations and biases between individual polls.
After running this prediction, I spent most of the time matching the results and they were so accurate that I wasn't surprised Trump won Florida. Concerning what kind of president he'll turn out to be we'll watch and wait.

And if you speak R here's the R Script on Kaggle used for the above prediction map just to kill any doubt or if you want to star in the next elections.