Credit Card Fraud Detection Using Machine Learning

5 min readMar 4, 2021

You can find the complete code here

Source: https://giphy.com/gifs/glitch-money-shopping-d3mmdNnW5hkoUxTG

In this fast-paced digital world, we are integrated into the digital transaction society. It is expected that in coming years there will be steady growth of non-cash transactions. As this digital transaction keep increasing every year, the number of credit card frauds also keeps increasing at an all-time high. 15.4 million people experienced credit fraud in 2016 alone in the U.S, according to a recent study.

There are few ways to stop these fraudulent activities but I am going to walk you through my machine learning approach here.

Collecting the Data

I used the Kaggle dataset which contains 284807 transactions. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Exploratory Data Analysis

Initially, I wanted to explore the Time, Amount, and class data columns

The time is recorded in the number of seconds and the data set includes all the transactions recorded over the course of two days. Most of the transactions happened in the daytime. The vast majority of transactions are relatively small and only a tiny fraction comes from even close to the maximum. Most daily transactions aren’t extremely expensive (most are <$50), but it’s likely where most fraudulent transactions are occurring as well.

In the long tail, fraud transactions happened more frequently. It seems It would be hard to differentiate fraud from normal transactions by transaction amount alone. Hour “zero” corresponds to the hour the first transaction happened and not necessarily 12–1 am. Given the heavy decrease in normal transactions from hours 1 to 8 and again roughly at hours 24 to 32, it seems fraud tends to occur at higher rates during the night. Statistical tests could be used to give evidence for this fact.

Looking at the class distribution we can see there are only 492 fraudulent transactions. That’s only 0.173% of all of the transactions in this dataset!

Feature Scaling

As we know from the dataset, features V1-V28 have been transformed by PCA and scaled already. Whereas features “Time” and “Amount” have not. And considering that we will analyze these two features with other V1-V28, they should better be scaled before we train our model using various algorithms. Which scaling method should we use? The Standard Scaler is not recommended as “Time” and “Amount” features are not normally distributed. The Min-Max Scaler is also not recommended as there are noticeable outliers in feature “Amount”. The Robust Scaler is robust to outliers: (xi–Q1(x))/( Q3(x)–Q1(x)) (Q1 and Q3 represent 25% and 75% quartiles). So we choose Robust Scaler to scale these two features.

Correlation Matrices

Correlation matrices are the essence of understanding our data. We want to know if there are features that influence heavily in whether a specific transaction is a fraud. I used heatmap to understand if there is any strong collinearity going on in the data

Building Model

I split the dataset to 70%-30% for training and test before I created any models. Another issue to address is the highly imbalanced dataset. There are various ways to evaluate and solve the imbalance problem. The synthetic minority over-sampling technique (SMOTE) is one of the over-sampling methods addressing this problem. Based on SMOTE method, I used the borderline-SMOTE method for dealing with the imbalance of my dataset. With this setup, I’m now ready to run the data through some models. Before I work with three models of logistic regression, DecisionTreeClassifier, and RandomForestClassifier, I used the pipeline method to simplify my workflow.

Summary Result of three models:

with the Logistic regression model, we captured 129 out of 147 fraud cases in the test dataset and 1501 transactions were mistakenly marked as fraudulent. Think of this false positive scenario when you travel outside the state without confirming your travel and you get a notification after you buy something.

with DecisionTreeClassifier 104 out of 147 fraud cases in the test dataset were detected but it did very good showing fewer false-positive around 81 transactions.

Confusion Matrix: DecisionTreeClassifier

RandomForestClassifier captured 117 out of 147 fraud cases and only 13 were mistakenly detected normal transaction as fraudulent transaction

Confusion Matrix: RandomForestClassifier

TensorFlow Keras Model

I used the deep learning model to classify the fraud cases and compared it with my previous three models.

I used the Relu activation function for all layers to encode and the sigmoid activation function to decode the output. The Keras model was then trained with Adam optimizer. binary_crossenttropy was used as a loss function and the model was trained on 100 epochs batch size was equal to 256.

Model Evaluation

I used receiver operating characteristics (ROC) and confusion matrix to evaluate the model. The ROC is a performance measurement for classification problems at various thresholds. It is essentially a probability curve, and the higher the Area Under the Curve (AUC) score the better the model is at predicting fraudulent/non-fraudulent transactions.

At the end of training out of 85334 validation transactions, we are able to correctly identify 82 fraudulent transactions and missed 26 of them. We incorrectly flagged 1552 legitimate transactions.

In the real world, one would put an even higher weight on class 1, so as to reflect that False Negatives are more costly than False Positives. Next time your credit card gets declined in an online purchase — this is why.

Conclusion

Credit card fraud detection has been a key import area in machine learning. This happens majorly due to continuous changes in patterns in fraud.

There are lots of methods to detect fraudulent activities on your credit card, and it’s really cool to see how companies deal with this on a day-to-day basis. Our machine learning models showed that even simple logistic regression trained based model on highly anonymized data was able to predict fraud-nonfraud transactions.