Donald John Trump, 45th president of the United States of America, is a prolific twitter with a following of more than 20 million. Could we use a simple machine learning algorithm to identify if a given tweet is from Donald Trump? Yes! An effective and yet simple ML algorithm is Naive Bayes. This algorithm has been used extensively to classify spam emails.

In this case study, we use a dataset comprising 200 tweets from Donald Trump and others to train and test our Naive Bayses classifier. The dataset is read from a csv file into allTweets (our Pandas DataFrame). See the allTweets data structure below.

Dataset is loaded into a Pandas DataFrame – allTweets
Top 5 rows in allTweets

The dataset is randomly split up 80/20 – 80% allocated for the training and 20% testing of our tweets classifier. The simple Naive Bayes algorithm worked very well, achieving a high accuracy of 87.5%! See the confusion matrix below.

Our Naive Bayes tweets classifier has an accuracy of 87.5%!
Confusion Matrix for our tweets classifier

We have built a simple utility function identifyTweet() to return the prediction for a given tweet. See some sample runs of the utility function below.

Test runs for predicting if a tweet came from Donald Trump

The Python code is shown below: