Mining Twitter Data with Tweepy API

I feel interested when the lecture mentioned about Social Media Analytics on Twitter Data. It said that Twitter data referring to cruise travel were collected and analysed. Therefore, I am going to collect Twitter data referring to the game Hearthstone to see what will happen.


1.) I start searching Mining twitter data. Luckily I found a website who teach users to mine twitter data step by step. And then I followed the instruction from that website.
2.) The first step is to register an application in Twitter. And then under the python console, install Tweepy API in order to collect the data from Twitter.
3.) Then I collected the Twitter status of my account. As my account only follow Hearthstone, it only shows the data about it. The below two figures, one represents the detail shown in twitter, another one represents the result of the python program.



4.) After that, I start streaming and listening to the Twitter with the hashtag “#Hearthstone” and then stored it into a file call “hearthstone.json”. The time interval were between Feb 22 19:31:44 +0000 2017 to Feb 23 00:19:50 +0000 2017, about 5 hours.
5.) And then I perform Text Pre-processing, that is tokenising the input. And then I start to count the top-5 frequency words appeared from the Twitter-data collected.
6.) However, without using filtering with stop words. The result is meaningless.


7.) After applying stop words, the result is shown below.


“#Hearthstone” refers to the hashtag that we care capturing. It can be refer as the number of twit we collected that is 241. “7.1” is the update-version of the game. “Arena” is an in-game object. “Https” may refer that there may be hyper link to website or photo in the twit. “@hearthstone_exp” may refer that there are people who help others to gain exp in hearthstone. It is much more meaningful compared to the result above.


That is what I have done in mining twitter data. Since the limitation of time, I performed words count only. And I believe that there is room for me to do much more for example like term co-occurrences, sentimental analysis, Geolocation and Interactive Maps, etc. I feel that it is quite similar to our project. This is my first-step on handling data from Social Media. I believe that analysis on it can facilitate and improve human life as making things more and more convenient.

Mining Twitter Data with Python (Part 1: Collecting data)

5 thoughts on “Mining Twitter Data with Tweepy API

  1. Thanks for sharing MOSESSMNG. Python is really a very useful tool. I also do some machine learning project by the advantage of python’s convenience.Web Crawlers is also a very efficient method, I think you can try it, not only for the Twitter. Other web’s information you can get also. 🙂

    I think the model we choose for sentimental analysis is very important and diffcult. Expecting for your next step!

    Liked by 1 person

  2. Running commands in console is sometimes time consuming rather than running in linux. using nohup command in linux to run the python command by executing a shell script, the process or command will not get killed even aborting remote connection.
    using bi-gram model and tri-gram model, you can also try to find frequent pairs {i, j} and Frequent Triples {i, j, k} if you interest in. Cheer you up, Moses!

    Liked by 1 person

  3. Thanks for your sharing. It is amazing that you have extracted so many data from the twitter and analysed them. However, your mining process only includes the sentence which have the tag #heartstone, most people put up their twitter without the entire name of heartstone, if we want to analyse those twitter relevant to a theme like heartstone, what should we do?

    Liked by 1 person

    1. Yes. Some of them put a hashtag #HS, while some of them may not post a tag but the post is related to hearthstone. For the first case, I think what we can do is to find out the Synonyms of #Hearthstone and extract them also. But there is a case that the synonyms we collected may not actually refer to hearthstone. For example, #HS can mean HighSchool.
      For the second case, I think we cannot improve it, as it required extraction of all twit and then let the program to determine if the content related to hearthstone and then it will be heavy.
      So I think only collect the twit with hashtag #hearthstone is the simple way and it is enough for us to analyse.

      Liked by 1 person

  4. Thank you for your work on showing how to perform social media analysis by Python program, the procedures are very clear so that i can follow your steps. However, as a player of Hearth Stone, the result is out of my expectation from what I expected are some words of the trending meta, for example “reno”, “jade”. The result may be more meaningful if the sample size increases. Anyway, I appreciate your study on twitter.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: