Creating a Twitter Bot

Nathan Fleck
7 min readJun 24, 2021

--

When I first joined the Blue Witness non profit project they presented their roadmap, this project was a data scraping and categorizing project. This non profit wanted to create a solution that would be able to take twitter posts and extract incidences which involved the police then present a heat map of the allocated incident reports. With this centralized system, users are able browse data using the given user interface. The end goal was to have an automated system that would able to accurately identify incidents involving police but also able to determine the location of the incident. A Bidirectional Encoded Representation from Transformers was used to perform natural language processing which was able to classify tweets. To solve the location data issue it was decided, before I joined the team, that they would utilize and automatic twitter account that would reach out to users and ask certain questions. It was also necessary to process responses but for that to even happen there must first be a functioning twitter bot. When I joined the project the state of the twitter bot was virtually non existent. I knew I would be able to develop the baseline functionality which were: the ability to reply to a certain tweet plus update a database, the ability to get all reply tweets directed towards our bot and connect those responses to the original tweet plus updating the database. The only functionality that existed when I started was the ability to connect to the twitter API and a function that could reply to a specific tweet ID.

The Process

My main contribution was writing the core functions needed by the bot to send tweets and observe replies while managing the data in a database. The more difficult challenge was not working with the twitter API but managing the database. These functions need to connect to specific rows within the database for each action taken by the bot. When using condition statements within SQL queries I was having a very annoying problem where I was using the correct variable in the condition statement but the query was returning a type mismatch error. I knew the table column I was trying to access was a text variable and the variable I was passing into my F string was string variable therefore I was puzzled as to why the SQL query was saying type mismatch. Eventually I tried to force the SQL query to see the variable as a string by adding single quotes outside of the F string brackets.

This is an example of a correct query. Around the {reply_id} there are single quotes which is necessary for condition statements. There is no issue when it comes to setting variables, single quotes will not return a type mismatch error in fact no error will be passed.

The Development

There are two main functions of the twitter bot. We need the bot to be able to take a specific tweet and respond to it with specified text. It is necessary to store the actions of the twitter bot into a database but in such a way which we can connect all replies to the original tweet.

The first function I wrote was a function that took a tweet object and text as input parameters used the twitter API to respond to that tweet with the given text input. When the API sends out a tweet, the API returns the tweet object of the tweet sent by the API. I then take the tweet object and update the database by storing the tweet ID of the sent tweet in the row that is indexed by the Tweet ID the bot responded to.

With the ability to send out tweets plus storing to the database, the next step would be to call the API and see if users replied to any of our tweets.

This function is very simple, since_id is used as a parameter to only fetch replies that came after that tweet id. This is used to filter responses we have already seen, we must update this value every time we want to process responses. Although we have replies to our bot, we must process them which is done in the update_mentions function.

The first thing this function does is it fetches the tweet id of the most recently processed response. To do this it accesses a special row in the database where the index is “update id”. We extract the id, which is stored in the user_name column, and use the id as a parameter when we call get_mentions. We also know we need to keep track of the id of the most recent response and because get_mentions returns a list starting with the most recent mention we must always take the first id when processing responses. To do this I initialize an empty list and insert a condition statement that if the list is empty then store the id into it. This can only happen once which is always the first element. We need this value for later when updating the database. Each tweet reply that is returned from get_mentions is processed by being passed into the received_reply function. I will show the process of receiving each reply after the explanation of the current function. Once all replies are processed I then take the tweet id of the most recent reply and update the database for future calls. Afterwards I commit the database and close my connections.

Each response to the bot must be processed individually which is what this function manages. This function takes a tweet object and the database cursor as input parameters. The tweet object has two variables extracted from it, reply_id is used to connect that tweet to the tweet it was responding to. We first must take all previous responses processed for the bot tweet which we access through the database and save to a variable called responses. One check that is important is if there are no responses then we need to make sure the variable is an empty string and not a None type. We effectively want to take all previous responses and add the current response but SQL databases cannot store lists into the database and therefore we must use a single string variable that is decoded to extract the individual string variables. This is done with two steps, we first pass the current response text and also the response string from the database into a function called string_to_list.

This function takes the single string that incorporates all responses and splits the string based on the substring “:.:.:”. Because we are using a substring within a string to separate values we must use a key that is not going to occur naturally. Once the string is split into a list of strings, we are able to append the processed tweet text to the list of responses. Once we have a new list of the appropriate responses the function returns this list to be further analyzed or processed. The response list may be used in analysis but no matter what it needs to be stored back into the database. To do this we first must convert it back into a single string.

Each tweet response that is processed will go through this process within the received_reply function is the database updated with the appropriate responses.

For the future of this product the ideal goal is to reach reliable automation. To do so it would take a great management system that can efficiently make decisions on which tweets to reach out to and how to process replies. The bot will have to reliably be able to determine location from a users response.

There are a few future challenges: reliable automation and response processing through natural language processing.

I really like developing automated processes and so I loved working on this project. It was an enjoyable problem to solve but very straight forward in its execution.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response