We are almost at the end of this course; only two weeks left. Luckily, we already finished this project; however, I can already feel the stress of working with the pending 3 projects for other courses. Actually, we might use this course’s project as a starter for our Advanced Database course project. We are thinking on developing a web app that based on data displays a heat map of parties popularity among different zones. We are focusing on Jalisco, and parties competing for governor, and senator position.
Stay tuned on the developments of this week other projects.
Holy week is here, and I won’t be relaxing that much. I am preparing for a job interview. I will be focusing on working on my data structures, algorithms, and databases skills for the interview. I won’t be working on this project, so I won’t be blogging in Holy week.
On the other hand, this week we had our sprint presentation. We could say that the backend part of the application is almost finished. We must focus now on the frontend part of it.
This week I saw some great advancements on our project. Alfonso, and I finished styling the code for the words counting functionality of the word cloud. The graphic stuff is going to be developed in the front-end; however, the basic functionality of distinguishing which words are most frequently used by candidates is already done. We used a library from Python named Counter to do this. The current output of our word cloud program is a list of words with their respective number of appearances.
This is what we are going to present on the sprint presentation on Tuesday because on Friday the classrooms where occupied, so the presentation was moved.
Be sure to hear of what the next steps are going to be.
This week we have a deliverable on Friday. We need to show some advancements on our project, which I believe we already have so far. I hope to finish the word cloud functionality, and to begin integrating everything. To do this, we need to delete stop words from our analyzed data. Finally, we need to use a library that uses word, and word count.
This week I said I was going to work on eliminating stop words with Alfonso; however, I didn’t quite ended up doing that. What I did, was optimizing the code, to avoid repeating code. Henceforth, the code used for opening and reading a file into an array was made a static method accessible by all micro-services. This is a good practice to stop repeating unecessary lines of code.
Next week we will be having an intense work on Taller Vertical. I’ll update what I’ve done on the Taller Vertical, but I won’t be working on this project next week.
It’s me again with another update on what we will be working this week. Last week, I worked on mining all tweets from an account on Twitter. This week, I will be polishing what I did last week, and I will be working with Alfonso on eliminating the stop words of a tweet. Stop words are empty words that are filtered out from natural language processing, such as: the, is, at, which, and on.
By doing this, we will be able to create a more accurate words map without having a lot of concurrences of stop words and focusing just on the information that matters.
Stay tuned for the advancement of this week on the post mortem blog post.
This week is all about joining forces. So many thing have beeen worked on separately, and its time to integrate everything into a first impression of our project. We can now focus on gathering and mining information to be ready for the next phase. The next phase is actually analyzing data, and displaying results of our application. However, that’s a topic for another day.
In the meantime, let’s wait to see how things are developing within our project. Stay tuned for this week’s post-mortem to know what’s going on.
As you know we already had a streamer to use with the Twitter API for streaming tweets. With that streamer we were only able to filter using key words; however, tweets from all sources arrived. Now, added a second streamer, where we are now able to filter from whom we want our tweets. This list of users is provided via a .txt file, and we can also use a list of keywords for a more thorough filtering.
What did we do with the first streamer?
It is now fully working, adding filtered tweets to the database, and printing who tweeted the matching results.
This week I added environment variables for the connection information on the database. Also I changed some classes names, and improved the code for better understanding. Connection to the database is working, as well as insertion. However, we still have problems with encoding the text to UTF-8, and avoiding weird characters. EMOJIS are a big problem because, whenever a user has an EMOJI in his user name, or in his tweet, the program halts.
We are still figuring a way to store the text with the emojis for further sentiment analysis. I know it should be possible.
This week is going to be about improving what I previously achieved on the project. I will fix the code, applying some better practices while coding, and I will add some functionalities. Reading, and deleting from the database shall be implemented. Also, I will try to store the tweets without modifying the text for further use on sentiment analysis. For this last part, everything matters; from emojis, to raw text. Henceforth, I will be researching how to store tweets with special characters, and emojis into the database.
This is what I will try to do, and I will keep learning about Python, MySQL, and all these new topics for me.