MongoDB — Twitter dataset

Francesco
Jul 9, 2021

This is the some results from my advanced databases & information system exam.

Dataset

It contains tweets and users information from 29/1/2020 to 7/03/2020. The total size is 6,1 GB.

MongoDB

I choose to install mongoDB via Docker.

Docker-compose.yml

init-mongo.js

In the image below you can see (part) of the content of init-mongo.js. It creates another non-admin user and the first collection, user, with some schema.

Results & Query

Get number of tweet per day

The total sum of the tweet is 5.405.147
Query

The image below shows part of the code used to generate the plot

Most used hashtags per day on 19/2/2020

Most used hashtags on 20/2/2020

This is the query to get the data for the plots above.

--

--

Francesco

Master’s degree in Computer Engineering for Robotics and Smart Industry — Smart Systems & Data Analytics