Academic year 2017/2018 Universidad Complutense de Madrid Facultad de Informática Final Project Aleksandra Zmudzinska Blazej Wietczak Analysis of Twitter activity by country during competitive events Abstract The subject of this work is an attempt to analyze data. Due to its popularity, availability, and the ability to comment on public events, it was decided to use Twitter data as data for analysis. Data obtained in this way are particularly interesting because of the large social and age cross-section that can be observed among Twitter users, so they give a fairly good statistical sample for a given location. The aim was to create a solution to present the number of tweets posted by users from different countries during media events. Geolocation is usually not included in the tweets, so it was decided to work on the location given by the user [1]. The main tools used in the work are the Twitter API, the non-relational database management system MongoDB and the Python high-level programming language. It was also necessary to use external databases to convert location to coordinates. Thanks to the use of solutions presented in the work, it is possible to present the activity of twitter users from different countries through different visualization methods. In order to analyze the data in the best way, it was decided to use the tweets chart in time, maps showing the total intensity of tweets for the 20 most popular locations for the entire event, and a movie based on maps in minute intervals. It was important for the results to be legible and visually appealing, understandable also for people not related to the project. For the analysis, it was decided to choose popular events on an international scale, the course of which will be possible to reproduce after a while during which it will be possible to observe certain culminating points. During the tests, it was checked how the project deals with two events, the Eurovision Song Contest and the UEFA Champions League match between Real Madrid and Paris Saint Germain, the analysis of the data obtained was also carried out and attempts were made to extract conclusions from them. Thanks to the proposed methods, it was possible to obtain very interesting results, among others, to observe how the culmination such as scoring a goal in a football match influences the activity of users on Twitter and for which nations the given event is the most important. Keyword: Twitter, MongoDB, data cleaning, data-analysis, visualization, geo-location, bubble-map, chart, football match, Eurovision Contents Academic year 2017/2018 ............................................................................................... 1 Universidad Complutense de Madrid .............................................................................. 1 Facultad de Informática ................................................................................................... 1 1. Introduction ............................................................................................................... 4 a. Twitter ....................................................................................................................... 4 b. Tweets analysis ........................................................................................................ 4 c. Examples of tweeter analysis (applications) ............................................................. 6 2. The aim of the project ............................................................................................... 7 3. Programming environment ....................................................................................... 7 a. MongoDB .................................................................................................................. 7 b. Python ...................................................................................................................... 7 c. PyMongo .................................................................................................................. 8 1. Implementation ......................................................................................................... 8 a. Selection of the event ............................................................................................... 8 b. Preparing data .......................................................................................................... 9 c. Data selection ......................................................................................................... 11 d. Data visualization .................................................................................................... 12 e. Automation ............................................................................................................. 15 f. Adding Tweets to movie ......................................................................................... 16 2. Results .................................................................................................................... 18 a. Real Madrid - PSG football match .......................................................................... 18 i. Chart ....................................................................................................................... 18 ii. Map ......................................................................................................................... 20 iii. Movie ...................................................................................................................... 20 b. Eurovision ............................................................................................................... 22 i. Chart ....................................................................................................................... 22 ii. Map ......................................................................................................................... 23 iii. Movie ...................................................................................................................... 24 3. Division of labor ...................................................................................................... 27 4. Conclusions ............................................................................................................ 28 5. Program files........................................................................................................... 32 6. Bibliography ............................................................................................................ 33 1. Introduction a. Twitter According to Cambridge Dictionary, a social network is [2] : “a website or computer program that allows people to communicate and share information on the internet using a computer or mobile phone”. Twitter [3] is a social network that provides a microblogging service. A registered user can send and read so-called tweets. A tweet [4] is a short text message (max. 280 characters) displayed on the author's profile and shown to users who are watching the profile [4]. Twitter allows tagging (using # in front of the word) and responding to other users (@ username = answer). Users write short messages in their Twitter profile via a website, SMS or via a mobile application. Twitter [5] is fashionable among the elites and widly used by public figures, including by politicians, diplomats, publicists and journalists. Twitter [6] profiles also have newspapers, magazines, state institutions, companies, medical centers, celebrities, etc. Tweets of well-known personalities (e.g. the pope, the president of the USA) are often prepared by communication specialists. New Internet communication channels, such as Twitter or Facebook, have helped to help demonstrators coordinate their activities and send information to the international community about these demonstrations [7]. Social networks help in communication, coordination and synchronization of manifestations The term Twitter revolution may refer to: • riots after the elections in Iran in 2009 • riots in Moldova in 2009 • revolution in Tunisia in 2010-2011 • revolution in Egypt in 2011 b. Tweets analysis According to 2018 Global Digital suite of reports from We Are Social and Hootsuite now more than 4 billion people around the world using the internet [8]. Figure 1 Statistics on the use of Social media • The number of internet users in 2018 is 4.021 billion, • The number of social media users in 2018 is 3.196 billion, which is 42% of all population, • The number of mobile phone users in 2018 is 5.135 billion, which is 39% of all population, Figure 2 Map of the use of Social media around the world Twitter is used by over 46% of the world's internet population, it allows commenting on events in the world in real time and watching tweets published by public figures, celebrities or politicians. This makes it a very powerful tool to observe political moods, social trends and what is popular in a given place in the world, in a given age group or for a given gender. Since 2006, the possibility of using posts on Twitter as a source of sociological data has been tested. Software that is capable of processing natural language is tested for this purpose. c. Examples of tweeter analysis (applications) Data from Twitter is an interesting source of information about behaviors, interests and trends [9]. Many companies use information from it extensively, creating marketing or political campaigns, investigating the reactions of people to events or even examining the interest in specific cultural events. Using Twitter, it is possible to collect materials from any part of the world, almost immediately after they have been generated. The result is that in recent years many applications have been created that allow you to connect to the Twitter API and download data. Below are the most popular applications using data from the twitter site. Table 1 The most popular applications using data from the twitter site Name Notes BackTweets Twitter monitoring tool which allows search for keywords in historical tweets. It also has an advanced search option which makes your searching more flexible. Monitter Monitor Twitter world for s set of keywords with a visual presentation of messages containing at least one of them. Twazzup It informs you every time your keywords are mentioned in a tweet. Categorizes results due to the popularity of messages, users sending them TweetBuzzer Shows you which brands is the most popular on Twitter. You can choose a 24 hour, 7-day, or 30-day period. TwiBuzz This tool tells you how often people are using Twitter to tweet your favorite keywords in real time. The described applications allow various operations on data from Twitter [1]. Most of them are focused either on historical data or on tweets being processed at longer intervals [10]. However, no solution is known to allow analyzing the number of tweets and getting information about the place from where they were sent. 2. The aim of the project The aim of the project was to show how the number of tweets on the twitter website changes during international events. The first stage was to collect all tweets about a specific event. Then clean and prepare the data so that only the information that interests us is preserved. The next step was to assign each tweet to geolocation in such a way that it would be possible to mark it on the map. The final part of the work was to prepare a presentation of the collected data using a video showing the change in the number of tweets sent by users from individual countries. 3. Programming environment a. MongoDB MongoDB [11] is an open, non-relational database management system written in C ++. It is characterized by high scalability, performance and the lack of a well-defined structure of supported databases. Instead, the data is stored as JSON-style documents, which allows applications to more naturally process them, while maintaining the ability to create hierarchies and indexing b. Python Figure 3 Python Logo Python [12] is a high level general-purpose programming language, with an extensive package of standard libraries, whose guiding concept is the readability and clarity of the source code. Its syntax is characterized by transparency and brevity. Python supports various programming paradigms: object-oriented, imperative and, to a lesser extent, functional. It has a fully dynamic type system and automatic memory management, being like Perl, Ruby, Scheme or Tcl. Like other dynamic languages, it is often used as a scripting language. Python interpreters are available on many operating systems. c. PyMongo PyMongo [13] is a Python distribution containing tools for working with MongoDB and is the recommended way to work with MongoDB from Python. Listing 1 Using PyMongo module 1. Implementation a. Selection of the event In order to get the most interesting results, we decided to choose an event that will interest audiences from different parts of the world. We decided that it would be good if it would be possible to observe moments that are more exciting for a certain group of recipients. The important thing was also to make it possible to compare the results with the course of the event. In the first place we decided to observe the football game, where we have 2 teams from different countries that fight each other and observe the reaction of the users when climactic moments such as scoring, fouls appear. We can also follow the course of the match in a newspaper or on the website. We decided, therefore, that this is the perfect event for our project. Another event that we have found that can give us interesting results is the Eurovision. It is a song contest, organized annually since 1956, in which 56 countries of Europe, North Africa and the Middle East take part [14]. It is a live competition, shown on TV in which teams from different countries compete. We expected to observe which countries or cities will be most interested in the competition, as well as the temporary activity of users from the country from which the performance is being presented. As selected events, we used data from: - Real Madrid - PSG match, which took place on April 6, 2018 - Eurovision, which took place on May 12, 2018 b. Preparing data The first step during creating the project was reading the data from the twitter. Because the apps.twitter.com application interface allows you to generate an access token and a secret key for the application owner to read tweets, this option was used. To read the data, a program written in Python was used that enabled reading tweets according to the declared key and saving them in the json format. Then it was necessary to create a database in MongoDB, and to create a collection of previously obtained data files. At this stage, we already had all the necessary data in the collection, however to be able to develop analysis and visualization of the location of tweets, it was necessary to clean data. Figure 4 Screenshot of part of document of pure Twitter data The first step in data cleaning was to check if the tweet has a location and delete those that do not have. Then it was necessary to remove unnecessary information and leave important such as: tweet creation time, tweet location, user id, tweet content. The next step necessary to display tweets on the map was to obtain longitude and latitude for the given location. For this purpose, we checked the city for each tweet and searched for the city in the database containing the geographical locations of the cities. After finding the city in the database, we read the longitude and latitude and added it to our collection. The problem turned out to be that the database contained cities sorted according to the alphabetical order of the continents, for example the first city found was London in Africa and not in Europe, which introduced distortions in the data. To improve this, we sorted the data in the database containing the locations in terms of the population descending and we repeated the process of adding longitude and latitude - thanks to this we obtained more reliable data. Then we wanted to have the timestamp od each tweet in such a form to be able to sort after it and create time intervals. Tweeter ‘created_at’ is a String of UTC time when this Tweet was created. Example: "created_at" : "Tue Mar 06 21:41:23 +0000 2018". We decided to change it to time – tuple which contains 10 digits containing time with an accuracy of one second. Figure 5 Screenshot of document from prepared data collection c. Data selection Once the data was properly cleaned and prepared, we could proceed to their selection. We aimed to create a chart, a map, and a movie. The chart was supposed to show the number of tweets in time. For this purpose, it was necessary to aggregate the data over time, summing the number of tweets and sorting in time ascending. Figure 6 Screenshot of data aggregated for chart To create a map showing the number of tweets for a given location during the whole event, it was necessary to aggregate the data relative to the location, summing the number of tweets and sorting them descending. We decided to use the 20 most important locations, which was related to the limit function. Figure 7 Screenshot of data aggregated for map To create a movie, we decided to split the data in the time intervals every minute and for the given ranges to display the number of tweets for a given location. In this case, we also decided to limit the number of cities to 20. Selecting data for this purpose was a more difficult task. Figure 8 Part of the code which create time intervals and find the start and finish time First, it was necessary to determine the time of the first and last tweet. Then specify the number of 60 seconds intervals by subtracting the start and end time and dividing by 60. Then, in the loop for time intervals, aggregations were made over the location, summing the number of tweets and sorting it descending. The data prepared in this way enabled visualization. Figure 9 Screenshot of data aggregated for one interval for movie d. Data visualization To get the most information from our data, we decided to use several different methods of data visualization. The first and the easiest way was the graph of the intensity of tweets from time. It allows you to easily observe whether an occurrence of an event - for example a goal, a break in a match, the performance of a team significantly affects the increase in the number of tweets. To show the data in the chart, we loaded them into the tables - time to one, tweets to the second and then we showed using the Python library - matplotlib in such a way that the time was on the x-axis, while on the y-axis was the number of tweets in the moment. To increase the readability of the chart, we decided to show data every 60 seconds. Figure 10 The part of the code that accompanies adding elements to the array every 60 seconds and creating a graph The next visualization method was a map showing the number of tweets for a given city using bubbles of various sizes during the whole event. Such a map made it possible to show which cities most actively tweeted about a given event. To display the data on the map, we used the Python - Folium library [15]. Using the aggregated data, we have loaded its latitude and longitude for each of the 20 cities as location, name of a city as a name and the number of tweets as a weight. First, we declared an empty map, giving parameters such as the center point and zoom. Then in the loop we gave the parameters of the next bubbles. By providing the number of tweets multiplied by the appropriate weight as the radius of the bubble, we get a larger bubble on the map the more tweets for the given location. Figure 11 The code part which create an empty map, add bubbles and save the map as HTML file At the end, the map with bubbles is saved as an html file. The last method for visualizing the data we have carried out was a movie composed of maps displaying the number of tweets for a given location in minute intervals. This way of presenting the data allows to observe how people react from given regions of the world to a given situation during the entire event. This allows you to see if there are situations in which cities that were not generally active temporarily become active in response to some event. This part was done in a similar way to the previous one with the difference that we used data aggregated in time intervals and the whole map creation was done in a loop. First, the number of time intervals that determined the range of the loop was calculated. Figure 12 The part of the code which obtain interval number and connect to collections with data aggregated in intervals e. Automation To assemble the movie from the created html files it was necessary to save them as a photo. Because we had approximately 250 html files, opening each of them separately and taking a screenshot would take a lot of time, we decided to automate this process. For this purpose, we used the Python library - Selenium and Chromedriver. Selenium WebDriver is one of the most popular tools for Web UI Automation. It can automate execution of the actions performed in a web browser window like navigating to a website, filling forms that include dealing with text boxes, radio buttons and drop downs, submitting the forms, browsing through web pages, handling pop-ups and so on. Chromedriver is a tool which allows automatic opening of files in the Google Chrome browser. The created program loads the html files in the loop, opens them in the Google Chrome browser, and then creates and saves the Screenshot of open on in the website map as a .png file using driver.save_screenshot() formula. Figure 13 The part of code which open the html file create a screenshot and save it into .png file In this way, we get files ready to be assembled in a very short time. f. Adding Tweets to movie The resulting video was clear, it showed how the activity of users from a given place in the world changed during the event. However, we wanted it to be more interesting and provide more information. So, we decided to add a signature to each of the screenshots containing the time and one example of Tweet from the most active place in the given minute. For this purpose, we used the Python PIL library which enables operations in the pictures, among others, adding text in the picture. First, in the loop we read the name of the city with the most tweets and the time interval for the given interval. Then, for the given time frame, we searched for one tweet for a user from a given city and added this tweet as an inscription on the map’s Screenshot from the given interval. Figure 14 The part of the code which create the inscription to the map's Screenshot By adding time to photos, we were able to compare users' activity with existing events, and by adding tweets, we could see what was the subject of user interests. Figure 15 The map's screenshot without inscription Figure 16 The map's screenshot with inscription - date, hour and tweet text 2. Results We used two example datasets to check the project. The first of them was the Real Madrid - PSG football match, which took place on April 6, 2018, while the second one was the Eurovision contest, which took place on May 12, 2018. The aim of the project was to observe how the activity of twitter users in the world changes in response to the given event. To this end, we created a set of tools for visualizing results from each event. a. Real Madrid - PSG football match i. Chart To illustrate how the general activity of twitter users changes during the football game, we created a chart that displays the number of tweets on the y-axis and the time on the x-axis. The timing of tweets starts a few minutes before the match and ends a few minutes after the match Figure 17 Chart of tweets number during match time of Real Madrid-PSG football match The graph shows the high activity of twitter users a few minutes before the start of the match and during the beginning of the match. Later, you can see a great drop in the activity of twitter users. We can see small peaks at the 30th and 40th minute of the measurement, i.e. the 20th and 30th minute of the match, which corresponds to the yellow cards - first for Marco-Verratti (PSG), and then for Mateo Kovacić (Real - Madrid). Then, about 55-65 minutes of measurement, i.e. 45-55 minutes of the match, you can see increased activity. This is probably related to the break in the match - people who were previously busy watching the match, could add a tweet on twitter. Then you can see a drop-in activity and a very large increase of 75 minutes of measurement - this coincides with the scoring of the 51st minute by Cristiano Ronaldo (Real - Madrid). After this event, the activity of twitter users decreases slightly but still is on the high level. The next peak can be seen about 95 minutes, which can be interpreted as a reaction to the goal scored by Edinson Cavani (PSG). The last big increase in activity can be observed in the 105th minute, that is the 80th minute of the match, after scoring a goal by Casemiro and 10 minutes later after the match. ii. Map To see which cities were the most active during the whole match, we created a map with 20 places with the higher number of tweets in the entire match. The cities are surrounded by bubbles, the size of which is illustrated by the number of tweets - the bigger the bubble, twitter users from a given city were more active. The map shows that the most active were users from Rio de Janeiro, Madrid and Paris, as well as from London and other cities from Spain and France. Interestingly, there is a lot of interest in Nigeria, Argentina, Peru, Venezuela, and several cities from Brazil outside of Rio de Janeiro iii. Movie The last way to present the data we used was a video created from the map screenshots showing the activity of users from different cities every minute. His goal was to observe how the activity of regions changes over time. Figure 18 Screenshot of the movie Figure 19 Screenshot of the movie Figure 20 Screenshot of the movie The attached examples clearly show that the activity of twitter users from a given place changes over time. We observed, among other things, that Rio de Janeiro and Madrid add a lot of tweets throughout the duration of the match while Paris and London were the most active during the break and at the end of the match b. Eurovision i. Chart As in the case of a football game, the first step in analyzing Eurovision data was to create a chart of the number of tweets from time. We wanted to observe whether, as in the case of goals during the Real Madrid - PSG match, we will be able to observe increased activity during certain moments of the competition. In the case of the Eurovision competition, we see much more even activity of tweeter users during the entire event. The number of tweets added is rapidly increasing every 10 minutes and then falls sharply. It is probably caused by adding tweets after the performance of a given vocalist. ii. Map Another method used to visualize the data was as in the case of the Real Madrid football game - PSG, showing the activity of given cities on the map during the entire event. On the map, we can see that twitter users from Great Britain, especially from London, were the most active. It is also possible to observe very high activity in Spain, with the most tweets coming from Madrid and Barcelona. You can also see a lot of activity in Berlin. Figure 21 Heatmap of tweets during Eurovision This is partly the case with viewers statistics which look as follows: the final was around 8.21 million viewers in Germany, about 7.2 million in Spain, about 6.9 million in Great Britain iii. Movie Just like in the case of a football game, we presented the obtained data in the form of a film, consisting of maps depicting the intensity of tweets for a given location data every minute. Just like in the case of a football game, we presented the obtained data in the form of a film, consisting of maps depicting the intensity of tweets for a given location data every minute. However, in the case of the Eurovision competition, there are much smaller changes in the activity of the cities in question compared to the data from the football game. Rather, one can notice a decrease in activity at certain times than a change in activity between cities. Figure 22 Part of Eurovision movie Figure 23 Part of the Eurovision movie - decreasing number of tweets observed An interesting thing we noticed during the filming was a break in the data between 102 and 106 minutes. This was most likely caused by the disconnection of the connection to the network while reading data. Figure 24 Part of Eurovision movie - break of data 3. Division of labor The first part of the project was taken by Blazej. Acquiring data from the twitter site, data cleaning involving the removal of tweets from which it was impossible to obtain a location. Then add geolocation to the remaining tweets and change the time format in tweets. The next part of the work was done by Aleksandra. It was the preparation of data so that you could create a chart, map and film from them. Then, visualize the data creating a chart and maps. Addition of subtitles (tweets) to photos and later automation of data aggregation. The last stage of works belonged to Blazej. It consisted in creating png files from html files presenting maps and then automating this process and assembling a film from photos. The presented division of labor is not one hundred percent correct. It is impossible because during the project implementation many of its functions and problems that appeared have worked together. 4. Conclusions Twitter is a social network that provides a microblogging service. A registered user can send and read so-called tweets. A tweet is a short text message (max. 280 characters) displayed on the author's profile and shown to users who are watching the profile. The goal of the project was to analyze the data on media events from the twitter site with special consideration of how the tweets intensity in a given place in the world is shaped, as a response to the event and presenting the data graphically. Twitter data as data for analysis was decided to use, due to its popularity, availability, and the ability to comment on public events. The following steps were taken during the creation of the project: Media events were selected that could engage users from different countries around the world and increase user activity as a response to certain climaxes. It was also important to choose the media events which events whose course could be traced over time e. j. in the newspaper. It was decided to use two media events to be able to compare results, check the correctness of the program's operation for another data set and test the effectiveness of the program's automation. It was decided to use the football game (Real Madrid - PSG) due to the presence of clear climax such as scoring, foul, red card. The football game also had an advantage in the form that we could speculate that certain areas will be more active (the city from which teams come from and the countries from which the players come from) and on this basis, assess the correctness of the analysis. For example, if the results were obtained that during the Real Madrid (Madrid, Spain) match with PSG (Paris, France) the most active users were Poles and Hungarians, we could speculate that the program had errors when assigning locations from an external database. The second event that was decided to use was Eurovision. It was an interesting event due to its international scale and popularity among young people (the most-used age group from Twitter). This event did not have such characteristic points as the football match, so it could have brought some additional insights, or will there be a difference in the tweeting intensity between the two events. During such an event it seems likely that during the performance of the team from a given country, the increase activity of users from this country will be visible, so they would like to see if such relationships will be visible during the analysis. Data were cleaned and prepared so that they were best suited for further analysis. It was necessary to delete many unnecessary data. You also had to choose only documents that contained the location of tweets. Then it was necessary to check whether the location really exists, or is not an artificial location, eg "Neverland". The next step was to add coordinates to each location so that it can be displayed on the map. For this purpose, it was necessary to use an external database containing: city, state, continent, longitude, latitude and population. Because the database was sorted alphabetically to the continent, there were errors, and for example, the first London being read was London in Africa instead of in Europe. For this reason, it was decided to sort the database against the population. This approach resulted in very good results. Data has been aggregated for every specific visualization method. The chart was supposed to show the number of tweets in time. For this purpose, it was necessary to aggregate the data over time, summing the number of tweets and sorting in time ascending. To create a map showing the number of tweets for a given location during the whole event, it was necessary to aggregate the data relative to the location, summing the number of tweets and sorting them descending. It was decided to use the 20 most important locations. To create a movie, it was decided to split the data in the time intervals every minute and for the given ranges to display the number of tweets for a given location. In this case, it was also decided to limit the number of cities to 20. This required the creation of a very large number of new collections, which would be very tedious if it had to be done manually and was the first step in automating the program. Next, the data was presented by means of the tweet intensity graph over time, during the entire media event, which was to enable observation of whether at certain specific moments of the media event, users' activity may increase or decrease. Data was also present as a map showing the number of tweets for different locations during the entire media event, which aimed to show which areas were the most active during the entire event. Last kind of presentation data was a video showing the duration of the media event in minute intervals as maps showing the amount tweets for a given place in the world in a given time, which was supposed to show whether certain specific areas are activated during certain moments of the media event. This part also required automation, introduced automatic generation of maps in html format, automatic opening of the browser and taking screenshots, and then saving to PNG files. It was important for the results to be legible and visually appealing, understandable also for people not related to the project. Thanks to the use of various methods, it was possible to observe both the activity of users during the media event and observe whether there were visible changes in the number of tweets during the climax of the media event, see which cities were most active during the entire media event and observe whether less active areas become active in response to any incident. Based on the graph showing the football game, a clear increase in users' activity on scored goals was observed. It is very positive and confirms that the event was well- chosen and that this type of data visualization fulfills its assumption and is a valuable tool in analyzing data from Twitter. In the case of Eurovision, the number of tweets increases and decreases. Figure 25 Football game - chart Figure 26 The Eurovision - chart From the map depicting the location of tweets in a football game one can read that the most active users were people from Rio de Janeiro, Madrid and Paris, which is confirmed with speculations. During the visualization of Eurovision data using the map, we observed the largest number of tweets from Great Britain, Madrid and Barcelona. After checking the statistics of viewership, the convergence of results was noted. Figure 27 The Eurovision - map Figure 28 Football game - map The movie from the football game it is clearly visible that the activity of twitter users from a given place changes over time. We observed, among other things, that Rio de Janeiro and Madrid add a lot of tweets throughout the duration of the match while Paris and London were the most active during the break and at the end of the match. However, in the case of the Eurovision competition, there are much smaller changes in the activity of the cities in question compared to the data from the football game. Rather, one can notice a decrease in activity at certain times than a change in activity between cities. After analyzing the collected data, it became possible to draw the conclusion that the football game allowed for a more accurate observation of reactions from the viewers. We managed to observe that the increased number of tweets overlapped with important events during the match. It was also observed that during the movie from a football game, changes in the number of tweets for a given city are much more accurate than in the case of Eurovision. However, both events allowed us to observe which moments were most significant and which cities were most involved in commenting on the media event. Even though we managed to present data from a twitter in a very interesting way, enabling data analysis, the project has the potential to introduce some corrections and to expand. An interesting possibility was to display the most common words for a given media event or even for certain time intervals. It would also be possible to study how the number of tweets in a given area is shaped depending on gender. The project could be implemented in the form of a windowed application, in which the user would specify the time range in which he wants to load files, e.g. on a website or a mobile application and receive the result in the form of appropriate charts and statistics generated fully automatically. However, if we would like to expand and transform the project into a commercial project, we would have to take into account the development rules included in [16]: https://developer.twitter.com/en/developer-terms/geo-guidelines.html. The aim of the work was realized, we managed to create a project that allows the analysis of data obtained from the Twitter social network during particular media events. 5. Program files Program files have been placed in google drive access via the following link: https://drive.google.com/drive/folders/1dg_puyjBwUR8tPHC6Q7rNYUwvPrWbl6p?usp= sharing https://drive.google.com/drive/folders/1dg_puyjBwUR8tPHC6Q7rNYUwvPrWbl6p?usp=sharing https://drive.google.com/drive/folders/1dg_puyjBwUR8tPHC6Q7rNYUwvPrWbl6p?usp=sharing 6. Bibliography [1] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of Sciences, 102(33):11623--1162, 2005. [2] https://dictionary.cambridge.org/ [3] https://www.lifewire.com/what-is-a-tweet-3486211 [4] https://twitter.com/ [5] Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng. Why we twitter: understanding microblogging usage and communities. San Jose, California — August 12 - 12, 2007 [6] https://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50- million-tweets-per-day.html [7] Alexandra Segerberg & W. Lance Bennett, Social Media and the Organization of Collective Action: using Twitter to explore the ecologies of two climate change protests. [8] https://wearesocial.com/blog/2018/01/global-digital-report-2018 [9] Richard D.Waters, Jia Y.Jamal.Tweet, tweet, tweet: A content analysis of nonprofit organizations’ Twitter updates. Public Relations Review Volume 37, Issue 3, September 2011, Pages 321-324 [10] Vanessa Murdock, Sheila KINSELLA. Locating a user based on aggregated tweet content associated with a location. US8478701B2 US Grant [11] https://www.python.org/ [12] https://www.mongodb.com/ [13] https://api.mongodb.com/python/current/ [14] https://eurovision.tv/ [15] http://folium.readthedocs.io/en/latest/quickstart.html [16] https://developer.twitter.com/en/developer-terms/geo-guidelines.html. https://dictionary.cambridge.org/ https://www.lifewire.com/what-is-a-tweet-3486211 https://twitter.com/ https://dl.acm.org/author_page.cfm?id=81309502676&coll=DL&dl=ACM&trk=0 https://dl.acm.org/author_page.cfm?id=81100579110&coll=DL&dl=ACM&trk=0 https://dl.acm.org/author_page.cfm?id=81100529754&coll=DL&dl=ACM&trk=0 https://dl.acm.org/author_page.cfm?id=81100582555&coll=DL&dl=ACM&trk=0 https://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html https://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html https://wearesocial.com/blog/2018/01/global-digital-report-2018 https://www.mongodb.com/ https://api.mongodb.com/python/current/ https://eurovision.tv/ http://folium.readthedocs.io/en/latest/quickstart.html