Source Code, Datasets

TweeLoc: A System for Geolocalizing Tweets at Fine-Grain
Pavlos Paraskevopoulos, Giovanni Pellegrini, Themis Palpanas

The recent rise in the use of social networks has resulted in an abundance of information on different aspects of everyday social activities that is available online. In the process of analysis of identifying the information originating from social networks, and especially Twitter, an important aspect is that of the geographic coordinates, i.e., geolocalisation, of the relevant information. Geolocalized information can be used by a variety of applications in order to offer better, or new services. However, only a small percentage of the twitter posts are geotagged, which restricts the applicability of location-based applications. In this work, we describe TweeLoc, our prototype system for geolocalizing tweets that are not geotagged, which can effectively estimate the tweet location at the level of a city neighborhood. TweeLoc employs a dashboard that visualizes the social activity of the geographic regions specified by the user, and provides relevant easy-to-access statistics. Moreover, it displays information on the way that these statistics evolve over time. Our system can help end-users and large-scale event organizers to better plan and manage their activities, and can complete this task fast and more accurately than alternative solutions that we compare to.


You can see a video of the TweeLoc system in action.

Source Code

You may freely use this code for research purposes, provided that you properly acknowledge the authors using the following references:

Zip file with source code for all the algorithms used in the paper (add your Twitter credentials in the Settings file, and follow the instructions in the Readme file). (Small sample data file to use for offline operation.)

Real Datasets

Our method of non-geotagged tweet geolocalization was tested on three real datasets.
  1. The first dataset contains geotagged posts from Twitter that were generated in Italy between June 1 and June 20, 2016. The locations that we focused on were the neighbourhoods of the 7 Italian cities with the highest activity, namely Rome, Milan, Florence, Venice, Naples, Turin and Bologna. The total number of tweets is 218,572.
  2. The second dataset contains geotagged posts from Twitter that were generated from Germany. This dataset contains 325,120 tweets
  3. The geotagged posts of the third dataset were generated from the Netherlands and contains 232,454 tweets.
The latter two datasets, were both generated between August 10 and September 11, 2014.

Other Publications

Pavlos Paraskevopoulos, Giovanni Pellegrini, Themis Palpanas. When a Tweet Finds its Place: Fine-Grained Tweet Geolocalisation. International Workshop on Data Science for Social Good (SoGood), in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML PKDD), Riva Del Garda, Italy, September 2016.

Pavlos Paraskevopoulos, Themis Palpanas. Fine-Grained Geolocalisation of Non-Geotagged Tweets. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Paris, France, August 2015.

Pavlos Paraskevopoulos, Themis Palpanas. What do Geotagged Tweets Reveal about Mobility Behavior? International Workshop on Mobility Analytics for Spatio-Temporal and Social Data (MATES), in conjunction with the International Conference on Very Large Data Bases (VLDB), Munich, Germany, August 2017.

Pavlos Paraskevopoulos, Tanh-Cong Dinh, Zolzaya Dashdorj, Themis Palpanas, Luciano Serafini. Identification and Characterization of Human Behavior Patterns from Mobile Phone Data. International Conference on the Analysis of Mobile Phone Datasets (NetMob), Special Session on the Data for Development (D4D) Challenge, Cambridge, MA, USA, May 2013.


Below we depict some example screenshots from the operation of the TweeLoc system.

Italy heatmap
Figure 1. Country Activity Heatmap (Italy)

Rome heatmap
Figure 2. City Activity Heatmap (Rome)

Square activity
Figure 3. Fine-Grain Location Activity (Rome, 1x1km neighbourhoods)

Square activity
Figure 4. Activity Differential Heatmap (Rome, 1x1km neighbourhoods)

Square activity
Figure 5. Check Tweet Details Interface

Square activity
Figure 6. Activity time-series for Fine-Grain Location