TOURISTS’ DIGITAL FOOTPRINT IN CITIES: COMPARING BIG DATA SOURCES 

 
MARÍA HENAR SALAS-OLMEDO  
Departamento de Geografía Humana.  
Universidad Complutense de Madrid 

C/Profesor Aranguren, s/n. 28040 Madrid, Spain 
msalas01@ucm.es  

 
BORJA MOYA-GÓMEZ 
Departamento de Geografía Humana. Universidad Complutense de Madrid. 

C/Profesor Aranguren, s/n. 28040 Madrid (España) 
bmoyagomez@ucm.es  

 
JUAN CARLOS GARCÍA-PALOMARES * 
Departamento de Geografía Humana.  
Universidad Complutense de Madrid 

C/Profesor Aranguren, s/n. 28040 Madrid, Spain 
jcgarcia@ucm.es 

 
JAVIER GUTIÉRREZ  

Departamento de Geografía Humana.  
Universidad Complutense de Madrid 

C/Profesor Aranguren, s/n. 28040 Madrid, Spain 
javiergutierrez@ghis.ucm.es 

 
*Corresponding author. 

 
mailto:msalas01%40ucm.es
mailto:bmoyagomez@ucm.es
mailto:jcgarcia%40ucm.es
mailto:javiergutierrez%40ghis.ucm.es


Highlights 

 The paper analyses the digital footprint of urban tourists through Big Data. 

 Panoramio, Foursquare and Twitter reflect different tourism activities. 

 The methods used are density maps, OLS, spatial self-correlation and cluster analysis. 

 The results show different tourist spaces: multifunction (several activities) vs 

specialising. 

 
1 
 

TOURISTS’ DIGITAL FOOTPRINT IN CITIES: COMPARING BIG DATA SOURCES 

 
Abstract.- There is little knowledge available on the spatial behaviour of urban tourists, and 

yet tourists generate an enormous quantity of data when they visit cities. These data sources 

can be used to track their presence through their activities. The aim of this paper is to analyse 

the digital footprint of urban tourists through Big Data. Unlike other papers that use a single 

data source, this article examines three sources of data to reflect different tourism activities in 

cities: Panoramio (sightseeing), Foursquare (consumption), and Twitter (being connected-

accommodation). The results show that the data from the three activities are partly spatially 

redundant and partly complementary, and allow the characterisation of multifunction tourist 

spaces and spaces specialising in one or various activities. The main conclusion is that it is not 

sufficient to use one data source to analyse the presence of tourists in cities; several must be 

used in a complementary manner.  

Key words.- Urban tourism, Big data, Photo-sharing services, Social networks, Spatial analysis, 

GIS 

 
http://ees.elsevier.com/jtma/viewRCResults.aspx?pdf=1&docID=18028&rev=1&fileID=353869&msid={ABE64E9D-2D79-4F23-B021-4E06D49F77CE}


2 
 

1. INTRODUCTION 

The tourism product of large cities has enormous capacity and is highly diversified (Jansen-

Verbeke, 1986). Tourists make very selective use of the city. They reduce uncertainty in their 

exploration of an area by visiting sites perceived to give the greatest reward for effort (Cooper, 

1981). In fact, it is impossible for tourists to consume the entire urban tourism product on an 

average 2 to 3-day visit to such a city (Mazanec, 1997), so they must choose which of the 

attractions they wish to visit, and which to skip. The result is the creation of typical tourism-

product consumption patterns based on the preferences and limitations of different tourist 

types (Shoval & Raveh, 2004). Studies analysing spatial patterns of tourist mobility in cities 

show that they tend to be concentrated in specific areas of city centers (Shoval & Raveh, 2004; 

Hayllar & Griffin, 2009), where they find the main tourist attractions (historical buildings and 

parks, museums, theatres, concert halls, etc.), along with leisure and shopping facilities and 

accommodation services for tourists (Pearce, 1987). Not surprisingly, most tourists seek hotels 

or apartments that are within walking distance of major attractions in the city (Arbel & Pizam, 

1977) and spend a large share of their time budget in the immediate vicinity of the hotel 

(Shoval et al., 2011). As a result, city centers are profoundly transformed by the pressure of 

tourism. They become more and more oriented towards meeting the needs of tourists (hotels 

and apartments, souvenir shops, restaurants, etc.), and less toward residents’ needs, who tend 

to abandon central locations. These processes are known as tourismification (Jansen-Verbeke, 

1998) and tourism gentrification (Gotham, 2005).  

Traditionally the spatial behaviour of tourists in cities has been studied through surveys (for 

example, Cooper, 1981). However, surveys do not provide great spatial and temporal data on 

the spatial behaviour of tourists. Over the past nine years, the rapid advancement and 

availability of small, inexpensive and reliable tracking devices that draw on GPS technology is 

assisting researchers in developing new methods of spatial research (Edwards & Griffin, 2013). 

GPS allows for the precise and continuous tracking of individuals and provides spatially rich 

data, making it possible to accurately track the paths tourists take and to provide greater 

understanding of their socio-spatial behaviour (Asakura & Iryo, 2007). Not surprisingly, several 

studies have used GPS tracks in recent years to analyse the spatial behaviour of tourists (for 

example, Shoval et al., 2011). However, most of these studies use small samples, since the 

collaboration of tourists is necessary to obtain their tracks. 

Big Data offers new opportunities in tourism research by providing high spatial and temporal 

data that make it possible to analyse the spatiotemporal patterns of a large numbers of 

tourists. Big Data is a new concept that has become widely popularised in recent years to 

describe the production of massive quantities of data. Big Data covers a range of very different 

areas: Internet searches, bank card transactions, records of mobile phone activity, social 

networks, data on water and electricity consumption, meteorological data, images recorded 

with video cameras and many more. The main characteristics of these new data sources 

include particularly the following three Vs: volume, in terabytes or petabytes of data; velocity, 

created in or at near real time; and variety, taken from a wide variety of sources, either 

structured (data that can be stored in the form of tables), semistructured (htlm files) or 

unstructured (texts, photographs, videos) (McAfee et al., 2012; Kitchin, 2013; Sagiroglu & 

Simanc, 2014).  

Big Data supplies a large quantity of information to complement the traditional sources. 

Tourists leave a digital “footprint” in most of their activities, and these new data sources now 

make it possible to analyse tourists' behaviour in the cities they visit. They take vast numbers 


3 
 

of photographs and upload them to photo-sharing services, they make payments with bank 

cards, they talk and send messages via their mobile phones, they are active on social networks, 

and so on. All this activity produces an enormous quantity of digital data (Big Data) which can 

be analysed to study behaviour patterns (Shoval & Isaacson, 2007; Asakura & Iryo, 2007; 

Girardin et al., 2008a and b). Much of these data is geolocated, so tourists' activity can be 

analysed spatially. However, there are very few papers that apply Big Data to examine the 

spatial distribution of tourists in cities, probably due to the novelty of these information 

sources and the fact that some are difficult to access.  

Photo-sharing services provide very useful information for identifying the presence of tourists 

when they go sightseeing in cities. Although there are several photo-sharing communities that 

allow the geolocation of photos (such as Flickr or Instagram), Panoramio is probably the most 

useful service for measuring tourist hotspots, as this website shows photographs taken of 

places or landscapes when sightseeing, which are then posted online once they have been 

georeferenced. The records of geolocated photographs can be used not only to identify 

sightseeing spots (García-Palomares et al., 2015), but also to analyse the spatial and temporal 

patterns of tourist flows in cities (Girardin et al., 2008b).   

However, tourists not only visit the most photographed spaces. They also go shopping, go to 

restaurants and stay in hotels, and they leave their digital footprint in all these establishments 

when they pay with a bank card or check-in their location on social networks. These digital 

footprints of tourists offer information which is largely complementary to the data from 

photo-sharing social networks. The most photographed areas often have very little offer of 

accommodation and shopping. In the case of business tourists, their hotel and the spaces they 

frequent tend to be close to business sectors, and not necessarily in the most photographed 

areas. During their stay in the city, tourists also log onto the Internet to confirm details of their 

visit, check their e-mail, engage in the social networks, and so on. This activity also leaves a 

digital footprint in many of the places they visit. Tourists often use the facilities in hotels, 

hostels, restaurants and certain open spaces to connect to Internet through free WiFi 

networks, so their activity on social networks may particularly reflect this type of spaces.  

The main aim of this paper is to compare three geolocated data sources to identify the 

presence of tourists in cities in terms of their different activities: geolocated photographs from 

the Panoramio platform area used as a proxy for sightseeing, Foursquare check-ins as a proxy 

for consumption, and interaction on the social network Twitter as a proxy for being connected-

accommodation. The study area is the city of Madrid, one of the European cities with the 

highest volume of tourists.  

This paper contributes to the literature on Big Data and tourism activity from a threefold 

perspective: 1) Three different data sources are compared to obtain the most comprehensive 

analysis of locations of tourists. 2) The data for tourist activity (photos, check-ins, tweets,) are 

not analysed directly as in previous works, but the tourists themselves are the unit of analysis. 

The data are processed to allow the number of tourists to be counted in each place in the city 

according to each data source, thus avoiding problems of multiple counting, and making the 

results comparable. 3) The information from the different data sources is integrated through 

cluster analysis and spatial autocorrelation analysis to characterise the areas of tourist 

concentration according to the type of activity.  

The remainder of this paper is structured as follows. Section 2 summarises the existing 

literature on the use of photo-sharing services, Foursquare check-ins and Twitter in urban 

studies, with a particular focus on tourism. Sections 3 and 4 describe the data and the 


4 
 

methodology respectively. Section 5 describes and discusses the results, while Section 6 

presents the main conclusions and suggests further directions for research.   

 
2. RELATED LITERATURE  

Photo-sharing services 

Sightseeing is one of the main tourist activities in cities, and leaves its digital footprint on social 

networks for photo-sharing such as Instagram, Flickr and Panoramio. All three offer the 

possibility of geolocating photographs, but Panoramio (http://www.panoramio.com) 

particularly allowed the georeferencing of the photos, as it focused on images of places or 

landscapes shared by its users, which can be seen on the Panoramio website (until November 

4, 2016) or through Google Earth and Google Maps. In fact, Panoramio was a Google service, 

and had over 120 million geolocated photographs (2015 data according to Panoramio data 

API). 

Photo-sharing services have been used for several purposes in the field of tourism, including 

identifying social events such as festivals, demonstrations, sporting events and so on (Sun and 

Fan, 2014), estimating tourist numbers (Koerbitz et al., 2013), identifying the presence of 

tourists (Girardin et al., 2008a, Kisilevich et al., 2013; Straumann et al., 2014, Gutiérrez et al, 

2017) and the most common trajectories followed by tourists (Girardin et al., 2008b), 

proposing or assessing tourist routes (Kurashima et al., 2013), suggesting tourist trips (Lu et al., 

2010) and planning trips lasting several days and places to visit (Li, 2013). These data sources 

make it possible to identify areas with a concentration of tourists in cities (sightseeing spots) 

through spatial statistical techniques (García-Palomares et al., 2015). Photographs taken by 

tourists can be differentiated from those taken by residents based on the time period in which 

the same user takes the photographs. The results obtained in the work of García-Palomares et 

al. (2015) show clearly differentiated spatial distributions for tourists (more concentrated in 

sightseeing spots) and local residents (more widely dispersed throughout the city) in eight 

European cities.  

Foursquare check 

Tourists not only leave their digital footprint when they go sightseeing in the city. They also 

leave a digital trail when they engage in consumption-related activities (shopping, restaurants 

and so on) and pay with bank cards (Sobolevsky et al., 2014a, 2014b and 2015). However, bank 

card transactions are a data source that has been very little used in tourist studies, due to the 

difficulty of accessing these databases. An alternative data source for analysing consumption 

activities is Foursquare social network. This service enables a user to inform their friends 

his/her location by generating check-ins, rate and review venues they visit, and read other 

users’ reviews. Each check-in contains information that reveals who (which user) spends time 

where (at what location), when (what time of day, what day of week), and doing what 

(according to the kind of venue) (Çelikten et al., 2016). 

Foursquare data have been used for analysing geographic distribution of venues across the 

cities (Çelikten et al., 2016), discovering functional urban areas (Vaca et al., 2015), analysing 

movement pattern and the popularity of areas in cities (Silva et al., 2013), studying traffic 

conditions (Ribeiro et al, 2014), and performing trade area analysis (Qu et al., 2013). However, 

there are few studies using Foursquare data in the field of tourism. One exception is Ferreira et 

al. (2014), who analyze the spatio-temporal characteristics of the behaviour of tourists and 


5 
 

residents in a set of cities, identifying the most visited places and the temporal distributions of 

both groups of visitors. Other exception is the paper by Serrano-Estrada et al. (2016), who map 

the most relevant venues in terms of number of visitors within a tourist city.  

Twitter  

Most of the studies done with mass data from social networks have used Twitter (Murthy, 

2013), not only because this platform has global coverage, but also because its data (the 

tweets) are available free on the Internet the instant they are produced –that is, in real-time. 

Each geolocated tweet leaves the digital footprint of the place and the time it was sent. If the 

data are processed according to their user identifier, they provide an approximation of the 

places visited by each user at different times of the day and days of the week –that is, their 

spatial and temporal profile.  

Activity on social networks can be used as a proxy to analyse the changing population densities 

in the city throughout the day (Ciuccarelli et al., 2014) and the population's mobility patterns 

(Wu et al., 2014). Twitter’s daily use profiles serve to classify the space according to the type of 

dominant activity, whether business, leisure/weekend, nightlife and residential (Frias-Martinez 

et al, 2012). Geolocated tweets have also been used to analyse the degree of social mixing in 

the use of space by tracking the movement of social groups in highly segregated cities such as 

Río de Janeiro (Netto et al., 2015) and Louisville (Shelton et al., 2015). Unlike the information 

supplied by official sources, which offer data relating to place of residence, these studies apply 

indicators of multiculturalness and mixing to examine the use of space throughout the day. For 

example, there are studies on linguistic diversity in cities and regions based on the language 

used in tweets as an indicator of cultural diversity (Mocanu et al., 2013). Elsewhere, Takhteyev 

et al. (2012) studied the role of distance, languages and borders in configuring the networks of 

tweets, and concluded that the relations established over long distances are similar to those 

observed in air transport flows.  

Papers on tourism using Twitter data are particularly scant. The very few works that use 

geolocated tweets in the field of tourism tend to focus on comparing visitor’s spatial behaviour 

between cities on the national or global scale (Bassolas et al., 2016; Hawelka et al., 2014; 

Sobolevsky et al., 2015), but do not address the spatial patterns within the city.  

 
3. STUDY AREA: MADRID 

The area selected for study was the city of Madrid. Madrid is a historic city that has a large 

concentration of tourists. Delimitation of the study area was based on the administrative 

reference unit: the municipality of Madrid. This is a relatively compact area of 604,3 km² 

around the traditional city. With 3,160,000 inhabitants in 2016, the population density of the 

municipality of Madrid is high, with more than 5,200 inhabitants per km². 

According to the Statistics National Institute, Madrid received 9 million tourists in 2016. Its 

popularity rose considerably in the last decade. In 2007, the number of tourists totalled 7.3 

million, so there was an increase of 24% in the last ten years. International visitors increased 

even more, from 3.4 million in 2007 to 4.6 in 2017 (an increase of 35%). This huge influx of 

visitors has an enormous economic and social impact on the city, generating more than 7.600 

millions of euros in 2016. Figure 1 shows the main points of tourist attraction in the city of 

Madrid. 


6 
 

Figure 1: The city of Madrid 

 
4. DESCRIBING AND PRE-PROCESSING THE DATA 

Panoramio 

The records of 307,062 geolocated photographs uploaded to Panoramio between 2006 and 

2014 covering the municipality of Madrid were downloaded from the Panoramio API. This 

dataset contains information about the geographic coordinates, the ID of the owner of the 

photograph, a URL link to the photograph, and the date on which it was uploaded (day, month 

and year).  


7 
 

This downloading generated “.csv” files. The geographical coordinates of each photograph 

were used to create a layer for each location using ArcGIS 10.4. This data source does not 

include a field to record the user’s nationality. However, it was possible to “identify” each user 

as a tourist or a resident using the user’s ID and the date of his/her photograph. Using 

MongoDB, if a user had taken pictures in Madrid over a period exceeding one week per year, 

the photographs were attributed to residents; if the period was less than one week per year, 

then they were attributed to tourists
1
. This methodology is similar to that used by Fischer for 

his Geotaggers' World Atlas, and García Palomares et al. (2015). A total of 27,573 photographs 

were assigned to tourists as a result of this process (Figure 2a).  

 
Twitter and Foursquare 

One of the most widely used social networks is Twitter, a platform that allows users to send 

messages with a maximum of 140 characters –known as tweets– which by default are public. 

This service has 500 million users all over the world and generates around 65 million tweets a 

day. Some of these (around 3% until April 2015 and currently around 1%) are geolocated; that 

is, messages in which the sender’s location is known from their geographic coordinates. Given 

the vast amount of messages sent on the social networks, this 3% percentage represents an 

enormous quantity of tweets. 

Our study was conducted using a dataset of geolocated tweets sent from Madrid between 

2012 and 2014. The dataset contains information on the geographic coordinates, the owner’s 

ID, the language, the date on which it was uploaded (minute, hour, day, month and year), any 

messages included, etc. This file underwent a similar treatment to the Panoramio file. First, 

tourists were identified using MongoDB to extract users that had tweeted for a week or less 

per year. Then, all twees from those users within the municipality of Madrid were used to 

generate a layer of tweets with ArcGIS using the coordinates of the logs. A total of 234,159 

tweets were identified as sent by tourists visiting Madrid.  

Foursquare is a community of 50 million users allowing tourists to check in to the 

establishments they visit and providing personalized recommendations of the places to go 

(restaurants, nightlife spots, shops and other places of interest) in the surrounding area. These 

check-ins are private by default, but they become publicly accessible for example, when users 

opt to share their check-ins publicly via Twitter (Çelikten et al., 2016). Thus, it was possible to 

extract Foursquare data by selecting Foursquare check-ins in our Twitter dataset. From a total 

of 234,159 tweets assigned to tourists, 20,076 were Foursquare check-ins (Figure 2b) and the 

rest (214,083) were considered ordinary tweets (Figure 2c).  

                                                           
1
 We are aware that the one-week limit to differentiate between tourists and residents can lead to two 

types of errors: omission errors (some tourists can stay for more than one week in the city and be 
considered residents), and commission errors (some residents may Tweet or take photos for less than 
one week, and therefore being considered tourists). The probability associated with the first type of 
error seems to be very low, since the average length of stay of tourists in Madrid is 2.1 nights (Madrid 
City Council). The magnitude of the second type of error is difficult to estimate; however, the 
comparison between the identified spatial clusters and a map of the city (Open Street Map) indicates 
that our results are consistent, since the spatial clusters coincide with areas in the city that are known to 
attract the highest concentration of tourists. This consistency was observed in previous papers analysing 
the footprint of tourists in cities, such as García-Palomares et al. (2015). 


8 
 

Figure 2: a) Photographs taken by tourists. b) Foursquare check-in by tourists. c)  Tweets sent by tourists. 
Figure 2a references: 1) Royal Palace; 2) Puerta del Sol square; 3) Plaza de Cibeles square; 4) Plaza Mayor square; 5) Las Ventas bullring; 6) Temple of Debod; 7) Real Madrid 

football stadium; 8) Atlético de Madrid football stadium; 9) Atocha – Reina Sofía Museum; 10) Cuatro Torres; 11) Torres Kio 
Figure 2c references: 1) Historic centre; 2) Paseo de la Castellana axis; 3) Salamanca district; 4) Madrid Río axisBackground map: OpenStreetMap. 


9 
 

Foursquare data and tweets exhibit very different temporal distribution patterns (Figure 3). 

Foursquare check-ins are more equally distributed along the day (check-ins in restaurants, 

shops and other places), while tweets tend to be more concentrated between 18 and 21 hours 

(what suggests that the use of Twitter is predominantly related to the location of 

accommodations). 

 
Figure 3: Temporal distribution of Foursquare and Twitter data of tourists in Madrid 

 
5. METHODOLOGY 

The following methodology was used to analyse the spatial distribution of tourists in Madrid: 

a) Number of tourists per census tract.- The tourists' photos, Foursquare check-ins and tweets 

were located according to census tracts, counting the number of single tourists in each census 

tract and for each data source from the user ID. The row data was converted to single tourists 

using joint spatial (census tracts) aggregation to obtain the number of single users in each 

census tract. 

b) Tourist density by census tract.- The number of tourists in each census tract depends on the 

actual concentration of tourists in the census tract and its size. Larger census tracts tend to 

register a greater number of tourists. To mitigate this problem (Modifiable Areal Unit Problem 

-MAUP) (Openshaw and Taylor, 1981), the tourist density per census tract was obtained for 

each data source. In contrast with regular units (grids), using census tract we obtained relative 

homogeneous spatial zones from the point of view of land use. Census tract also allow identify 

some references areas like parks, football stadiums, big shopping centers and others.  

c) Rescaling of the data.- A very different number of tourists was detected through the three 

data sources. The density data of visitors by census tract were rescaled to a scale of 0 to 1,000 

(by means of a linear transaction) in order to eliminate the effect of different ranges in the 

data sources and to make density data comparable.  

d) Density maps and descriptive statistics.- Rescaled data aggregated by census tract were 

used to obtain density maps and descriptive statistics. Density maps provided an initial visual 


10 
 

overview of the density distribution of tourists in Madrid according to the three datasets. 

Descriptive statistics were useful to compare the degree of concentration of visitors according 

to the data sources used.  

e) OLS analysis.- Results obtained for the three data sources were compared using bivariant 

Ordinary Least Squares (OLS). The coefficient of determination reveals the common part of 

variation between each pair of data sources, and the differences between each pair of sources 

can be analysed using the maps of standardised residuals. 

f) Cluster analysis.- Cluster analysis was used to integrate the information from the three data 

sources and characterise the census tracts according to the tourist activities performed in 

them. The K-means clustering algorithm was used. This method looks for a solution where all 

the features within each group are as similar as possible, and all the groups themselves are as 

different as possible.  

g) Spatial autocorrelation analysis.- Unlike previous analyses, spatial autocorrelation 

techniques do not consider each location in an isolated way, but in relation to the locations in 

its environment (Anselin, 1995). Global Moran's I statistic and Anselin Local Moran's I (LISA 

statistic) were calculated separately for the three data sources, using the IDW (inverse distance 

weighted) method with a 500 m radius. The LISA analysis identified High-High clusters of 

tourists (a high value surrounded primarily by high values). The results were combined in order 

to determine the specialisation of each census tract, which can be classified as areas of 

sightseeing (Panoramio High-High clusters), tourist consumption (Foursquare High-High 

clusters), Internet consumption (Twitter High-High clusters) or a combination of two or three 

types (for example, consumption and sightseeing ). A census tract classified as HH cluster 

according to the three data sources means that in a radius of 500 m there is high potential for 

the three types of activities.  

All these calculations and maps were made using ArcGIS 10.4 software. 

 
6. RESULTS 

Tourist density maps and descriptive statistics 

Figure 3 shows tourist density maps according to census tracts, with rescaled data and at the 

same intervals in the three maps. Panoramio (Figure 4a) clearly shows a high spatial 

concentration of visitors in the historic centre and along the north-south axis of the city (Paseo 

de La Castellana). The areas of greatest density denote the sightseeing spots in the city, for 

example Plaza de Cibeles, Puerta de Alcalá, Puerta del Sol, Plaza Mayor, the Royal Palace, the 

Temple of Debod, Plaza de España, Reina Sofía Museum, Atocha Station, Gran Vía street, the 

Real Madrid and Atletico de Madrid stadiums, the Las Ventas bullring, Torres Kio, Cuatro 

Torres, among others. Some census tracts have a high density of photos, not because they 

themselves contain a sightseeing spot, but because they offer a good vantage point for taking 

photographs of a sightseeing spot located in an adjacent census tract. The prohibition against 

taking photographs inside some monuments explains the relatively low density of photos in 

the census tracts containing them (for example, the Prado Museum and the Royal Palace).  

The digital footprint of tourists in consumption activities (Figure 4b) is particularly dense in the 

historic centre (for example, the shopping area of Gran Vía street) and, to a lesser extent, in 

surrounding areas (for example, the area of luxury shops known as the Golden Mile in Barrio 


11 
 

de Salamanca). It is also sporadically explained by the presence of some shopping centres 

(such as the AZCA centre) or singular places (Real Madrid Stadium museum and shopping 

area).  

Finally, tourist density identified according to Twitter (Figure 4c) is more disperse. It is 

particularly high in the historic centre, and it tends to spread throughout surrounding census 

tracts, and along the Paseo de la Castellana axis. Most of Madrid's tourist accommodation 

offer (hotels, hostels, tourist apartments...) is located in these areas (see, for example, 

booking.com or http://insideairbnb.com/madrid/).  

The descriptive statistics of tourist density according to the data sources (Table 1) show that a 

much higher number of tourists have been detected from Twitter than from any of the other 

two sources, thus justifying the rescaling of the three variables. The coefficient of variation of 

Foursquare data reveals that, as suggested by the maps, consumption activities exhibits the 

highest spatial concentration.  

 
Table 1: Descriptive statistics  

Tourist density/Ha according to census tracts 

 Panoramio Foursquare Twitter 

Count: 2415 2415 2415 

Minimum: 0 0 0 

Maximum: 3.83 8.25 13.86 

Sum: 312.95 424.53 2491.02 

Mean: 0.13 0.176 1.03 

Standard Deviation: 0.33 0.550 1.31 

Variation coefficient 251.0 312.80 127.1 

Tourist density/Ha according to census tracts: rescaled data 

 Panoramio Foursquare Twitter 

Count: 2415 2415 2415 

Minimum: 0 0 0 

Maximum: 1000 1000 1000 

Sum: 81625.85 51420.18 179479.38 

Mean: 33.80 21.29 74.32 

Standard Deviation: 84.86 66.60 94.51 

Variation coefficient 251.0 312.80 127.1 

 
12 
 

Figure 4: Tourist density according to: a) Panoramio, b) Foursquare, c) Twitter. The references 

to the numbers in Figure 3a can be seen in Figure 1. 

 
Comparison between data sources: OLS analysis 

To determine the degree of association between the three distributions, the coefficient of 

determination was calculated between each pair of data sources (Table 2). The standardised 

residuals of the bivariate regressions reveal where the data sources present the greatest 

differences (Figure 4).  

The correlation analysis points to a medium positive correlation between the tourist densities 

provided by the Twitter-Foursquare data sources. The correlation between the number of 

tourists provided by the Panoramio-Twitter and Panoramio-Foursquare data sources is low, 

indicating a greater complementarity. The main discrepancies between these three data 

sources can be analysed through the residuals between the variables in the regression 

analysis: 

- Panoramio has more tourists than expected according to Foursquare and Twitter (positive 

residuals in Figures 5a and 5b) in the historic centre and in the main sightseeing spots in the 

city (football stadiums, the bullring, Retiro Park, Torres Kio, Cuatro Torres, Temple of Debod), 

but less than expected (negative residuals) around the city centre.  

- Foursquare shows more tourists than expected according to Twitter (Figure 5c) in the historic 

centre, the Salamanca district, in the AZCA shopping and business centre, the large shopping 

centre of La Vaguada, Real Madrid Stadium, and Atocha station, but less tourists than 

expected in less central areas.  

 
13 
 

Table 2: Coefficients of determination (adjusted r2) of tourist distribution according to data 

sources (OLS)  

 Panoramio Foursquare Twitter 

Panoramio 1   

Foursquare 0.27** 1  

Twitter 0.31** 0.52** 1 

** Significant to 0.01 

 
Figure 5: Standardised residuals of bivariate regressions: a) Foursquare-Panoramio; b) Twitter-

Panoramio; c) Twitter- Foursquare. The references to the numbers in Figure 4a can be seen in 

Figure 1. 

 
Types of spaces according to tourist activities: cluster analysis  

Cluster analysis enables the census tracts to be typified according to tourist density per activity 

(sightseeing, consumption or Internet activity). The cluster was calculated using the K-means 

algorithm and using rescaled density values. Table 3 shows the means and standard deviations 

that characterise each group and the parallel box plot graph summarises both the groups and 

the variables within them (Figure 6). Figure 7 shows the spatial distribution of the 6 groups 

established: 

‐ Groups of census tracts with a predominance of tourists related to sightseeing: groups 

6 (coffee) and 2 (red). The differences are based on the intensity of the number of 

tourists. Group 6 contains tracts with a very high number of tourists identified in 

Panoramio, and high in Twitter and Foursquare users, and corresponds to spaces such 

as the bullring, Puerta de Alcalá arch, Glorieta de Atocha and Plaza de España square. 

Group 2 also contains tracts linked to sightseeing, but with a lower number of tourists: 


14 
 

the parks of the Temple of Debod and Retiro, the Royal Palace and the surrounding 

area, the Atlético de Madrid stadium, Atocha station and singular buildings such as 

Torres Kio and Cuatro Torres. 

‐ Groups of census tracts with a predominance of tourists linked to consumption: groups 

5 (violet) and 3 (green). Group 5 contains the most commercial tracts in the historic 

centre (Gran Vía, Puerta del Sol square), which also have a very high number of 

tourists according to Twitter and Panoramio. Group 3 comprises tracts with a high 

tourist density related to consumer activities and Twitter, but much less in relation to 

Panoramio. These correspond to the historic centre (except the most commercial 

area), with a decrease in specialisation in Twitter towards the periphery, but also 

commercial areas like Golden Mile-Salamanca district, AZCA and retail spaces in the 

centre. 

‐ Groups of census tracts with a predominance of tourists related to Internet activities: 

Group 1 (blue), with a lower number of tourists, occupies the limits of the historic 

centre. 

‐ Census tracts with a low presence of tourists: group 4 (yellow). These are the non-

tourist spaces in the city and generally correspond to peripheral census tracts.  

 
Table 3: Means and standard deviations of the groups of tracts. 

  Tourists according to 
Panoramio 

Tourists according 
to Foursquare 

Tourists according 
to Twitter 

Groups Number of 
tracts 

Mean Sd Mean Sd Mean Sd 

1 331 23.9 30.0 33.2 35.1 161.3 60.1 

2 151 148.5 56.0 21.4 27.7 110.5 66.1 

3 76 141.5 82.9 182.6 72.5 282.6 102.1 

4 1795 8.2 17.5 3.82 9.2 36.1 25.2 

5 26 343.0 217.3 498.7 165.1 599.9 178.1 

6 36 466.6 168.2 96.9 78.8 208.9 119.6 

Total 2415 33.8 84.8 21.3 66.6 74.3 94.5 

 
15 
 

Figure 6: Standardised means for the groups of tracts (parallel box plot graph). 


16 
 

Figure 7: Groups of tracts according to tourist activities 

 
Analysis of spatial autocorrelation 

Spatial autocorrelation allows the data from each census tract to be analysed not in isolation 

(as in previous sub-section) but in relation to the data in the census tracts in its environment. 

Using the IDW (inverse distance weighted) procedure with a distance threshold of 500 m, 

Global Moran’s Index shows a positive spatial autocorrelation in all cases (positive Moran I 

values), lower for Panoramio than for Twitter and Foursquare (Table 4). Anselin Local Moran’s I 

statistic reveals the spatial cluster distribution (significant at the 0.01 level) (Figure 8). HH 

census tracts (high values in a variable surrounded by high values in that same variable) tend 

to form a single cluster in the case of Foursquare (historic centre and Salamanca district) 

(Figure 8b), but several clusters in geolocated photographs (historic centre, Real Madrid 

stadium, Torres Kio-Cuatro Torres) (Figure 8a).  


17 
 

Table 4: Global Moran’s I statistics (distance threshold = 500m) 

 Panoramio Foursquare Twitter 

Global Moran’s Index 0.571 0.692 0.741 

z-score 37.88 46.18 49.02 

p-value 0.0000 0.0000 0.0000 

 
Figure 8: Results of the LISA analysis (distance threshold = 500m). a) Panoramio, b) Foursquare, 

c) Twitter. The references to the numbers in Figure 7a can be seen in Figure 1. 

 
The results of the previous univariate analysis have been cross-referenced to incorporate the 

information from the three data sources. Figure 9 shows a classification of the census tracts 

considering the HH clusters of the three data sources jointly. When a census tract forms part of 

a HH cluster in the three sources, it indicates that in a radius of 500 m there is a high density of 

opportunities for sightseeing, consumption and connecting to the internet. This cross-

referencing therefore gives a more complete multivariant LISA vision than would be obtained 

by using bivariant LISA analysis. The results show census tracts that form part of the HH cluster 

in the three data sources (historic centre), or in only one of them –for example, in areas 

specialised in sightseeing (such as Las Ventas bullring), in consumption (Salamanca district) or 

in both (AZCA shopping and business centre). In general, the resulting map shows how tourist 

specialisation tends to increase radiating outwards from the historic centre towards the 

periphery: the census tracts in the centre have a mixed character (multifunctional), and are 

surrounded by others which generally have two activities, while the most peripheral tend to 

specialise in one activity.  

 
18 
 

Figure 9: Typology combining the HH cluster of the three data sources 

 
7. DISCUSSION 

The work tracks the digital footprint of urban tourists using several data sources and taking the 

city of Madrid as the study area. Unlike the paper by García-Palomares et al. (2015), comparing 

different cities using the same data source (Panoramio), we perform a comparison between 

several data sources within the same city. Three data sources are used to identify the places 

where tourists carry out their activities: geolocated photographs from the Panoramio platform 

(sightseeing), Foursquare check-ins (consumption), and interaction on the social network 

Twitter (being connected-accommodation). This article not only counts the density of digital 

footprints (for example, number of photographs or tweets), but also the density of tourists. 

This mitigates problems of possible bias (compulsive photographers, Foursquare users or 

tweeters) and allows the three data sources to be compared.  

Previous literature (Girardin et al., 2008a, García-Palomares, 2015) identified visitors by 

dividing the study period into one-month blocks. If a user had taken pictures over a period 

exceeding one week per year, the photographs were attributed to residents; if the period was 

less than one week per year, then they were attributed to tourists. We have reduced this time 

limit to one week, more in line with the typical length of the stay in urban tourism, usually 2 or 

3 days (Mazanec, 1997). In fact, the average length of stay of tourists in Madrid is 2.1 nights 

(Madrid City Council). The same time limit is considered in the three data sources, in order to 

establish comparisons between them. 


19 
 

The data sources and methodological approach used in this paper contribute to a better 

understanding of tourism activities in cities, not yet sufficiently developed (Pearce, 2001; 

Shoval & Raveh, 2004). These results are consistent with previous studies (for example, Shoval 

& Raveh, 2004; Ashworth & Page, 2011) showing that tourists visiting a city are selective and 

make use of only a very small portion of all that the city has on offer. The findings reveal that it 

is not sufficient to use a single data source to understand the spatial distribution of tourists in 

cities, as tourists engage in different activities in different spaces. Not surprisingly, the three 

data sources show a high tourist density in some areas of the historic centre, where there is a 

significant concentration of monuments, shops, hotels, restaurants and more. Spatial clusters 

identified in this paper confirm this trend, which is consistent with previous literature (Pearce, 

1987; Hayllar & Griffin, 2009). These central areas can be considered as multifunctional tourist 

areas, since they exhibit a high concentration of tourist attractions, facilities and 

accommodations.  

However, tourists’ digital footprint also extends throughout other areas of the city (football 

stadiums, bullring, commercial districts, large parks), which is consistent with previous 

literature (for example, Shoval & Ravel, 2004). Contrary to the multifunctional character of the 

most central tourist areas, the spatial clusters identified outside the historical centre tend to 

be more specialised from the tourist point of view, so that tourist specialisation tends to 

increase, radiating outwards from the historic centre towards the periphery.  

Spatial clusters identified by using these three data sources exhibit some similarities, but also 

clear differences. The areas of tourist concentration identified using Panoramio data coincide 

with the main sightseeing points of the city. These results are in line with those obtained in 

previous studies with data from social networks of geolocalized photographs (eg, Girardin et 

al., 2008; Straumann et al., 2014, García-Palomares et al., 2015). An exception is the Prado 

Museum, the most visited museum in the city, which presents a low concentration of tourists 

according to Panoramio, since it is prohibited to take pictures inside. As in Barcelona (Gutiérrez 

et al., 2017), the high univariate Global Moran Index value obtained in Madrid shows that this 

activity is very concentrated spatially. 

As expected, the use of Twitter seems to be related to the location of accommodations, since 

Twitter data cover areas with a high selection of accommodation and are temporally 

concentrated between 18 and 22 hours. The current high telephone fees for international calls 

force tourists to use Twitter, taking advantage of the WiFi networks in their accommodations. 

As expected, Twitter data tend to be concentrated within the historic centre and its 

surroundings. These results are consistent with previous work on location of hotels (Arbel & 

Pizam, 1977) and P2P accommodations (Gutiérrez et al., 2017), suggesting that proximity to 

tourist attractions is an important factor in the tourist's decision to choose accommodation. 

After the suppression of international telephone fees between European Union countries in 

2017, this variable could lose its interest as a proxy for this type of tourist activity.  

Finally, Foursquare check-ins are particularly dense in areas specialised in consumption 

(restaurants and shopping), not only in the historic centre of the city, but also in luxury 

shopping areas and shopping centre, suggesting that this social network can be a good proxy 

for locating tourists in activities related to consumption. 

8. CONCLUSIONS 

The goal of this paper is to track the activities of tourists by comparing different Big Data 

sources: Panoramio, Foursquare and Twitter. These data sources provide massive information 


20 
 

on spatial and temporal patterns of tourists in cities. We perform different types of analyses 

with our datasets. The bivariant correlation analysis reveals little similarity between the 

distributions of tourist density on Panoramio and Twitter –on the one hand–, and Panoramio 

and Foursquare –on the other–, but a certain similarity between the spatial patterns of tourists 

according to Foursquare and Twitter. In a second step, cluster analysis was applied to classify 

the city’s tourist areas based on the intensity of the digital footprint left by tourists as being 

more oriented to sightseeing, consumption or Internet connection. Finally, the spatial 

autocorrelation analysis shows tourist activity by considering each census sector, not in an 

isolated way, but in relation to the census tracts in its surroundings. The global analysis 

confirms that Panoramio reveals the lowest spatial autocorrelation (with several spatial 

clusters). Areas allowing a wide range of different activities can be differentiated from those 

with a more specialised nature by combining the results of the three data sources. The degree 

of specialisation tends to increase from the historic centre radiating outwards towards the 

periphery.  

As in other papers that use Big Data, there is also an underlying problem of bias. Most tourists 

do not upload their photos to photo-sharing communities like Panoramio, and some do not 

even take photographs. Photographs do not always properly reflect all the monuments in the 

city, due to the prohibition against taking photographs in some monuments, and particularly in 

museums. Many tourists do not use social networks like Twitter or Foursquare and only a small 

proportion of Twitter users send geolocated tweets. The source bias is unquestionably difficult 

to identify and correct. In this paper the bias has been mitigated by working with the density of 

tourists rather than with the density of their footprints (photographs, Foursquare check-ins or 

tweets), and avoids counting the same tourist several times –this is especially important in the 

case of compulsive users. In addition, by comparing different data sources we consider 

different tourist activities, and this partly offsets the bias caused by working with only one 

source.  

Knowledge of the spatial distribution of urban tourists is extremely important for public 

policies. Thus for example in spaces with a high prevalence of tourists, the local authorities 

may envisage actions to improve the experience of the tourists, such as creating pedestrian-

only streets or widening pavements, extending public spaces with free WiFi, locating new 

tourist information points, among others. The spatial distribution of the photographs taken by 

tourists shows there are spaces with a high tourist potential but which are still under exploited 

(for example, the Madrid Río axis). The results may also be important for pinpointing new 

opportunities for business for the private sector, for example by identifying areas with 

economies for locating retail tourism where there are still opportunities for expansion.  

Research in tourism using social network data is a promising field. However, the use of these 

data sources in research raises privacy and ethical issues (Moreno et al., 2013). Researchers 

should not only preserve the anonymity of users, but should also avoid using information from 

social networks inappropriately, for example, revealing uncomfortable information about 

social groups (Zwitter, 2014). 

 
Acknowledgments.- We gratefully acknowledge the financial support of the European Union's 

Seventh Framework Programme (INSIGHT project), the Madrid regional government 

(SOCIALBIGDATA-CM S2015/HUM-3427), and the Ministerio de Economía y Competitividad of 

Spain and Universidad Complutense Madrid for the funding provided for a post-doctoral 


21 
 

fellowship (FPDI-2013-17001). The authors would also like to thank Luca Piovano from CEDINT 

(Polytechnic University of Madrid) for help in downloading the Panoramio data. 

 
REFERENCES 

Anselin, L. (1995). Local Indicators of Spatial Association — LISA. Geographical Analysis 27 (2): 
93–115. doi:10.1111/j.1538-4632.1995.tb00338.x. 

Arbel, A., & Pizam, A. (1977). Some determinants of urban hotel location: The tourists' 
inclinations. Journal of Travel Research, 15(3), 18e22. 

Asakura, Y., & Iryo, T. (2007). Analysis of tourist behaviour based on the tracking data collected 

using a mobile communication instrument. Transportation Research Part A: Policy and Practice 

41, 684–690.  

Bassolas, A., Lenormand, M., Gonçalves, B., Tugores, A., & Ramasco, J.J. (2016). Touristic site 

attractiveness seen through Twitter. EPJ Data Science 5. doi:10.1140/epjds/s13688-016-0073-5 

Çelikten, E., Falher, G. L., & Mathioudakis, M. (2016). Extracting Patterns of Urban Activity from 
Geotagged Social Data. arXiv preprint arXiv:1604.04649. 

Cooper, C. P. (1981). Spatial and temporal patterns of tourist behaviour. Regional Studies, 
15(5), 359-371. 

Edwards, D., & Griffin, T. (2013). Understanding tourists’ spatial behaviour: GPS tracking as an 
aid to sustainable destination management. Journal of Sustainable Tourism, 21(4), 580-595. 

Ferreira, A. P. G., Silva, T. H., & Loureiro, A. A. F. (2014). You are your check-in: Understanding 
the behavior of tourists and residents using data from foursquare. In Proceedings of the 20th 
Brazilian Symposium on Multimedia and the Web (pp. 103-110). ACM. 

Frias-Martinez, V., Soto, V., Hohwald, H., & Frias-Martinez, E. (2012). Characterizing urban 
landscapes using geolocated tweets. In Privacy, Security, Risk and Trust (PASSAT), 2012 
International Conference on and 2012 International Confernece on Social Computing 
(SocialCom) (pp. 239-248). IEEE. 

García-Palomares, J.C., Gutiérrez, J., & Mínguez, C. (2015): Identification of tourist hot spots 
based on social networks: a comparative analysis of European metropolises using photo-
sharing services and GIS. Applied Geography, 63, 408–417. 

Girardin, F., Calabrese, F., Fiore, F.D., Ratti, C., & Blat, J. (2008a). Digital Footprinting: 

Uncovering Tourists with User-Generated Content. IEEE Pervasive Computing 7, 36–43. 

doi:10.1109/MPRV.2008.71 

Girardin, F., Fiore, F. D., Ratti, C., & Blat, J. (2008b). Leveraging explicitly disclosed location 
information to understand tourist dynamics: a case study. Journal of Location Based Services, 
2(1), 41-56. 

Gotham, K. (2005). Tourism gentrification: The case of New Orleans' Vieux carre (French 
quarter). Urban Studies, 42(7), 1099e1121. 

Gutiérrez, J., García-Palomares, J. C., Romanillos, G., & Salas-Olmedo, M. H. (2017). The 

eruption of Airbnb in tourist cities: Comparing spatial patterns of hotels and peer-to-peer 

accommodation in Barcelona. Tourism Management, 62, 278-291. 

Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., & Ratti, C. (2014). Geo-


22 
 

located Twitter as proxy for global mobility patterns. Cartography and Geographic Information 

Science 41, 260–271. doi:10.1080/15230406.2014.890072 

Hayllar, B., & Griffin, T. (2009). Urban tourist precincts as sites of play. In G. Maciocco, and S. 
Serreli, (Eds.), Enhancing the city, new perspectives for tourism and leisure. Urban and 
Landscape Perspectives, 6(1), 65-81. 

Jansen-Verbeke, M. (1986). Inner-city tourism: resources, tourists and promoters. Annals of 

Tourism Research, 13(1), 79-100. 

Jansen-Verbeke, M. (1998). Tourismification of historical cities. Annals of Tourism Research, 
25(3), 739-742. 

Kisilevich, S., Keim, D., Andrienko, N., & Andrienko, G. (2013). Towards acquisition of semantics 

of places and events by multi-perspective analysis of geotagged photo collections. In A. Moore, 

& Drecki (Eds.), Geospatial Visualisation, Lecture Notes in Geoinformation and Cartography. 

Berlin Heidelberg: Springer-Verlag. 

Kitchin, R. (2013). Big Data and human geography Opportunities, challenges and risks. 

Dialogues in human geography, 3(3), 262-267. 

Koerbitz, W., Önder, I., & Hubmann-Haidvogel, A. C. (2013). Identifying Tourist Dispersion in 

Austria by Digital Footprints (pp. 495-506). Springer Berlin Heidelberg. 

Kurashima, T., Iwata, T., Irie, G., &Fujimura, K. (2013). Travel route recommendation using 

geotagged photos. Knowledge and information systems, 37(1), 37–60. 

Lamsfus, C., Martín, D., Alzua-Sorzabal, A., & Torres-Manzanera, E. (2015). Smart Tourism 

Destinations: An Extended Conception of Smart Cities Focusing on Human Mobility, in: 

Tussyadiah, I., & Inversini, A. (Eds.), Information and Communication Technologies in Tourism 

2015. Proceedings of the International Conference in Lugano, Switzerland, February 3 - 6, 

2015. Springer International Publishing, Cham, pp. 363–375. doi:10.1007/978-3-319-14343-

9_27 

Li, X. (2013). Multi-day and multi-stay travel planning using geo-tagged photos. In Proceedings 

of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered 

Geographic Information (pp. 1-8). ACM. 

Lu, X., Wang, C., Yang, J. M., Pang, Y.,& Zhang, L. (2010). Photo2trip: generating travel routes 

from geo-tagged photos for trip planning. In Proceedings of the international conference on 

Multimedia (pp. 143–152). ACM. 

Mahadevan, P., Prajish Kuman, P., & Vimeesh, M. (2013). Role of social networking in tourism 

marketing- a case study of tavel industry in Kerala. International Journal of Multidisciplinary 

Educational Research 2, 107–117. 

Mazanec, J. A. (1997). Segmenting city tourists into vacation styles. International city tourism: 

Analysis and strategy, 114-128. 

McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D. J., & Barton, D. (2012). Big data. The 

management revolution. Harvard Bus Rev, 90(10), 61-67. 

Mocanu, D., Baronchelli, A., Perra, N., Gonçalves, B., Zhang, Q., & Vespignani, A. (2013). The 

twitter of babel: Mapping world languages through microblogging platforms. PloS one, 8(4), 

e61981. 


23 
 

Moreno, M. A., Goniu, N., Moreno, P. S., & Diekema, D. (2013). Ethics of social media research: 

common concerns and practical considerations. Cyberpsychology, Behavior, and Social 

Networking, 16(9), 708-713. 

Murthy, D. (2013). Twitter: Social communication in the Twitter age. John Wiley & Sons. 

Netto, V. M., Pinheiro, M., Meirelles, J. V., & Leite, H. (2015): Digital footprints in the cityscape. 

International Conference on Social Networks, Athens, USA. 

Openshaw, S., &Taylor, P.J. (1981), The modifiable areal unit problem. In: Quantitative 

geography: a British View, (eds) N. Wrigley and R.J. Bennett, (Routledge and Kegan Paul: 

London), pp 60-70. 

Pearce, D. G. (1987). Tourism today: A geographical analysis. Burnt Mill: Longman. 

Pearce, D. G. (2001). An integrative framework for urban tourism research. Annals of Tourism 

Research, 28(4), 926-946. 

Qu, Y., & Zhang, J. (2013). Trade area analysis using user generated mobile location data. In 

Proceedings of the 22nd international conference on World Wide Web (pp. 1053-1064). ACM. 

Ribeiro, A. I. J. T., Silva, T. H., Duarte-Figueiredo, F., & Loureiro, A. A. (2014). Studying traffic 

conditions by analyzing foursquare and instagram data. In Proceedings of the 11th ACM 

symposium on Performance evaluation of wireless ad hoc, sensor, & ubiquitous networks (pp. 

17-24). ACM. 

Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In Collaboration Technologies and 

Systems (CTS), 2013 International Conference on (pp. 42-47). IEEE. 

Serrano-Estrada, L., Martí, P., & Nolasco-Cirugeda, A. (2016): Reading the social preferences of 

tourist destinations through social media data. In Back to the Sense of the City: International 

Monograph Book (pp. 1065-1075). Centre de Política de Sòl i Valoracions. 

Shelton, T., Poorthuis, A., & Zook, M. (2015). Social media and the city: Rethinking urban socio-

spatial inequality using user-generated geographic information. Landscape and Urban 

Planning, 142, 198-211. 

Shoval, N., & Raveh, A. (2004). Categorization of tourist attractions and the modelling of 
tourist cities: Based on the co-plot method of multivariate analysis. Tourism Management, 
25(6), 741e750. 

Shoval, N., & Isaacson, M. (2007). Tracking tourists in the digital age. Annals of Tourism 

Research 34, 141–159. doi:10.1016/j.annals.2006.07.007 

Shoval, N., McKercher, B., Ng, E., & Birenboim, A. (2011). Hotel location and tourist activity in 
cities. Annals of Tourism Research, 38(4), 1594-1612. 

Silva, T. H., Vaz de Melo, P. O., Almeida, J. M., Salles, J., & Loureiro, A. A. (2013). A comparison 

of foursquare and instagram to the study of city dynamics and urban social behavior. In 

Proceedings of the 2nd ACM SIGKDD international workshop on urban computing (p. 4). ACM. 

Sobolevsky S, Sitko I, Grauwin S, Des Combes RT, Hawelka B, et al. (2014a) Mining Urban 

Performance: Scale-Independent Classification of Cities based on Individual Economic 

Transactions. In: Big Data Science and Computing, 2014 ASE International Conference on, May 

27–31, Stanford University. p. 10. 


24 
 

Sobolevsky S, Sitko I, Tachet des Combes R, Hawelka B, Murillo Arias J, et al. (2014b) Money on 

the Move: Big Data of Bank Card Transactions as the New Proxy for Human Mobility Patterns 

and Regional Delineation. The Case of Residents and Foreign Visitors in Spain. In: Big Data 

(BigData Congress), 2014 IEEE International Congress on, Jun 27-Jul 2, Anchorage, AK. pp.136–

143. doi:10.1109/BigData.Congress.2014.28. 

Sobolevsky S, Bojic I, Belyi A, Sitko I, Hawelka B, et al. (2015) Scaling of city attractiveness for 

foreign visitors through big data of human economical and social media activity. 

arXiv:150406003. 

Straumann, R. K., Çöltekin, A., & Andrienko, G. (2014). Towards (Re) constructing narratives 

from georeferenced photographs through visual analytics. The Cartographic Journal, 51(2), 

152-165. 

Sun, Y., & Fan, H. (2014). Event Identification from Georeferenced Images. In Connecting a 

Digital Europe through Location and Place. (pp. 73-88). Springer International Publishing. 

Takhteyev, Y., Gruzd, A., & Wellman, B. (2012). Geography of Twitter networks. Social 

networks, 34(1), 73-81. 

Vaca, C., Quercia, D., Bonchi, F., & Fraternali, P. (2015, May). Taxonomy-based discovery and 

annotation of functional areas in the city. In Ninth international AAAI conference on web and 

social media. AAAI Publications. http://ai2-s2-

pdfs.s3.amazonaws.com/218f/70bd5459cb0388247eaeae5a4f5d38173d7f.pdf 

 Wu, L., Zhi, Y., Sui, Z., & Liu, Y. (2014). Intra-urban human mobility and activity transition: 

evidence from social media check-in data. PloS one, 9(5), e97010. 

Zwitter, A. (2014). Big data ethics. Big Data & Society, 1(2), 2053951714559253. 

 
http://ai2-s2-pdfs.s3.amazonaws.com/218f/70bd5459cb0388247eaeae5a4f5d38173d7f.pdf
http://ai2-s2-pdfs.s3.amazonaws.com/218f/70bd5459cb0388247eaeae5a4f5d38173d7f.pdf


Dr. María Henar Salas-Olmedo is a geographer and post-doc researcher at Complutense University 
Madrid with an international background acquired in EU funded research projects and some research 
stays in Europe and Latin America. Her research interests are the analysis of accessibility, mobility and 
land use relationships in a variety of scales and topics to help designing better urban and regional 
policies. Her research methodologies include spatial analysis with the use Geographic Information 
Systems (GIS) combining geolocated regular and big data. She has co-authored several papers in top 
scientific journals especially in the areas of geography, economics and transportation, and she regularly 
attends international conferences. She also teaches in graduate and postgraduate programs and 
organizes seminars on GIS and transport issues. 

 
Borja Moya-Gómez is a researcher at Research Group tGIS (Transport, Infrastructure and Territory). He 

is a PhD candidate in Geography at Universidad Complutense de Madrid (UCM). He graduated as Civil 

Engineer, major in Transport and Urban Services, and M.Sc. Logistics, Transport and Mobility at 

Universitat Politècnica de Catalunya – BarcelonaTech (UPC). His research interests are the traffic 

congestion, transport, mobility, accessibility, geographic information systems (GIS), and big data on 

urban and social studies. He has published 4 papers in scientific journals and contributed in several 

technical reports. He also teaches in graduate and postgraduates programs. 

 
Juan Carlos Garcia-Palomares is Associate Professor of Human Geography at the University Complutense 
of Madrid. He obtained his PhD in Human Geography with Honours. His research activity focuses on 
urban studies, big data and Geographic Information Systems (GIS). He has published 47 papers in 
scientific journals, has contributed in more than 25 books, book chapters and research monographs, and 
he has participated in a number of national and international conferences. He has developed an 
extensive teaching activity, focusing on the application of GIS, urban geography, transportation and 
mobility.  

 
Javier Gutiérrez Puebla is Full Professor of Human Geography at the Universidad Complutense de 
Madrid (UCM), director of the Research Group tGIS (Transport, Infrastructure and Space) and head of 
the Department of Human Geography. He specializes in GIS (Geographic Information Systems), mobility 
and urban studies. He has written more than 100 scientific publications, with more than 3,000 citations 
in Google Scholar. He has directed 10 doctoral theses and numerous research projects. 

 
http://ees.elsevier.com/jtma/download.aspx?id=353856&guid=1c624b12-cd68-452e-8f32-2f141c7431dc&scheme=1


http://ees.elsevier.com/jtma/download.aspx?id=353860&guid=7ffdc918-4fd9-4524-a7ab-099376e096e4&scheme=1


http://ees.elsevier.com/jtma/download.aspx?id=353861&guid=b94e6627-4ecc-4ae0-9564-0d367e7a43d9&scheme=1


http://ees.elsevier.com/jtma/download.aspx?id=353862&guid=18c056e4-de9c-4f50-87d5-c8a503668c21&scheme=1