Using crowdsourced spatial data from Flickr vs. PPGIS for understanding nature's contribution to people in Southern Norway

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. People and Nature published by John Wiley & Sons Ltd on behalf of British Ecological Society 1Department of Arctic and Marine Biology, UiT The Arctic University of Norway, Tromsø, Norway 2Department of Natural Resource Management and Environmental Sciences, California Polytechnic State University, San Luis Obispo, CA, USA 3Department of Oceanography, Dalhousie University, Halifax, NS, Canada

relationships that people have with nature, but the potential and limits of using crowdsourcing data to generate maps for conservation purposes need further research.
2. Passive crowdsourcing tools include social media platforms where photos and user-generated tags are shared among users, whereas active crowdsourcing, such as public participatory geographic information system (PPGIS), provides an online platform for mapping place attributes such as values, experiences and preferences.
3. In this study, we assess the spatial information gained through using Flickr (a photo sharing platform) and PPGIS (an online mapping platform) platforms for conservation planning to understand differences and similarities on the spatial distribution of values captured by the two platforms, and to identify what environmental and infrastructure variables correlate best with the distribution of values. We test these tools in Southern Norway including protected areas and the surrounding zones.
4. We analysed non-spatial (using chi-square and Spearman rank correlation) and spatial (using clustering, Maxent and distribution overlap) data to identify differences between the two datasets and the values represented therein. 5. We found large differences in spatial distribution using these two datasets, with Flickr data concentrated outside the protected areas and near roads, whereas PPGIS provided more fine-scale data on diverse values in locations inaccessible by roads within the protected areas. Flickr can be used for generating regional scale data of scenic landscapes or routes, but PPGIS performs better for management of nature qualities appreciated by different user groups within protected areas.
We discuss the pros and cons of using each data source and when each dataset is more suitable to be used in protected area management.

| INTRODUC TI ON
As anthropogenic pressures on nature increase across the globe, raising awareness of nature's contribution to people (NCP) has become one of the approaches for integrating conservation into policy (Pascual et al., 2017). Despite the growing body of research on the non-material contribution of nature to a good quality of life (Hirons, Comberti, & Dunford, 2016), tools for mainstreaming non-material contributions into ecosystem services assessments and decisionmaking are still under development (Costanza et al., 2017;Small, Munday, & Durance, 2017). The natural processes and features appreciated by people that positively contribute to their life are often referred to as nature qualities (Arler, 2000;Van den Bosch, Östergren, Grahn, Skärbäck, & Währborg, 2015) and are a central component of NCP. Bringing in diverse perspectives and values into conservation planning is costly, time-consuming and logistically challenging, but is important to find solutions that balance the needs of people with conservation objectives.
A wide range of methods and approaches have been used to elucidate the diverse perspectives on the cultural benefits provided by nature (Small et al., 2017;Teff-Seker & Orenstein, 2019;Tew, Simmons, & Sutherland, 2019). Among these are crowdsourcing methods which have the potential to deliver spatial information of NCP from a diverse range of citizens at a large scale of relevance to conservation (Bubalo, van Zanten, & Verburg, 2019). There are two main crowdsourcing approaches that have gained popularity in recent years: passive and active crowdsourcing. Passive crowdsourcing derives data from users leaving traces online on location and activity by sharing material on social media or by simply using their cell phones (Birenboim & Shoval, 2016;See et al., 2016). Social media derived from people sharing text or photos on an online platform, such as Flickr, has become particularly important for mapping recreation and aesthetic values appreciated by people in nature (Richards & Friess, 2015;van Zanten et al., 2016). Combining several content sharing platforms has been suggested for monitoring protected area popularity and temporal visitation patterns, using, for example, Instagram, Twitter and Flickr (Tenkanen et al., 2017). Active crowdsourcing, on the other hand, depends on users actively contributing with data through online platforms specifically designed to collect data about users or nature qualities (Ridding et al., 2018;Wolf, Brown, & Wohlfart, 2018 Although social media and online PPGIS platforms have both been shown to be useful tools for assessing the spatial distribution of values, each has their pros and cons. Social media data are less costly to collect and therefore allow the elicitation of values from a much larger pool of potential users on a broader scale (Toivonen et al., 2019). Social media data have been used to quantify naturebased tourism and recreation (Wood, Guerry, Silver, & Lacayo, 2013), tourism flows (Hawelka et al., 2014) or for mapping destinations and events that are highly visited by the public (Kisilevich, Keim, Andrienko, & Andrienko, 2013). The tags can also inform about how people value nature, how those values are distributed and the contribution of nature to the qualities appreciated by people (van Zanten et al., 2016). The photos can represent diverse activities and values including aesthetics, recreation, wildlife viewing and bio-cultural heritage (Toivonen et al., 2019). Moreover, photos taken by several people at a specific location can be associated with specific environmental characteristics of that area (Dunkel, 2015). Content analysis of photographs shared on social media has also been used to model the spatial distribution of values and non-material benefits with respect to landscape characteristics and infrastructure, and to indicate how changes in the landscape and infrastructure development can affect the overall visitor experience and distribution (Tenerelli, Demšar, & Luque, 2016;Walden-Schreiner, Leung, & Tateosian, 2018). However, social media have been shown to be unreliable at capturing some indirect-use and non-use values, whereas PPGIS is capable of capturing a wide range of values (Levin, Lechner, & Brown, 2017). The primary benefit of PPGIS surveys is the possibility to customize the tool to collect information on spatial values, preferences and experiences that are of direct relevance to protected area management (e.g. Brown & Weber, 2011). For example, PPGIS has been used to identify areas of value hotspots and the overlap of different user groups, to understand land use preferences, to address conflicts between different user groups, and to monitor tourism development preferences (Brown & Weber, 2013;Engen et al., 2018;Muñoz, Hausner, Brown, Runge, & Fauchald, 2019;Wolf et al., 2018). Participatory mapping surveys are customized for each case, which makes them suitable for surveying a wide range of people, which can include stakeholders, locals, visitors, experts, the general public and decision-makers (Brown & Kyttä, 2014). Thus, PPGIS can include voluntary participation (similar to social media), as well as targeted recruitment of a representative sample.
While the use of social media data has been compared to visitor data on a regional scale previously (Graham & Eigenbrod, 2019;Tenkanen et al., 2017), spatial data and the values identified using passive and active crowdsourcing tools have not been extensively evaluated using the same location. One exception is Levin et al. (2017) who compared the visitor density and values mapped by K E Y W O R D S cluster analysis, management, maxent, nature qualities, protected area, social media, values, visitors crowdsourcing tools in multiple protected areas. No one has to date compared the potential of active and passive crowdsourcing tools to provide spatial information of nature qualities on a finer scale of relevance to protected area management (i.e. within protected areas).
The spatial distribution at this scale will depend on the profile of users captured by the different tools, the values people ascribe to nature and the spatial accuracy of the geolocations mapped using different platforms. If these tools are to be used to guide protected area management, it is important to understand the conditions that influence the results generated by each tool at this scale.
Here we examine the spatial distribution and the type of values generated by the two crowdsourced tools (Flickr and

| Flickr
Flickr is a free photo management and sharing platform where users can upload their pictures, geotag them and share them privately or publicly (Flickr, 2019). We retrieved information associated with 6,255 publicly available geotagged Flickr images on 4 April 2016 for our study area using the flickRgeotag r package (Daigle & Dunnington, 2018). The metadata that accompanied the images included de-identified (key-coded) photos and user ID codes, the country of origin of the Flickr user, text-based tags associated with each photo (which can be either user-specified or selected by Flickr's automated tagging algorithm), the coordinates (latitude and longitude in WGS84) of the image and the URL link to the photo. For the purpose of this study, we used the country of origin, the coordinates and the photo URL. For those users that did not report their country of origin (268 users), we estimated the contributors' home country from the median coordinates of all uploaded pictures. The  (Tenkanen et al., 2017), we aggregated the data from 9 years for this study as Flickr data are temporally sparse in this region so that we could ensure sufficient sample to make robust conclusions. Also, we were not focusing on the changes over time in this study, but values that change more slowly (see Brown & Weber, 2012). 3. Scenic landscapes: Dominant feature of the picture is an important place that is scenic, a distinctive landscape, wilderness or natural settings (could include people, but not the main focus). For example, scenic drives, scenic cruises, mountains, fjord, wilderness. Could be symbolic/spiritual values, which need to be determined ad hoc.

| Public participation geographic information system
The PPGIS is a GIS tool to map spatial attributes and important locations in an area. We conducted two online PPGIS surveys: a household survey combined with voluntary participation of locals, and a visitor survey with in-situ recruitment in the study area in October-December 2014 and July-September 2015, respectively.
For the first survey, we invited a randomly selected set of 10% of the households in the municipalities in the study area to participate in the web-PPGIS study, contacting them by regular post. A reminder letter was sent 2 weeks after the first contact. Additionally, we used local organizations, newspaper and social media to recruit volunteers. During the peak visitor season to our study area in 2015, we recruited respondents to the second survey at recreational parking spots, either by direct contact or by leaflets placed on cars. Two reminders were sent by email to visitors recruited in the field.
In the PPGIS survey, we asked respondents to drag and drop georeferenced markers that represent one of the 12 values (see List 2 for the full list of values) onto a Google ® map view, by zooming in and out as needed. People could place as many markers as they wanted, but were encouraged to place at least 20.
They were free to place markers for as many, or as few, values as they wished. We refer to 'mapped value' as the georeferenced marker placed by participants on the map. We piloted the surveys on park managers whose feedback was used to improve the consent for participation that respondents had to accept before completing the survey, where we informed participants about the purpose of the study and explained that data would be treated confidentially. Also, participants were informed that the study was voluntary, and that they could withdraw from it at any time or contact us through the provided email in case of any concerns regarding the study. For additional details about the survey, see Muñoz et al. (2019).
From the 12 values included in the mapping activity, four were comparable to the categories obtained by coding Flickr images: biological diversity, scenic landscapes, social value and recreation (values 1-4 in List 2). We used all values mapped in Flickr and PPGIS to identify the potential differences between international and domestic visitors for each platform (i.e. difference in clustering and ranking between user groups). We used the subset of four values that were comparable for PPGIS and the Flickr coding (see above) to compare the difference in spatial information obtained from these two platforms. When discussing results, we refer to either 'all values' (8 values in Flickr and 12 values in PPGIS) or 'four common values' (i.e. the ones that are comparable between the two datasets).
List 2 The values used in the PPGIS survey adapted from Brown and Reed (2000) to the Norwegian context . 10. Therapeutic: Areas that are valuable because they make me feel better, either because they provide opportunities for physically activities important for my health and/or they give me peace, harmony and therapy.
11. Wilderness and undisturbed nature: Areas that are relatively untouched, providing for peace and quiet without too many disturbances.
12. Special place: Please describe why these places are special to you.

| Density-based clustering for hotspot mapping
We conducted a density-based cluster analysis of all the values mapped to compare the areas with highest density of values (hotspots) in each dataset and to quantify the number of hotspots. To accomplish this, we used the 'density-based spatial clustering of applications with noise' (DBSCAN) algorithm (Ester, Kriegel, Sander, & Xiaowei, 1996) with a minimum of 10 neighbouring points within a 1,000 m search radius. In DBSCAN, points represent the geographical location of each Flickr photo or the mapped value location in PPGIS. This algorithm detects points that form clusters with irregular shapes and discards sparse points (Ester et al., 1996). The search radius was determined by visual inspection of the threshold of the k-nearest neighbour distances plot. DBSCAN forms clusters with core and border points. Core points are those that are surrounded by 10 points within the search radius. Ten points was selected as the minimum number of points to capture a diversity of values inside each cluster. Border points are those points that belong to a cluster because they are located inside the search radius of a core point, but do not have the requirements to be classified as a core point (i.e. they do not meet the requirement of a minimum 10 points in a 1,000 m search radius). The points that are not classified as either core or border points are discarded from the clusters. The resulting clusters are point clouds containing core and border points.

| Maximum entropy modelling for environmental and infrastructure variables
The purpose of the modelling was to test whether Flickr and PPGIS data are correlated with the same environmental and infrastructure characteristics. We developed the following 18 models to analyse the distribution of values: two overall models for all values in each dataset separately (i.e. Flickr and PPGIS), and 16 models for each unique combination of the four common values (the first four values in List 1 and List 2, we compared each domestic and international user group (n = 2), developed for each dataset).
We selected the covariates based on previous research demonstrating how nature tourism is related to human infrastructure and environmental characteristics (Bagstad, Semmens, Ancona, & Sherrouse, 2016;Richards & Tunçer, 2018;Walden-Schreiner et al., 2018). Values were modelled against nine environmental and infrastructure variables (hereafter referred to as covariates); eight continuous variables: distance from trails, roads, touristic cabins, buildings (other infrastructures, e.g. houses, bridges), rivers, lakes, and mountain tops and glaciers and vegetation cover percentage; and one categorical variable: altitude (divided in 500-m elevation intervals; see Supporting Information Table S1).
We extracted covariates from the N500 database developed by the Norwegian Mapping Authority (Kartverket), which contains among other things landscape characteristics and infrastructure (Kartverket, 2015). Mountain tops were manually georeferenced based on the protected area brochures published by the Norwegian Environmental Agency. Vegetation cover percentage was produced from CORINE2006 data (European Environmental Agency, 2015) and transformed to vegetation cover percentage.
We reclassified the CORINE map by assigning 100% cover to vegetated areas, 50% cover to areas sparsely vegetated and 0% cover to areas artificial surfaces, rocks, non-vegetated areas and water bodies. The values for each pixel were interpolated using the nearest neighbour approach using a 3 × 3 kernel. We rasterized covariates in a 1,210,000 pixel raster with a 116.1 m pixel size. The raster layers provided distances to natural and human-made features and these were square root transformed to avoid skewedness towards the right end (long distances). We tested for correlation between covariates and found no indication to discard any of the covariates (Supporting Information Table S2).
We developed the 18 maximum entropy models using MaxEnt software version 3.4.0 (Phillips, Dudík, & Schapire [Internet]). Briefly, maximum entropy modelling compares the distribution of presences (e.g. sighting of a species) in environmental space (the set of covariates) against the background distribution of those covariates (Elith et al., 2011). The model compares the presence of points (i.e. values) against a set of randomly distributed background points to estimate the influence of environmental characteristics on the value distribution. Therefore, we removed duplicates from the model as MaxEnt works with presence data and 25% of the presence points were randomly selected as a test set during the internal validation of the model. We selected a random subset of 10,000 background points from the 1,210,000 grid cells in our study region. MaxEnt selected the regularization values and feature types, that is, hinge, product, linear and quadratic, that was best fit to the model. The output is a model that can predict the suitability of other areas for the values mapped by users. To identify those covariates that best explain the distribution of each value, we examined the permutation importance, which is a measure calculated by randomly selecting values for each of the covariates for each permutation during the training of the model, independent of the model path followed. The permutation importance measures how much the model relies on the given variable, normalized to percentages. In other words, the permutation importance is a measure of the contribution of a variable to the predictive ability of the model. We used these models to predict the suitability of the study area to contain the four common values. To assess how alike the predictions were for values mapped in different platforms (i.e. Flickr and PPGIS), we used the suitability maps for each value to calculate the niche overlap between the two datasets. MaxEnt is suitable for use with presence-only data such as that generated by Flickr or PPGIS, where the photo or PPGIS locations indicate the 'presence' of a value, but unmapped areas cannot be assumed to indicate the 'absence' of a value. Maximum entropy modelling has previously been used to model species distribution (Phillips & Dudík, 2008 All analyses were conducted using the R Software version 3.4.1 (R Core Team, 2019) using 'dismo' package for the Maxent model (Hijmans, Phillips, Leathwick, & Elith, 2017)

| Comparing domestic and international visitors
We used exploratory analyses to describe and summarize differences in the mapped values by different user groups for each citizen-generated dataset. First, we identified differences between the values mapped by domestic and international visitors within each dataset using chi-square tests and then Spearman rank correlation tests. For each mapped common value, we compared standardized chi-square residuals of the proportion of values mapped compared to the total amount of values for domestic and international visitors, identifying those values that were outside the range −2 to 2 as being mapped significantly less or more often than the other group. We used the Spearman rank correlation to show the degree to which the two user groups (i.e. domestic and international visitors) are similar in their perception of value importance based on the ranks of mapped value frequencies (based on 8 data points in Flickr and 12 data points in PPGIS).

| RE SULTS
In the Flickr dataset, 479 users geotagged a picture related to nature qualities inside the study area, from which 177 were domestic (Norwegians), 284 were international visitors and 18 had an unknown origin. Of the 479 users, 268 users did not report their origin. Using the median distance of all the photos that each of these individuals uploaded, we concluded that 100 were domestic visitors, 150 international visitors and 18 remained with no clear origin. From the 4,038 uploaded images, photos related to nature qualities primarily showed scenic landscapes (3,008 photos) and recreation (601).
The median number of nature related photos uploaded by each user was 2, and 16 users uploaded more than 50 pictures inside the study area (3.3% of users; Supporting Information Figure S1). In the PPGIS dataset, 468 respondents were recruited, split between 332 domestic (Norwegians) and 136 international visitors. From 3,873 mapped values, the most commonly mapped value was recreation (1,176 markers) followed by scenic landscapes (1,070). The median number of mapped values by each user was 5, and five users (1%) were identified as 'supermappers' (those who mapped more than 50 values; Figure S1).
We tested differences in the spatial distribution of all values for the two datasets by creating density-based clusters to identify hotspots of values. The density cluster analysis resulted in 51 hotspots for the Flickr database and 36 hotspots for the PPGIS database ( Figure 2) with 19.7% and 35.9% of the points remaining outside clusters. Figure 2 shows that places attractive to visitors are located along roads in the Flickr dataset, but are predominantly located inside protected areas in the PPGIS dataset (values inside PAs: 32.3% in Flickr and 77.4% in PPGIS).
We compared 18 MaxEnt models to determine differences in the two datasets concerning the environmental and infrastructure covariates that explain the distribution of values. We used the permutation importance metric to understand the contribution of each covariate to the MaxEnt model, which contrary to the per cent contribution, does not depend on the order in which the covariates are entered into the model (Kalle, Ramesh, Qureshi, & Sankar, 2013 (Table 2).
We used chi-square tests to assess differences in values between domestic and international visitors within the two datasets (Table 3). In the Flickr dataset, domestic visitors uploaded more im-

| D ISCUSS I ON
We found large differences in the spatial data generated by passive versus active crowdsourcing methods. Flickr and PPGIS datasets differ substantially in both the types and locations of values mapped. Values represented in Flickr photos were located closer to roads than those mapped in the PPGIS dataset, which were predominantly located inside PAs and often associated with trails, mountain tops and glaciers. Despite these differences, the predicted spatial distribution of values generated by models applied to these two datasets showed substantial overlap, especially for scenic and recreational values, indicating that both datasets capture similar environmental and landscape characteristics. However,

TA B L E 3
Standardized residuals for chi-square tests. Numbers below −2 or above 2 indicate that the value of domestic and international visitors has been mapped significantly less or more than would be expected within those two datasets (shaded). In brackets, the percentage a value was photographed/mapped by domestic and international visitors for each dataset Differences among domestic and international visitors with respect to the use and appreciation of nature qualities within protected areas have previously been documented (Shultis, 1989;Tyrväinen, Mäntymaa, & Ovaskainen, 2014) Brown et al. (2015). Our study shows that PPGIS captures better the differences between domestic and international visitors than Flickr does, and will likely be more useful when developing strategies for tourism development and management.

| ADDITIONAL ADVANTAG E S AND LIMITATI ON S OF FLI CK R AND PP G IS
As previous studies have concluded, crowdsourced data are a valuable source for assessing NCP. However, each method has their advantages and limitations that need to be carefully considered depending on the research questions to be addressed.  Hausmann et al. (2018) found that Flickr users post more pictures related to biodiversity than Instagram users, who post more photos of people. However, Instagram performs better at estimating visitor rates than Flickr and Twitter (Tenkanen et al., 2017).
In our case, there was no visitor data available to assess whether the PPGIS data were biased towards mid-aged males and educated participants as shown in similar studies Bubalo et al., 2019). Sampling design plays a crucial role in capturing a representative sample of the population or a targeted population segment (Brown, 2017;Brown & Kyttä, 2014;Brown et al., 2019).
However, although data on visitation and visitor distribution provided by social media have previously been validated against local knowledge and field surveys (Kim et al., 2019), there is no available true representation of the spatial distribution of values with which Flickr and PPGIS data can be assessed.

| CON CLUS IONS
Crowdsourced data from passive and active sources can be a useful tool to inform managers about the spatial distribution of NCP in protected areas. Our results show that crowdsourced data provide fine-scale information on a diversity of values that people associate to protected areas, and the differences between user groups that are relevant for management. The methods differ in the distribution of values people ascribe to nature, for example in PPGIS a high proportion of values is located inside protected areas, whereas in To overcome some of the limitations of crowdsourcing data, combining these tools with field surveys could combine the benefits of both approaches, delivering large-scale datasets from a broad user sample along with more detailed and specific information on NCP and the nature qualities that are valued by different groups of people.

ACK N OWLED G EM ENTS
This study was funded by CultES-Assessing spatially explicit cultural ecosystem services for adaptive management in the Alpine

North, Environmental Research Program, Norwegian Research
Council, 230330/E50/2014. We would like to thank the two anonymous reviewers and the associate editor for their constructive comments on earlier versions of the draft.

CO N FLI C T O F I NTE R E S T
Nothing to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data are publicly available at DataverseNO https://doi.org/ 10.18710 /VQLTM8 (Muñoz, Hausner, Runge, Brown, & Daigle, 2020).