Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Cultural heritage collections are maintained by public and private institutions and described in digital catalogues, often constituted by Web-based anthologies, The advent of digital photography have led to a significant increase – and Web availability – of the wealth of images depicting cultural heritage assets. However, given a large image collection of cultural heritage assets, the issue arises to analyze and process the set of photos without resorting to expensive manual work. Human Computation [1] has emerged as a successful paradigm to face this challenge in a cheaper way. Therefore, the following questions emerge regarding the adoption of a Human Computation approach: (1) given a set of images of a cultural heritage asset, is it possible to identify the most representative or the most recognizable photo? (2) given a collection of images of different cultural heritage assets, is it possible to tell apart the most popular assets and those that could benefit from promotion campaigns?

This paper contributes the design, implementation and evaluation of Indomilando, an application aimed to rank a quite heterogeneous set of images, depicting the cultural heritage assets of Milan. Indomilando is designed as a Web-based Game with a Purpose that involves players in guessing an asset from a set of photos. We evaluate the ability of Indomilando to achieve its ranking purpose and to engage users. Moreover, given the cultural flavour of our game, we also investigate whether Indomilando achieves the “collateral effect” of giving back new knowledge to users.

2 Related Work

Human Computation [1] and Crowdsourcing [2] are different approaches to involve people in mixed human-machine computational systems to collect and process information. In the cultural heritage field, there are studies and surveys [3, 4] that explore how to apply crowdsourcing-like methods to achieve different tasks [5]. Games with a Purpose (GWAP [6]) emerged as an interesting Human Computation method to collect information from game players that are often unaware that their playful activities hide the achievement of a task. Among possible purposes, ranking items in a collection is explored in some GWAP cases [79]. Several GWAPs in literature are based on multimedia elements [7, 10]; among those, some applications are specifically related to the cultural heritage domain [11]. In this paper, we describe a GWAP aimed to rank images of cultural heritage assets.

The main challenge in those Human Computation applications is modeling and predicting user engagement [12, 13]. Therefore it is important to design suitable incentive schemes to foster user participation [14]; it is demonstrated that gamification and furtherance incentives improve both quality and quantity of task execution [15]. One possible incentive can be represented by an educational stimulus [16]: users are encouraged to participate because not only they get fun and/or are paid, but also because they acquire new interesting knowledge. This type of incentive is particularly interesting in our cultural heritage case. However, balancing the purpose of a GWAP with educational-style incentives seems to be far from trivial: either an expensive training effort is needed to ensure results quality [17] or the learning stimulus negatively impacts on the purpose quality [18]. In our work, we aim to analyze and evaluate the interplay of purpose achievement, user engagement and educational-like incentives in a cultural heritage GWAP.

3 Design and Development of the Indomilando Game

We developed Indomilando (cf. http://bit.ly/indomilando), a GWAP [6] that engages players in a game to leverage their human contributions to rank cultural heritage photos. Indomilando makes use of Lombardy region’s SIRBeC data related to the architectural assets of the city of Milan: 2,104 photos depicting 685 cultural heritage assets of Milan. The purpose of Indomilando is to rank the assets’ photos according to their popularity; by popularity we mean both how famous an assets is (with respect to the other assets) and how recognizable or representative a photo is (with respect to the other photos of the same asset). To achieve this purpose, we show some photos of different assets and ask the players to guess which one corresponds to a given asset.

Indomilando can be played in rounds composed by levels; in each level, the player is presented with the name of an architectural asset and 4 photos. The game goal is to identify the correct photo for the given asset; for every right choice, the player score increases and it increases more for consecutive right answers. Gaining points in levels and rounds let the player climb the leaderboard and obtain badges. Every time a user completes a level by making a choice, Indomilando highlights the correct/incorrect answer and shows the 4 names of the assets depicted in the photos. Moreover, the game lets the user learn more about the assets: each photo is associated to a link to the corresponding SIRBeC catalogue report. At the end of the round, the player can also explore a map of Milan showing the location of all the assets the user played with. Figure 1 shows some screenshots of Indomilando gameplay.

Fig. 1.
figure 1

Screenshots of the Indomilando game.

Indomilando was published online in mid October 2015 and promoted via social media campaigns. A link to the game was recently added to several culture-related official portals of Lombardy Region. In the following sections, we present our analysis and evaluation of Indomilando based on the data collected in the last three months. Further details and graphics regarding this evaluation are available at http://swa.cefriel.it/urbangames/indomilando/icwe2016.html.

4 Game Purpose Analysis

Since Indomilando is a GWAP [6], we first analyze its ability to achieve its goal: ranking the full set of photos and ranking the depicted assets.

4.1 Photo Ranking

Whenever an asset photo is presented to a player to be guessed, the user can either correctly choose it or wrongly select one of the distracting images. To create a numeric score to rank the photos, instead of relying on the ratio between the number of times the photo was correctly chosen and the number of times the photo was visualized to be guessed (in the following simply named chosen/visualized ratio), we assign a score to the photos by using the Wilson score confidence interval [19] for a Bernoulli parameter; in our case, a trial is the visualization of a photo to be guessed, and its possible outcomes are that the photo is correctly chosen (success) or wrongly discarded (failure). We choose the lower bound of the Wilson interval as a conservative measure of the score w. We compute the w score for each asset photo.

Given that a player has to identify the correct image in a set of four photos, the success (or failure) is not only dependent on the “correct” picture, but also on the three distracting ones. The distracting options are selected among the assets belonging to the same category of the one to be guessed, but some categories are more recognizable than others. We correct the w score to take this effect into account, by applying a standardization by asset type, as follows: \(\widetilde{w} = (w - \bar{X}_t)/{S_t}\), where t indicates the photo asset type, \(\bar{X}_t\) and \(S_t\) the sample mean and the sample standard deviation, respectively, of the w scores of the same type t. We adopt \(s_{photo}=\widetilde{w}\) as the metrics to rank Indomilando asset photos.

4.2 Asset Ranking

The score of an asset could be defined as \(\bar{s}_{photo}\), the average score of all the photos depicting that asset, but a simple mean is sensitive to the characteristics of the set of images. Experimentally, we observe that the mean score of the asset’s photos \(\bar{s}_{photo}\) increases with the number \(N_a\) of photos in the set. This result could be caused by the “learning” effect that a high number of photos has on the players: a user over time could learn to identify an asset he/she had to recognize or discard in multiple previous game rounds. We introduce an adjustment for the number of photos derived from the linear regression line, as follows: \(adj_{num} = \beta _0 + \beta _1 \cdot N_a\).

To take into account the effect of the inhomogeneity of the photo set, we consider the variance \(v_a\) of the chosen/visualized rate across the same photos belonging to the asset and we introduce the following adjustment: \(adj_{set} = (v_a - \bar{X})/S\), where the variance \(v_a\) is standardized with regards to the mean \(\bar{X}\) and the standard deviation S of v across all photos.

We finally define the asset score as: \(s_{asset} = \bar{s}_{photo} - adj_{num} - adj_{set}\).

4.3 Results

We collect the game log information and compute the \(s_{photo}\) score and the \(s_{asset}\) score, taking into consideration only the images that were played by at least three players. The results are shown in the following table.

No. of players

Total effective played time

Completed photos

Completed assets

Throughput (tasks/hour)

ALP (time/player)

72

8 h 58 m 20 s

1397 (66.4 %)

524 (76.5 %)

155.22

7 m 29 s

The total effective time includes only the time needed to choose an image in each game level; thus in around 9 h 72 players were able to complete the ranking of 66.40 % of photos and 76.50 % of assets. The main metrics for GWAP evaluation [6] are also displayed: the average life play (ALP) is 7.5 min and measures the time spent on average by each user playing the game: it is a measure of the Indomilando engagement; the throughput is a measure of how many “tasks” are completed in the unit of time. The latter metrics are used to estimate how much time and how many users are needed to complete the tasks at hand. In the Indomilando case, we can therefore estimate that the whole set of 2,104 photos and 685 assets could be ranked in around 13.5 h by less than 110 players.

4.4 Ground Truth Evaluation

To evaluate the ability of Indomilando to achieve its purpose, we need to evaluate the photo ranking and the asset ranking. From our manual inspection of rankings, we can say that indeed the photos depicting the same asset, as well as the assets belonging to the same type, are indeed ordered from the most to the least representative.

To have a more objective evaluation of our scores, we set up a term of comparison for our rankings. We search the cultural heritage assets in the Italian version of Wikipedia, checking if they have a dedicated page: among the 685 assets of Milan, we find 111 pages. Then, we derive a rank by sorting the set of assets by decreasing number of Wikipedia visits (cf. http://stats.grok.se/), which can be interpreted as a measure of the “fame” of the asset. We first evaluate Indomilando with the rank correlation between the asset score computed by the game and the Wikipedia visits. We compute both the Spearman \(\rho \) and Kendall \(\tau \) rank correlation indicators [20]. There is indeed a positive correlation (\(\rho =0.20\), \(\tau =0.15\)) that is stronger when focusing on the 10 most visited pages in Wikipedia (\(\rho =0.60\), \(\tau =0.556\)).

We build a second reference “ground truth” by manual ordering of the first ten assets in the Wikipedia visits by involving a set of 12 participants who are asked to rank them by their recognisability. Then we aggregate the manual ranks using a weighted brute force algorithm [21]; the weighting is applied both to users (incorporating the level of familiarity with Milan) and to individual ranks (considering the possible lack of knowledge about any asset). We again compute the rank correlation \(\rho \) and \(\tau \) indicators between the Indomilando asset rank and the ordering obtained by the aggregation of the manual tests. In this case, the correlation values are much higher (\(\rho =0.806\), \(\tau =0.60\)), indicating that Indomilando is able to approximate the reference ranking with a high level of accuracy.

5 Game Engagement Analysis

We propose two types of evaluation: an objective analysis of played time and a subjective survey of participants’ experience.

Fig. 2.
figure 2

Distribution of the total time per user.

5.1 Analysis of Played Time

In the experimentation period, a total of 72 users played the Indomilando game for an average ALP of 7.5 min. Figure 2 shows the distribution of the total effective played time, i.e. the time actually spent trying to guess the right answer, from the beginning to the end of a level. This specific left-skewed shape is recurring in casual games’ engagement measures. Despite the curve decreases very quickly, we can distinguish a group of players characterized by a different behaviour, corresponding to the little rise between 10 and 20 min.

We therefore suppose the existence of two sets of players: the first and larger one including the users with a total played time around 2.5 min, and a less numerous set with the users who played approximately 7–8 times more. The empirical distribution of the total number of levels played per user displays a very similar behaviour to the played time distribution and confirms our analysis, also on the existence of different groups of players.

5.2 Subjective Analysis of Engagement

We conduct also a second type of engagement assessment, by setting up an online evaluation questionnaire and asking the Indomilando players to compile it. Some of the questions were explicitly directed to understand the game reception (cf. Fig. 3). The results prove that Indomilando is indeed perceived as a fun game with a simple and intuitive gameplay. We can conclude that Indomilando has a good engagement capability.

Fig. 3.
figure 3

Survey questions related to the game engagement; from left to right: “did you like playing?”, “are the rules clear, simple and intuitive?”, “is the user interface clear, simple and intuitive?”; scores range from 1 (not at all) to 5 (very much).

6 Game Cultural Incentive Analysis

Given the cultural topic of the game, we hypothesize that an incentive to play is constituted by the interest in learning something new about Milan.

Fig. 4.
figure 4

Distribution of the time spent between levels of the same round per user.

6.1 Analysis of “Learning” Time

During the gameplay, the players can learn more about the assets they are playing with: at the end of each level, when the names of the 4 depicted assets are displayed, and at the end of the round, when the assets are visualized on a map. We analyze the time spent between levels of the same round and the time spent between consecutive rounds; we can consider those intervals as the actual learning time. Figure 4 shows the distribution of the time between levels (summed on the round); the graph is cut at 2 min, but the between-levels time goes up to almost 7 min. The curve decreases very quickly, in most cases only 2–3 s are spent between levels, but we can notice that sometimes users spent a considerable longer time. We can speculate that in those cases, the players exploited the links to the catalogue records.

Whenever a user plays two game rounds within an interval of 15 min, we consider those rounds are consecutive. The time-between-rounds distribution shows the already observed tendency, with a visible group of between-rounds intervals around 40 s, which we can attribute to the most “curious” users that spent some time to explore the asset map.

6.2 Analysis of Learning Effect

It is worth noting that, at the end of each game level, the player not only learns if his/her answer was correct but can also see the asset names for the four photos (cf. top-right screenshot in Fig. 1). Since it can happen that the same image appears more than once as distracting option, over time this can cause the player to learn recognizing some of the assets. We evaluate this possible learning effect by measuring the average number of correct guesses per round as a function of the number of played round. We notice that the players’ precision actually improves along with the game rounds; using a simple linear regression model, we can estimate this improvement in 1.1 % per played round.

6.3 Subjective Analysis of the Educational Incentive

Some of the questions in the already mentioned survey are aimed to evaluate the educational incentive of the game, as shown in Fig. 5. The question displayed on the left investigates the incentives to continue playing as perceived by the users. While the main game features are among the highly rated incentives, it is interesting to notice that 27 % of players stated that learning new things is a stimulus to play the game. This is further proved by the question on the right in Fig. 5, asking the participants whether they learned anything new about Milan while playing. The responses distribution clearly shows that an “educational” effect was strongly perceived by Indomilando players. We can conclude that Indomilando has an evident educational “collateral effect” that makes players acquire new knowledge about cultural heritage assets.

Fig. 5.
figure 5

Survey questions related to the game incentives; on the left “what motivated you to play the game?”; on the right “did you learn anything new about Milan?” (1 = not at all, 5 = very much).

7 Conclusions

In this paper, we presented the design, development and evaluation of Indomilando, a cultural heritage GWAP aimed to rank the assets and the photos of the collection of historical-artistic architectures of Milan. Our evaluation results support our claims that: (1) the game is effective in achieving its ranking purpose, because the resulting rank is highly correlated to a ground truth and this outcome is achieved in a very limited time; (2) Indomilando shows a good engagement potential, because most players find the game fun and we also notice a user group that spent a significantly high time in playing; and (3) the game also leads to a learning/educational effect, because players are motivated to acquire new knowledge about the Milan cultural heritage. The interplay of the ranking purpose and the educational incentive, however, is to be further investigated.