Ingmar Weber
Saarland University, Computer Science, Faculty Member
- Alexander von Humboldt Professor in AIedit
Extant literature has explored the social integration process of migrants settling in host communities. However, this literature typically takes a migrant-centric view, implicitly putting the burden of a successful integration on the... more
Extant literature has explored the social integration process of migrants settling in host communities. However, this literature typically takes a migrant-centric view, implicitly putting the burden of a successful integration on the migrant, and trying to identify the factors that lead to integration along various dimensions. In this paper, we flip this point of view by studying the attributes of natives that govern their propensity to form social ties with migrants.We do so by using anonymous and aggregate social network data provided by Facebook’s advertising platform. More specifically, we look at factors that influence the propensity for a likely-to-be non-Muslim Facebook user to have at least one social connection to a Facebook user who celebrates Ramadan. Given that, in the European context, following Islam is predominantly tied to a migration background, this gives us a lens into cross-cultural native-migrant connectivity. Our study considers demographic attributes of the ho...
Research Interests:
Research Interests:
... 3. QUERY EXPANSION VIA PREFIX COMPLETION The key idea is to add the information we have about related terms as artificial words to ... and tested our feature for two collec-tions: the TREC Robust collection (1.5 GB, 556,078... more
... 3. QUERY EXPANSION VIA PREFIX COMPLETION The key idea is to add the information we have about related terms as artificial words to ... and tested our feature for two collec-tions: the TREC Robust collection (1.5 GB, 556,078 doc-uments), and the English Wikipedia (8 GB ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
ABSTRACT Suppose your sole interest in recommending a product to me is to maximize the amount paid to you by the seller for a sequence of recommendations. How should you recommend optimally if I become more inclined to ignore you with... more
ABSTRACT Suppose your sole interest in recommending a product to me is to maximize the amount paid to you by the seller for a sequence of recommendations. How should you recommend optimally if I become more inclined to ignore you with each irrelevant recommendation you make? Finding an answer to this question is a key challenge in all forms of marketing that rely on and explore social ties; ranging from personal recommendations to viral marketing. We prove that even if the recommendee regains her initial trust on each successful recommendation, the expected revenue the recommender can make over an infinite period due to payments by the seller is bounded. This can only be overcome when the recommendee also incrementally regains trust during periods without any recommendation. Here, we see a connection to "banner blindness," suggesting that showing fewer ads can lead to a higher long-term revenue.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests: Engineering, Information Retrieval, Probability, DISTRIBUTION, Information Processing, and 13 moreAssignment, Probability Distribution & Applications, Mathematical Sciences, Load Balancing, Makespan, Large classes, Web Search Engine, Load Balance, Randomized Algorithm, Lower Bound, Assignment Problem, File, and Distribution
Research Interests:
ABSTRACT Given only the URL of a Web page, can we identify its language? In this article we examine this question. URL-based language classification is useful when the content of the Web page is not available or downloading the content is... more
ABSTRACT Given only the URL of a Web page, can we identify its language? In this article we examine this question. URL-based language classification is useful when the content of the Web page is not available or downloading the content is a waste of bandwidth and time. We built URL-based language classifiers for English, German, French, Spanish, and Italian by applying a variety of algorithms and features. As algorithms we used machine learning algorithms which are widely applied for text classification and state-of-art algorithms for language identification of text. As features we used words, various sized n-grams, and custom-made features (our novel feature set). We compared our approaches with two baseline methods, namely classification by country code top-level domains and classification by IP addresses of the hosting Web servers. We trained and tested our classifiers in a 10-fold cross-validation setup on a dataset obtained from the Open Directory Project and from querying a commercial search engine. We obtained the lowest F1-measure for English (94) and the highest F1-measure for German (98) with the best performing classifiers. We also evaluated the performance of our methods: (i) on a set of Web pages written in Adobe Flash and (ii) as part of a language-focused crawler. In the first case, the content of the Web page is hard to extract and in the second page downloading pages of the “wrong” language constitutes a waste of bandwidth. In both settings the best classifiers have a high accuracy with an F1-measure between 95 (for English) and 98 (for Italian) for the Adobe Flash pages and a precision between 90 (for Italian) and 97 (for French) for the language-focused crawler.
Research Interests:
Research Interests:
The internet is often thought of as a democratizer, enabling equality in aspects such as pay, as well as a tool introducing novel communication and monetization opportunities. In this study we examine athletes on Cameo, a website that... more
The internet is often thought of as a democratizer, enabling equality in aspects such as pay, as well as a tool introducing novel communication and monetization opportunities. In this study we examine athletes on Cameo, a website that enables bi-directional fancelebrity interactions, questioning whether the well-documented gender pay gaps in sports persist in this digital setting. Traditional studies into gender pay gaps in sports are mostly in a centralized setting where an organization decides the pay for the players, while Cameo facilitates grassroots fan engagement where fans pay for video messages from their preferred athletes. The results showed that even on such a platform gender pay gaps persist, both in terms of cost-per-message, and in the number of requests, proxied by number of ratings. For instance, we find that female athletes have a median pay of 30$ per-video, while the same statistic is 40$ for men. The results also contribute to the study of parasocial relationships and personalized fan engagements over a distance. Something that has become more relevant during the ongoing COVID-19 pandemic, where in-person fan engagement has often been limited.
Research Interests:
Research Interests:
BackgroundBrief intervention is a critical method for identifying patients with problematic substance use in primary care settings and for motivating them to consider treatment options. However, despite considerable evidence of delay... more
BackgroundBrief intervention is a critical method for identifying patients with problematic substance use in primary care settings and for motivating them to consider treatment options. However, despite considerable evidence of delay discounting in patients with substance use disorders, most brief advice by physicians focuses on the long-term negative medical consequences, which may not be the best way to motivate patients to seek treatment information.ObjectiveIdentification of the specific symptoms that most motivate individuals to seek treatment information may offer insights for further improving brief interventions. To this end, we used anonymized internet search engine data to investigate which medical conditions and symptoms preceded searches for 12-step meeting locators and general 12-step information.MethodsWe extracted all queries made by people in the United States on the Bing search engine from November 2016 to July 2017. These queries were filtered for those who mention...
Research Interests:
Research Interests:
Research Interests:
Facebook, the most popular social network with over one billion daily users, provides rich opportunities for its use in the health domain. Though much of Facebook's data are not available to outsiders, the company provides a tool for... more
Facebook, the most popular social network with over one billion daily users, provides rich opportunities for its use in the health domain. Though much of Facebook's data are not available to outsiders, the company provides a tool for estimating the audience of Facebook advertisements, which includes aggregated information on the demographics and interests, such as weight loss or dieting, of Facebook users. This paper explores the potential uses of Facebook ad audience estimates for eHealth by studying the following: (1) for what type of health conditions prevalence estimates can be obtained via social media and (2) what type of marker interests are useful in obtaining such estimates, which can then be used for recruitment within online health interventions. The objective of this study was to understand the limitations and capabilities of using Facebook ad audience estimates for public health monitoring and as a recruitment tool for eHealth interventions. We use the Facebook Mark...
Research Interests:
Research Interests:
Research Interests:
Social media platforms provide several social interactional features. Due to the large scale reach of social media, these interactional features help enable various types of political discourse. Constructive and diversified discourse is... more
Social media platforms provide several social interactional features. Due to the large scale reach of social media, these interactional features help enable various types of political discourse. Constructive and diversified discourse is important for sustaining healthy communities and reducing the impact of echo chambers. In this paper, we empirically examine the role of a newly introduced Twitter feature, 'quote retweets' (or 'quote RTs') in political discourse, specifically whether it has led to improved, civil, and balanced exchange. Quote RTs allow users to quote the tweet they retweet, while adding a short comment. Our analysis using content, network and crowd labeled data indicates that the feature has increased political discourse and its diffusion, compared to existing features. We discuss the implications of our findings in understanding and reducing online polarization.
Research Interests:
In recent years, the Middle East’s information and communication landscape has changed dramatically. Increasingly, states, businesses, and citizens are capitalising on the opportunities offered by new technologies, the fast pace of... more
In recent years, the Middle East’s information and communication landscape has changed dramatically. Increasingly, states, businesses, and citizens are capitalising on the opportunities offered by new technologies, the fast pace of digitisation, and enhanced connectivity. These changes are far from turning Middle Eastern nations into network societies, but their impact is significant. The growing adoption of a wide variety of technologies in everyday life has given rise to complex dynamics that beg for a better understanding. Digital Middle East sheds a critical light on the continuing changes closely intertwined with the adoption of information and communication technologies in the region. Drawing on case studies from throughout the Middle East, the contributors explore how these digital transformations are playing out in the social, cultural, political, and economic spheres, exposing the various disjunctions and discordances that have marked the advent of the digital Middle East.
Research Interests: History, Information Technology, Information Society, Digital Media, Iranian Studies, and 15 moreDigital Culture, Digitization, E Government, Egypt, Media, Activism, Global South, Information and Communication Technologies, Information and Communications Technology, Digitisation, Cyber Politics, Digital Transformations, Digital Era, Arab Youth, and E Commerce
On social media platforms, like Twitter, users are often interested in gaining more influence and popularity by growing their set of followers, aka their audience. Several studies have described the properties of users on Twitter based on... more
On social media platforms, like Twitter, users are often interested in gaining more influence and popularity by growing their set of followers, aka their audience. Several studies have described the properties of users on Twitter based on static snapshots of their follower network. Other studies have analyzed the general process of link formation. Here, rather than investigating the dynamics of this process itself, we study how the characteristics of the audience and follower links change as the audience of a user grows in size on the road to user's popularity. To begin with, we find that the early followers tend to be more elite users than the late followers, i.e., they are more likely to have verified and expert accounts. Moreover, the early followers are significantly more similar to the person that they follow than the late followers. Namely, they are more likely to share time zone, language, and topics of interests with the followed user. To some extent, these phenomena are...
Research Interests:
While the coronavirus disease 2019 (COVID-19) pandemic wreaked havoc across the globe, we have witnessed substantial mis- and disinformation regarding various aspects of the disease. We conducted a cross-sectional study using a... more
While the coronavirus disease 2019 (COVID-19) pandemic wreaked havoc across the globe, we have witnessed substantial mis- and disinformation regarding various aspects of the disease. We conducted a cross-sectional study using a self-administered questionnaire for the general public (recruited via social media) and healthcare workers (recruited via email) from the State of Qatar, and the Middle East and North Africa region to understand the knowledge of and anxiety levels around COVID-19 (April–June 2020) during the early stage of the pandemic. The final dataset used for the analysis comprised of 1658 questionnaires (53.0% of 3129 received questionnaires; 1337 [80.6%] from the general public survey and 321 [19.4%] from the healthcare survey). Knowledge about COVID-19 was significantly different across the two survey populations, with a much higher proportion of healthcare workers possessing better COVID-19 knowledge than the general public (62.9% vs. 30.0%, p < 0.0001). A reverse ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
We developed Political Insights, an online searchable database of politically charged queries, which allows you to obtain topical insights into partisan concern. In this paper we demonstrate how you can discover such political queries and... more
We developed Political Insights, an online searchable database of politically charged queries, which allows you to obtain topical insights into partisan concern. In this paper we demonstrate how you can discover such political queries and how to lay bare which issues are most salient to political audiences. We employ anonymized search engine queries resulting in a click on U.S. political blogs to calculate the probability that a query will land on blogs of a particular leaning. We are thus able to ‘charge’ queries politically and to group them along opposing partisan lines. Finally, by comparing the zip codes of users submitting these queries with election results, we find that the leaning of blogs people read correlates well with their likely voting behavior.
Research Interests:
Research Interests:
Abstract. Multimedia annotation is central to its organization and re-trieval – a task which tag recommendation systems attempt to simplify. We propose a photo tag recommendation system which automatically extracts semantics from visual... more
Abstract. Multimedia annotation is central to its organization and re-trieval – a task which tag recommendation systems attempt to simplify. We propose a photo tag recommendation system which automatically extracts semantics from visual and meta-data features to complement existing tags. Compared to standard content/tag-based models, these automatic tags provide a richer description of the image and especially improve performance in the case of the “cold start problem”. 1
We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in... more
We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe three algorithms which can be applied to this problem. In experiments, our best-performing method yields an improvement in precision of 10-15 % over a baseline method very similar to the system currently used by Flickr. Our system is accessible at
Research Interests:
We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in... more
We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe three algorithms which can be applied to this problem. In experiments, our best-performing method yields an improvement in precision of 10-15% over a baseline method very similar to the system currently used by Flickr. Our system is accessible at http://ltaa5.epfl.ch/flickr-tags/. To the best of our knowledge, this is the first study on tag suggestion in a setting where (i) no full text information is available, such as for blogs, (ii) no item has been tagged by more than one person, such as for social bookmarking sites, and (iii) suggestions are dynamically updated, requiring efficient yet effect...
In The Clash of Civilizations, Samuel Huntington argued that the primary axis of global conflict was no longer ideological or eco-nomic but cultural and religious, and that this division would char-acterize the “battle lines of the... more
In The Clash of Civilizations, Samuel Huntington argued that the primary axis of global conflict was no longer ideological or eco-nomic but cultural and religious, and that this division would char-acterize the “battle lines of the future. ” In contrast to the "top down" approach in previous research focused on the relations among na-tion states, we focused on the flows of interpersonal communica-tion as a bottom-up view of international alignments. To that end, we mapped the locations of the world’s countries in global email networks to see if we could detect cultural fault lines. Using IP-geolocation on a worldwide anonymized dataset obtained from a large Internet company, we constructed a global email network. In computing email flows we employ a novel rescaling procedure to account for differences due to uneven adoption of a particular Inter-net service across the world. Our analysis shows that email flows are consistent with Huntington’s thesis. In addition to locatio...
Is it possible to "hack" an image of an international entity by driving international and domestic media? Here, we present an image/brand monitoring tool for a country, Qatar, which presents an overview of the contexts and... more
Is it possible to "hack" an image of an international entity by driving international and domestic media? Here, we present an image/brand monitoring tool for a country, Qatar, which presents an overview of the contexts and references to media in which it is mentioned on social media. Tracking dozens of languages, this tool allows a global understanding of the perceptions and concerns Twitter users associate with Qatar, and which mainstream media may be driving these sentiments.
Research Interests:
Research Interests:
We present a system that visualizes geo-temporal Twitter activity. The distinguishing features our system offers include, (i) a large degree of user freedom in specifying the subset of data to visualize and (ii) a focus on... more
We present a system that visualizes geo-temporal Twitter activity. The distinguishing features our system offers include, (i) a large degree of user freedom in specifying the subset of data to visualize and (ii) a focus on *discriminative* patterns rather than high volume patterns. Tweets with precise GPS co-ordinates are assigned to geographical cells and grouped by (i) tweet language, (ii) tweet topic, (iii) day of week, and (iv) time of day. The spatial resolutions of the cells is determined in a data-driven manner using quad-trees and recursive splitting. The user can then choose to see data for, say, English tweets on weekend evenings for the topic "party". This system has been implemented for 1.8 million geo-tagged tweets from Qatar (http://qtr.qcri.org/) and for 4.8 million geo-tagged tweets from New York City (http://nyc.qcri.org/) and can be easily extended to other cities/countries.