Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
Elections and opinion polls often have many candidates, with the aim to either rank the candidate... more Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters’ preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning for arbitrary voter noise models. Finally, we use d...
Manufacturing & Service Operations Management, 2021
Problem definition: Platforms critically rely on rating systems to learn the quality of market pa... more Problem definition: Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, ratings are often highly inflated and therefore, not very informative. In this paper, we first investigate whether the platform can obtain less inflated, more informative ratings by altering the meaning and relative importance of the levels in the rating system. Second, we seek a principled approach for the platform to make these choices in the design of the rating system. Academic/practical relevance: Platforms critically rely on rating systems to learn the quality of market participants, and so, ensuring these ratings are informative is of first-order importance. Methodology: We analyze the results of a randomized, controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices; in particular, the treatment conditions include severa...
We present a system for personalized tag suggestion for Flickr: While the user is entering/select... more We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe three algorithms which can be applied to this problem. In experiments, our best-performing method yields an improvement in precision of 10-15% over a baseline method very similar to the system currently used by Flickr. Our system is accessible at http://ltaa5.epfl.ch/flickr-tags/. To the best of our knowledge, this is the first study on tag suggestion in a setting where (i) no full text information is available, such as for blogs, (ii) no item has been tagged by more than one person, such as for social bookmarking sites, and (iii) suggestions are dynamically updated, requiring efficient yet effect...
We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and us... more We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.
A major difficulty in applying deep learning in novel domains is the expense associated with acqu... more A major difficulty in applying deep learning in novel domains is the expense associated with acquiring sufficient training data. In this work, we extend literature in deep transfer learning by studying the role of initializing the embedding matrix with word vectors from GLoVe on a target dataset before training models with data from another domain. We study transfer learning on variants of four models (2 RNNs, a CNN, and an LSTM) and three datasets. We conclude that 1) the simple idea of initializing word vectors significantly and robustly improves transfer learning performance, 2) cross-domain learning occurs in fewer iterations than in-domain learning, considerably reduces train time, and 3) blending various out-of-domain datasets before training improves transfer learning. We then apply our models to a dataset of over 400k tweets by politicians, classifying sentiment and subjectivity vs. objectivity. This dataset was provided unlabelled, motivating an unsupervised and transfer le...
Modern online platforms rely on effective rating systems to learn about items. We consider the op... more Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items' underlying match rates and the platform's preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.
In this section, we expand upon the results discussed in Section 5. We design and run an experime... more In this section, we expand upon the results discussed in Section 5. We design and run an experiment that a real platform may run to design a rating system. We follow the general framework in Section 4. We first run an experiment to estimate a ψ(θ, y), the probability at which each item with quality θ receives a positive answer under different questions y. Then, we design H(y), using our optimal β for various settings (different objectives w and matching rates g). Then, we simulate several markets (using the various matching rates g) and measure the performance of the different rating system designs H, as measured by various objective functions (2).
Many societal decision problems lie in high-dimensional continuous spaces not amenable to the vot... more Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called Iterative Local Voting for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters' utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across d...
Proceedings of the National Academy of Sciences of the United States of America, Apr 17, 2018
Word embeddings are a powerful machine-learning framework that represents each English word by a ... more Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection betwe...
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
Elections and opinion polls often have many candidates, with the aim to either rank the candidate... more Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters’ preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning for arbitrary voter noise models. Finally, we use d...
Manufacturing & Service Operations Management, 2021
Problem definition: Platforms critically rely on rating systems to learn the quality of market pa... more Problem definition: Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, ratings are often highly inflated and therefore, not very informative. In this paper, we first investigate whether the platform can obtain less inflated, more informative ratings by altering the meaning and relative importance of the levels in the rating system. Second, we seek a principled approach for the platform to make these choices in the design of the rating system. Academic/practical relevance: Platforms critically rely on rating systems to learn the quality of market participants, and so, ensuring these ratings are informative is of first-order importance. Methodology: We analyze the results of a randomized, controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices; in particular, the treatment conditions include severa...
We present a system for personalized tag suggestion for Flickr: While the user is entering/select... more We present a system for personalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe three algorithms which can be applied to this problem. In experiments, our best-performing method yields an improvement in precision of 10-15% over a baseline method very similar to the system currently used by Flickr. Our system is accessible at http://ltaa5.epfl.ch/flickr-tags/. To the best of our knowledge, this is the first study on tag suggestion in a setting where (i) no full text information is available, such as for blogs, (ii) no item has been tagged by more than one person, such as for social bookmarking sites, and (iii) suggestions are dynamically updated, requiring efficient yet effect...
We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and us... more We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.
A major difficulty in applying deep learning in novel domains is the expense associated with acqu... more A major difficulty in applying deep learning in novel domains is the expense associated with acquiring sufficient training data. In this work, we extend literature in deep transfer learning by studying the role of initializing the embedding matrix with word vectors from GLoVe on a target dataset before training models with data from another domain. We study transfer learning on variants of four models (2 RNNs, a CNN, and an LSTM) and three datasets. We conclude that 1) the simple idea of initializing word vectors significantly and robustly improves transfer learning performance, 2) cross-domain learning occurs in fewer iterations than in-domain learning, considerably reduces train time, and 3) blending various out-of-domain datasets before training improves transfer learning. We then apply our models to a dataset of over 400k tweets by politicians, classifying sentiment and subjectivity vs. objectivity. This dataset was provided unlabelled, motivating an unsupervised and transfer le...
Modern online platforms rely on effective rating systems to learn about items. We consider the op... more Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items' underlying match rates and the platform's preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.
In this section, we expand upon the results discussed in Section 5. We design and run an experime... more In this section, we expand upon the results discussed in Section 5. We design and run an experiment that a real platform may run to design a rating system. We follow the general framework in Section 4. We first run an experiment to estimate a ψ(θ, y), the probability at which each item with quality θ receives a positive answer under different questions y. Then, we design H(y), using our optimal β for various settings (different objectives w and matching rates g). Then, we simulate several markets (using the various matching rates g) and measure the performance of the different rating system designs H, as measured by various objective functions (2).
Many societal decision problems lie in high-dimensional continuous spaces not amenable to the vot... more Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called Iterative Local Voting for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters' utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across d...
Proceedings of the National Academy of Sciences of the United States of America, Apr 17, 2018
Word embeddings are a powerful machine-learning framework that represents each English word by a ... more Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection betwe...
Uploads
Papers