Abstract
The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where items come one at a time. CrowdSense dynamically samples subsets of the crowd based on an exploration/exploitation criterion. The algorithm produces a weighted combination of the subset’s votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional approximations to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence approximation of the labelers for large crowds, whereas the second algorithm finds a lower bound on how often the current subcrowd agrees with the crowd’s majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the \(23^{rd}\) annual ACM symposium on User interface software and technology (UIST), pp 313–322
Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowd-powered interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST), pp 33–42
Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: Nearly real-time answers to visual questions. In: Proceedings of the \(23^{rd}\) Annual ACM Symposium on User Interface Software and Technology, ACM, New York, USA, UIST ’10, pp 333–342
Callison-Burch C, Dredze M (2010) Creating speech and language data with amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp 1–12
Dakka W, Ipeirotis PG (2008) Automatic extraction of useful facet hierarchies from text databases. In: Proceedings of the 24\(^{th}\) International Conference on Data Engineering (ICDE), pp 466–475
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl. Stat. 28(1):20–28
Dekel O, Shamir O (2009a) Good learners for evil teachers. In: Proceedings of the 26\(^{th}\) Annual International Conference on Machine Learning (ICML)
Dekel O, Shamir O (2009b) Vox populi: Collecting high-quality labels from a crowd. In: Proceedings of the 22\(^{nd}\) Annual Conference on Learning Theory
Dekel O, Gentile C, Sridharan K (2010) Robust selective sampling from single and multiple teachers. In: The 23\(^{rd}\) Conference on Learning Theory (COLT), pp 346–358
Donmez P, Carbonell JG, Schneider J (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD), pp 259–268
Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10, pp 2399–2402
Ertekin S, Hirsh H, Rudin C (2012) Learning to predict the wisdom of crowds. In: Proceedings of Collective Intelligence, CI’12, Cambridge, Massachusetts
Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Association for Computational Linguistics, CSLDAMT ’10, pp 148–151
Hsueh PY, Melville P, Sindhwani V (2009) Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp 27–35
Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, ACM, New York, USA, HCOMP ’10, pp 64–67
Ipeirotis PG, Provost F, Sheng VS, Wang J (2013) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441
Kaisser M, Lowe J (2008) Creating a research collection of question answer sentence pairs with amazons mechanical turk. In: Proceedings of the \(6^{th}\) International Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA)
Kasneci G, Gael JV, Stern D, Graepel T (2011) Cobayes: bayesian knowledge corroboration with assessors of unknown areas of expertise. In: Proceedings of the \(4^{th}\) ACM International Conference on Web Search and Data Mining (WSDM), pp 465–474
Law E, von Ahn L (2011) Human computation, synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Marge M, Banerjee S, Rudnicky A (2010) Using the amazon mechanical turk for transcription of spoken language. In: Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp 5270–5273
Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’09, pp 77–85
Nakov P (2008) Noun compound interpretation using paraphrasing verbs: Feasibility study. In: Proceedings of the \(13^{th}\) international conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer-Verlag, Berlin, Heidelberg, AIMSA ’08, pp 103–117
Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on Multimedia information retrieval, MIR ’10, pp 557–566
Ogawa S, Piller F (2006) Reducing the risks of new product development. MITSloan Manag Rev 47(2):65
Quinn AJ, Bederson BB (2011), Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 Conference on Human Factors in, Computing Systems, pp 1403–1412
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res (JMLR) 11:1297–1322
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceeding of the \(14^{th}\) International Conference on Knowledge Discovery and Data Mining (KDD), pp 614–622
Smyth P, Burl MC, Fayyad UM, Perona P (1994a) Knowledge discovery in large image databases: Dealing with uncertainties in ground truth. In: KDD, Workshop, pp 109–120
Smyth P, Fayyad UM, Burl MC, Perona P, Baldi P (1994b) Inferring ground truth from subjective labelling of venus images. In: NIPS, pp 1085–1092
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 254–263
Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. Computer Vision and Pattern Recognition Workshop 1–8
Sullivan EA (2010) A group effort: more companies are turning to the wisdom of the crowd to find ways to innovate. Mark News 44(2):22–28
Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Who should label what? instance allocation in multiple expert active learning. In: Proceedings of the SIAM International Conference on Data Mining (SDM)
Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Transact Med Imaging (TMI) 23(7):21–903
Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems (NIPS) vol 10, pp 2424-2432
Whitehill J, Ruvolo P, fan Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems (NIPS), pp 2035–2043
Yan Y, Rosales R, Fung G, Dy J (2010b) Modeling multiple annotator expertise in the semi-supervised learning scenario. In: Proc. of the \(26^{th}\) Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press, Corvallis, Oregon, pp 674–682
Yan Y, Rosales R, Fung G, Schmidt MW, Valadez GH, Bogoni L, Moy L, Dy JG (2010b) Modeling annotator expertise: Learning when everybody knows a bit of something. J Mac Learn Res-Proc Track 9:932–939
Zheng Y, Scott S, Deng K (2010) Active learning from multiple noisy labelers with varied costs. In: 10th IEEE International Conference on Data Mining (ICDM), pp 639–648
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, Filip Zelezny.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ertekin, Ş., Rudin, C. & Hirsh, H. Approximating the crowd. Data Min Knowl Disc 28, 1189–1221 (2014). https://doi.org/10.1007/s10618-014-0354-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0354-1