Skip to main content
In 2021, Google announced they would disable third-party cookies in the Chrome browser in order to improve user privacy. They proposed FLoC as an alternative, meant to enable interest-based advertising while mitigating risks of... more
In 2021, Google announced they would disable third-party cookies in the Chrome browser in order to improve user privacy. They proposed FLoC as an alternative, meant to enable interest-based advertising while mitigating risks of individualized user tracking. The FLoC algorithm assigns users to 'cohorts' that represent groups of users with similar browsing behaviors so that third-parties can serve users ads based on their group. After testing FLoC in a real world trial, Google canceled the proposal, with little explanation, in favor of new alternatives to third-party cookies. In this work, we offer a post-mortem analysis of how FLoC handled balancing utility and privacy. In particular, we analyze two potential problems raised by privacy advocates: FLoC (1) allows individualized user tracking rather than prevents it and (2) risks revealing sensitive user demographic information, presenting a new privacy risk. We test these problems by implementing FLoC and compute cohorts for u...
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these... more
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to improve distributed search. Here we draw upon results from the networked optimization literatures suggesting that arranging learning agents in communication networks other than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the de facto fully-connected communication topology across several DRL benchmark tasks. Additionally, we observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showi...
High-resolution individual geolocation data passively collected from mobile phones is increasingly sold in private markets and shared with researchers. This data poses significant security, privacy, and ethical risks: it's been shown... more
High-resolution individual geolocation data passively collected from mobile phones is increasingly sold in private markets and shared with researchers. This data poses significant security, privacy, and ethical risks: it's been shown that users can be re-identified in such datasets, and its collection rarely involves their full consent or knowledge. This data is valuable to private firms (e.g. targeted marketing) but also presents clear value as a public good. Recent public interest research has demonstrated that high-resolution location data can more accurately measure segregation in cities and provide inexpensive transit modeling. But as data is aggregated to mitigate its re-identifiability risk, its value as a good diminishes. How do we rectify the clear security and safety risks of this data, its high market value, and its potential as a resource for public good? We extend the recently proposed concept of a tradeoff curve that illustrates the relationship between dataset uti...
It's long been known that humans, like many animals, exhibit patterns of behavior that appear to balance exploration of new opportunity and resources with exploitation of already-found safe bets. Humans seem to leverage exploration... more
It's long been known that humans, like many animals, exhibit patterns of behavior that appear to balance exploration of new opportunity and resources with exploitation of already-found safe bets. Humans seem to leverage exploration not only to find quality resources, but also to find quality sources of information, such as people or communities. In this thesis, I explore how exploration behavior and the information diversity afforded by such behavior relates to learning and discovery. I first take a theoretical and algorithmic approach to show how considering exploration behavior and information diversity in deep reinforcement learning systems can lead to improved learning. I then present brief observational studies of exploration behavior in two real-world human systems: a social trading network and human mobility in a major U.S. netro area. In the social trading network, I show that users who fail to seek out diverse information far from their local network are more likely to ...
In 2021, Google announced they would disable third-party cookies in the Chrome browser in order to improve user privacy. They proposed FLoC as an alternative, meant to enable interest-based advertising while mitigating risks of... more
In 2021, Google announced they would disable third-party cookies in the Chrome browser in order to improve user privacy. They proposed FLoC as an alternative, meant to enable interest-based advertising while mitigating risks of individualized user tracking. The FLoC algorithm assigns users to 'cohorts' that represent groups of users with similar browsing behaviors so that third-parties can serve users ads based on their group. After testing FLoC in a real world trial, Google canceled the proposal, with little explanation, in favor of new alternatives to third-party cookies. In this work, we offer a post-mortem analysis of how FLoC handled balancing utility and privacy. In particular, we analyze two potential problems raised by privacy advocates: FLoC (1) allows individualized user tracking rather than prevents it and (2) risks revealing sensitive user demographic information, presenting a new privacy risk. We test these problems by implementing FLoC and compute cohorts for u...
We identify key factors that influence the social role of students in an online class and predict student grades and roles using communication features. Our results show that students that communicate with others less overall are more... more
We identify key factors that influence the social role of students in an online class and predict student grades and roles using communication features. Our results show that students that communicate with others less overall are more likely to work alone on future projects, while future group leaders are more likely to have high rates of communication with other study groups. Interestingly, social patterns seem to be as important as learning performance and more important than demographic information in predicting a student’s social role, and integral in predicting a student’s grade.
High-resolution individual geolocation data passively collected from mobile phones is increasingly sold in private markets and shared with researchers. This data poses significant security, privacy, and ethical risks: it's been shown... more
High-resolution individual geolocation data passively collected from mobile phones is increasingly sold in private markets and shared with researchers. This data poses significant security, privacy, and ethical risks: it's been shown that users can be re-identified in such datasets, and its collection rarely involves their full consent or knowledge. This data is valuable to private firms (e.g. targeted marketing) but also presents clear value as a public good. Recent public interest research has demonstrated that high-resolution location data can more accurately measure segregation in cities and provide inexpensive transit modeling. But as data is aggregated to mitigate its re-identifiability risk, its value as a good diminishes. How do we rectify the clear security and safety risks of this data, its high market value, and its potential as a resource for public good? We extend the recently proposed concept of a tradeoff curve that illustrates the relationship between dataset uti...
Press releases (PR) serve as an important tool by which political figures (and business) communicate their messages to news editors and journalists, which in turn deliver it to their audience. Indeed, much of the media coverage of... more
Press releases (PR) serve as an important tool by which political figures (and business) communicate their messages to news editors and journalists, which in turn deliver it to their audience. Indeed, much of the media coverage of political events is based on, responds to or quotes press releases submitted by politicians. With the increasing number of PR communications it is important to provide journalists with tools for easy processing of press releases at large scale. In this paper we present a system for automatic discovery of agenda setting efforts, framing strategies and political spin as evident in a large corpus of press releases. The system combines topic models, sentiment analysis and autoregressive-distributed-lag models. We automatically analyze over 130000 PR communications released by members of the House and the Senate in the years 2010-2013 and find significant differences in topic ownership and sentiment as well as significant evidence for coordinated campaigns for ...
It's long been known that humans, like many animals, exhibit patterns of behavior that appear to balance exploration of new opportunity and resources with exploitation of already-found safe bets. Humans seem to leverage exploration... more
It's long been known that humans, like many animals, exhibit patterns of behavior that appear to balance exploration of new opportunity and resources with exploitation of already-found safe bets. Humans seem to leverage exploration not only to find quality resources, but also to find quality sources of information, such as people or communities. In this thesis, I explore how exploration behavior and the information diversity afforded by such behavior relates to learning and discovery. I first take a theoretical and algorithmic approach to show how considering exploration behavior and information diversity in deep reinforcement learning systems can lead to improved learning. I then present brief observational studies of exploration behavior in two real-world human systems: a social trading network and human mobility in a major U.S. netro area. In the social trading network, I show that users who fail to seek out diverse information far from their local network are more likely to ...
We present Breakout, a group interaction platform for online courses that enables the creation and measurement of face-to-face peer learning groups in online settings. Breakout is designed to help students easily engage in synchronous,... more
We present Breakout, a group interaction platform for online courses that enables the creation and measurement of face-to-face peer learning groups in online settings. Breakout is designed to help students easily engage in synchronous, video breakout session based peer learning in settings that otherwise force students to rely on asynchronous text-based communication. The platform also offers data collection and intervention tools for studying the communication patterns inherent in online learning environments. The goals of the system are twofold: to enhance student engagement in online learning settings and to create a platform for research into the relationship between distributed group interaction patterns and learning outcomes.
We identify key factors that influence the social role of students in an online class and predict student grades and roles using communication features. Our results show that students that communicate with others less overall are more... more
We identify key factors that influence the social role of students in an online class and predict student grades and roles using communication features. Our results show that students that communicate with others less overall are more likely to work alone on future projects, while future group leaders are more likely to have high rates of communication with other study groups. Interestingly, social patterns seem to be as important as learning performance and more important than demographic information in predicting a student’s social role, and integral in predicting a student’s grade.
A common technique to improve speed and robustness of learning in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of... more
A common technique to improve speed and robustness of learning in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to better facilitate distributed search. Here we draw upon results from the networked optimization and collective intelligence literatures suggesting that arranging learning agents in less than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the standard fully-connected communication topology across several DRL benchmark tasks. We observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected top...
We present Breakout, a group interaction platform for online courses that enables the creation and measurement of face-to-face peer learning groups in online settings. Breakout is designed to help students easily engage in synchronous,... more
We present Breakout, a group interaction platform for online courses that enables the creation and measurement of face-to-face peer learning groups in online settings. Breakout is designed to help students easily engage in synchronous, video breakout session based peer learning in settings that otherwise force students to rely on asynchronous text-based communication. The platform also offers data collection and intervention tools for studying the communication patterns inherent in online learning environments. The goals of the system are twofold: to enhance student engagement in online learning settings and to create a platform for research into the relationship between distributed group interaction patterns and learning outcomes.
We draw upon a previously largely untapped literature on human collective intelligence as a source of inspiration for improving deep learning. Implicit in many algorithms that attempt to solve Deep Reinforcement Learning (DRL) tasks is... more
We draw upon a previously largely untapped literature on human collective intelligence as a source of inspiration for improving deep learning. Implicit in many algorithms that attempt to solve Deep Reinforcement Learning (DRL) tasks is the network of processors along which parameter values are shared. So far, existing approaches have implicitly utilized fully-connected networks, in which all processors are connected. However, the scientific literature on human collective intelligence suggests that complete networks may not always be the most effective information network structures for distributed search through complex spaces. Here we show that alternative topologies can improve deep neural network training: we find that sparser networks learn higher rewards faster, leading to learning improvements at lower communication costs.
We draw upon a previously largely untapped literature on human collective intelligence as a source of inspiration for improving deep learning. Implicit in many algorithms that attempt to solve Deep Reinforcement Learning (DRL) tasks is... more
We draw upon a previously largely untapped literature on human collective intelligence as a source of inspiration for improving deep learning. Implicit in many algorithms that attempt to solve Deep Reinforcement Learning (DRL) tasks is the network of processors along which parameter values are shared. So far, existing approaches have implicitly utilized fully-connected networks, in which all processors are connected. However, the scientific literature on human collective intelligence suggests that complete networks may not always be the most effective information network structures for distributed search through complex spaces. Here we show that alternative topologies can improve deep neural network training: we find that sparser networks learn higher rewards faster, leading to learning improvements at lower communication costs.
We present Open Badges, an open-source framework an toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular... more
We present Open Badges, an open-source framework an toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular system that allows researchers to monitor and collect interaction data from people engaged in real-life social settings. In this paper we describe the technical aspects of the Open Badges project and the motivation for its creation.
We present Open Badges, an open-source framework an toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular... more
We present Open Badges, an open-source framework an toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular system that allows researchers to monitor and collect interaction data from people engaged in real-life social settings. In this paper we describe the technical aspects of the Open Badges project and the motivation for its creation.
In this empirical paper, we investigate how learning agents can be arranged in more efficient communication topologies for improved learning. This is an important problem because a common technique to improve speed and robustness of... more
In this empirical paper, we investigate how learning agents can be arranged in more efficient communication topologies for improved learning. This is an important problem because a common technique to improve speed and robustness of learning in deep reinforcement learning and many other machine learning algorithms is to run multiple learning agents in parallel. The standard communication architecture typically involves all agents intermittently communicating with each other (fully connected topology) or with a centralized server (star topology). Unfortunately, optimizing the topology of communication over the space of all possible graphs is a hard problem, so we borrow results from the networked optimization and collective intelligence literatures which suggest that certain families of network topologies can lead to strong improvements over fully-connected networks. We start by introducing alternative network topologies to DRL benchmark tasks under the Evolution Strategies paradigm ...
In this empirical paper, we investigate how learning agents can be arranged in more efficient communication topologies for improved learning. This is an important problem because a common technique to improve speed and robustness of... more
In this empirical paper, we investigate how learning agents can be arranged in more efficient communication topologies for improved learning. This is an important problem because a common technique to improve speed and robustness of learning in deep reinforcement learning and many other machine learning algorithms is to run multiple learning agents in parallel. The standard communication architecture typically involves all agents intermittently communicating with each other (fully connected topology) or with a centralized server (star topology). Unfortunately, optimizing the topology of communication over the space of all possible graphs is a hard problem, so we borrow results from the networked optimization and collective intelligence literatures which suggest that certain families of network topologies can lead to strong improvements over fully-connected networks. We start by introducing alternative network topologies to DRL benchmark tasks under the Evolution Strategies paradigm ...
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these... more
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to improve distributed search. Here we draw upon results from the networked optimization literatures suggesting that arranging learning agents in communication networks other than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the de facto fully-connected communication topology across several DRL benchmark tasks. Additionally, we observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showi...
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these... more
A common technique to improve learning performance in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to improve distributed search. Here we draw upon results from the networked optimization literatures suggesting that arranging learning agents in communication networks other than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the de facto fully-connected communication topology across several DRL benchmark tasks. Additionally, we observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showi...
Traditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in... more
Traditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in places that may relate to individual choices, opportunities, and mobility behavior. Using a large-scale high-resolution mobility data set of 4.5 million mobile phone users and 1.1 million places in 11 large American cities, we show that income segregation experienced in places and by individuals can differ greatly even within close spatial proximity. To further understand these fine-grained income segregation patterns, we introduce a Schelling extension of a well-known mobility model, and show that experienced income segregation is associated with an individual’s tendency to explore new places (place exploration) as well as places with visitors from different income groups (social exploration). Interestingly, while the latter is more strongly associa...
Traditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in... more
Traditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in places that may relate to individual choices, opportunities, and mobility behavior. Using a large-scale high-resolution mobility data set of 4.5 million mobile phone users and 1.1 million places in 11 large American cities, we show that income segregation experienced in places and by individuals can differ greatly even within close spatial proximity. To further understand these fine-grained income segregation patterns, we introduce a Schelling extension of a well-known mobility model, and show that experienced income segregation is associated with an individual’s tendency to explore new places (place exploration) as well as places with visitors from different income groups (social exploration). Interestingly, while the latter is more strongly associa...
Press releases (PR) serve as an important tool by which political figures (and business) communicate their messages to news editors and journalists, which in turn deliver it to their audience. Indeed, much of the media coverage of... more
Press releases (PR) serve as an important tool by which political figures (and business) communicate their messages to news editors and journalists, which in turn deliver it to their audience. Indeed, much of the media coverage of political events is based on, responds to or quotes press releases submitted by politicians. With the increasing number of PR communications it is important to provide journalists with tools for easy processing of press releases at large scale. In this paper we present a system for automatic discovery of agenda setting efforts, framing strategies and political spin as evident in a large corpus of press releases. The system combines topic models, sentiment analysis and autoregressive-distributed-lag models. We automatically analyze over 130000 PR communications released by members of the House and the Senate in the years 2010-2013 and find significant differences in topic ownership and sentiment as well as significant evidence for coordinated campaigns for agenda setting and political spin. We provide a detailed analysis of agenda setting campaigns related to two important issues in American politics: the health care reform and energy related issues.
Framing is a sophisticated form of discourse in which the speaker tries to induce a cognitive bias through consistent linkage between a topic and a specific context (frame). We build on political science and communication theory and use... more
Framing is a sophisticated form of discourse in which the speaker tries to induce a cognitive bias through consistent linkage between a topic and a specific context (frame). We build on political science and communication theory and use probabilistic topic models combined with time series regression analysis (autoregressive distributed-lag models) to gain insights about the language dynamics in the political processes. Processing four years of public statements issued by members of the U.S. Congress, our results provide a glimpse into the complex dynamic processes of framing, attention shifts and agenda setting, commonly known as "spin". We further provide new evidence for the divergence in party discipline in U.S. politics.
The Hollywood Blacklist was based on a series of interviews conducted by the House Committee on Un-American Activities (HUAC), trying to identify members of the communist party. We use various NLP algorithms in order to automatically... more
The Hollywood Blacklist was based on a series of interviews conducted by the House Committee on Un-American Activities (HUAC), trying to identify members of the communist party. We use various NLP algorithms in order to automatically analyze a large corpus of interview transcripts and construct a network of the industry members and their 'naming' relations. We further use algorithms for Sentiment Analysis in order to add a psychological dimension to the edges in the network. In particular, we test how different types of connections are manifested by different sentiment types and attitude of the interviewees. Analysis of the language used in the hearings can shed new light on the motivation and role of network members.
Recent scholarship has explored text reuse in legislation and speeches in order to track the flow of policy ideas through the US Congress. This has allowed scholars a much richer view of patterns of cooperation and inspiration within... more
Recent scholarship has explored text reuse in legislation and speeches in order to track the flow of policy ideas through the US Congress. This has allowed scholars a much richer view of patterns of cooperation and inspiration within Congress. We expand on this work by introducing the use of Topic Models -- a machine learning paradigm that allows automatic unsupervised topic detection and document clustering. By applying topic models to the public statements of members of Congress, we can reconstruct influence and cooperation on the basis of ideological similarity and mutual interest in a specific domain, rather than on the coarser distinction between individuals. Exploring text reuse based on topic models along with committee membership and political affiliation provides a richer view of the interpersonal and cross party networks of influence, cooperation and contention in the U.S. Congress.
Research Interests: