FakeNewsNet Big Data
FakeNewsNet Big Data
net/publication/341886409
CITATIONS READS
1,292 4,492
5 authors, including:
All content following this page was uploaded by Kai Shu on 09 June 2020.
Abstract
Social media has become a popular means for people to consume and share news. At the same
time, however, it has also enabled the wide dissemination of fake news, i.e., news with intentionally false
information, causing significant confusions and disruptions on society. To mitigate this problem, the
research of (computational) fake news detection has recently received a lot of attention. Despite several
existing computational solutions on the detection of fake news, however, the lack of comprehensive and
community-driven fake news benchmark datasets has become one of major roadblocks. Not only existing
datasets are scarce, they do not contain a myriad of features often required in the study such as news
content, social context, and spatiotemporal information. Therefore, in this paper, to facilitate fake news
related research, we present a fake news benchmark data repository, named as FakeNewsNet, which con-
tains two comprehensive datasets with diverse features in news content, social context, and spatiotemporal
information. We present a detailed description of the FakeNewsNet, demonstrate an exploratory analy-
sis of two datasets from varying perspectives, and discuss the benefits of the FakeNewsNet for potential
applications on fake news study on social media. The latest version of the FakeNewsNet is available at:
https://github.com/KaiDMML/FakeNewsNet
1 Introduction
Social media has become a primary source of news consumption nowadays. Social media is cost-free, easy
to access, and can fast disseminate posts. Hence, it acts as an excellent way for individuals to post and/or
consume information. For example, the time individuals spend on social media is continually increasing1 . As
another example, studies from Pew Research Center shows that around 68% of Americans get some of their
news on social media in 20182 and this has shown a constant increase since 2016. Since there is no regulatory
authority on social media, the quality of news pieces spread in social media is often lower than traditional
news sources. In other words, social media also enables the widespread of fake news. Fake news [18, 31]
means the false information that is spread deliberately to deceive people. Fake news affects the individuals
as well as society as a whole. First, fake news can disturb the authenticity balance of the news ecosystem.
Second, fake news persuades consumers to accept false or biased stories. For example, some individuals and
organizations spread fake news in social media for financial and political gains [1, 2]. It is also reported that
fake news has an influence on the 2016 US presidential elections3 . Finally, fake news may cause significant
effects on real-world events. For example, “Pizzagate”, a piece of fake news from Reddit, leads to a real
shooting4 . Thus, fake news detection is a critical issue that needs to be addressed.
1 https://www.socialmediatoday.com/marketing/how-much-time-do-people-spend-social-media-infographic
2 http://www.journalism.org/2018/09/10/news-use-across-social-media-platforms-2018/
3 https://www.independent.co.uk/life-style/gadgets-and-tech/news/tumblr-russian-hacking-us-presidential-election-fake-
news-internet-research-agency-propaganda-bots-a8274321.html
4 https://www.rollingstone.com/politics/politics-news/anatomy-of-a-fake-news-scandal-125877/
1
Detecting fake news on social media presents unique challenges. First, fake news pieces are intentionally
written to mislead consumers, which makes it not satisfactory to spot fake news from news content itself.
Thus, we need to explore information in addition to news content, such as user engagements and social
behaviors of users on social media. For example, a credible user’s comment that “This is fake news” is
a strong signal that the news may be fake. Second, the research community lacks datasets which contain
spatiotemporal information to understand how fake news propagates over time in different regions, how users
react to fake news, and how we can extract useful temporal patterns for (early) fake news detection and
intervention. Thus, it is necessary to have comprehensive datasets that have news content, social context
and spatiotemporal information to facilitate fake news research. However, to the best of our knowledge,
existing datasets only cover one or two aspects.
Therefore, in this paper, we construct and publicize a multi-dimensional data repository FakeNewsNet 5 ,
which currently contains two datasets with news content, social context, and spatiotemporal information.
The dataset is constructed using an end-to-end system, FakeNewsTracker6 [27]. The constructed FakeNews-
Net repository has the potential to boost the study of various open research problems related to fake news
study. First, the rich set of features in the datasets provides an opportunity to experiment with different
approaches for fake new detection, understand the diffusion of fake news in social network and intervene in
it. Second, the temporal information enables the study of early fake news detection by generating synthetic
user engagements from historical temporal user engagement patterns in the dataset [15]. Third, we can in-
vestigate the fake news diffusion process by identifying provenances, persuaders, and developing better fake
news intervention strategies [21]. Our data repository can serve as a starting point for many exploratory
studies for fake news, and provide a better, shared insight into disinformation tactics. We aim to continu-
ously update this data repository, expand it with new sources and features, as well as maintain completeness.
The main contributions of the paper are:
• We construct and publicize a multi-dimensional data repository for various facilitating fake news de-
tection related researches such as fake news detection, evolution, and mitigation;
• We conduct an exploratory analysis of the datasets from different perspectives to demonstrate the
quality of the datasets, understand their characteristics and provide baselines for future fake news
detection; and
• We discuss benefits and provides insight for potential fake news studies on social media with Fake-
NewsNet.
• BuzzFeedNews7 : This dataset comprises a complete sample of news published in Facebook from 9
news agencies over a week close to the 2016 U.S. election from September 19 to 23 and September 26
and 27. Every post and the linked article were fact-checked claim-by-claim by 5 BuzzFeed journalists.
It contains 1,627 articles –826 mainstream, 356 left-wing, and 545 right-wing articles.
5 https://github.com/KaiDMML/FakeNewsNet
6 http://blogtrackers.fulton.asu.edu:3000/#/about
7 https://github.com/BuzzFeedNews/2016-10-facebook-fact-check/tree/master/data
2
• LIAR8 : This dataset [26] is collected from fact-checking website PolitiFact. It has 12.8 K human
labeled short statements collected from PolitiFact and the statements are labeled into six categories
ranging from completely false to completely true as pants on fire, false, barely-true, half-true, mostly
true, and true.
• BS Detector9 : This dataset is collected from a browser extension called BS detector developed for
checking news veracity. It searches all links on a given web page for references to unreliable sources by
checking against a manually compiled list of domains. The labels are the outputs of the BS detector,
rather than human annotators.
• CREDBANK10 : This is a large-scale crowd-sourced dataset [13] of around 60 million tweets that
cover 96 days starting from Oct. 2015. The tweets are related to over 1,000 news events. Each event
is assessed for credibilities by 30 annotators from Amazon Mechanical Turk.
• BuzzFace11 : This dataset [17] is collected by extending the BuzzFeed dataset with comments related
to news articles on Facebook. The dataset contains 2263 news articles and 1.6 million comments.
• FacebookHoax12 : This dataset [23] comprises information related to posts from the facebook pages
related to scientific news (non- hoax) and conspiracy pages (hoax) collected using Facebook Graph
API. The dataset contains 15,500 posts from 32 pages (14 conspiracy and 18 scientific) with more than
2,300,000 likes.
We provide a comparison in Table 1 to show that no existing public datasets provide all features of
news content, social context, and spatiotemporal information. Existing datasets have some limitations that
FakeNewsNet addresses. For example, BuzzFeedNews only contains headlines and text for each news piece
and covers news articles from very few news agencies. LIAR dataset contains mostly short statements
instead of entire news articles with meta attributes. BS Detector data is collected and annotated by using
a developed news veracity checking tool, rather than using human expert annotators. CREDBANK dataset
was originally collected for evaluating tweet credibilities and the tweets in the dataset are not related to
the fake news articles and hence cannot be effectively used for fake news detection. BuzzFace dataset has
basic news contents and social context information but it does not capture the temporal information. The
FacebookHoax dataset consists very few instances about conspiracy theories and scientific news.
To address the disadvantages of existing fake news detection datasets, the proposed FakeNewsNet reposi-
tory collects multi-dimension information from news content, social context, and spatiotemporal information
from different types of news domains such as political and entertainment sources.
8 https://www.cs.ucsb.edu/ william/software.html
9 https://github.com/bs-detector/bs-detector
10 http://compsocial.github.io/CREDBANK-data/
11 https://github.com/gsantia/BuzzFace
12 https://github.com/gabll/some-like-it-hoax
3
Figure 1: The flowchart of dataset integration process for FakeNewsNet. It mainly describes the collection
of news content, social context and spatiotemporal information.
3 Dataset Integration
In this section, we introduce a process that integrates datasets to construct the FakeNewsNet repository.
We demonstrate (see Figure 1) how we can collect news contents with reliable ground truth labels, how we
obtain additional social context and spatiotemporal information.
4
news articles are generally written to reflect the fact and so may not be used directly. For example, one of
the headlines, “Jennifer Aniston NOT Wearing Brad Pitts Engagement Ring, Despite Report” mentions the
fact instead of the original news articles title. We utilize some heuristics to extract proper headlines such
as i) using the text in quoted string; ii) removing negative sentiment words. For example, some headlines
include quoted strings which are exact text from the original news source. In this case, we extract the
named entities from the headline using CoreNLP tool [12] and quoted strings to form the search query. For
example, in headline Jennifer Aniston, Brad Pitt NOT “Just Married” Despite Report, we extract named
entities including Jennifer Aniston, Brad Pitt and quoted strings including Just Married and form the search
query as “Jennifer Aniston Brad Pitt Just Married” because the quoted text in addition with named entities
mostly provides the context of the original news. As another example, the headlines are written in the
negative sense to correct the false information, e.g., “Jennifer Aniston NOT Wearing Brad Pitts Engagement
Ring, Despite Report”. So we remove negative sentiment words retrieved from SentiWordNet[3] and some
hand-picked words from the headline to form the search query, e.g., “Jennifer Aniston Wearing Brad Pitts
Engagement Ring”.
5
(a) PolitiFact Fake News (b) PolitiFact Real News
update newly coming news articles, so we dynamically collect these newly added news pieces and update
the FakeNewsNet repository as well. In addition, we keep collecting the user engagements for all the news
pieces periodically in the FakeNewsNet repository such as the recent social media posts, and second order
user behaviors such as replies, likes, and retweets. For example, we run the news content crawler and update
Tweet collector per day. The spatiotemporal information provides useful and comprehensive information for
studying fake news problem from a temporal perspective.
4 Data Analysis
FakeNewsNet has multi-dimensional information related to news content, social context, and spatiotemporal
information. In this section, we first provide some preliminary quantitative analysis to illustrate the features
of FakeNewsNet. We then perform fake news detection using several state-of-the-art models to evaluate the
quality of the FakeNewsNet repository. The detailed statistics of FakeNewsNet repository is illustrated in
Table 2.
6
4.2 Comparing Social Contexts of Fake and Real News
Social context represents the news proliferation process over time, which provides useful auxiliary information
to infer the veracity of news articles. Generally, there are three major aspects of the social media context
that we want to represent: user profiles, user posts, and network structures. Next, we perform an exploratory
study of these aspects on FakeNewsNet and introduce the potential usage of these features to help fake news
detection.
User profiles on social media have been shown to be correlated with fake news detection [22]. Research
has also shown that fake news pieces are likely to be created and spread by non-human accounts, such as
social bots or cyborgs [18, 20]. We will illustrate some user profile features in FakeNewsNet repository.
First, we explore whether the creation time of user accounts for fake news and true news is different or
not. We compute time ranges of account register time with the current date and the results are shown in
Figure 3. We can see that the account creation time distribution of users posting fake news is significantly
different from those who post real news, with the p-value< 0.05 under t-test. Also, we notice that it’s not
necessary that users with an account created long time or shorter time post fake/real news more often.
For example, the mean creation time for users posting fake news (2214.09) is less than that for real news
(2166.84) in Politifact; while we see opposite case in Gossipcop dataset.
Figure 4: A comparison of bot scores on users related to fake and real news on PolitiFact dataset.
Next, we take a deeper look into the user profiles and assess the social bots effects. We randomly selected
10,000 users who posted fake and real news and performed bot detection using Botometer [5], one of the
7
(a) PolitiFact dataset (b) GossipCop dataset
Figure 5: Ternary plots of the ratio of the positive, neutral and negative sentiment replies for fake and real
news.
state-of-the-art bot detection algorithm. Botometer20 takes Twitter username as input and utilizes various
features extracted from meta-data and outputs a probability score in [0, 1], indicating how likely the user
is a bot. We set the threshold of 0.5 on the bot score returned from the Botometer results to determine
bot accounts. Figure 4 shows the ratio of the bot and human users involved in tweets related to fake and
real news. We can see that bots are more likely to post tweets related to fake news than real users. For
example, almost 22% of users involved in fake news are bots, while only around 9% of users are predicted as
bot users for real news. Similar results were observed with different thresholds on bot scores based on both
datasets. This indicates that there are bots in Twitter for spreading fake news, which is consistent with the
observation in [20]. In addition, most users that spread fake news (around 78%) are still more likely to be
humans than bots (around 22%), which is also in consistence with the findings in [24].
8
(a) PolitiFact dataset (b) GossipCop dataset
Figure 6: Ternary plots of the ratio of likes, retweet and reply of tweets related to fake and real news
(a) Follower count of users in PolitiFact (b) Followee count of users in PolitiFact
dataset dataset
(c) Follower count of users in GossipCop (d) Followee count of users in GossipCop
dataset dataset
Figure 7: The distribution of the count of followers and followees related to fake and real news
users’ beliefs characteristics. FakeNewsNet provides real-world datasets to understand the social factors of
user engagements and underlying social science as well.
4.2.3 Networks
Users tend to form different networks on social media in terms of interests, topics, and relations, which serve
as the fundamental paths for information diffusion [18]. Fake news dissemination processes tend to form an
echo chamber cycle, highlighting the value of extracting network-based features to represent these types of
network patterns for fake news detection [6].
We look at the social network statistics of all the users that spread fake news or real news. The social
network features such as followers count and followee count can be used to estimate the scope of how the
fake news can spread in social media. We plot the distribution of follower count and followee count of users
in Figure 7. We can see that: i) the follower and followee count of the users generally follows power law
9
distribution, which is commonly observed in social network structures; ii) there is a spike in the followee
count distribution of both users and this is because of the restriction imposed by Twitter22 on users to have
at most 5000 followees when the number of following is less than 5000.
(a) Temporal user engagements of fake news (b) Temporal user engagements of real news
Figure 8: The comparison of temporal user engagements of fake and real news
Recent research has shown users’ temporal responses can be modeled using deep neural networks to
help detection fake news [16], and deep generative models can generate synthetic user engagements to help
early fake news detection [11]. The spatiotemporal information in FakeNewsNet depicts the temporal user
engagements for news articles, which provides the necessary information to further study the utility of using
spatiotemporal information to detect fake news.
First, we investigate if the temporal user engagements such as posts, replies, retweets, are different for
fake news and real news with similar topics, e.g., fake news “TRUMP APPROVAL RATING Better than
Obama and Reagan at Same Point in their Presidencies” from June 9, 2018 to 13 June, 2018 and real news
“President Trump in Moon Township Pennsylvania” from March 10, 2018 to 20 March, 2018. As shown in
Figure 8, we can observe that: i) for fake news, there is a sudden increase in the number of retweets and it
does remain constant beyond a short time whereas, in the case of real news, there is a steady increase in
the number of retweets; ii) Fake news pieces tend to receive fewer replies than real news. We have similar
observations in Table 2, and replies count for 5.76% among all tweets for fake news, and 7.93% for real
news. The differences of diffusion patterns for temporal user engagements have the potential to determine
the threshold time for early fake news detection. For example, if we can predict the sudden increase of user
engagements, we should use the user engagements before the time point and detect fake news accurately to
limit the affect size of fake news spreading [21].
Next, we demonstrate the geo-location distribution of users engaging in fake and real news (See Figure 9
for Politifact dataset). We show the locations explicitly provided by users in their profiles, and we can see
that users in the PolitiFact dataset who posting fake news have a different distribution than those posting real
news. Since it is usually sparse of locations provided by users explicitly, we can further consider the location
information attached with Tweets, and even utilize existing approaches for inferring the locations [28]. It
would be interesting to explore how users are geo-located distributes using FakeNewsNet repository from
different perspectives.
• News content: To evaluate the news contents, the text contents from source news articles are rep-
resented as a one-hot encoded vector and then we apply standard machine learning models including
support vector machines (SVM), logistic regression (LR), Naive Bayes (NB), and CNN. For SVM, LR,
22 https://help.twitter.com/en/using-twitter/twitter-follow-limit
10
(a) Spatial distribution for fake news (b) Spatial distribution for real news
Figure 9: Spatial distribution of users posting tweets related to fake and real news in PolitiFact dataset.
and NB, we used the default settings provided in the scikit-learn and do not tune parameters. For
CNN we use the standard implementation with default setting23 . We also evaluate the classification
of news articles using Social article fusion (SAF /S) [27] model that utilizes auto-encoder for learning
features from news articles to classify new articles as fake or real.
• Social context: In order to evaluate the social context, we utilize the variant of SAF model [27], i.e.,
SAF /A, which utilize the temporal pattern of the user engagements to detect fake news.
• News content and social context: Social Article Fusion(SAF) model that combines SAF /S and
SAF /A is used. This model uses autoencoder with LSTM cells of 2 layers for encoder as well as
decoder and also temporal pattern of the user engagements are also captured using another network
of LSTM cells with 2 layers.
The experimental results are shown in Table 3. We can see that: i) Among news content-based methods,
SAF /S perform better in terms of accuracy and F1 score in most cases. SAF /A provides a similar result
around 66.7% accuracy as SAF /S. The compared baselines models provide reasonably good performance
results for the fake news detection where accuracy is mostly around 65% on PolitiFact; ii) we observe that
SAF relatively achieves better accuracy than both SAF /S and SAF /A for both dataset. For example, SAF
has around 5.65% and 3.60% performance improvement than SAF /S and SAF /A on PolitiFact in terms of
Accuracy. This indicates that user engagements can help fake news detection in addition to news articles on
PolitiFact dataset.
In summary, FakeNewsNet provides multiple dimensions of information that has the potential to benefit
researchers to develop novel algorithms for fake news detection.
5 Data Structure
In this section, we describe in details of the structure of FakeNewsNet. We will introduce the data format
and provide API interfaces that allows for efficient downloading of dataset under the policy of social media
platforms.
23 https://github.com/dennybritz/cnn-text-classification-tf
11
5.1 API Interfaces
The full dataset is massive and the actual content cannot be directly distributed because of Twitter’s sharing
policy24 . The dataset25 is referenced using DOI 26 and adheres FAIR Data Principles 27 . The APIs are
provided in the form of multiple Python scripts which are well-documented and CSV file with news content
URLs and associated tweet ids are provided as well. In order to initiate the download, the user need to
simply run the main.py file with the required configuration. The APIs make use of Twitter Access tokens
to fetch information related to tweets. These APIs can help to download specific subsets of dataset such
as linguistic content, tweet information, retweet information, user information and social network. Since
Twitter does not provide APIs to download replies and likes of tweets, web scrapping tools can be used.
• news article.json includes all the meta information of the news articles collected using the provided
news source URLs. This is a JSON object with attributes including:
• tweets folder contains the metadata of the list of tweets associated with the news article. Each file in
this folder contains the tweet objects returned by the Twitter API.
• retweets folder includes a list of files containing the retweets of tweets posting the news articles. Each
file is named as <tweet id>.json and have a list of retweet objects collected using Twitter API.
• replies folder contains files including replies and conversation threads of tweets sharing the news such
as reply text, user details and reply timestamps.
• likes folder comprises files containing a list of IDs for users who have liked each of the tweets sharing
the news article.
In addition, we store the meta data of all users including profiles, historical tweets, followers, followees
through the following folders. Each of the these folders contains files named as <user id>.json indicating a
particular user details. Note that we only show the meta of 5000 users in the provided link due to the space
limitation.
• user profiles folder includes files containing all the metadata of the users in the dataset. Each file
is this directory is a JSON object collected from Twitter API containing information about the user
including profile creation time, geolocation of the user, profile image URL, followers count, followees
count, number of tweets posted and number of tweets favorited.
• user timeline tweets folder includes JSON files containing the list of at most 200 recent tweets posted
by the user. This includes the complete tweet object with all information related to tweet.
• user followers folder includes JSON files containing a list of user IDs of users following a particular
user.
• user following folder includes JSON files containing a list of user IDs a particular user follows.
24 https://developer.twitter.com/en/developer-terms/agreement-and-policy
25 To access the dataset, we have published code implementation available at https://github.com/KaiDMML/FakeNewsNet
12
6 Potential Applications
FakeNewsNet contains information from multi-dimensions which could be useful for many applications. We
believe FakeNewsNet would benefit the research community for studying various topics such as: (early) fake
news detection, fake news evolution, fake news mitigation, malicious account detection.
13
of these trajectories. Second, for a specific news event, the related topics may keep changing over time and
be diverse for fake news and real news. FakeNewsNet is dynamically collecting associated user engagements
and allows us to perform comparison analysis (e.g., see Figure 8), and further investigate distinct temporal
patterns to detect fake news [16]. Moreover, statistical time series models such as temporal point process
can be used to characterize different stages of user activities of news engagements [7]. FakeNewsNet enables
the temporal modeling from real-world datasets, which is otherwise impossible from synthetic datasets.
14
FakeNewsNet repository can be integrated with front-end softwares and build an end-to-end system for fake
news study.
Ackowledgments
This material is in part supported by the NSF awards #1909555, #1614576, #1742702, #1820609, and
#1915801.
References
[1] Srijan Kumar, and Neil Shah. 2018. False information on web and social media: A survey. In arXiv
preprint arXiv:1804.08559(2018)
[2] Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the web: Impact, characteristics,
and detection of wikipedia hoaxes. In WWW’16.
[3] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet3.0: an enhanced lexical
resource for sentiment analysis and opinion mining. In Lrec, Vol. 10. 2200–2204.
[4] Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer, and Matt Stempeck. Characterizing the life cycle
of online news stories using social media reac-tions. In CHI’14.
[5] Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Fil-ippo Menczer. Botornot:
A system to evaluate social bots. In WWW’16.
[6] Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, AntonioScala, Guido Caldarelli,
and Walter Quattrociocchi. 2016. Echo chambers: Emo-tional contagion and group polarization on face-
book.Scientific reports6 (2016),37825.
[7] Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, EliasKhalil, Shuang Li, Le
Song, and Hongyuan Zha. 2017. Fake news mitigation viapoint process based intervention. arXiv preprint
arXiv:1703.07823(2017).
[8] CJ Hutto Eric Gilbert. Vader: A parsimonious rule-based model for sen-timent analysis of social media
text. In ICWSM’14.
[9] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News Verification by Exploiting Conflicting
Social Viewpoints in Microblogs. In AAAI’16.
[10] Antino Kim and Alan R Dennis. 2017. Says Who?: How News PresentationFormat Influences Perceived
Believability and the Engagement Level of SocialMedia Users. (2017).
[11] Yang Liu and Yi-fang Brook Wu. Early Detection of Fake News on SocialMedia Through Propagation
Path Classification with Recurrent and Convolu-tional Networks. In AAAI’18.
[12] Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, StevenBethard, and David McClosky.
2014. The Stanford CoreNLP natural languageprocessing toolkit. In ACL’14. 55–60.
[13] Tanushree Mitra and Eric Gilbert. CREDBANK: A Large-Scale SocialMedia Corpus With Associated
Credibility Annotations. In ICWSM’15.
[14] Vahed Qazvinian, Emily Rosengren, Dragomir R Radev, and Qiaozhu Mei. [n.d.]. Rumor has it: Iden-
tifying misinformation in microblogs. In EMNLP’11.
[15] Feng Qian, ChengYue Gong, Karishma Sharma, and Yan Liu. Neural User Response Generator: Fake
News Detection with Collective User Intelligence.. In IJCAI’18.
[16] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep modelfor fake news detection. In
CIKM’17.
15
[17] Giovanni C Santia and Jake Ryland Williams. BuzzFace: A News VeracityDataset with Facebook User
Commentary and Egos. In ICWSM’18.
[18] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social
media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22–36.
[19] Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Fil-ippo Menczer. Hoaxy: A
platform for tracking online misinformation. In WWW’16.
[20] Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Alessandro Flam-mini, and Filippo Menczer.
2017. The spread of fake news by social bots.arXivpreprint arXiv:1707.07592(2017).
[21] Kai Shu, H. Russell Bernard, and Huan Liu. 2018. Studying Fake News via Net-work Analysis: Detection
and Mitigation.CoRRabs/1804.10233 (2018).
[22] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding user profiles on social media for fake news
detection. In 2018 IEEE MIPR. IEEE, 430–435.
[23] Eugenio Tacchini, Gabriele Ballarin, Marco L Della Vedova, Stefano Moret, and Luca de Al-
faro. 2017. Some like it hoax: Automated fake news detection in social networks.arXiv preprint
arXiv:1704.07506(2017).
[24] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.Science359,
6380 (2018), 1146–1151.
[25] Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. defend: Explainable fake news
detection. In KDD 2019.
[26] William Yang Wang. 2017. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection.
arXiv preprint arXiv:1705.00648(2017).
[27] Kai Shu, Deepak Mahudeswaran, and Huan Liu. FakeNewsTracker: a toolfor fake news collection,
detection, and visualization. In CMOT’18.
[28] Arkaitz Zubiaga, Alex Voss, Rob Procter, Maria Liakata, Bo Wang, and Adam Tsakalidis. 2017.Towards
real-time, country-level location classification of worldwide tweets.IEEE Transactions on Knowledge and
Data Engineering29,9 (2017), 2053–2066.
[29] Mustafa Alassad, Muhammad Nihal Hussain, and Nitin Agarwal. 2019. Finding Fake News Key Spread-
ers in Complex Social Networks by Using Bi-Level Decomposition Optimization Method. In International
Conference on Modelling and Simulation of Social-Behavioural Phenomena in Creative Societies
[30] Gisel Bastidas Guacho, Sara Abdali, Neil Shah, and Evangelos E Papalexakis. 2018. Semi-supervised
Content-based Detection of Misinformation via Tensor Embeddings. In ASONAM.
[31] Kai Shu, and Huan Liu. Detecting fake news on social media. In Synthesis Lectures on Data Mining
and Knowledge Discovery, 2019.
[32] Seyedmehdi Hosseinimotlagh and Evangelos E Papalexakis. 2018. Unsupervised Content-Based Identi-
fication of Fake News Articles with Tensor Decomposition Ensembles. (2018).
[33] Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, and Jiliang Tang. 2018. Multi Source Multi-Class Fake
News Detection. In COLING.
[34] Kai Shu, Suhang Wang, and Huan Liu. Beyond News Contents: The Role of Social Context for Fake
News Detection. In WSDM’19.
[35] Hamid Karimi and Jiliang Tang. 2019. Learning Hierarchical Discourse-level Structure for Fake News
Detection. arXiv preprint arXiv:1903.07389 (2019)
16