Firoj Alam

Hamad Bin Khalifa University, Qatar Computing Research Institute, Research Scientist

University of Trento, DISI - Department of Information Engineering and Computer Science, Graduate Student

BRAC University, School of Engineering and Computer Science (SECS), Alumnus

BRAC University, CRBLP, Research Programmer

Followers

330

Following

Co-authors

Public Views

InterestsView All (11)

Uploads

Papers by Firoj Alam

QCRI at SemEval-2023 Task 3: News Genre, Framing and Persuasion Techniques Detection using Multilingual Models

arXiv (Cornell University), May 5, 2023

Download

Benchmarking Arabic AI with Large Language Models

arXiv (Cornell University), May 24, 2023

Download

Detecting and Reasoning of Deleted Tweets before they are Posted

arXiv (Cornell University), May 5, 2023

Download

ConceptX: A Framework for Latent Concept Analysis

arXiv (Cornell University), Nov 12, 2022

Download

Towards Bangla Named Entity Recognition

Named Entity Recognition is one of the fundamental problems for Information Extraction and the ta... more Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Bangla NER, however, most of these studies are conducted almost a decade ago and were focused on a single geographical location (i.e., India). Therefore, in this paper, we present a corpus annotated with seven named entities with a particular focus on Bangladeshi Bangla. It is a part of the development of the Bangla Content Annotation Bank (B-CAB). We also present baseline results, which can be useful for future research. For the baseline results, we employed word-level, POS, gazetteers and contextual features along with Conditional Random Fields (CRFs). Our study also includes the exploration of deep neural networks. Additionally, we investigated another large corpus from a different geographical location (i.e., India) and concluded on the importance of geographic-based NER for a language.

Robust Training of Social Media Image Classification Models for Rapid Disaster Response

arXiv (Cornell University), Apr 9, 2021

Download

Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

arXiv (Cornell University), Apr 14, 2020

Download

Robust Training of Social Media Image Classification Models

IEEE Transactions on Computational Social Systems, 2023

Download

Analyzing Encoded Concepts in Transformer Language Models

arXiv (Cornell University), Jun 27, 2022

Download

A Collaborative Platform to Collect Data for Developing Machine Translation Systems

Springer eBooks, Jul 4, 2019

The emergence of neural machine translation techniques has opened up a new era for developing tra... more The emergence of neural machine translation techniques has opened up a new era for developing translation systems. However, it requires a very large amount of parallel corpus, which is scarce for many under-resourced languages, e.g., Bangla. In order to develop a corpus, currently, there is a lack of publicly available collaborative system. In this paper, we report an online collaborative system for the development of the parallel corpus. The system is developed for supporting any language, however, we only evaluated for developing Bangla–English parallel corpus. In a task completion evaluation experiment, the system outperforms the widely used offline system, i.e., OmegaT.

The CLEF-2023 CheckThat! Lab: Checkworthiness, Subjectivity, Political Bias, Factuality, and Authority

Lecture Notes in Computer Science, 2023

Detecting and Understanding Harmful Memes: A Survey

arXiv (Cornell University), May 9, 2022

Download

Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text

arXiv (Cornell University), Jul 15, 2022

Download

Adversarial NLP for Social Network Applications: Attacks, Defenses, and Research Directions

IEEE Transactions on Computational Social Systems

Overview of the CLEF–2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection

Lecture Notes in Computer Science, 2022

Download

Accepted Tutorials at The Web Conference 2022

Companion Proceedings of the Web Conference 2022

Download

Analyzing Encoded Concepts in Transformer Language Models

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Download

Detecting and Understanding Harmful Memes: A Survey

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

The automatic identification of harmful content online is of major concern for social media platf... more The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes. With this in mind, here we offer a comprehensive survey with a focus on harmful memes. Based on a systematic analysis of recent literature, we first propose a new typology of harmful memes, and then we highlight and summarize the relevant state of the art. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism, partly due to the lack of suitable datasets. We further find that existing datasets mostly capture multi-class scenarios, which are not inclusive of the affective spectrum that memes can represent. Another observation is that memes can propagate globally through repackaging in different languages and that th...

Download

TeamX@DravidianLangTech-ACL2022: A Comparative Analysis for Troll-Based Meme Classification

Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Download

Robust Training of Social Media Image Classification Models

IEEE Transactions on Computational Social Systems

Download

QCRI at SemEval-2023 Task 3: News Genre, Framing and Persuasion Techniques Detection using Multilingual Models

arXiv (Cornell University), May 5, 2023

Download

Benchmarking Arabic AI with Large Language Models

arXiv (Cornell University), May 24, 2023

Download

Detecting and Reasoning of Deleted Tweets before they are Posted

arXiv (Cornell University), May 5, 2023

Download

ConceptX: A Framework for Latent Concept Analysis

arXiv (Cornell University), Nov 12, 2022

Download

Towards Bangla Named Entity Recognition

Robust Training of Social Media Image Classification Models for Rapid Disaster Response

arXiv (Cornell University), Apr 9, 2021

Download

Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

arXiv (Cornell University), Apr 14, 2020

Download

Robust Training of Social Media Image Classification Models

IEEE Transactions on Computational Social Systems, 2023

Download

Analyzing Encoded Concepts in Transformer Language Models

arXiv (Cornell University), Jun 27, 2022

Download

A Collaborative Platform to Collect Data for Developing Machine Translation Systems

Springer eBooks, Jul 4, 2019

The CLEF-2023 CheckThat! Lab: Checkworthiness, Subjectivity, Political Bias, Factuality, and Authority

Lecture Notes in Computer Science, 2023

Detecting and Understanding Harmful Memes: A Survey

arXiv (Cornell University), May 9, 2022

Download

Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text

arXiv (Cornell University), Jul 15, 2022

Download

Adversarial NLP for Social Network Applications: Attacks, Defenses, and Research Directions

IEEE Transactions on Computational Social Systems

Overview of the CLEF–2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection

Lecture Notes in Computer Science, 2022

Download

Accepted Tutorials at The Web Conference 2022

Companion Proceedings of the Web Conference 2022

Download

Analyzing Encoded Concepts in Transformer Language Models

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Download

Detecting and Understanding Harmful Memes: A Survey

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Download

TeamX@DravidianLangTech-ACL2022: A Comparative Analysis for Troll-Based Meme Classification

Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Download

Robust Training of Social Media Image Classification Models

IEEE Transactions on Computational Social Systems

Download

Unsupervised Recognition and Clustering of Speech Overlaps in Spoken Conversations

by Shammur Chowdhury and Firoj Alam

Descriptive and Visual Summaries of Disaster Events using Artificial Intelligence Techniques: Case Studies of Hurricanes Harvey, Irma, and Maria

by Ferda Ofli, Firoj Alam, and Muhammad Imran

Behaviour & Information Technology (BIT), 2019

People increasingly use microblogging platforms such as Twitter during natural disasters and emer... more People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making sense of social media data is a challenging task due to several reasons such as limitations of available tools to analyze high-volume and high-velocity data streams, dealing with information overload, among others. To eliminate such limitations, in this work, we first show that textual and imagery content on social media provide complementary information useful to improve situational awareness. We then explore ways in which various Artificial Intelligence techniques from Natural Language Processing and Computer Vision fields can exploit such complementary information generated during disaster events. Finally, we propose a methodological approach that combines several computational techniques effectively in a unified framework to help humanitarian organizations in their relief efforts. We conduct extensive experiments using textual and imagery content from millions of tweets posted during the three major disaster events in the 2017 Atlantic Hurricane season. Our study reveals that the distributions of various types of useful information can inform crisis managers and responders, and facilitate the development of future automated systems for disaster management.

Download

Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response

by Ferda Ofli, Firoj Alam, Muhammad Imran, Umair Qazi, and Tanvirul Alam

International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

During a disaster event, images shared on social media helps crisis managers gain situational awa... more During a disaster event, images shared on social media helps crisis managers gain situational awareness and assess incurred damages, among other response tasks. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of damage. Despite several efforts, past works mainly suffer from limited resources (i.e., labeled images) available to train more robust deep learning models. In this study, we propose new datasets for disaster type detection, and informativeness classification, and damage severity assessment. Moreover, we relabel existing publicly available datasets for new tasks. We identify exact-and near-duplicates to form non-overlapping data splits, and finally consolidate them to create larger datasets. In our extensive experiments, we benchmark several state-of-the-art deep learning models and achieve promising results. We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community.

Download

A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models

Bangla-ranked as the 6 ℎ most widely spoken language across the world, 1 with 230 million native ... more Bangla-ranked as the 6 ℎ most widely spoken language across the world, 1 with 230 million native speakersis still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.

Download

Firoj Alam

Uploads

Papers by Firoj Alam

Log In