Skip to main content
Imen Bizid

    Imen Bizid

    During crisis events such as disasters, the need for real-time information retrieval (IR) from microblogs becomes essential. However, the huge amount and the variety of the shared information in real time during such events... more
    During crisis events such as disasters, the need for real-time information retrieval (IR) from microblogs becomes essential. However, the huge amount and the variety of the shared information in real time during such events over-complicates this task. Unlike existing IR approaches based on content analysis, we propose to tackle this problem by using user-centric IR approaches with identifying and tracking prominent microblog users who are susceptible to share relevant and exclusive information at an early stage of each analyzed event phase. This approach ensures real-time access to the valuable microblogs information required by the emergency teams. In this approach, we propose a phase-aware probabilistic model for predicting and ranking prominent microblog users over time according to their behavior using Mixture of Gaussians Hidden Markov Models (MoG-HMM). The model utilizes a new user representation which takes into account both the user and the event specificities over time. Thi...
    Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art... more
    Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and sc...
    Content shared in microblogs during disasters is expressed in various formats and languages. This diversity makes the information retrieval process more complex and computationally infeasible in real time. To address this, we propose a... more
    Content shared in microblogs during disasters is expressed in various formats and languages. This diversity makes the information retrieval process more complex and computationally infeasible in real time. To address this, we propose a classification model for the identification of prominent users who are sharing relevant and exclusive information during the disaster. Users who have shared at least one tweet about the disaster are modeled using three kinds of time-sensitive features, including topical, social and geographical features. Then, these users are classified into two classes using a linear Support Vector Machine (SVM) to evaluate them over the extracted features and identify the most prominent ones. The first results using the actual dataset, show that our model has a high accuracy by detecting most of the prominent users. Moreover, we demonstrate that all the proposed features used by our model are indispensable to achieve this high accuracy.
    During crisis events such as disasters, the need of real-time information retrieval (IR) from microblogs remains inevitable. However, the huge amount and the variety of the shared information in real time during such events... more
    During crisis events such as disasters, the need of real-time information retrieval (IR) from microblogs remains inevitable. However, the huge amount and the variety of the shared information in real time during such events over-complicate this task. Unlike existing IR approaches based on content analysis, we propose to tackle this problem by using user-centricIR approaches with solving the wide spectrum of methodological and technological barriers inherent to : 1) the collection of the evaluated users data, 2) the modeling of user behavior, 3) the analysis of user behavior, and 4) the prediction and tracking of prominent users in real time. In this context, we detail the different proposed approaches in this dissertation leading to the prediction of prominent users who are susceptible to share the targeted relevant and exclusive information on one hand and enabling emergency responders to have a real-time access to the required information in all formats (i.e. text, image, video, l...