Skip to main content
Ferda Ofli

    Ferda Ofli

    • I am currently a scientist at the Qatar Computing Research Institute, an institute that strives for pursuing world-cl... moreedit
    Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and sufferings during natural disasters based on social media contents (text and images). While notable... more
    Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and sufferings during natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance the image-based approach, we propose MEDIC1, which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multitask learning setup. This is the first dataset of its kind: social media image, disaster response, and multi-task learning research. An important property of this dataset is its high potential to contribute research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore,...
    Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming. It is difficult to predict when and where an incident will occur, so timely emergency response is critical to... more
    Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming. It is difficult to predict when and where an incident will occur, so timely emergency response is critical to saving the lives of those endangered by destructive events. Fortunately, technology can play a role in these situations. Social media posts can be used as a low-latency data source to understand the progression and aftermath of a disaster, yet parsing this data is tedious without automated methods. Prior work has mostly focused on text-based filtering, yet image and video-based filtering remains largely unexplored. In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977,088 images, with 43 incident and 49 place categories. We provide details of the dataset construction, statistics and potential biases; introduce and train a model for incident detection; and perform image-filtering experiments on millions ...
    This paper summarizes the recent progress we have made for the computer vision technologies in physical therapy with the accessible and affordable devices. We first introduce the remote health coaching system we build with Microsoft... more
    This paper summarizes the recent progress we have made for the computer vision technologies in physical therapy with the accessible and affordable devices. We first introduce the remote health coaching system we build with Microsoft Kinect. Since the motion data captured by Kinect is noisy, we investigate the data accuracy of Kinect with respect to the high accuracy motion capture system. We also propose an outlier data removal algorithm based on the data distribution. In order to generate the kinematic parameter from the noisy data captured by Kinect, we propose a kinematic filtering algorithm based on Unscented Kalman Filter and the kinematic model of human skeleton. The proposed algorithm can obtain smooth kinematic parameter with reduced noise compared to the kinematic parameter generated from the raw motion data from Kinect.
    Images shared on social media help crisis managers in terms of gaining situational awareness and assessing incurred damages, among other response tasks. As the volume and velocity of such content are really high, therefore, real-time... more
    Images shared on social media help crisis managers in terms of gaining situational awareness and assessing incurred damages, among other response tasks. As the volume and velocity of such content are really high, therefore, real-time image classification became an urgent need in order to take a faster response. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of the damage. For developing real-time robust models, it is necessary to understand the capability of the publicly available pretrained models for these tasks. In the current state-of-art of crisis informatics, it is under-explored. In this study, we address such limitations. We investigate ten different architectures for four different tasks using the largest publicly av...
    Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among... more
    Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among others. Although many studies have shown the usefulness of both text and image content for disaster response purposes, the research has been mostly focused on analyzing only the text modality in the past. In this paper, we propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques. Specifically, we utilize convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation. Extensive experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality (e.g., either text or image).
    Food is an integral part of our life and what and how much we eat crucially affects our health. Our food choices largely depend on how we perceive certain characteristics of food, such as whether it is healthy, delicious or if it... more
    Food is an integral part of our life and what and how much we eat crucially affects our health. Our food choices largely depend on how we perceive certain characteristics of food, such as whether it is healthy, delicious or if it qualifies as a salad. But these perceptions differ from person to person and one person's "single lettuce leaf" might be another person's "side salad". Studying how food is perceived in relation to what it actually is typically involves a laboratory setup. Here we propose to use recent advances in image recognition to tackle this problem. Concretely, we use data for 1.9 million images from Instagram from the US to look at systematic differences in how a machine would objectively label an image compared to how a human subjectively does. We show that this difference, which we call the "perception gap", relates to a number of health outcomes observed at the county level. To the best of our knowledge, this is the first time...
    Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying HAR to wearable sensors can provide new insights by enriching the feature set in health studies, and enhance the personalisation and... more
    Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying HAR to wearable sensors can provide new insights by enriching the feature set in health studies, and enhance the personalisation and effectiveness of health, wellness, and fitness applications. Wearable devices provide an unobtrusive platform for user monitoring, and due to their increasing market penetration, feel intrinsic to the wearer. The integration of these devices in daily life provide a unique opportunity for understanding human health and wellbeing. This is referred to as the "quantified self" movement. The analyses of complex health behaviours such as sleep, traditionally require a time-consuming manual interpretation by experts. This manual work is necessary due to the erratic periodicity and persistent noisiness of human behaviour. In this paper, we present a robust automated human activity recognition algorithm, which we call RAHAR. We test our algorithm in the app...
    People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making... more
    People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making sense of social media data is a challenging task due to several reasons such as limitations of available tools to analyze high-volume and high-velocity data streams. This work presents an extensive multidimensional analysis of textual and multimedia content from millions of tweets shared on Twitter during the three disaster events. Specifically, we employ various Artificial Intelligence techniques from Natural Language Processing and Computer Vision fields, which exploit different machine learning algorithms to process the data generated during the disaster events. Our study reveals the distributions of various types of useful information that can inform crisis managers and responders as well as facilitate the development of future automated systems...
    Social networks are widely used for information consump- tion and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large vol- ume, social media content is often too noisy for... more
    Social networks are widely used for information consump- tion and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large vol- ume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, catego- rize, and concisely summarize the available content to facil- itate effective consumption and decision-making. To address such issues automatic classification systems have been de- veloped using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ∼77K human-labeled tweets, sampled from a pool of ∼24 million tweets across 19 disas- ter events that happened between 2016 and 2019. Moreover, we propose a data collec...
    Wearable devices with a wide range of sensors have contributed to the rise of the Quantified Self movement, where individuals log everything ranging from the number of steps they have taken, to their heart rate, to their sleeping... more
    Wearable devices with a wide range of sensors have contributed to the rise of the Quantified Self movement, where individuals log everything ranging from the number of steps they have taken, to their heart rate, to their sleeping patterns. Sensors do not, however, typically sense the social and ambient environment of the users, such as general life style attributes or information about their social network. This means that the users themselves, and the medical practitioners, privy to the wearable sensor data, only have a narrow view of the individual, limited mainly to certain aspects of their physical condition. In this paper we describe a number of use cases for how social media can be used to complement the check-up data and those from sensors to gain a more holistic view on individuals' health, a perspective we call the 360 Quantified Self. Health-related information can be obtained from sources as diverse as food photo sharing, location check-ins, or profile pictures. Addit...
    Having reliable and up-to-date poverty data is a prerequisite for monitoring the United Nations Sustainable Development Goals (SDGs) and for planning effective poverty reduction interventions. Unfortunately, traditional data sources are... more
    Having reliable and up-to-date poverty data is a prerequisite for monitoring the United Nations Sustainable Development Goals (SDGs) and for planning effective poverty reduction interventions. Unfortunately, traditional data sources are often outdated or lacking appropriate disaggregation. As a remedy, satellite imagery has recently become prominent in obtaining geographically-fine-grained and up-to-date poverty estimates. Satellite data can pick up signals of economic activity by detecting light at night, it can pick up development status by detecting infrastructure such as roads, and it can pick up signals for individual household wealth by detecting different building footprints and roof types. It can, however, not look inside the households and pick up signals from individuals. On the other hand, alternative data sources such as audience estimates from Facebook's advertising platform provide insights into the devices and internet connection types used by individuals in diffe...
    During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multime- dia content to report updates about injured or dead people, infrastructure damage, and missing or found people among... more
    During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multime- dia content to report updates about injured or dead people, infrastructure damage, and missing or found people among other information types. Studies have revealed that this on- line information, if processed timely and effectively, is ex- tremely useful for humanitarian organizations to gain situational awareness and plan relief operations. In addition to the analysis of textual content, recent studies have shown that imagery content on social media can boost disaster response significantly. Despite extensive research that mainly focuses on textual content to extract useful information, limited work has focused on the use of imagery content or the combination of both content types. One of the reasons is the lack of labeled imagery data in this domain. Therefore, in this paper, we aim to tackle this limitation by releasing a large multi-modal dataset collected from T...
    This article describes a method for early detection of disaster-related damage to cultural heritage. It is based on data from social media, a timely and large-scale data source that is nevertheless quite noisy. First, we collect images... more
    This article describes a method for early detection of disaster-related damage to cultural heritage. It is based on data from social media, a timely and large-scale data source that is nevertheless quite noisy. First, we collect images posted on social media that may refer to a cultural heritage site. Then, we automatically categorize these images according to two dimensions: whether they are indeed a photo in which a cultural heritage resource is the main subject, and whether they represent damage. Both categorizations are challenging image classification tasks, given the ambiguity of these visual categories; we tackle both tasks using a convolutional neural network. We test our methodology on a large collection of thousands of images from the web and social media, which exhibit the diversity and noise that is typical of these sources, and contain buildings and other architectural elements, heritage and not-heritage, damaged by disasters as well as intact. Our results show that whi...
    Extended AbstractPeople increasingly use social media such as Facebook and Twitter during disasters and emergencies. Research studies have demonstrated the usefulness of social media information for a number of humanitarian relief... more
    Extended AbstractPeople increasingly use social media such as Facebook and Twitter during disasters and emergencies. Research studies have demonstrated the usefulness of social media information for a number of humanitarian relief operations ranging from situational awareness to actionable information extraction. Moreover, the use of social media platforms during sudden-onset disasters could potentially bridge the information scarcity issue, especially in the early hours when few other information sources are available. In this work, we analyzed Twitter content (textual messages and images) posted during the recent devastating hurricanes namely Harvey and Maria. We employed state of the art artificial intelligence techniques to process millions of textual messages and images shared on Twitter to understand the types of information available on social media and how emergency response organizations can leverage this information to aid their relief operations. Furthermore, we employed deep neural networks techniques to analyze the imagery content to assess the severity of damage shown in the images. Damage severity assessment is one of the core tasks for many humanitarian organization.To perform data collection and analysis, we employed our Artificial Intelligence for Digital Response (AIDR) technology. AIDR combines human computation and machine learning techniques to train machine learning models specialized to fulfill specific information needs of humanitarian organizations. Many humanitarian organizations such as UN OCHA, UNICEF have used the AIDR technology during many major disasters in the past including the 2015 Nepal earthquake, the 2014 typhoon Hagupit and typhoon Ruby, among others. Next, we provide a brief overview of our analysis during the two aforementioned hurricanes.Hurricane Harvey Case StudyHurricane Harvey was an extremely devastating storm that made landfall to Port Aransas and Port O'Connor, Texas, in the United States on August 24-25, 2017. We collected and analyzed around 4 million Twitter messages to determine how many of these messages are, for example, reporting some kind of infrastructure damage, or reports of injured or dead people, missing or found people, displacements and evacuation, donation and volunteers reports. Furthermore, we also analyzed geotagged tweets to determine the types of information originate from the disaster-hit areas compared to neighboring areas. For instance, we generated maps of different cities in the US in and around the hurricane hit areas. Figure 1 shows the map of geotagged tweets reporting different types of useful information from Florida, USA. According to the results obtained from the AIDR classifiers, both caution and advice and sympathy and support categories are more prominent than other informational categories such as donation and volunteering. In addition to the textual content processing of the collected tweets, we perform automatic image processing to collect and analyze imagery content posted on Twitter during Hurricane Harvey. For this purpose, we employ state-of-the-art deep learning techniques. One of the classifiers deployed in this case was the damage-level assessment. The damage-level assessment task aims to predict the level of damage in one out of three damage levels i.e., SEVERE damage, MILD damage, and NO damage. Our analysis revealed that most of the images (∼86%) do not contain any damage signs or considered irrelevant containing advertisements, cartoons, banners, and other irrelevant content. Of the remaining set, 10% of the images contain MILD damage, and only ∼4% of them show SEVERE damage. However, finding these 10% (MILD) or 4% (SEVERE) useful images is like finding a needle in a giant haystack. Artificial intelligence techniques such as employed by the AIDR platform are hugely useful to overcome such information overload issues and help decision-makers to process large amounts of data in a timely manner.Fig. 1: Geotagged tweets from Florida, USA.Hurricane Maria Case StudyAn even more devastating hurricane than the Harvey that hit Puerto Rico and nearby areas was hurricane Maria. Damaged roofs, uprooted trees, widespread flooding were among the scenes on the path of Hurricane Maria, a Category 5 hurricane that slammed Dominica and Puerto Rico and has caused at least 78 deaths including 30 in Dominica and 34 in Puerto Rico, and many more left without homes, electricity, food, and drinking water.We activated AIDR on September 20, 2017 to collect tweets related to Hurricane Maria. More than 2 million tweets were collected. Figure 2 shows the distribution of the daily tweet counts. To understand what these tweets are about, we applied our tweet text classifier which was originally trained (F1 = 0.64) on more than 30k human-labeled tweets from a number of past disasters. AIDR's image processing pipeline was also activated to identify images that show infrastructure damage due to Hurricane Maria. Around 80k tweets contained…
    Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions... more
    Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and arm gestures of a speaker have been studied extensively in (3)-(6) and these gestures were shown to carry linguistic information (7),(8). A typical example is the head gesture while saying "yes". In this project, correlation between gestures and speech is investigated. Speech features are selected as Mel Frequency Cepstrum Coefficients (MFCC). Gesture features are composed of positions of hand, elbow and global motion parameters calculated across the head region. In this sense, prior to the detection of gestures, discrete symbol sets for gesture is determined manually and for each s...
    This paper presents a framework for audio-driven human body motion analysis and synthesis. We address the problem in the con- text of a dance performance, where gestures and movements of the dancer are mainly driven by a musical piece and... more
    This paper presents a framework for audio-driven human body motion analysis and synthesis. We address the problem in the con- text of a dance performance, where gestures and movements of the dancer are mainly driven by a musical piece and characterized by the repetition of a set of dance figures. The system is trained in a su- pervised manner using
    The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect... more
    The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect human pose estimation when they depend on it as a measuring system. In this paper we compare the Kinect pose estimation (skeletonization) with more established techniques for pose estimation from motion capture data, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. We have evaluated six physical exercises aimed at coaching of elderly population. Experimental results present pose estimation accuracy rates and corresponding error bounds for the Kinect system.
    Although the positive effects of exercise on the well-being and quality of independent living for older adults are well-accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform... more
    Although the positive effects of exercise on the well-being and quality of independent living for older adults are well-accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform exercise at home. To provide a more engaging environment that promotes physical activity, various fitness applications have been proposed. Many of the available products, however, are geared toward a younger population and are not appropriate or engaging for an older population. To address these issues, we developed an automated interactive exercise coaching system using the Microsoft Kinect. The coaching system guides users through a series of video exercises, tracks and measures their movements, provides real-time feedback, and records their performance over time. Our system consists of exercises to improve balance, flexibility, strength, and endurance, with the aim of reducing fall risk and improving performance of daily activities. In this paper, we report on the development of the exercise system, discuss the results of our recent field pilot study with six independently-living elderly individuals, and highlight the lessons learned relating to the in-home system setup, user tracking, feedback, and exercise performance evaluation.
    We present a framework for selecting best audio features for audiovisual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm. They can be analyzed through the audio spectra using... more
    We present a framework for selecting best audio features for audiovisual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm. They can be analyzed through the audio spectra using spectral and rhythmic musical features. In the proposed audio feature evaluation system, dance figures are manually labeled over the video stream. The music segments, which
    We target to learn correlation models between music and dance performances to synthesize music driven dance choreographies. The proposed framework learns statistical mappings from mu- sical measures to dance figures using musical measure... more
    We target to learn correlation models between music and dance performances to synthesize music driven dance choreographies. The proposed framework learns statistical mappings from mu- sical measures to dance figures using musical measure models, exchangeable figures model, choreography model and dance figure models. Alternative dance choreographies are synthe- sized based on these statistical mappings. Objective and subjec- tive evaluation results
    Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions... more
    Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and arm
    Abstract—The,goal,of this project,is to convert,a,given speaker’s speech,(the Source speaker) into another,identified voice (the Target speaker) as well as analysing the face animation of the,source,to animate,a 3D avatar,imitating,the... more
    Abstract—The,goal,of this project,is to convert,a,given speaker’s speech,(the Source speaker) into another,identified voice (the Target speaker) as well as analysing the face animation of the,source,to animate,a 3D avatar,imitating,the source facial movements.,We assume,we have,at our disposal a large amount,of speech,samples,from,the source,and,target voices with a reasonable,amount,of parallel data. Speech and video are processed,separately and,recombined,at the end. Voice conversion is obtained,in two steps: a
    ... (if)f takip Daha sora f rinden ... IV duru an, { isi, 2, 6* }, 61lu~aktadir.FHer br X~rdn du= t mti i A> 'rnhdr0tken matrtiYle ili;;kiendirikilhio=8 &ek dizi,hLr > rn°eZk *etqimde topla.abilr:, F - {fj, f> } EBu:rada... more
    ... (if)f takip Daha sora f rinden ... IV duru an, { isi, 2, 6* }, 61lu~aktadir.FHer br X~rdn du= t mti i A> 'rnhdr0tken matrtiYle ili;;kiendirikilhio=8 &ek dizi,hLr > rn°eZk *etqimde topla.abilr:, F - {fj, f> } EBu:rada fe, arunhidaki ~rdt1ik yek1-toim ifade eder, A yapisini ku]la-narak ltmuz zarnaal ...
    In this paper we present a framework for analysis of dance figures from audio-visual data. Our audio-visual data is the mul-tiview video of a dancing actor which is acquired using 8 synchronized cameras. The multi-camera motion capture... more
    In this paper we present a framework for analysis of dance figures from audio-visual data. Our audio-visual data is the mul-tiview video of a dancing actor which is acquired using 8 synchronized cameras. The multi-camera motion capture technique of this framework is based on 3D tracking of the markers attached to the dancer's body, using stereo color information. The extracted
    This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancerpsilas body whereas the musical audio signal is processed to extract the... more
    This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancerpsilas body whereas the musical audio signal is processed to extract the beat information. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing

    And 3 more