Skip to main content
Twitter as an information dissemination tool has proved to be instrumental in generating user curated content in short spans of time. Tweeting usually occurs when reacting to events, speeches, about a service or product. This in some... more
Twitter as an information dissemination tool has proved to be instrumental in generating user curated content in short spans of time. Tweeting usually occurs when reacting to events, speeches, about a service or product. This in some cases comes with its fair share of blame on varied aspects in reference to say an event. Our work in progress details how we plan to collect the informal texts, clean them and extract features for blame detection. We are interested in augmenting Recurrent Neural Networks (RNN) with self-developed association rules in getting the most out of the data for training and evaluation. We aim to test the performance of our approach using human-induced terror-related tweets corpus. It is possible tailoring the model to fit natural disaster scenarios.
Text categorization entails making a decision on whether a document belongs to a set of pre-specified classes of other documents. This can be in a supervised way in classification tasks or unsupervised reminiscent of clustering related... more
Text categorization entails making a decision on whether a document belongs to a set of pre-specified classes of other documents. This can be in a supervised way in classification tasks or unsupervised reminiscent of clustering related tasks. Categorization can be a challenging task especially when the discriminating words are large. K-Nearest Neighbor is an instance based learning algorithm that has proven to be effective in such classification tasks including documents. The key element of this algorithm lies in the similarity measurement principle that is capable of identifying neighbors of a particular document to high accuracies. The only drawback of this approach is in the weighting of all features to determine the distance among the documents in question. This is not only time consuming but also overuses computer resources without adding anything substantial to the overall results. In our approach (Attribute Distance Weighted - KNN), we do not make use of all features in the c...
Content comprehension in text is one of the challenges in natural language processing. Understanding text at a low level has become increasingly relevant due to the surge in the amount of content on the web space, where most of it is... more
Content comprehension in text is one of the challenges in natural language processing. Understanding text at a low level has become increasingly relevant due to the surge in the amount of content on the web space, where most of it is stream data. In our case, data streams are considered to be an ordered sequence of short and noisy textual messages that are read once or fewer number of times, for example tweets. Our approach entails processing and interpreting streaming texts at document level in mini-batches via deep convolutional networks for opinion, semantic or relationship analysis. Training our model is iterative and incremental, where documents are learnt by understanding the sentence structure and content in vector form based on a known offline model. The model however, incrementally adapts to the changing textual patterns. Our conceptual framework design is distributed in nature such that a pipeline of inputs, deep processing framework and output will be coordinated by the A...
Social media mining can provide insights into a community’s perceptions which conventional approaches cannot observe. In this paper, we perform a sentiment analysis for measuring long-term trends in public opinion during the 2016 Indian... more
Social media mining can provide insights into a community’s perceptions which conventional approaches cannot observe. In this paper, we perform a sentiment analysis for measuring long-term trends in public opinion during the 2016 Indian demonetisation policy using Twitter data. We compare our findings to prior research and reports retrieved from media and sources. We utilise Rapid Miner sentiment classifier to a post-event of extending the deadline to deposit the forfeit banknotes. The results indicate an attitude that is predominantly continuing to oppose towards demonetisation policy implementation. We recommend from this study that a multi-lingual sentiment be employed to process non-polarised tweets in local languages in future work.
With each passing minute, online data is growing exponentially. A bulk of such data is generated from short text social media platforms such as Twitter. Such platforms are fundamental in social media knowledge-based applications like... more
With each passing minute, online data is growing exponentially. A bulk of such data is generated from short text social media platforms such as Twitter. Such platforms are fundamental in social media knowledge-based applications like recommender systems. Twitter, for example, provides rich real-time streaming information. Extracting knowledge from such short texts without automated support is not feasible due to Twitter's platform streaming nature. Therefore, an automated method for comprehending patterns in such text is a need for many knowledge systems. This paper provides solutions to generate topics from Twitter data. We present several techniques related to topical modelling to identify topics of interest in short texts. Topic modelling is inherently problematic in shorter texts with very sparse vocabulary in addition to the informal language used in their dissemination. Such findings are informative in knowledge extraction for social media-based recommender systems as well...
The advent of social computing brought with it different social networking platforms. The idea of surfers socializing with people of different backgrounds as well as geographical regions is quite fascinating. In our approach, we delved... more
The advent of social computing brought with it different social networking platforms.
The idea of surfers socializing with people of different backgrounds as well as
geographical regions is quite fascinating. In our approach, we delved deeper in disaster
discovery whereby we extracted panic related attributes and trained them with real data
in three disaster scenarios in different parts of the world. Fine tuning of the final
attributes led to accuracies above 91% proving the fact that with proper attribute
selection and handling of sparse data balance, it’s possible to detect related disasters as
soon as related tweets appear. We believe that we are the first to use probabilistic
classifiers approach as well as NLP in specifically human induced terror attacks
detection as there is no known system currently that solely caters for these.
Research Interests: