Skip to main content
    Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or... more
    Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or classification designed to help the reader better understand how models differ and the reasoning behind these differences. In this paper we review the existing literature on primary stroke risk prediction models. From our literature review we identify key similarities and differences in the existing models. We find that models can differ in a number of ways, including the event type, the type of analysis, the model type and the time horizon. Based on these similarities and differences we have created a set of questions and a system to help answer those questions that modelers and readers alike can use to help classify and better understand the existing models as well as help to make necessary decisions when creating a new model.
    This volume contains the papers presented at the Fourth ACL-SIGSEM Workshop on Prepositions. This workshop is endorsed by the ACL Special Interest Group on Semantics (ACL-SIGSEM), and is hosted in conjunction with ACL 2007, taking place... more
    This volume contains the papers presented at the Fourth ACL-SIGSEM Workshop on Prepositions. This workshop is endorsed by the ACL Special Interest Group on Semantics (ACL-SIGSEM), and is hosted in conjunction with ACL 2007, taking place on 28th June, 2007 in Prague, the Czech Republic. Prepositions, postpositions and other adpositions have received a considerable amount of interest in recent years. Researchers from linguistics, artificial intelligence and psycholinguistics have examined spatial and temporal aspects of prepositions, their cross-linguistic differences, monolingual and cross-linguistic contrasts, the role of prepositions in syntactic alternations and their semantics in situated dialog. In languages like English and German, phrasal verbs have also been the subject of considerable research, ranging from the development of techniques for their automatic extraction from corpora to methods for determining their semantics. In other languages, like Romance languages or Hindi, the focus has been either on the incorporation of the preposition or its inclusion in the prepositional phrase. All these configurations are important both semantically and syntactically in natural language understanding and processing. This workshop builds on the success of three previous workshops on prepositions (held in Toulouse, 2003, Colchester, 2005 and Trento, 2006) in providing a forum for researchers to present their current work on these areas. The aim of these workshops has been to bring together researchers from a variety of backgrounds to discuss the syntax, semantics, description, representation and cross-linguistic aspects of prepositions in order to promote collaboration.
    <sec> <title>BACKGROUND</title> <p>Topic modeling and word embeddings' studies of Twitter data related to COVID-19 are being extensively reported. Another social media platform that experienced a tremendous... more
    <sec> <title>BACKGROUND</title> <p>Topic modeling and word embeddings' studies of Twitter data related to COVID-19 are being extensively reported. Another social media platform that experienced a tremendous increase in new users and posts due to COVID-19 was Reddit, offering a much less explored alternative, especially the submissions' titles, due to their format (≤ 300 characters) and content rules. The positivity of self-presentation on social media has an influence on both the quantity and quality of reactions (upvotes) from other social media contacts.</p> </sec> <sec> <title>OBJECTIVE</title> <p>1) Expand on the concept of resilience identifying possible related topics considering their number of upvotes and its closest terms and 2) Associate specific emotions obtained from the state-of-the-art literature to their closest terms in order to relate such emotions to experienced situations.</p> </sec> <sec> <title>METHODS</title> <p>Reddit data were collected from pushshift.io, with the pushshiftr R package, data cleaning and preprocessing was performed using quanteda, tidyverse, tidytext R packages. A word2vec model (W2V) was trained using submissions' titles, preliminary validation was performed using a subset of Mikolov's analogies and a COVID-19 glossary. The W2V model was trained with the wordVectors R package. Main topics (represented as sets of words) using the number of upvotes as covariate were extracted using structural topic modelling (STM) with the spectral methos using the stm R package. Topics validation was performed using semantic coherence and exclusivity. Clusters were assessed using Dunn index.</p> </sec> <sec> <title>RESULTS</title> <p>We collected all 374,421 titles submitted by 104,351 different redditors to the r/Coronavirus subreddit between January 20th 2020 and 14th May 2021. We trained W2V and identified more than 20 valid analogies (e.g. doctor – hospital + teacher = school). We further validated W2V with representative terms extracted from a COVID-19 glossary, all closest terms retrieved by W2V were verified using state of the art publications. STM retrieved 20 topics (with 20 words each) ordered by their number of upvotes, we run W2V in a representative topic (addressing vaccines) and we used two terms as seeds leading to other related terms (represented using cluster analysis) that we validated using scientific publications. STM did not retrieve any topic containing the term "resilience", it hardly appeared (less than 0.02%) in all titles. Nevertheless we identified several closest terms (e.g. wellbeing, roadmap) and combined terms (e.g. resilience and elderly, resilience and indigenous) as well as specific emotions that W2V related to lived experiences (e.g. the emotion of gratitude associated to applauses and balconies).</p> </sec> <sec> <title>CONCLUSIONS</title> <p>We applied for the first time the combination of STM and a word2vec model trained with a relatively small Coronavirus dataset of Reddit titles, leading to immediate and accurate terms that can be used to expand our knowledge on topics associated to the pandemic (e.g. vaccines) or specific aspects such as resilience.</p> </sec>
    In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an “attentive” RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and... more
    In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an “attentive” RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and achieves performance comparable to an ensemble of 10 similar sized RNN-LMs. We also show that an “attentive” RNN-LM needs less contextual information to achieve similar results to the state-of-the-art on the wikitext2 dataset.
    This archive contains a collection of computational models called word embeddings. These are vectors that contain numerical representations of words. They have been trained on pseudo-sentences generated artificially from a random walk... more
    This archive contains a collection of computational models called word embeddings. These are vectors that contain numerical representations of words. They have been trained on pseudo-sentences generated artificially from a random walk over the English WordNet taxonomy, and thus reflect taxonomic knowledge about words (rather than contextual)
    This article draws on the sociology of expectations to examine the construction of expectations of ‘ethical AI’ and considers the implications of these expectations for communication governance. We first analyse a range of public... more
    This article draws on the sociology of expectations to examine the construction of expectations of ‘ethical AI’ and considers the implications of these expectations for communication governance. We first analyse a range of public documents in the EU, the UK and Ireland to identify the key actors, mechanisms and issues which structure societal expectations around AI and an emerging discourse on ethics. We then explore expectations of AI and ethics through a survey of members of the public. We conclude that discourses of ‘ethical AI’ are generically performative, but to become more effective in practice we need to acknowledge the limitations of contemporary AI and the requirement for extensive human labour to deploy AI in specific societal contexts. An effective ethics of AI requires domain appropriate AI tools, updated professional practices, dignified places of work and robust regulatory and accountability frameworks.
    Research Interests:
    SONAS: Multimodal, Multi-User Interaction with a Modelled Environment John Kelleher, Tom Doris, Qamir Hussain DCU Seán Ó Nualláin NOUS Research Introduction This paper describes some of the work being carried out by the IME (Interaction... more
    SONAS: Multimodal, Multi-User Interaction with a Modelled Environment John Kelleher, Tom Doris, Qamir Hussain DCU Seán Ó Nualláin NOUS Research Introduction This paper describes some of the work being carried out by the IME (Interaction with Modelled Environments) ...
    This Conference Paper is brought to you for free and open access by the
    This Book is brought to you for free and open access by the School of Computing at
    This archive contains a collection of language corpora. These are text files that contain samples of text collected from English Wikipedia
    This archive contains a collection of computational models called word embeddings. These are vectors that contain numerical representations of words. These have been trained on real language sentences collected from the English Wikipedia.... more
    This archive contains a collection of computational models called word embeddings. These are vectors that contain numerical representations of words. These have been trained on real language sentences collected from the English Wikipedia. As such, they contain contextual (thematic) knowledge about words (rather than taxonomic)
    While age is an important risk factor, there are some disadvantages to including it in a stroke risk model: age can dominate the risk score and lead to over- or under-predictions in some age groups. There is evidence to suggest that some... more
    While age is an important risk factor, there are some disadvantages to including it in a stroke risk model: age can dominate the risk score and lead to over- or under-predictions in some age groups. There is evidence to suggest that some of these disadvantages are due to the non-proportionality of other risk factors with age, e.g., risk factors contribute differently to stroke risk based on an individual’s age. In this paper, we present a framework to test if risk factors are proportional with age. We then apply the framework to a set of risk factors using Framingham heart study data from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center to determine if we can find evidence of non-proportionality. Using our framework, we find that a number of risk factors (diastolic blood pressure, total cholesterol, BMI, sex, high blood pressure treatment) may be non-proportional to age. This suggests that testing for the proportionality of risk factors with age should...
    ABSTRACTOne of the more interesting ideas for achieving personalized, preventive, and participatory medicine is the concept of a digital twin. A digital twin is a personalized computer model of a patient. So far, digital twins have been... more
    ABSTRACTOne of the more interesting ideas for achieving personalized, preventive, and participatory medicine is the concept of a digital twin. A digital twin is a personalized computer model of a patient. So far, digital twins have been constructed using either mechanistic models, which can simulate the trajectory of physiological and biochemical processes in a person, or using machine learning models, which for example can be used to estimate the risk of having a stroke given a cross-section profile at a given timepoint. These two modelling approaches have complementary strengths which can be combined into a hybrid model. However, even though hybrid modelling combining mechanistic modelling and machine learning have been proposed, there are few, if any, real examples of hybrid digital twins available. We now present such a hybrid model for the simulation of ischemic stroke. On the mechanistic side, we develop a new model for blood pressure and integrate this with an existing multi-...
    This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our... more
    This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our analysis because of the variety of information types that have previously been explored in the literature for this task, including: topic, lexical, and syntactic features. This variety of relevant information types means that the task of idiom token identification enables us to explore the forms of linguistic information that a BERT language model captures and encodes in its representations. The core of this article presents three experiments. The first experiment analyzes the effectiveness of BERT sentence embeddings for creating a general idiom token identification model and the results indicate that the BERT sentence embeddings outperform Skip-Thought. In the second and third experiment we use the game theory concept of Shapley Values to rank the u...
    Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the... more
    Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the relevant technology keywords. In this paper, we propose a novel system that addresses this problem. This tool is used to automatically detect the existence of new technologies and tools in text, and extract terms used to describe these new technologies. The extracted new terms can be logged as new AI technologies as they are found on-the-fly in the web. It can be subsequently classified into the relevant semantic labels and AI domains. Our proposed tool is based on a two-stage cascading model--the first stage classifies if the sentence contains a technology term or not; and the second stage identifies the technology keyword in the sentence. We obtain a competitive accuracy for both tasks of sentence classification and text identification.
    Agent-based models can be used to better understand the impacts of lifting restrictions or implementing interventions during a pandemic. However, agent-based models are computationally expensive, and running a model of a large population... more
    Agent-based models can be used to better understand the impacts of lifting restrictions or implementing interventions during a pandemic. However, agent-based models are computationally expensive, and running a model of a large population can result in a simulation taking too long to run for the model to be a useful analysis tool during a public health crisis. To reduce computing time and power while running a detailed agent-based model for the spread of COVID-19 in the Republic of Ireland, we introduce a scaling factor that equates 1 agent to 100 people in the population. We present the results from model validation and show that the scaling factor increases the variability in the model output, but the average model results are similar in scaled and un-scaled models of the same population, and the scaled model is able to accurately simulate the number of cases per day in Ireland during the autumn of 2020. We then test the usability of the model by using the model to explore the like...
    Current state-of-the-art neural machine translation (NMT) architectures usually do not take document-level context into account. However, the document-level context of a source sentence to be translated could encode valuable information... more
    Current state-of-the-art neural machine translation (NMT) architectures usually do not take document-level context into account. However, the document-level context of a source sentence to be translated could encode valuable information to guide the MT model to generate a better translation. In recent times, MT researchers have turned their focus to this line of MT research. As an example, hierarchical attention network (HAN) models use document-level context for translation prediction. In this work, we studied translations produced by the HAN-based MT systems. We examined how contextual information improves translation in document-level NMT. More specifically, we investigated why context-aware models such as HAN perform better than vanilla baseline NMT systems that do not take context into account. We considered Hindi-to-English, Spanish-to-English and Chinese-to-English for our investigation. We experimented with the formation of conditional context (i.e., neighbouring sentences) ...
    In our work we address limitations in the state-of-the-art in idiom type identification. We investigate different approaches for a lexical fixedness metric, a component of the state-of the-art model. We also show that our Machine Learning... more
    In our work we address limitations in the state-of-the-art in idiom type identification. We investigate different approaches for a lexical fixedness metric, a component of the state-of the-art model. We also show that our Machine Learning based approach to the idiom type identification task achieves an F1-score of 0.85, an improvement of 11 points over the state-of the-art.
    Background The etiology of ANCA-associated vasculitis (AAV) and triggers of relapse are poorly understood. Vitamin D (vitD) is an important immunomodulator, potentially responsible for the observed latitudinal differences between... more
    Background The etiology of ANCA-associated vasculitis (AAV) and triggers of relapse are poorly understood. Vitamin D (vitD) is an important immunomodulator, potentially responsible for the observed latitudinal differences between granulomatous and non-granulomatous AAV phenotypes. A narrow ultraviolet B spectrum induces vitD synthesis (vitD-UVB) via the skin. We hypothesised that prolonged periods of low ambient UVB (and by extension vitD deficiency) are associated with the granulomatous form of the disease and an increased risk of AAV relapse. Methods Patients with AAV recruited to the Irish Rare Kidney Disease (RKD) (n = 439) and UKIVAS (n = 1961) registries were studied. Exposure variables comprised latitude and measures of ambient vitD-UVB, including cumulative weighted UVB dose (CW-D-UVB), a well-validated vitD proxy. An n-of-1 study design was used to examine relapse risk. Multi-level models and logistic regression were used to examine the effect of predictors on AAV relapse r...
    BACKGROUND Stroke is a major worldwide cause of serious long-term disability. Most previous studies addressing functional independence included only inpatients with limited follow-up. OBJECTIVE To identify novel classes of patients having... more
    BACKGROUND Stroke is a major worldwide cause of serious long-term disability. Most previous studies addressing functional independence included only inpatients with limited follow-up. OBJECTIVE To identify novel classes of patients having similar temporal patterns in motor functional independence and relate them to baseline clinical features. METHODS Retrospective observational cohort study, data were obtained for n = 428 adult patients with ischemic stroke admitted to rehabilitation (March 2005-March 2020), including baseline clinical features and follow-ups of motor Functional Independence Measure (mFIM) categorized as poor, fair or good. Growth mixture models (GMMs) were fitted to identify classes of patients with similar mFIM trajectories. RESULTS GMM identified three classes of trajectories (1,664 mFIM assessments):C1 (11.2 %), 97.9% having poor admission mFIM, at 4.93 years 61.1% still poor, with the largest percentage of hypertension, neglect, dysphagia, diabetes and dyslipid...
    Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the... more
    Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the relevant technology keywords. In this paper, we propose a novel system that addresses this problem. This tool is used to automatically detect the existence of new technologies and tools in text, and extract terms used to describe these new technologies. The extracted new terms can be logged as new AI technologies as they are found on-the-fly in the web. It can be subsequently classified into the relevant semantic labels and AI domains. Our proposed tool is based on a two-stage cascading model -- the first stage classifies if the sentence contains a technology term or not; and the second stage identifies the technology keyword in the sentence. We obtain a competitive accuracy for both tasks of sentence classification and text identification.

    And 189 more