About

Location

Carretera d’Esplugues 39-41

David Nettleton

Pompeu Fabra University, Communication and Information Technologies, Department Member

Other Affiliations:
add
Research Interests:
Data Mining, Artificial Intelligence, Machine Learning, Industrial Process control, Process Control and Optimisation, Multi-Agent Systems, and 8 morePredictive Analytics, Data Analysis, Data Modeling, industrial data, Industrial Engineering, Fuzzy Logic, Process Control, and Applied Statisticsedit
About:
David F. Nettleton is Senior Data Mining Analyst at IRIS Technology Solutions. He also collaborates with the Web Scie... moreDavid F. Nettleton is Senior Data Mining Analyst at IRIS Technology Solutions. He also collaborates with the Web Science and Social Computing Research Group of the DTIC at the Pompeu Fabra University in Catalunya, Spain. From 1985 until 2004 he worked for a diversity of companies in different sectors, such as Systems Designers, Plc. (UK), IBM Global Services, Carburos Metalicos, Laboratorios Menarini and Coritel. He has also been involved in business startups, such as TAD Sistemas (acquired by Bertelsmann AG in 2000). Since 2004 he has taught and conducted research at the Pompeu Fabra University (Web Research Group, http://grupoweb.upf.es), the IIIA-CSIC (Ares Team for Advanced Research on Information Security and Privacy, http://www.iiia.csic.es/en/project/ares), the Ramon Llull University with the GRSI (Intelligent Systems Research Group, http://www.salleurl.edu/GRSI/) and IRIS (http://www.iristechnologygroup.com/).

His research interests include industrial data analysis and modeling, machine learning, artificial intelligence and online social network analysis.edit
Advisors:
edit

Papers

Publisher: SCITEPRESS - Science and Technology Publications

Publication Date: 2021

Publication Name: Proceedings of the 11th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

Research Interests:
Computer Science, Machine Learning, Modeling, System Dynamics Modeling, Simulation, and 5 moreImmune system, Rule Induction, Coronavirus COVID-19, Simulation of Biological Systems, and Immune System Response

Download (.pdf)

Publisher: SCITEPRESS - Science and Technology Publications

Publication Date: 2020

Publication Name: Proceedings of the 12th International Conference on Agents and Artificial Intelligence

Research Interests:
Immunology, Cancer, Cancer stem cells, Modeling and Simulation, and Multi Agent Systems

Download (.pdf)

Publication Date: 2016

Publication Name: World Academy of Science, Engineering and Technology, International Journal of Industrial and Manufacturing Engineering

Research Interests:
Computer Science, Data Mining, Statistical Modeling, Statistical machine learning, Statistical Process Control, and 3 moreData Modeling, Industrial Process control, and industrial process modeling

Download (.pdf)

Publisher: Encyclopedia of Social Network Analysis and Mining. 2nd Ed.

Publication Date: 2018

Download (.pdf)

Publisher: Trans. Data Priv.

Publication Date: 2019

Publication Name: Trans. Data Priv.

Research Interests:
Computer Science, Social Networks, Machine Learning, Online social networks, Graph Data Mining, and 3 morePrivacy and data protection, Privacy Preserving Data Mining, and SVM classifier

Download (.pdf)

Publisher: Springer International Publishing

Publication Date: 2021

Publication Name: Modeling Decisions for Artificial Intelligence

Research Interests:
Computer Science, Online social networks, Graphs, Synthetic Data Generation, and arXiv

Download (.pdf)

Environmental impacts and consumer concerns have necessitated the study of bio-based materials as alternatives to petrochemicals for packaging applications. The purpose of this review is to summarize synthetic and non-synthetic materials feasible for packaging and textile applications, routes of upscaling, (industrial) applications, evaluation of sustainability, and end-of-life options. The outlined bio-based materials include polylactic acid, polyethylene furanoate, polybutylene succinate, and non-synthetically produced polymers such as polyhydrodyalkanoate, cellulose, starch, proteins, lipids, and waxes. Further emphasis is placed on modification techniques (coating and surface modification), biocomposites, multilayers, and additives used to adjust properties especially for barriers to gas and moisture and to tune their biodegradability. Overall, this review provides a holistic view of bio-based packaging material including processing, and an evaluation of the sustainability of an...

Publisher: MDPI AG

Publication Date: 2020

Publication Name: Polymers

Research Interests:
Engineering, Packaging, Polymers, Medicine, Biodegradation, and 7 moreEnd of life, Textile, Coatings, Polylactic Acid, Upscaling, Sustainability, and Bioplastic

Download (.pdf)

In the increasingly pressing context of improving recycling, optical technologies present a broad potential to support the adequate sorting of plastics. Nevertheless, the commercially available solutions (for example, employing near-infrared spectroscopy) generally focus on identifying mono-materials of a few selected types which currently have a market-interest as secondary materials. Current progress in photonic sciences together with advanced data analysis, such as artificial intelligence, enable bridging practical challenges previously not feasible, for example in terms of classifying more complex materials. In the present paper, the different techniques are initially reviewed based on their main characteristics. Then, based on academic literature, their suitability for monitoring the composition of multi-materials, such as different types of multi-layered packaging and fibre-reinforced polymer composites as well as black plastics used in the motor vehicle industry, is discussed...

Publisher: SAGE Publications

Publication Date: 2021

Publication Name: Waste Management & Research: The Journal for a Sustainable Circular Economy

Research Interests:
Environmental Engineering, Civil Engineering, Artificial Intelligence, Spectroscopy, Recycling, and 7 moreWaste recycling, Photonics, Waste Management, Near Infrared Spectroscopy, Polymer Composites, Plastic waste, and Multilayer Films

Download (.pdf)

Publisher: SciTePress - Science and and Technology Publications

Publication Date: 2009

Publication Name: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Recommender Systems, Statistical Analysis, and 15 moreSocial Media, Tagging Technologies, Data and Knowledge Modeling, Tagging, Social Tagging, Library automation and networking, Meta data, Image Content, Tagging Behavior, Eye tracking Study, Pre iconographic, Iconographic, Iconologic, Areas of Interest, and Tag Order

Download (.pdf)

Background In this study, we compared four models for predicting rice blast disease, two operational process-based models (Yoshino and Water Accounting Rice Model (WARM)) and two approaches based on machine learning algorithms (M5Rules and Recurrent Neural Networks (RNN)), the former inducing a rule-based model and the latter building a neural network. In situ telemetry is important to obtain quality in-field data for predictive models and this was a key aspect of the RICE-GUARD project on which this study is based. According to the authors, this is the first time process-based and machine learning modelling approaches for supporting plant disease management are compared. Results Results clearly showed that the models succeeded in providing a warning of rice blast onset and presence, thus representing suitable solutions for preventive remedial actions targeting the mitigation of yield losses and the reduction of fungicide use. All methods gave significant “signals” during the “early...

Publisher: Springer Science and Business Media LLC

Publication Date: 2019

Publication Name: BMC Bioinformatics

Research Interests:
Computer Science, Artificial Intelligence, Machine Learning, Forecasting, Neural Networks, and 12 moreForecasting and Prediction Tools, Biological Sciences, Artificial Neural Networks, Mathematical Sciences, BMC Bioinformatics, Predictive models, Crop Diseases, Disease Management of Crop Plant, Plant Disease, Rule Induction, Breeding for Rice Blast Resistance, and Rice Blast Disease

Download (.pdf)

Publisher: BCS Learning & Development

Publication Date: 2012

Research Interests:
Computer Science, Information Retrieval, Eye tracking, User Experience (UX), Web search, and 4 moreUsability and user experience, User eXperience, Search Engine, and Internet user behaviour

Download (.pdf)

Publication Date: 2005

Publication Name: Http Www Libreriasaulamedica Com

Publisher: ACM

Publication Date: 2015

Publication Name: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 - ASONAM '15

Research Interests:
Computer Science, Internet privacy, and Information Privacy

Download (.pdf)

Publisher: IEEE

Publication Date: 2015

Publication Name: 2015 IEEE Trustcom/BigDataSE/ISPA

Research Interests:
Computer Science, Machine Learning, Online social networks, Privacy and data protection, Data Privacy, and 3 moreIEEE, Decision Tree Classification, and Information Gain

Download (.pdf)

Publisher: Foundation of Computer Science

Publication Date: 2015

Publication Name: International Journal of Computer Applications

Research Interests:
Computer Science, Data Mining, Graphs Theory, Simulated Annealing, Topology, and 5 moreGraph matching, Calibration, Inexact Graph Matching, Computer Applications, and Graph Pattern Matching

Download (.pdf)

One of the difficulties for data analysts of online social networks is the public availability of data, while respecting the privacy of the users. One alternative is to use synthetically generated data[1]. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, and so on. In the following we present an approach for generating a graph topology and populating it with synthetic data for an online social network.

Publication Date: Mar 18, 2015

Research Interests:
Social Networks, Social Networking, Social Network Analysis (SNA), Online social networks, Social Network Analysis (Social Sciences), and 2 moreGraph Data Mining and Graph Mining

This is the first of four chapters that deal with the analysis of data on the Internet and in an online environment. This chapter gives an introduction to website analysis and Internet search using two contrasting case studies: first, the chapter discusses how to analyze the transactional data from customer visits to a business’s website, and second, it explores how Internet search can be used as a market research tool. The examples serve to illustrate how the Internet can be used as a tool for individual marketing, mass marketing, and marketing sentiment surveys. The examples also illustrate the following two business objectives: (i) analyzing activity on a website to adapt the website’s commercial offering at both the general and individual levels, and (ii) gathering commercial information on the Internet from a diversity of sources in order to analyze and understand the marketplace. From a data mining perspective (and recalling the data sources in Chapter 3 ), throughout this chapter the Internet could be considered as a meta data source born from a company's Internet presence. Following each case study, details are given of which technical techniques are relevant and which software applications could be used for the examples.

Publisher: Elsevier

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests:
Computer Science, The Internet, World Wide Web, and Commercial Data Mining

The area of CRM (Customer Relationship Management) has attracted a lot of attention, and many businesses who are end users of IT solutions have spent considerable amounts of money on implementing CRM systems integrated to a greater or lesser extent with their operational and business processes. However, what should be kept in mind is that CRM is a basic, common sense idea that can be put into practice with nothing more than a spreadsheet and a modest database. This chapter introduces the reader to CRM in terms of recency, frequency, and latency of customer activity, and in terms of the client life cycle: capturing new clients, potentiating and retaining existing clients, and winning back ex-clients. The chapter then discusses the relation of data analysis to each of the CRM phases and considers customer satisfaction and integrated CRM systems. Next, it briefly describes the characteristics of commercial CRM software products, and finally, the chapter examines example screens and functionality from a simple CRM application.

Publisher: Elsevier

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests:
Business and Commercial Data Mining

This chapter discusses data quality, which is a preliminary consideration for any commercial data analysis project; the definition of quality includes the availability or accessibility of data. The chapter examines typical problems that can occur with data, including errors in the data content (textual and numerical data) and the relevance and reliability of the data, as well as how to quantitatively evaluate data quality. Finally, some typical errors due to data extraction and how to avoid them are discussed by examining a practical case study.

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests:
Computer Science and Commercial Data Mining

Publisher: Foundation of Computer Science

Publication Date: 2014

Publication Name: International Journal of Computer Applications

Research Interests:
Computer Science, Social Networks, Data Mining, Graph Theory, Privacy, and 12 moreOnline social networks, Information Security and Privacy, Social Networking Security and Privacy, Privacy and data protection, Data Privacy, Information Hiding, Information Loss, Graphs and Networks, Computer Applications, Information and Knowledge, Privacy-Preserving Data Publishing, and Anonymization

Download (.pdf)

When evaluating variable data for a given business intelligence objective, we may observe that the relevant variables are not reliable or that the reliable ones are not relevant. Here's how to address this situation. Available at:... more

Research Interests:
Data Mining, Data Warehousing, Data Quality (Computer Science), Data quality (Business), Data Warehousing and Data Mining, and Data Quality

ABSTRACT This poster gives an overview of an approach for anonymizing online social networks represented as graphs: (i) The end user of the data is able to specify the utility requirements; (ii) We are able to define potential adversary queries on the data. These two aspects condition the way in which we anonymize the graph, and from which we derive measures for information loss, risk and privacy levels.

ABSTRACT This brief talk will consider some of the issues which graph data miners may encounter when analyzing Online Social Networks represented as graphs. Such issues include representing an OSN as a graph, the elicitation of a... more

ABSTRACT A key aspect of data mining and its success in extracting useful knowledge is the way in which the data is represented. In this paper we propose representing the relations inherent in an e-commerce bookstore search log as a graph, which allows us to apply and customize graph metrics and algorithms in order to identify structures and key elements. This approach complements traditional transactional mining by facilitating the identification of underlying structural interrelations.

Research Interests:
Computer Science, Information Theory, Data Mining, Data Analysis, Web Mining, and 5 moreBuilding Information Modeling, Graph Data Mining, Information, Library and Information Studies, and Graph Mining

ABSTRACT

Research Interests:
Statistics, Data Mining, Fuzzy Logic, Estadistica, and Minería de Datos

Publication Date: 2012

Publication Name: Proceedings - 2012 8th Latin American Web Congress, LA-WEB 2012

Research Interests:
Computer Science, Information Retrieval, Human Computer Interaction, Eye tracking, Data Mining, and 15 moreUsability and user experience, Information Visualisation, Digital Identity, Search Engines, Ambient Intelligence, Information Fusion, Interactive Systems, User web search behaviour, Eye, Internet user behaviour, Attentive Displays, Eye and Gaze Tracking, Google Search Engine, Search Engine Optimiztion, and Internet

Download (.pdf)

Publication Date: 2013

Publication Name: Computer Science Review

Research Interests:
Computer Science, Human Computer Interaction, Community Informatics, Social Networks, Usability, and 9 moreData Mining, Social Networking, Social Network Analysis (SNA), Online social networks, Graphical Models, Graph/Network Algorithms, Graph Data Mining, Mobile Communications, and User interfaces

Download (.pdf)

Publication Date: 2012

Publication Name: Lecture Notes in Computer Science

Research Interests:
Computer Science, Community Informatics, Social Networks, Social Network Analysis (SNA), Online social networks, and 3 moreGraphical Models, Graph/Network Algorithms, and Graph Data Mining

Download (.pdf)

Publisher: SciELO Comision Nacional de Investigacion Cientifica Y Tecnologica (CONICYT)

Publication Date: 2015

Publication Name: Ingeniare. Revista chilena de ingeniería

Research Interests:
Mechanical Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Text Mining, and 7 moreSocial Media, Internet memes, Semantic Web, Semantic Networks, Memes, Semantic Network, and Memetics

Download (.pdf)

ABSTRACT

Publication Date: 2005

Research Interests:
Information Retrieval, Artificial Intelligence, Data Mining, Clustering and Classification Methods, Web search, and Kohonen Self-Organizing Feature Maps (SOFMs)

In this paper we propose a classification for different observable trends over time for user web queries. The focus is on the identification of general collective trends, based on search query keywords, of the user community in Internet and how they behave over a given time period. We give some representative examples of real search queries and their tendencies. From these examples we define a set of descriptive features which can be used as inputs for data modelling. Then we use a selection of non supervised (clustering) and supervised modelling techniques to classify the trends. The results show that it is relatively easy to classify the basic hypothetical trends we have defined, and we identify which of the chosen learning techniques are best able to model the data. However, the presence of more complex, noisy or mixed trends make the classification more difficult.

Publisher: KDIR

Publication Date: 2010

Research Interests:
Bioinformatics, Computer Science, Information Retrieval, Artificial Intelligence, Natural Language Processing, and 12 moreMachine Learning, Data Mining, Signal Processing, Network Security, Web Mining, Text Mining, Web search, Computer Security, Mobile Computing, The Internet, World Wide Web, and Syntactic and Semantic Knowledge

Download (.pdf)

In this paper we describe the functionality of a decision support modelling approach to select appropriate biomaterial blends depending on their mechanical/chemical properties on the one hand, and their biodegradation behaviour, on the other. Firstly, a Case Based Reasoning (CBR) approach is applied to predict expected biodegradation behaviour over time, based on historical examples and using a weighted distance metric on the material properties in order to calculate the trend curve of the new case. Secondly, a Multi-Agent System (MAS) is applied to dynamically simulate the biodegradation curve, in which the two main agents, bacteria and plastic, interact to reproduce the biodegradation kinetics over time. The results of the interpolation are very promising with a good approximation to the real curve time series and % biodegradation, and the Multi-Agent System successfully simulates the different trend curves over time. The system has been confirmed as useful by materials expert end-users, who participated in the project, in order to evaluate a priori new blends "in silico", and identify and select the most promising, before conducting the long duration biodegradation experiments in the real environment.

DOI: 10.5220/0011136200003274

Publication Date: 2022

Publication Name: Proc. 12th Int. Conf. on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH), p. 26-35, July 14-16, 2022, Lisbon, Portugal

Research Interests:
Case-Based Reasoning, Modeling and Simulation, Biodegradation, Multi-Agent Systems, and Bioplastics

Download (.pdf)

This article presents the results of applying artificial intelligence (AI), such as machine learning algorithms, to identifying and predicting anomalies for corrective maintenance in a water for injection (WFI) processing plant. The aim is to avoid the yearly stoppage of the WFI plant for preventive maintenance activities, common in the industry, and use a more scientific approach for the time between stoppages, expected to be longer after the study and thus saving money and increasing productivity.

Publication Date: 2022

Publication Name: Pharmaceutical Engineering

Research Interests:
Artificial Intelligence, Machine Learning, Industrial Engineering, Anomaly Detection, Pharmaceutics, and 2 morePreventive Maintenance and Predictive Modeling

Among digital technologies, Artificial Intelligence (AI) and Big Data (BD) have proven capability to support different processes, mainly in discrete manufacturing. Despite the fact that a number of AI and BD literature reviews exist, no comprehensive review is available for the Process Industry (i.e. cements, chemical, steel, and mining). This paper aims to provide a comprehensive review of AI and BD literature to gain insights into their evolution supporting operational phases of the Process Industry. Results allow to define the areas where AI/BD are proven to have greater impact and areas with gaps like for example the process control (predictive models) area, machine learning and cyber-physical systems technologies. The sectors lagging behind are Ceramics, Cement and non-ferrous metals. Areas to be studied in the future include the interaction between intelligent systems. humans and the external environment, the implementation of AI for the monitoring and optimization of parameters of different operations, ethical and social impact.

DOI: 10.1007/978-3-030-85914-5_61

Publication Date: 2021

Publication Name: APMS 2021: Advances in Production Management Systems. Artificial Intelligence for Sustainable and Resilient Production Systems

Research Interests:
Artificial Intelligence, Operations Management, Process industry, and Big Data

The motivation of the work in this paper is due to the need in research and applied fields for synthetic social network data due to (i) difficulties to obtain real data and (ii) data privacy issues of the real data. The issues to address are first to obtain a graph with a social network type structure, label it with communities. The main focus is the generation of realistic data, its assignment to and propagation within the graph. The main aim in this work is to implement an easy to use standalone end-user application which addresses the aforementioned issues. The methods used are the R-MAT and Louvain algorithms, with some modifications, for graph generation and community labeling respectively, and the development of a Java based system for the data generation using an original seed assignment algorithm followed by a second algorithm for weighted and probabilistic data propagation to neighbors and other nodes. The results show that a close fit can be achieved between the initial user specification and the generated data, and that the algorithms have potential for scale up. The system is made publicly available in a Github Java project.

DOI: 10.1007/978-3-030-85529-1_22

Publication Date: 2021

Publication Name: Lecture Notes in Computer Science

Research Interests:
Social Networks, Modeling and Simulation, Online social networks, Graphs, and Synthetic Data Generation

***BEST PAPER AWARD, SIMULTECH 2021 ***
The exceptionally high virulence of COVID-19 and the patients' precondition seem to constitute primary factors in how pro-inflammatory cytokines production evolves during the course of an infection. We present a System Dynamics Model approach for simulating the patient reaction using two key control parameters (i) virulence, which can be "moderate" or "high" and (ii) patient precondition, which can be "healthy", "not so healthy" or "serious preconditions". In particular, we study the behaviour of Inflammatory (M1) Alveolar Macrophages, IL6 and Active Adaptive Immune system as indicators of the immune system response, together with the COVID viral load over time. The results show that it is possible to build an initial model of the system to explore the behaviour of the key attributes involved in the patient condition, virulence and response. The model suggests aspects that need further study so that it can then assist in choosing the correct immunomodulatory treatment, for instance the regime of application of an Interleukin 6 (IL-6) inhibitor (tocilizumab) that corresponds to the projected immune status of the patients. We introduce machine learning techniques to corroborate aspects of the model and propose that a dynamic model and machine learning techniques could provide a decision support tool to ICU physicians.

DOI: 10.5220/0010600301410153

Volume: Proc. 11th Int. Conf. on Simulation and Modeling Methodologies, Technologies and Applications

Page Numbers: 141-153

Publication Date: 2021

Publication Name: SIMULTECH 2021: 11th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

Research Interests:
Machine Learning, Modeling, System Dynamics Modeling, Simulation, Immune system, and 4 moreRule Induction, Coronavirus COVID-19, Simulation of Biological Systems, and Immune System Response

Download (.pdf)

DOI: 10.1177/0734242X21997908

Publication Date: 2021

Publication Name: Waste Management & Research

Research Interests:
Spectroscopy, Recycling, Waste recycling, Photonics, Polymer Composites, and 2 morePlastic waste and Multilayer Films

Download (.pdf)

This paper describes an application, called Medici, designed to produce synthetic data for social network graphs, which can be used for analysis, hypothesis testing and application development by researchers and practitioners in the field. It builds on previous work by providing an integrated system, and a user friendly screen interface. It can be run with default values to produce graph data and statistics, which can then be used for further processing. The system is made publicly available in a Github Java project. The annex provides a user manual with a screen by screen guide.

Research Interests:
Online social networks, Graphs, and Synthetic Data Generation

Download (.pdf)

DOI: 10.3390/polym12071558

Issue: 7

Volume: 12

Page Numbers: 1558

Publication Date: 2020

Publication Name: Polymers

Research Interests:
Packaging, Biodegradation, End of life, Textile, Coatings, and 3 morePolylactic Acid, Upscaling, and Bioplastic

Download (.pdf)

There is exciting news in recent developments suggesting the potential to treat some human cancers by stimulating the patients own immune system. However, there is still much to understand; therefore, modelling the battle between those cells that are constituents of the human immune system against tumorous cells can significantly provide insights as mathematical modelling has done regarding the immune system behaviour against virus infections. In this paper we innovate in two directions. First, we move the modelling of immune struggles from the sphere of ordinary-differential equation models to the modelling by multi-agent simulations. We highlight the advantages of the multi-agent simulation, for example the consideration of elaborate spatial proximity interactions. Secondly, we move away from the realm of infectious diseases to the complex modelling of the stimulation of T-Cells and their participation in fighting cancerous cell tumours.

DOI: 10.13140/RG.2.2.23082.98248

Publication Date: 2020

Publication Name: ICAART 2020, 12th International Conference on Agents and Artificial Intelligence, Valleta, Malta, 22-24 February, 2020, http://www.icaart.org/

Research Interests:
Immunology, Cancer, Cancer stem cells, Modeling and Simulation, and Multi-Agent Systems

Download (.pdf)

Background: In this study, we compared four models for predicting rice blast disease, two operational process-based models (Yoshino and Water Accounting Rice Model (WARM)) and two approaches based on machine learning algorithms (M5Rules and Recurrent Neural Networks (RNN)), the former inducing a rule-based model and the latter building a neural network. In situ telemetry is important to obtain quality in-field data for predictive models and this was a key aspect of the RICE-GUARD project on which this study is based. According to the authors, this is the first time process-based and machine learning modelling approaches for supporting plant disease management are compared. Results: Results clearly showed that the models succeeded in providing a warning of rice blast onset and presence, thus representing suitable solutions for preventive remedial actions targeting the mitigation of yield losses and the reduction of fungicide use. All methods gave significant "signals" during the "early warning" period, with a similar level of performance. M5Rules and WARM gave the maximum average normalized scores of 0.80 and 0.77, respectively, whereas Yoshino gave the best score for one site (Kalochori 2015). The best average values of r and r2 and %MAE (Mean Absolute Error) for the machine learning models were 0.70, 0.50 and 0.75, respectively and for the process-based models the corresponding values were 0.59, 0.40 and 0.82. Thus it has been found that the ML models are competitive with the process-based models. This result has relevant implications for the operational use of the models, since most of the available studies are limited to the analysis of the relationship between the model outputs and the incidence of rice blast. Results also showed that machine learning methods approximated the performances of two process-based models used for years in operational contexts. Conclusions: Process-based and data-driven models can be used to provide early warnings to anticipate rice blast and detect its presence, thus supporting fungicide applications. Data-driven models derived from machine learning methods are a viable alternative to process-based approaches and - in cases when training datasets are available - offer a potentially greater adaptability to new contexts.

DOI: 10.1186/s12859-019-3065-1

Publication Date: 2019

Publication Name: BMC Bioinformatics

Research Interests:
Machine Learning, Forecasting, Neural Networks, Forecasting and Prediction Tools, Artificial Neural Networks, and 7 morePredictive models, Crop Diseases, Disease Management of Crop Plant, Plant Disease, Rule Induction, Breeding for Rice Blast Resistance, and Rice Blast Disease

We consider the re-identification of users of on-line social networks when they participate in several different on-line social networks, potentially using several different accounts. The re-identification of users serves several purposes: (i) commercial use so as to avoid redundant mailing to the same user; (ii) enhancement of the information available about these users by unifying information from different sources; (iii) consolidation of accounts by on-line social network providers; (iv) identification of potentially malicious users and/or bots. We highlight that all this should occur within the bounds of the data protection and privacy laws as well as the users' expectations on such matters to avoid backlash. In this paper, we explore this situation first by a formalization using the SAN model to conceptually structure information as a graph, which includes user and attribute type nodes. This formalization enables us to reason on two issues. First, how to identify that two or more user-accounts belong to the same user. Second, what gains in predictability are obtained after re-identification. For the first issue, we show that a set-difference approach is remarkably effective. For the second issue we explore the impact of re-identification on the predictability by two different machine learning algorithms: C4.5 (decision tree induction) and SVM-SMO (Support Vector Machine with SMO kernel). Our results show that as predictability improves, in some cases different SAN metrics emerge as predictors.

Issue: 1

Volume: 12

Page Numbers: 29-56

Publication Date: 2019

Publication Name: Transactions on Data Privacy

Research Interests:
Social Networks, Machine Learning, Online social networks, Graph Data Mining, Privacy and data protection, and 3 morePrivacy Preserving Data Mining, SVM classifier, and C4.5 ALGORITHM OF DECISION TREE

Download (.pdf)

From their origins in the sociological field, memes have recently become of interest in the context of `viral' transmission of basic information units (memes) in online social networks. However, much work still needs to be done in terms of metrics and practical data processing issues. In this paper we define a theoretical basis and processing system for extracting and matching memes from free format text. The system facilitates the work of a text analyst in extracting this type of data structures from online text corpuses and n performing empirical experiments in a controlled manner. The general aspects related to the solution are the automatic processing of unstructured text without need for preprocessing (such as labelling and tagging), identification of co-occurences of concepts and corresponding relations, construction of semantic networks and selecting the top memes. The system integrates these processes which are generally separate in other state of the art systems. The proposed system is important because unstructured online text content is growing at a greater rate than other content (e.g. semi-structured, structured) and integrated and automated systems for knowledge extraction from this content will be increasingly important in the future. To illustrate the method and metrics we process several real online discussion forums, extracting the principal concepts and relations, building the memes and then identifying the key memes for each document corpus using a sophisticated matching process. The results show that our method can automatically extract coherent key knowledge from free text, which is corroborated by benchmarking with a set of other text analysis approaches, as well as a user study evaluation.

Research Interests:
Text Mining, Semantic Networks, Memes, Semantic Network Analysis, Content Anaysis of Online Discussion Forums, and 2 moreOnline Forums and Memes, Internet Memes, Internet culture

Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. One possible solution to both of these problems is to use synthetically generated data. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels.

Journal Name: Soc. Netw. Anal. Min. (2016)

Research Interests:
Statistics, Data Mining, Graphs Theory, Social Network Analysis (SNA), and Synthetic Data Generation

Download (.pdf)

Given that exact pair-wise graph matching has a high computational cost, different representational schemes and matching methods have been devised in order to make matching more efficient. Such methods include representing the graphs as tree structures, transforming the structures into strings and then calculating the edit distance between those strings. However many coding schemes are complex and are computationally expensive. In this paper, we present a novel coding scheme for unlabeled graphs and perform some empirical experiments to evaluate its precision and cost for the matching of neighborhood subgraphs in online social networks. We call our method OSG-L (Ordered String Graph-Levenshtein). Some key advantages of the pre-processing phase are its simplicity, compactness and lower execution time. Furthermore, our method is able to match both non-isomorphisms (near matches) and isomorphisms (exact matches), also taking into account the degrees of the neighbors, which is adequate for social network graphs.

Journal Name: International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 24(03):411-431 · June 2016

Research Interests:
Graph Theory, Graph matching, and Inexact Graph Matching

Download (.pdf)

In recent years, online social networks have become a part of everyday life for millions of individuals. Also, data analysts have found a fertile field for analyzing user behavior at individual and collective levels, for academic and commercial reasons. On the other hand, there are many risks for user privacy, as information a user may wish to remain private becomes evident upon analysis. However, when data is anonymized to make it safe for publication in the public domain, information is inevitably lost with respect to the original version, a significant aspect of social networks being the local neighborhood of a user and its associated data. Current anonymization techniques are good as identifying risks and minimizing them, but not so good at maintaining local contextual data which relate users in a social network. Thus, improving this aspect will have a high impact on the data utility of anonymized social networks. Also, there is a lack of systems which facilitate the work of a data analyst in anonymizing this type of data structures and performing empirical experiments in a controlled manner on different datasets. Hence, in the present work we address these issues by designing and implementing a sophisticated synthetic data generator together with an anonymization processor with strict privacy guarantees and which takes into account the local neighborhood when anonymizing. All this is done for a complex dataset which can be fitted to a real dataset in terms of data profiles and distributions. In the empirical section we perform experiments to demonstrate the scalability of the method and the improvement in terms of reduction of information loss with respect to approaches which do not consider the local neighborhood context when anonymizing.

Journal Name: Expert Systems with Applications Volume 55, 15 August 2016, Pages 87–105

Research Interests:
Data Mining, Online social networks, Graph Data Mining, Graph Mining, Data Privacy, and 5 moreSimilarity Searching, Privacy Preserving Data Mining, Online Social Networks (OSNs), Synthetic Data Generation, and Online Social Networks Analysis

Download (.pdf)

Approximate sub-graph matching is important in many graph data mining fields. At present, current solutions can be difficult to implement, have an expensive pre-processing phase, or only work for given types of graph. In this paper a novel generic approach is presented which addresses these issues. An approximate sub-graph matcher (A-SGM) calculates the distance between the topological characteristics (footprint) of the sub-graphs to be matched, applying a weighting to the different sub-graph characteristics and those of neighbor nodes. The weights are calibrated for each dataset with a simulated annealing process using sample sets of graph nodes to reduce computational cost, and an exact isomorphism matcher as a fitness function which takes into account how well the match maintains the neighboring node degree distributions. Benchmarking is performed on several state of the art methods and real and synthetic graph datasets to evaluate the precision, recall and computational cost. The results show that the A-SGM is competitive with state of the art methods in terms of precision, recall and execution time.

Publisher: Foundation of Computer Science (FCS), NY, USA

Journal Name: International Journal of Computer Applications 130(10):29-38, November 2015

Research Interests:
Data Mining, Graphs Theory, Simulated Annealing, Topology, Graph matching, and 3 moreCalibration, Inexact Graph Matching, and Graph Pattern Matching

Download (.pdf)

Publication Date: 2010

Publication Name: Artificial Intelligence Review

Research Interests:
Bioinformatics, Cognitive Science, Artificial Intelligence, Natural Language Processing, Machine Learning, and 14 moreData Mining, Signal Processing, Network Security, Text Mining, Computer Security, Statistical machine learning, Mobile Computing, Noise, Supervised Learning, Decision Tree, Support vector machine, Reliability Modeling, Syntactic and Semantic Knowledge, and Data Model

Download (.pdf)

Internet users in general and on-line social networks users in particular are becoming more savvy about masking data they consider private. However, some of this masked data may be inferable from other data the user has not masked. Furthermore, even if a user masks all its data, it may still be inferable from the unmasked data of its friends, due to affinities in likes and personal attributes. In contrast to the conventional data mining approach, in which a model is built for all users, we build a rule set which is individualized for each user. In this paper we propose a novel rule induction approach (that incorporates predictive metrics) which enable a user to evaluate the potential risk incurred by unmasked attributes, friends’ attributes and also the risk of befriending new users. We find that all of these risks are quantifiable and a risk ranking of attributes and friends/potential friends can be individualized for each user. We give examples and use cases and confirm the effectiveness of the approach, using a sophisticated synthetic OSN-data to define risk attribute and user combinations which coincide with the risk ranking produced by our algorithm.

More Info: Authors: Vladimir Estivill-Castro, David F. Nettleton

Publication Date: Aug 20, 2015

Publication Name: in IEEE TRUSTCOM-RATSP 2015 -International Symposium on Recent Advances of Trust, Security and Privacy in Computing and Communications; Helsinki, Finland, August 20-22, 2015.

Research Interests:
Machine Learning, Data Mining, Privacy, Online social networks, Information Security and Privacy, and 5 moreSocial Networking Security and Privacy, Online Privacy, Privacy and data protection, Decision Tree Classification, and Information Gain

Download (.pdf)

And 50 more

Books

Key Features

- Illustrates cost-benefit evaluation of potential projects

- Includes vendor-agnostic advice on what to look for in off-the-shelf solutions as well as tips on building your own data mining tools

- Approachable reference can be read from cover to cover by readers of all experience levels

- Includes practical examples and case studies as well as actionable business insights from author's own experience

Description

Whether you are brand new to data mining or working on your tenth predictive analytics project, Commercial Data Mining will be there for you as an accessible reference outlining the entire process and related themes. In this book, you'll learn that your organization does not need a huge volume of data or a Fortune 500 budget to generate business using existing information assets. Expert author David Nettleton guides you through the process from beginning to end and covers everything from business objectives to data sources, and selection to analysis and predictive modeling.

Commercial Data Mining includes case studies and practical examples from Nettleton's more than 20 years of commercial experience. Real-world cases covering customer loyalty, cross-selling, and audience prediction in industries including insurance, banking, and media illustrate the concepts and techniques explained throughout the book.

Readership

Data mining professionals in business & IT.

More Info: Author: David F. Nettleton

Publisher: Morgan Kaufmann

Publication Date: Mar 1, 2014

Research Interests:
Business, Information Systems (Business Informatics), Data Mining, Web Mining, Business Intelligence, and 5 moreBusiness Information Systems, Predictive Analytics, Web Usage Mining, Business Intelligence, Predictive Modeling, Decision-Support, and Statistical Modeling and Machine Learning Algorithms for Data Mining, Inference, Prediction and Classification Problems

El libro esta dirigido a las personas que por razones profesionales o académicas tienen la necesidad de analizar datos de pacientes, con el motivo de realizar un diagnóstico o un pronóstico. Se explican en detalle las diversas técnicas estadísticas y de aprendizaje automatizado para su aplicación al análisis de datos clínicos. Además, el libro describe de forma estructurada, una serie de técnicas adaptadas y enfoques originales, basándose en la experiencia y colaboraciones del autor en este campo.

INDICE RESUMIDO: Introducción. Conceptos y técnicas. La perspectiva difusa. El diagnóstico y el pronóstico clínico. El diagnóstico del síndrome de apnea del sueña. La representación, comparación y proceso de datos de diferentes tipos. Técnicas. Resumen de los aspectos claves en la adaptación e implementación de las técnicas. Aplicación de las técnicas a casos reales. Pronóstico de pacientes de la UCI-Hospital Parc Tauli de Sabadell, etc.,

More Info: http://www.editdiazdesantos.com/libros/nettleton-david-tecnicas-para-el-analisis-de-datos-clinicos-L03007210101.html

Publisher: Ediciones Diaz de Santos, España.

Publication Date: Jan 1, 2005

Research Interests:
Medical Informatics, Data Mining, Decision Analysis, Medical data analysis, Multivariate Data Analysis, and 4 morePredictive Analytics, Usability Engineering, Clinical Informatics, and Healthcare systems engineering

Este libro está dirigido tanto a las personas sin formación en el análisis de datos comerciales como a las que ya se dedican a ello en mayor o menor grado, y buscan una referencia sencilla de todo el proceso y los temas vinculados. El autor incorpora materia tanto de sus mas de 20 años de experiencia empresarial como de sus diversos proyectos de investigación para enriquecer el contenido, el cual ofrece un enfoque original sobre la problemática del tema. En los apéndices, casos prácticos derivados de proyectos reales, sirven para ilustrar los conceptos y técnicas explicadas a lo largo del libro.

Prácticamente todos los métodos, técnicas e ideas que se presentan, por ejemplo 'calidad de datos', 'data mart', 'CRM - gestión de la relación con los clientes', 'diferentes fuentes de datos' y 'búsqueda en Internet', pueden ser aprovechados tanto por el empresario de una micro-empresa o un profesional autónomo, como por una empresa mediana o grande. No es imprescindible disponer de un gran volumen de datos, y hay herramientas de análisis disponibles a un precio accesible a todos.

More Info: http://www.editdiazdesantos.com/libros/nettleton-david-analisis-de-datos-comerciales-L03005930101.html

Publisher: Ediciones Diaz de Santos

Publication Date: Jan 1, 2003

Research Interests:
Statistics, Data Mining, Data Analysis, Business Intelligence, Business Intelligence (BI), and 4 morePredictive Analytics, Commercial, Business Intelligence/Knowledge Management, and Analisis De Datos Comerciales

Talks

A malevolent data miner can use data mining techniques in order to learn confidential information of social networking site users that the users did not disclose ; and thereby the data miner can breach individual privacy of a social networking site user . However, the information items in a social network are not only the attributes of users but also the relationships. The attributes of the neighbours, and the characteristics of the connections can also determine a user profile, even with very little or no information has been shared . Thus, it is a challenge to empower users by alerting them of unmasked attributes disclosed by a particular user or his neighbour connection . This approach gathers information for SNS users and applies proposed Cum_Sensitivity and Total_Count algorithms to find out sensitive rules, and their corresponding unmasked attributes (i . e . which are used to conjunct rule) . Then, it suggests the user suppress those high risk attributes or some values of it . In addition, potential risk incurred by friends’ attributes are also quantifiable and a risk ranking of attributes and friends can be individualized for each user .

Location: VICPU-2015, Video/Image Coding, Processing, and Understanding Workshop, Nov. 3-4 2015, Charles Stuart University, Australia

Organization: Charles Sturt University, Australia

Conference End Date: Nov 4, 2015

Conference Start Date: Nov 3, 2015

Research Interests:
Social Networks, Data Mining, Social Network Analysis (SNA), Privacy and data protection, and Data Privacy

Download (.pdf)

In this presentation two themes are considered:

(i) A personalized privacy tool for online social network users
and (ii) a generator for synthetic online social network graph data.

Research Interests:
Graphs Theory, Online social networks, Data Modeling, Privacy and data protection, Data Privacy, and Synthetic Data Generation

Download (.pdf)

Internet users in general and on-line social networks users in particular are becoming more savvy about masking data they consider private. However, some of this masked data may be inferable from other data the user has not masked. Furthermore, even if a user masks all its data, it may still be inferable from the unmasked data of its friends, due to affinities in likes and personal attributes. In contrast to the conventional data mining approach, in which a model is built for all users, we build a rule set which is individualized for each user. In this paper we propose a novel rule induction approach (that incorporates predictive metrics) which enable a user to evaluate the potential risk incurred by unmasked attributes, friends' attributes and also the risk of befriending new users. We find that all of these risks are quantifiable and a risk ranking of attributes and friends/potential friends can be individualized for each user. We give examples and use cases and confirm the effectiveness of the approach, using a sophisticated synthetic OSN-data to define risk attribute and user combinations which coincide with the risk ranking produced by our algorithm.

Research Interests:
Machine Learning, Online social networks, Obfuscation (Information Security And Cryptography), Privacy and data protection, Data Privacy, and Decision Tree Classification

Download (.pdf)

It is widely accepted that the field of Data Analytics has entered into the era of Big Data. In particular, it has to deal with so-called Big Graph Data, which is the focus of this paper. Graph Data is present in many fields, such as Social Networks, Biological Networks, Computer Networks, and so on. It is recognized that data analysts benefit from interactive real time data exploration techniques such as clustering and zoom capabilities on the clusters. However, although clustering is one of the key aspects of graph data analysis, there is a lack of scalable graph clustering algorithms which would support interactive techniques. This paper presents an approach based on combining graph clustering and graph coordinate system embedding, and which shows promising results through initial experiments. Our approach also incorporates both structural and attribute information, which can lead to a more meaningful clustering.

Research Interests:
Data Mining, Graphical Models, Graph/Network Algorithms, and Big Data / Analytics / Data Mining

Download (.pdf)

One of the difficulties for data analysts of online social networks is the public availability of data, while respecting the privacy of the users. One alternative is to use synthetically generated data[1]. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, and so on.
In the following we present an approach for generating a graph topology and populating it with synthetic data for an online social network.

More Info: Conference: 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), At UPC, Barcelona, Spain

Event Date: Mar 18, 2015

Research Interests:
Social Networks, Social Networking, Social Network Analysis (SNA), Online social networks, Graph Data Mining, and 6 moreGraph Mining, WEB MINING,Graph Mining, Constructing Social Networks, Hypergraphs, Graphs, Data Mining, Mining Social Graph, Social Influence Metrics, and Social Networks Design

Download (.pdf)

More Info: Conference: 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), At UPC, Barcelona, Spain

Event Date: Mar 18, 2015

Research Interests:
Social Networks, Social Networking, Social Network Analysis (SNA), Online social networks, Social Network Analysis (Social Sciences), and 6 moreGraph Data Mining, Graph Mining, Web Mining, Social Network Analysis, Data Mining, WEB MINING,Graph Mining, Hypergraphs, Graphs, Data Mining, and Mining Social Graph, Social Influence Metrics

Download (.pdf)

In this brief presentation on free text document sanitization, we perform a multi-step semi-automatic sanitization process and evaluate the information loss using information retrieval metrics. The Wikileaks document corpus is used for... more

Location: Poblet, Tarragona, Catalunya.

More Info: Author: David F. Nettleton

Publication Date: May 20, 2013

Publication Name: 6th Ares Plenary Meeting. Advanced Research on Information Security and Privacy.

Research Interests:
Natural Language Processing, Data Mining, Privacy, Text Mining, Information Security and Privacy, and 5 moreGraph Data Mining, Social Networking Security and Privacy, Online Privacy, Privacy and data protection, and Text Mining and Information Retrieval

Download (.pdf)

In this brief presentation on graph anonymization, we look at some graph modifier operators and different types of adversary information queries.

Location: Sant Hilari Sacalm, Girona, Catalunya.

More Info: Author: David F. Nettleton

Publication Date: May 14, 2012

Publication Name: 5th Ares Plenary Meeting. Advanced Research on Information Security and Privacy.

Research Interests:
Data Mining, Privacy, Information Security and Privacy, Graph Data Mining, Social Networking Security and Privacy, and 2 moreOnline Privacy and Privacy and data protection

Download (.pdf)

In this brief presentation we give an overview of some of the issues and work related to data privacy of on-line social network data represented as graphs. Among the issues considered are adversaries, protection methods (link addition and... more

Location: Nerja, Màlaga (Spain).

More Info: Authors: David F. Nettleton, Vicenç Torra, Klara Stokes

Publication Date: May 16, 2011

Publication Name: 4th Ares Plenary Meeting. Advanced Research on Information Security and Privacy.

Research Interests:
Data Mining, Privacy, Information Security and Privacy, Graph Data Mining, Social Networking Security and Privacy, and 2 moreOnline Privacy and Privacy and data protection

Download (.pdf)

This poster gives an overview of an approach for anonymizing online social networks represented as graphs: (i) The end user of the data is able to specify the utility requirements; (ii) We are able to define potential adversary queries on the data. These two aspects condition the way in which we anonymize the graph, and from which we derive measures for information loss, risk and privacy levels.

More Info: Universitat Politècnica de Catalunya (UPC), in Barcelona (Catalonia, Spain)

Publication Date: Feb 21, 2014

Publication Name: 2nd Workshop on Graph-based Technologies and Applications (Graph-TA)

Research Interests:
Artificial Intelligence, Data Mining, Graph Theory, Privacy, Online Communities, and 8 moreOnline social networks, Information Security and Privacy, Graph/Network Algorithms, Graph Data Mining, Social Networking Security and Privacy, Online Privacy, Graph Mining, and Privacy and data protection

Download (.pdf)

In this brief talk we describe an approach for anonymizing online social networks represented as graphs: (i) The end user of the data is able to specify the utility requirements; (ii) We are able to define potential adversary queries on the data. These two aspects condition the way in which we anonymize the graph, and from which we derive measures for information loss, risk and privacy levels.

More Info: Universitat Politècnica de Catalunya (UPC), in Barcelona (Catalonia, Spain)

Publication Date: Feb 21, 2014

Publication Name: 2nd Workshop on Graph-based Technologies and Applications (Graph-TA)

Research Interests:
Artificial Intelligence, Data Mining, Online Communities, Online social networks, Information Security and Privacy, and 6 moreGraph/Network Algorithms, Graph Data Mining, Social Networking Security and Privacy, Online Privacy, Graph Mining, and Privacy and data protection

Download (.pdf)

This poster gives an overview of some of the issues which graph data miners may encounter when analyzing Online Social Networks represented as graphs. Such issues include representing an OSN as a graph, the elicitation of a community... more

Location: Barcelona, Spain

More Info: 1st Workshop on Graph-based Technologies and Applications (Graph-TA)

Event Date: Feb 19, 2013

Organization: Universitat Politècnica de Catalunya (UPC), in Barcelona (Catalonia, Spain)

Research Interests:
Artificial Intelligence, Data Mining, Online Communities, Online social networks, Graph/Network Algorithms, and 2 moreGraph Data Mining and Graph Mining

Download (.pdf)

This brief talk will consider some of the issues which graph data miners may encounter when analyzing Online Social Networks represented as graphs. Such issues include representing an OSN as a graph, the elicitation of a community... more

More Info: 1st Workshop on Graph-based Technologies and Applications (Graph-TA)

Event Date: Feb 19, 2013

Organization: Universitat Politècnica de Catalunya (UPC), in Barcelona (Catalonia, Spain)

Research Interests:
Artificial Intelligence, Data Mining, Online Communities, Online social networks, Graph/Network Algorithms, and 2 moreGraph Data Mining and Graph Mining

Download (.pdf)

Patents

The present invention proposes a new approximate sub-graph matching method with the advantage of a relatively simple to implement matching method, requiring a worst case runtime computational cost of O(N2). The present invention refers to a similarity metric which approximates a modified isomorphism matcher for local neighbourhood sub-graphs, the matcher consisting in a distance metric with weighted characteristics in terms of sub-graph statistics and statistics of neighbour node degrees. The weights of the metric are calibrated using a simulated annealing process which uses as a fitness function a modified isomorphism matcher which takes into account how well the match maintains the neighbouring node degree distributions. The learned weights provide additional information useful to interpret the relative importance of each characteristic.

More Info: Inventors: David F. Nettleton (60%), Anton Dries (40%) European Patent application number: 13382308.8 Presented: 30th July 2013. PCT application number: PCT/ES2014/065505 Presented 18th July 2014.

Publication Date: Jul 30, 2013

Research Interests:
Online social networks, Design and Analysis of Algorithms, Graph Data Mining, Graph matching, and Graph Mining

Technical Reports

This unclassified report consists of three testing and performance studies of the IBM 3081 mainframe which provided computer services to the AERE (Atomic Energy Research Establishment) Harwell site. (i) Job test stream for the batch system. (ii) Performance comparison of an indexed VTOC vs OS VTOC. (iii) System response time analysis using two different performance monitoring systems.

DOI: 10.13140/RG.2.2.18913.17763

Publication Date: 1983

Publication Name: Industrial Report, Computer Science and Systems Division, AERE Harwell, U.K.

Research Interests:
Performance Studies, Applied Statistics, Software Testing (Computer Science), IBM Mainframes, and Operating Systems (In Computer Science)

Download (.pdf)

In this document we review the state of the art on graph privacy with special emphasis on applications to online social networks, and we review how six different operators modify local topologies, when activity data is included. We consider an aspect which has not been greatly covered in the specialized literature on graph privacy: adding, deleting and disaggregation of nodes. We also cover the following key considerations: (i) choice of six different operators to modify the graph; (ii) simulated annealing to find the optimum graph using a fitness function based on information loss and disclosure risk; (iii) Use of heuristics to choose graph elements (nodes, edges) to be modified, as a probability weighted by the distribution of an elements statistical characteristics (degree, clustering coefficient and path length) in the original graph; (iv) re-linking of nodes: heuristic which finds the topology whose statistical characteristics are closest to those of the original neighborhood; (v) in the case of the aggregation of two nodes, we choose adjacent nodes rather than isomorphic topologies, in order to maintain the overall structure of the graph; (vi) incorporation of network activity as a weight on the topology characteristics; (vii) a statistically knowledgeable attacker who is able to search for regions of the graph based on statistical characteristics and map those onto a given node and its immediate neighborhood.

More Info: Authors: David F. Nettleton, Vicenç Torra

Publication Date: Dec 1, 2010

Publication Name: Technical Report, TR–IIIA–2010–04, IIIA-CSIC

Research Interests:
Social Network Analysis (SNA), Online social networks, Graph Data Mining, Graph Mining, Privacy and data protection, and 2 moreData Privacy and Privacy Preserving Data Mining

Download (.pdf)

This document describes the first version (V1.0) of the graph privacy software suite. It consists of some initial assumptions, together with a textual description of the main routine (simulated annealing) and the six graph modifier operators. This is followed by a structure diagram of the whole system and the pseudo code of each of the main functions, organized in a modular design. A companion document [TR-IIIA-2010-04] details the theoretical background to the work.

More Info: Author: David F. Nettleton

Publication Date: Dec 1, 2010

Publication Name: Technical Report, TR-IIIA-2010-05, IIIA-CSIC

Research Interests:
Social Network Analysis (SNA), Online social networks, Graph Data Mining, Graph Mining, Privacy and data protection, and 2 moreData Privacy and Privacy Preserving Data Mining

Download (.pdf)

Datasets

Brief description Two datasets are included which represent a graph which contain 11580 user records (nodes) and 87322 link records (edges), respectively. We have used as a (empty) topology the Amazon product co-purchasing network and ground-truth communities dataset which was collected by crawling the Amazon website by (Yang and Leskovec 2012) and is available from the SNAP online repository (https://snap.stanford.edu/data/). We used the version which has the top 5000 communities. The graph structure has then been populated with data by choosing seeds in each community and propagating from them. This follows a method outlined in [1]. The method has also been used to create a synthetic dataset for use in a data privacy study [2].

Research Interests:
Social Networks, Data Mining, Online social networks, Synthetic Data Generation, and Synthetic Dataset

Download (.docx)

Research Interests:
Social Networks, Data Mining, Social Network Analysis (SNA), Online social networks, and Synthetic Data Generation

Download (.csv)

Research Interests:
Social Networks, Data Mining, Social Network Analysis (SNA), Online social networks, Synthetic Data Generation, and Synthetic Dataset

Download (.csv)

50K link records (edges) - corresponds to 1K records (nodes) file in this same section.

Research Interests:
Machine Learning, Data Mining, Online social networks, Graph Data Mining, and Synthetic Data Generation

Download (.csv)

1K user records (nodes) - corresponds to the edges file in this same section.

Research Interests:
Machine Learning, Data Mining, Online social networks, Graph Data Mining, and Synthetic Data Generation

Download (.csv)

Two datasets are included which represent a graph which contain approx. 1K user records (nodes) and 50K link records (edges), respectively. We have followed a two step process: (1) generate a topology using R-Mat; apply Louvain to identify some communities; then apply Louvain recursively to selected communities to obtain some smaller ones, giving a total of 10 communities; (2) Populate the graph structure with data by choosing seeds in each community and propagating from them. This follows a method outlined in [1]. A new more sophisticated version of this method will be made available soon (datasets and code).
Please reference the paper [1] when using this data and publishing results in your work. Please give me your feedback on your analysis/use of this data and suggestions for improvement.
[1] Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015.

Research Interests:
Machine Learning, Data Mining, Online social networks, Graph Data Mining, and Synthetic Data Generation

Download (.pdf)

Conference Presentations

In this presentation, preliminary results are given for the modeling and calibration of two different industrial winding MIMO (Multiple Input Multiple Output) processes using machine learning techniques. In contrast to previous approaches which have typically used “black-box” linear statistical methods together with a definition of the mechanical behavior of the process, the present work builds a model using non-linear
machine learning algorithms together with a “white-box” rule induction technique to create a supervised model of the fitting error between the expected and real force measures. The final objective is to build a precise model of the winding process
in order to control de tension of the material being wound in the first case, and the friction of the material passing through the die, in the second case.

Location: 18th International Conference on Industrial Engineering and Manufacturing, ICIEMPM 2016, 28-29 July 2016, London.

Conference End Date: Jul 29, 2016

Conference Start Date: Jul 28, 2016

Research Interests:
Data Mining, Statistical Modeling, Statistical machine learning, Statistical Process Control, Data Modeling, and 2 moreIndustrial Process control and industrial process modeling

Download (.pdf)

Source Code

Github Java source code of MEDICI: A simple to use synthetic social network data generator

https://github.com/dnettlet/MEDICI
The main project folder includes the corresponding paper (please reference if you include Medici in you research) and user manual.

The paper preprint reference is: https://arxiv.org/abs/2101.01956

Overview:
The Java and JavaFx source code corresponds to the Medici application, designed to produce synthetic data for social network graphs, which can be used for analysis, hypothesis testing and application development by researchers and practitioners in the field. It builds on previous work by providing an integrated system, and a user friendly screen interface. It can be run with default values to produce graph data and statistics, which can then be used for further processing. The system is made publicly available in a Github Java project. The annex provides a user manual with a screen by screen guide.

DOI: 10.13140/RG.2.2.30065.30568

Publication Date: 2021

Publication Name: https://github.com/dnettlet/MEDICI

Research Interests:
Social Networks, Data Mining, Graphs Theory, Online social networks, Graphs, and Synthetic Data Generation

Download (.pdf)

Repast (ReLogo) source code of paper "Multi-Agent Modeling Simulation of In-Vitro T-Cells for Immunologic Alternatives to Cancer Treatment" Language: Repast (ReLogo) Repository: https://github.com/dnettlet/AgentSim1 License: GNU GENERAL... more

Repast (ReLogo) source code of paper "Multi-Agent Modeling Simulation of In-Vitro T-Cells for Immunologic Alternatives to Cancer Treatment"
Language: Repast (ReLogo)
Repository: https://github.com/dnettlet/AgentSim1
License: GNU GENERAL PUBLIC LICENSE Version 3

Research Interests:
Immunology, Cancer, Modeling and Simulation, and Multi-Agent Systems

Python Source code of project to extract memes (compact semantic network structures) representing key knowledge which is circulating in online discussion forums. Languages: Python Repository: https://github.com/dnettlet/memes License:... more

Python Source code of project to extract memes (compact semantic network structures) representing key knowledge which is circulating in online discussion forums.
Languages: Python
Repository: https://github.com/dnettlet/memes
License: GNU GENERAL PUBLIC LICENSE Version 3

Research Interests:
Memes, Content Anaysis of Online Discussion Forums, Text Mining and Information Retrieval, Online Forums, and Memes, Internet Memes, Internet culture

This program takes an empty graph (just nodes and links) and a community labelling (e.g. generated by Gephi Louvain) and fills it will data, one record per node. Neighbors tend to be similar, users tend to form communities, node degree has a long tail distribution, clustering coefficient distributions, and so on.... Please reference the associated paper
"A synthetic data generator for online social network graphs",
Social Network Analysis and Mining, Dec. 2016, 6:44

and the github code ref when you use/adapt/improve it !

https://github.com/dnettlet/SynthOSNdataGenerator

This version with no overlapping communities :)

DOI: 10.1007/s13278-016-0352-y

Issue: 1

Volume: 6

Page Numbers: 44

Publication Date: 2016

Publication Name: Social Network Analysis and Mining

Research Interests:
Social Networks, Data Mining, Web Mining, Social Network Analysis (SNA), Online social networks, and Web Mining, Social Network Analysis, Data Mining

Thesis Chapters

This Master's Thesis dissertation describes my final project work for the M.Sc. in Computer Software and System Design, a 1 year intensive course at The Computing Laboratory of the University of Newcastle Upon Tyne, during 1984-1985. The work was motivated by the need at the time for higher level programming languages to allow the programmer to define and control computer operating system functions, rather than directly writing in (sequential) low level machine and assembly code. Also, it allowed an abstraction for addressing key issues such as concurrency, parallelism, reliability, security, IO disk interface, streams and queuing procedures, among others, and implementing at different levels (from user interface level down to the disc interface level, for example). Unix was used as the underlying system, running on a PDP 11/34 mainframe computer. The main areas of work were the setting up of the standalone Concurrent Euclid (CE) software on the PDP 11/34 hardware, the development of a disk interface written in CE, the development of different operating systems functions, some rewritten from an existing SOLO operating system (Brinch Hansen) written in Sequential Pascal, and a comparative study of the CE language with Concurrent Pascal, Modula 2 and Edison-11.

DOI: 10.13140/RG.2.2.32334.95043

Publication Date: 1985

Publication Name: Dissertation submitted for the Degree of Master of Science in Computing Software and System Design, Computing Laboratory, University of Newcastle Upon Tyne, U.K.

Research Interests:
Computer Architecture, UNIX System Programming, Concurrent Systems, and Operating Systems (In Computer Science)

Download (.pdf)

Drafts

In this paper a brief description is given of the implementation of a 'Pepper's Ghost' apparatus for creating an optical illusion. The result is a purely non digital effect, using only light reflection, appropriate lighting arrangement and background. A second chamber is added which makes it possible to project a secondary independent image superimposed with the primary one. As part of the testing of the apparatus, different objects (cup, bag) are made to appear and disappear, and by varying the incident light intensity, spurious visual artefacts are minimized.

DOI: 10.13140/RG.2.2.34999.39848

Publication Date: 2023

Publication Name: ResearchGate

Research Interests:
Optical Engineering, Illusion, and Peppers Ghost

Related Authors

Rajat Chhajed B.Tech. Mechanical Engg, IIT(BHU), Varanasi (U.P.), INDIA

David Nettleton

Publisher: SCITEPRESS - Science and Technology Publications

Publication Date: 2021

Publication Name: Proceedings of the 11th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

Publisher: SCITEPRESS - Science and Technology Publications

Publication Date: 2020

Publication Name: Proceedings of the 12th International Conference on Agents and Artificial Intelligence

Research Interests: Immunology, Cancer, Cancer stem cells, Modeling and Simulation, and Multi Agent Systems<div>()</div>

Publication Date: 2016

Publication Name: World Academy of Science, Engineering and Technology, International Journal of Industrial and Manufacturing Engineering

Publisher: Encyclopedia of Social Network Analysis and Mining. 2nd Ed.

Publication Date: 2018

Publisher: Trans. Data Priv.

Publication Date: 2019

Publication Name: Trans. Data Priv.

Publisher: Springer International Publishing

Publication Date: 2021

Publication Name: Modeling Decisions for Artificial Intelligence

Research Interests: Computer Science, Online social networks, Graphs, Synthetic Data Generation, and arXiv<div>()</div>

Publisher: MDPI AG

Publication Date: 2020

Publication Name: Polymers

Publisher: SAGE Publications

Publication Date: 2021

Publication Name: Waste Management & Research: The Journal for a Sustainable Circular Economy

Publisher: SciTePress - Science and and Technology Publications

Publication Date: 2009

Publication Name: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Publisher: Springer Science and Business Media LLC

Publication Date: 2019

Publication Name: BMC Bioinformatics

Publisher: BCS Learning & Development

Publication Date: 2012

Publication Date: 2005

Publication Name: Http Www Libreriasaulamedica Com

Publisher: ACM

Publication Date: 2015

Publication Name: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 - ASONAM '15

Research Interests: Computer Science, Internet privacy, and Information Privacy<div>()</div>

Publisher: IEEE

Publication Date: 2015

Publication Name: 2015 IEEE Trustcom/BigDataSE/ISPA

Publisher: Foundation of Computer Science

Publication Date: 2015

Publication Name: International Journal of Computer Applications

Publication Date: Mar 18, 2015

Publisher: Elsevier

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests: Computer Science, The Internet, World Wide Web, and Commercial Data Mining<div>()</div>

Publisher: Elsevier

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests: Business and Commercial Data Mining<div>()</div>

Publication Date: 2014

Publication Name: Commercial Data Mining

Research Interests: Computer Science and Commercial Data Mining<div>()</div>

Publisher: Foundation of Computer Science

Publication Date: 2014

Publication Name: International Journal of Computer Applications

Research Interests: Data Mining, Data Warehousing, Data Quality (Computer Science), Data quality (Business), Data Warehousing and Data Mining, and Data Quality<div>()</div>

Research Interests: Statistics, Data Mining, Fuzzy Logic, Estadistica, and Minería de Datos<div>()</div>

Publication Date: 2012

Publication Name: Proceedings - 2012 8th Latin American Web Congress, LA-WEB 2012

Publication Date: 2013

Publication Name: Computer Science Review

Publication Date: 2012

Publication Name: Lecture Notes in Computer Science

Publisher: SciELO Comision Nacional de Investigacion Cientifica Y Tecnologica (CONICYT)

Publication Date: 2015

Publication Name: Ingeniare. Revista chilena de ingeniería

Publication Date: 2005

Research Interests: Information Retrieval, Artificial Intelligence, Data Mining, Clustering and Classification Methods, Web search, and Kohonen Self-Organizing Feature Maps (SOFMs)<div>()</div>

Publication Date: 1999

Publisher: KDIR

Publication Date: 2010

DOI: 10.5220/0011136200003274

Publication Date: 2022

Publication Name: Proc. 12th Int. Conf. on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH), p. 26-35, July 14-16, 2022, Lisbon, Portugal

Research Interests: Case-Based Reasoning, Modeling and Simulation, Biodegradation, Multi-Agent Systems, and Bioplastics<div>()</div>

Research Interests:
Immunology, Cancer, Cancer stem cells, Modeling and Simulation, and Multi Agent Systems

Research Interests:
Computer Science, Online social networks, Graphs, Synthetic Data Generation, and arXiv

Research Interests:
Computer Science, Internet privacy, and Information Privacy

Research Interests:
Computer Science, The Internet, World Wide Web, and Commercial Data Mining

Research Interests:
Business and Commercial Data Mining

Research Interests:
Computer Science and Commercial Data Mining

Research Interests:
Data Mining, Data Warehousing, Data Quality (Computer Science), Data quality (Business), Data Warehousing and Data Mining, and Data Quality

Research Interests:
Statistics, Data Mining, Fuzzy Logic, Estadistica, and Minería de Datos

Research Interests:
Information Retrieval, Artificial Intelligence, Data Mining, Clustering and Classification Methods, Web search, and Kohonen Self-Organizing Feature Maps (SOFMs)

Research Interests:
Case-Based Reasoning, Modeling and Simulation, Biodegradation, Multi-Agent Systems, and Bioplastics

Research Interests:
Artificial Intelligence, Operations Management, Process industry, and Big Data

Research Interests:
Social Networks, Modeling and Simulation, Online social networks, Graphs, and Synthetic Data Generation

Research Interests:
Online social networks, Graphs, and Synthetic Data Generation

Research Interests:
Immunology, Cancer, Cancer stem cells, Modeling and Simulation, and Multi-Agent Systems

Research Interests:
Statistics, Data Mining, Graphs Theory, Social Network Analysis (SNA), and Synthetic Data Generation

Research Interests:
Graph Theory, Graph matching, and Inexact Graph Matching