[go: up one dir, main page]

Academia.eduAcademia.edu
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 1 doi:10.20944/preprints201907.0110.v1 Article 3 Approximate and Situated Causality in Deep Learning 4 Jordi Vallverdú 1 ,* 2 5 6 Philosophy Department - UAB; jordi.vallverdu@uab.cat * Correspondence: jordi.vallverdu@uab.cat; Tel.: +345811618 1 7 8 9 10 11 12 Abstract: Causality is the most important topic in the history of Western Science, and since the beginning of the statistical paradigm, it meaning has been reconceptualized many times. Causality entered into the realm of multi-causal and statistical scenarios some centuries ago. Despite of widespread critics, today Deep Learning and Machine Learning advances are not weakening causality but are creating a new way of finding indirect factors correlations. This process makes possible us to talk about approximate causality, as well as about a situated causality. 13 14 15 Keywords: causality; deep learning; machine learning; counterfactual; explainable AI; blended cognition; mechanisms; system 16 1. Causalities in the 21st Century. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 In classic Western Philosophies, causality was observed as an obvious observation of the divine regularities which were ruling Nature. From a dyadic truth perspective, some events were true while others were false, and those which were true followed strictly the Heaven’s will. That ontological perspective allowed early Greek philosophers (inspired by Mesopotamian, Egyptian and Indian scientists) to define causal models of reality with causal relations deciphered from a single origin, the arche (ἀρχή). Anaximander, Anaximenes, Thales, Plato or Aristotle, among others, created different models about causality, all of them connected by the same idea: hazard or nothingness was not possible. Despite those ideas were defended by atomists (who thought on a Nature with both hazard and void), any trace of them was deleted from researches. On the other hand, Eastern philosophers departed from the opposite ontological point of view: at the beginning was the nothingness, and the only true reality is the continuous change of things [1]. For Buddhist (using a four-valued logic), Hindu, Confucian or Taoist philosophers, causality was a reconstruction of the human mind, which is also a non-permanent entity. Therefore, the notion of causality is ontologically determined by situated perspectives about information values [2], which allowed and fed different and fruitful heuristic approaches to reality [3], [4]. Such situated contexts of thinking shape the ways by which people perform epistemic and cognitive tasks[5]–[7]. These ontological variations can be justified and fully understood once we assume the Duhem-Quine Thesis, that is, that it is impossible to test a scientific hypothesis in isolation, because an empirical test of the hypothesis requires one or more background assumptions (also called auxiliary assumptions or auxiliary hypotheses). Therefore, the history of the idea of causality changes coherently across the geographies and historical periods, entering during late 19 th Century into the realm of statistics and, later in 20 th Century, in multi-causal perspectives [8]. The statistical nature of contemporary causality has been involved into debates between schools, mainly Bayesians and a broad range of frequentist variations. At the same time, the epistemic thresholds has been changing, as the recent debate about statistical significance has shown, desacralizing the p-value. The most recent and detailed academic debate on statistical significance was extremely detailed into the #1 Supplement of the Volume 73, 209 of the journal The American Satistician, released in March, 20th 2019. But during the last decades of 20th Century and the beginning of the 21st Century, computational tools have become the backbone of cutting scientific researches [9], [10]. After the © 2019 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 doi:10.20944/preprints201907.0110.v1 46 47 48 49 great advances produced by machine learning techniques (henceforth, ML), several authors have asked themselves whether ML can contribute to the creation of causal knowledge. We will answer to this question into next section. 50 2. Deep Learning, Counterfactuals, and Causality 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 Is in this context, where the statistical analysis rules the study of causal relationships, that we find the attack to Machine Learning and Deep Learning as no suitable tools for the advance of causal and scientific knowledge. The most known and debated arguments come from the eminent statistician Judea Pearl [11], [12], and have been widely accepted. The main idea is that machine learning do not can create causal knowledge because the skill of managing counterfactuals, and following his exact words, [11] page 7: “Our general conclusion is that human-level AI cannot emerge solely from model-blind learning machines; it requires the symbiotic collaboration of data and models. Data science is only as much of a science as it facilitates the interpretation of data – a two-body problem, connecting data to reality. Data alone are hardly a science, regardless how big they get and how skillfully they are manipulated”. What he is describing is the well-known problem of the black box model: we use machines that process very complex amounts of data and provide some extractions at the end. As has been called, it is a GIGO (Garbage In, Garbage Out) process [13], [14]. It could be affirmed that GIGO problems are computational versions of Chinese room mental experiment [15]: the machine can find patterns but without real and detailed causal meaning. This is what Pearl means: the blind use of data for establishing statistical correlations instead that of describing causal mechanisms. But is it true? In a nutshell: not at all. I’ll explain the reasons. 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 Most of epistemic criticisms against AI are always repeating the same idea: machines are still not able to operate like humans do [16], [17]. The idea is always the same: computers are operating with data using a blind semantic perspective that makes not possible that they understand the causal connections between data. It is the definition of a black box model [18], [19]. But here happens the first problem: deep learning (henceforth, DL) is not the result of automated machines creating by themselves search algorithms and after it, evaluating them as well as their results. DL is designed by humans, who select the data, evaluate the results and decide the next step into the chain of possible actions. At epistemic level, is under human evaluation the decision about how to interpret the validity of DL results, a complex, but still only, technique [20]. But even last trends in AGI design include causal thinking, as DeepMind team has recently detailed [21], and with explainable properties. The exponential growth of data and their correlations has been affecting several fields, especially epidemiology [22], [23]. Initially it can be expressed by the agents of some scientific community as a great challenge, in the same way that astronomical statistics modified the Aristotelian-Newtonian idea of physical cause, but with time, the research field accepts new ways of thinking. Consider also the revolution of computer proofs in mathematics and the debates that these techniques generated among experts [24], [25]. In that sense, DL is just providing a situated approximation to reality using correlational coherence parameters designed by the communities that use them. It is beyond the nature of any kind of machine learning to solve problems only related to human epistemic envisioning: let’s take the long, unfinished, and even disgusting debates among the experts of different of statistical schools [8]. And this is true because data do not provide or determine epistemology, in the same sense that groups of data do not provide the syntax and semantics of the possible organization systems to which they can be assigned. Any connection between the complex dimensions of any event expresses a possible epistemic approach, which is a (necessary) working simplification. We cannot understand the world using the world itself, in the same way that the best map is not a 1:1 scale map, as Borges wrote (1946, On Exactitude in Science): “…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer 2.1.Deep Learning is not a data driven but a context driven techology: made by humans for humans. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 doi:10.20944/preprints201907.0110.v1 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography” Then, DL cannot follow a different information processing process, a specific one completely different from those run by humans. As any other epistemic activity, DL must include different levels of uncertainties if we want to use it [26]. Uncertainty is a reality for any cognitive system, and consequently, DL must be prepared to deal with it. Computer vision is a clear example of that set of problems [27]. Kendall and Gal have even coined new concepts to allow introduce uncertainty into DL: homocedastic, and heterocedastic uncertainties (both aleatoric) [28]. The way used to integrate such uncertainties can determine the epistemic model (which is a real cognitive algorithmic extension of ourselves). For example, Bayesian approach provides an efficient way to avoid overfitting, allow to work with multi-modal data, and make possible use them in real-time scenarios (as compared to Monte Carlo approaches) [29]; or even better, some authors are envisioning Bayesian Deep Learning [30]. Dimensionality is a related question that has also a computational solution, as Yosuhua Bengio has been exploring during last decades [31]–[33]. In any case, we cannot escape from the informational formal paradoxes, which were well-known at logical and mathematical level once Gödel explained them; they just emerge in this computational scenario, showing that artificial learnability can also be undecidable [34]. Machine learning is dealing with a rich set of statistical problems, those that even at biological level are calculated at approximate levels [35]–[37], a heuristics that are being implemented also into machines. This open range of possibilities, and the existence of mechanisms like informational selection procedures (induction, deduction, abduction), makes possible to use DL in a controlled but creative operational level [38]. 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 The second big Pearl critics is against DL because of its incapacity of integrating counterfactual heuristics. First we must affirm that counterfactuals do not warrant with precision any epistemic model, just add some value (or not). From a classic epistemic point of view, counterfactuals do not provide a more robust scientific knowledge: a quick look at the last two thousands of both Western and Estern sciences can give support to this view [4]. Even going beyond, I affirm that counterfactual can block thinking once it is structurally retaled to a close domain or paradigm of well-established rules; otherwise is just fiction or an empty mental experiment. Counterfactals are a fundamental aspect of human reasoning [39]–[42], and their algorithmic integration is a good idea [43]. But at the same time, due to the underdetermination [44]–[46], counterfactual thinking can express completely wrong ideas about reality. DL can no have an objective ontology that allows it to design a perfect epistemological tool: because of the huge complexity of the involved data as well as for the necessary situatedness of any cognitive system. Uncertainty would not form part of such counterfactual operationability [47], once it should ascribed to any not-well known but domesticable aspect of reality; nonetheless, some new ideas do not fit with the whole set of known facts, the current paradigm , nor the set of new ones. This would position us into a sterile no man’s land, or even block any sound epistemic movement. But humans are able to deal with it, going even beyond [48]. Opportunistic blending, and creative innovating are part of our set of most valuable cognitive skills [49] . 2.2. Deep learning is already running counterfactual approaches. 144 145 146 2.3. DL is not Magic Algorithmic Thinking (MAT). Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 147 148 149 150 151 152 153 154 155 156 157 158 159 160 doi:10.20944/preprints201907.0110.v1 Our third and last analysis of DL characteristics is related to its explainability. Despite of the evidence that the causal debate is beyond any possible resolution provided by DL, because it belongs to ontological perspectives that require a different holistic analysis, it is clear that the results provided my DL must be nor only coherent but also explainable, otherwise we would be in front of a new algorithmic form of magic thinking. By the same reasons by which DL cannot be just mere a complex way of curve fitting, it cannot become a fuzzy domain beyond human understanding. Some attempts are being held to prevent us from this, most of them rules by DARPA: Big Mechanisms [50] or XAI (eXplainable Artificial Intelligence) [51], [52]. An image from DARPA website: Figure 1. Explanability in DL Figure 2. XAI from DARPA Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 doi:10.20944/preprints201907.0110.v1 Again, these approaches answer to a request: how to adapt new epistemic tools to our cognitive performative thresholds and characteristics. There is not a bigger revolution from a conceptual perspective than the ones that happened during Renaissance with the use of telescopes or microscopes. DL systems are not running by themselves interacting with the world, automatically selecting the informational events to be studied, or evaluating them in relation to a whole universal paradigm of semantic values. Humans neither do. Singularity debates are useful and exploring possible conceptual frameworks and must be held [53]–[55], but at the same time cannot become fallacious fatalist arguments against current knowledge. Today, DL is a tool used by experts in order to map new connections between sets of data. Epistemology is not automated process, despite of minor and naïve attempts to achieve it [56], [57]. Knowledge is a complex set of explanations related to different systems that is integrated dynamically by networks of epistemic (human, still) agents who are working with AI tools. Machines could postulate their own models, true, but the mechanisms to verify or refine them would not be beyond any mechanism different from the used previously by humans: data do not express by itself some pure nature, but offers different system properties that need to be classified in order to obtain knowledge. And this obtaining is somehow a creation based on the epistemic and body situatedness of the system. . 190 3. Extending bad and/or good human cognitive skills through DL. 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 It is beyond any doubt that DL is contributing to improve the knowledge in several areas some of them very difficult to interpret because of the nature of obtained data, like neuroscience [58]. These advances are expanding the frontiers of verifiable knowledge beyond classic human standards. But even in that sense, they are still explainable. Anyhow, are humans who fed up DL systems with scientific goals, provide data (from which to learn patterns) and define quantitative metrics (in order to know how close are you getting to success). At the same time, are we sure that is not our biased way to deal with cognitive processes that mechanism that allows us to be creative? For such reason, some attempts to reintroduce human biased reasoning into machine learning are being explored [59]. This re-biasing [60], even replying emotional like reasoning mechanisms [61], [62]. My suggestion is that after great achievements following classic formal algorithmic approaches, it now time for DL practitioners to expand horizons looking into the great power of cognitive biases. For example, machine learning models with human cognitive biases are already capable of learning from small and biased datasets [63]. This process reminds the role of Student test in relation to frequentist ideas, always requesting large sets of data until the creation of the t-test, but now in the context of machine learning. In [63] the authors developed a method to reduce the inferential gap between human beings and machines by utilizing cognitive biases. They implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression, and random forests. This even could make possible one-shot learning systems [64]. Approximate computing can Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 doi:10.20944/preprints201907.0110.v1 212 213 214 215 216 217 218 boost the potentiality of DL, diminishing the computational power of the systems as well as adding new heuristic approaches to information analysis [35], [65]–[67]. Finally, a completely different type of problems, but also important, are how to reduce the biased datasets or heuristics we provide to our DL systems [68] as well as how to control the biases that make us not to interpret DL results properly [69]. Obviously, if there is any malicious value related to such bias, it must be also controlled [70]. 219 4. Causality in DL: the epidemiological case study 220 221 222 223 224 225 226 227 Several attempts has been implemented in order to allow causal models in DL, like [71] and the Structural Causal Model (SCM) (as an abstraction over a specific aspect of the CNN. We also formulate a method to quantitatively rank the filters of a convolution layer according to their counterfactual importance), or Temporal Causal Discovery Framework (TCDF, a deep learning framework that learns a causal graph structure by discovering causal relationships in observational time series data) by [72]. But my attempt here will be twofold: (1) first, to consider about the value of “causal data” for epistemic decisions in epidemiology; and (2) second, to look how DL could fit or not with those causal claims into the epidemiological field. 228 229 4.1.Do causality affects at all epidemiological debates? 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 According to the field reference [73], MacMahon and Pugh [74] created the one of the most frequently used definitions of epidemiology: “Epidemiology is the study of the distribution and determinants of disease frequency in man”. Note the absence of the term ‘causality’ and, instead, the use of the one of ‘determinant’. This is the result of the classic prejudices of Hill in his paper of 1965: “I have no wish, nor the skill, to embark upon philosophical discussion of the meaning of ‘causation’. The ‘cause’ of illness may be immediate and direct; it may be remote and indirect underlying the observed association. But with the aims of occupational, and almost synonymous preventive, medicine in mind the decisive question is where the frequency of the undesirable event B will be influenced by a change in the environmental feature A. How such a change exerts that influence may call for a great deal of research, However, before deducing ‘causation’ and taking action we shall not invariably have to sit around awaiting the results of the research. The whole chain may have to be unraveled or a few links may suffice. It will depend upon circumstances.” After this philosophical epistemic positioning, Hill numbered his 9 general qualitative association factors, also commonly called “Hill’s criteria” or even, which is frankly sardonic, “Hill’s Criteria of Causation”. For such epistemic reluctances epidemiologists abandoned the term “causation” and embraced other terms like “determinant” [75], “determining conditions” [76], or “active agents of change” [77]. For that reason, recent researches have claimed for a pluralistic approach to such complex analysis [78]. As a consequence we can see that even in a very narrow specialized field like epidemiology the meaning of cause is somehow fuzzy. Once medical evidences showed that causality was not always a mono-causality [22], [79] but, instead, the result of the sum of several causes/factors/determinants, the necessity of clarifying multi-causality emerged as a first-line epistemic problem. It was explained as a “web of causation” [80]. Some debates about the logics of causation and some popperian interpretations were held during two decades [81], [82]. Pearl himself provided a graphic way to adapt human cognitive visual skills to such new epidemiological multi-causal reasoning [83], as well do-calculus [84], [85], and directed acyclic graphs (DAGs) are becoming a fundamental tool [86], [87]. DAGs are commonly related to randomized controlled trials (RCT) for assessing causality. But RCT are not a Gold Standard beyond any critic [88], [89], because as [90] affirmed, RCT are often flawed, mostly useless, although clearly indispensable (it is not so uncommon that the same author claim against classic p-value suggesting a new 0,005, [91]). Krauss has even defended the impossibility of using RCT without biases [92], although some authors defend that DAGs can reduce RCT biases [93]. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 doi:10.20944/preprints201907.0110.v1 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 But there a real case that can shows us a good example about the weight of causality in real scientific debates. We will see the debates about the relation between smoke and lung cancer. As soon as in 1950 were explained the causal connections between smoking and lung cancer [94]. But far from being accepted, these results were replied by tobacco industry using scientific experimental regression. Perhaps the most famous generator of silly counterarguments was R.A. Fisher, the most important frequentist researcher of 20th Century. In 1958 he published a paper in Nature journal [95] in which he affirmed that all connections between tobacco smoking and lung cancer were due to a false correlation. Even more: with the same data could be inferred that “smoking cigarettes was a cause of considerable prophylactic value in preventing the disease, for the practice of inhaling is rare among patients with cancer of the lung that with others” (p. 596). Two years later he was saying similar silly things in a high rated academic journal [96]. He even affirmed that Hill tried to plant fear into good citizens using propaganda, and entering misleadingly into the thread of overconfidence. The point is: did have Fisher real epistemic reasons for not to accepting the huge amount of existing causal evidences against tobacco smoking? No. And we are not affirming the consequent after collecting more data not available during Fisher life. He has strong causal evidences but he did not wanted to accept them. Still today, there are evidences that show how causal connections are field biased, again with tobacco or the new e-cigarettes [97]–[99]. 277 278 279 As a section conclusion can be affirmed that causality has strong specialized meanings and can be studied under a broad range of conceptual tools. The real example of tobacco controversies offers such long temporal examples. 280 281 4.2.Can DL be of sume utility for the epidemiological debates on causality? 282 283 284 285 286 287 288 289 290 291 292 293 294 The second part of my argumentation will try to elucidate whether DL can be useful for the resolution of debates about causality in epidemiological controversies. The answer is easy and clear: yes. But it is directly related to a specific idea of causality as well as of a demonstration. For example, can be found a machine learning approach to enable evidence based oncology practice [100]. Thus, digital epidemiology is a robust update of previous epidemiological studies [101][102]. The new possibilities of finding new causal patterns using bigger sets of data is surely the best advantages of using DL for epidemiological purposes [103]. Besides, such data are the result of integrating multimodal sources, like visual combined with classic informational [104], but the future with mode and more data capture devices could integrate smell, taste, movements of agents,...deep convolutional neural networks can help us, for example, to estimate environmental exposures using images and other complementary data sources such as cell phone mobility and social media information. Combining fields such as computer vision and natural language processing, DL can provide the way to explore new interactions still opaque to us [105], [106]. 295 296 297 298 Despite of the possible benefits, it is also true that the use of DL in epidemiological analysis has a dangerous potential of unethicality, as well as formal problems [107], [108]. But again, the evaluation of involved expert agents will evaluate such difficulties as things to be solved or huge obstacles for the advancement of the field. 299 300 301 5. Conclusion: causal evidence is not a result, but a process. 302 303 304 305 Author has made an overall reply to main critics to Deep Learning (and machine learning) as a reliable epistemic tool. Basic arguments of Judea Pearl have been criticized using real examples of DL, but also making a more general epistemic and philosophical analysis. The systemic nature of knowledge, also situated and even biased, has been pointed as the fundamental aspect of a new Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 306 307 308 309 310 311 312 313 doi:10.20944/preprints201907.0110.v1 algorithmic era for the advance of knowledge using DL tools. If formal systems have structural dead-ends like incompleteness, the bioinspired path to machine learning and DL becomes a reliable way [109], [110] to improve, one more time, our algorithmic approach to nature. Finally, thanks to the short case study of epidemiological debates on causality and their use of DL tools, we’ve seen a real implementation case of such epistemic mechanism. The advantages of DL for multi-causal analysis using multi-modal data have been explored as well as some possible critics. 314 315 316 317 318 319 320 Funding: This work has been funded by the Ministry of Science, Innovation and Universities within the State Subprogram of Knowledge Generation through the research project FFI2017-85711-P Epistemic innovation: the case of cognitive sciences. This work is also part of the consolidated research network "Grup d'Estudis Humanístics de Ciència i Tecnologia" (GEHUCT) ("Humanistic Studies of Science and Technology Research Group"), recognised and funded by the Generalitat de Catalunya, reference 2017 SGR 568. 321 322 Acknowledgments: I thank Mr. Isard Boix for his support all throughout this research. Best moments are those without words, and sometimes this lack of meaningfulness entails unique meanings. 323 Conflicts of Interest: The author declares no conflict of interest. 324 References 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 [1] J. W. Heisig, Philosophers of Nothingness : An Essay on the Kyoto School. University of Hawai’i Press, 2001. [2] J. Vallverdú, “The Situated Nature of Informational Ontologies,” in Philosophy and Methodology of Information, WORLD SCIENTIFIC, 2019, pp. 353–365. [3] M. J. Schroeder and J. Vallverdú, “Situated phenomenology and biological systems: Eastern and Western synthesis.,” Prog. Biophys. Mol. Biol., vol. 119, no. 3, pp. 530–7, Dec. 2015. [4] J. Vallverdú and M. J. Schroeder, “Lessons from culturally contrasted alternative methods of inquiry and styles of comprehension for the new foundations in the study of life,” Prog. Biophys. Mol. Biol., 2017. [5] A. Carstensen, J. Zhang, G. D. Heyman, G. Fu, K. Lee, and C. M. Walker, “Context shapes early diversity in abstract thought,” Proc. Natl. Acad. Sci., pp. 1–6, Jun. 2019. [6] A. Norenzayan and R. E. Nisbett, “Culture and causal cognition,” Curr. Dir. Psychol. Sci., vol. 9, no. 4, pp. 132–135, 2000. [7] R. E. Nisbet, The Geography of Thought: How Asians and Westerners Think Differently...and Why: Richard E. Nisbett: 9780743255356: Amazon.com: Books. New York: Free Press (Simon & Schuster, Inc.), 2003. [8] J. Vallverdú, Bayesians versus frequentists : a philosophical debate on statistical reasoning. Springer, 2016. [9] D. Casacuberta and J. Vallverdú, “E-science and the data deluge.,” Philos. Psychol., vol. 27, no. 1, pp. 126–140, 2014. [10] J. Vallverdú Segura, “Computational epistemology and e-science: A new way of thinking,” Minds Mach., vol. 19, no. 4, pp. 557–567, 2009. [11] J. Pearl, “Theoretical Impediments to Machine Learning,” arXiv Prepr., 2018. [12] J. Pearl and D. Mackenzie, The book of why : the new science of cause and effect. Basic Books, 2018. [13] S. Hillary and S. Joshua, “Garbage in, garbage out (How purportedly great ML models can be screwed up by bad data),” in Proceedings of Blackhat 2017, 2017. [14] I. Askira Gelman, “GIGO or not GIGO,” J. Data Inf. Qual., 2011. [15] J. Moural, “The Chinese room argument,” in John searle, 2003. [16] H. L. Dreyfus, What Computers Can’t Do: A Critique of Artificial Reason. 1972. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 [17] doi:10.20944/preprints201907.0110.v1 H. L. Dreyfus, S. E. Drey-fus, and L. a. Zadeh, “Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer,” IEEE Expert, vol. 2, no. 2, pp. 237–264, 1987. [18] W. N. Price, “Big data and black-box medical algorithms,” Sci. Transl. Med., 2018. [19] J. Vallverdú, “Patenting logic, mathematics or logarithms? The case of computer-assisted proofs,” Recent Patents Comput. Sci., vol. 4, no. 1, pp. 66–70, 2011. [20] F. Gagliardi, “The necessity of machine learning and epistemology in the development of categorization theories: A case study in prototype-exemplar debate,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009. [21] T. Everitt, R. Kumar, V. Krakovna, and S. Legg, “Modeling AGI Safety Frameworks with Causal Influence Diagrams,” Jun. 2019. [22] M. Susser and E. Susser, “Choosing a future for epidemiology: II. From black box to Chinese boxes and eco-epidemiology,” Am. J. Public Health, vol. 86, no. 5, pp. 674–677, 1996. [23] A. Morabia, “Hume, Mill, Hill, and the sui generis epidemiologic approach to causal inference.,” Am. J. Epidemiol., vol. 178, no. 10, pp. 1526–32, Nov. 2013. [24] T. C. Hales, “Historical overview of the Kepler conjecture,” Discret. Comput. Geom., vol. 36, no. 1, pp. 5– 20, 2006. [25] N. Robertson, D. P. Sanders, P. D. Seymour, and R. Thomas, “The Four–Colour Theorem,” J. Comb. Theory, Ser. B, vol. 70, pp. 2–44, 1997. [26] Yarin Gal, Y. Gal, and Yarin Gal, “Uncertainty in Deep Learning,” Phd Thesis, 2017. [27] A. G. Kendall, “Geometry and Uncertainty in Deep Learning for Computer Vision,” 2017. [28] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?,” Mar. 2017. [29] J. Piironen and A. Vehtari, “Comparison of Bayesian predictive methods for model selection,” Stat. Comput., 2017. [30] N. G. Polson and V. Sokolov, “Deep learning: A Bayesian perspective,” Bayesian Anal., 2017. [31] Y. Bengio and Y. Lecun, “Scaling Learning Algorithms towards AI To appear in ‘ Large-Scale Kernel Machines ’,” New York, 2007. [32] J. P. Cunningham and B. M. Yu, “Dimensionality reduction for large-scale neural recordings,” Nat. Neurosci., vol. 17, no. 11, pp. 1500–1509, 2014. [33] S. Bengio and Y. Bengio, “Taking on the curse of dimensionality in joint distributions using neural networks,” IEEE Trans. Neural Networks, 2000. [34] S. Ben-David, P. Hrubeš, S. Moran, A. Shpilka, and A. Yehudayoff, “Learnability can be undecidable,” Nat. Mach. Intell., 2019. [35] C. B. Anagnostopoulos, Y. Ntarladimas, and S. Hadjiefthymiades, “Situational computing: An innovative architecture with imprecise reasoning,” J. Syst. Softw., vol. 80, no. 12 SPEC. ISS., pp. 1993– 2014, 2007. [36] K. Friston, “Functional integration and inference in the brain,” Progress in Neurobiology, vol. 68, no. 2. pp. 113–143, 2002. [37] S. Schirra, “Approximate decision algorithms for approximate congruence,” Inf. Process. Lett., vol. 43, no. 1, pp. 29–34, 1992. [38] L. Magnani, “AlphaGo, Locked Strategies, and Eco-Cognitive Openness,” Philosophies, 2019. [39] A. A. Baird and J. A. Fugelsang, “The emergence of consequential thought: Evidence from neuroscience,” Philosophical Transactions of the Royal Society B: Biological Sciences. 2004. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 doi:10.20944/preprints201907.0110.v1 [40] T. Gomila, Verbal Minds : Language and the Architecture of Cognition. Elsevier Science, 2011. [41] J. Y. Halpern, Reasoning About Uncertainty, vol. 2003. 2003. [42] N. Van Hoeck, “Cognitive neuroscience of human counterfactual reasoning,” Front. Hum. Neurosci., 2015. [43] J. Pearl, “The algorithmization of counterfactuals,” Ann. Math. Artif. Intell., 2011. [44] D. Tulodziecki, “Underdetermination,” in The Routledge Handbook of Scientific Realism, 2017. [45] L. Laudan, “Demystifying Underdetermination,” in Philosophy of Science. The Central Issues, 1990. [46] D. Turner, “Local Underdetermination in Historical Science*,” Philos. Sci., 2005. [47] D. Lewis, “Counterfactual Dependence and Time’s Arrow,” Noûs, 2006. [48] M. Ramachandran, “A counterfactual analysis of causation,” Mind, 2004. [49] J. Vallverdú and V. C. Müller, Blended cognition : the robotic challenge. Springer, 2019. [50] A. Rzhetsky, “The Big Mechanism program: Changing how science is done,” in CEUR Workshop Proceedings, 2016. [51] A. Wodecki et al., “Explainable Artificial Intelligence ( XAI ) The Need for Explainable AI,” 2017. [52] T. Ha, S. Lee, and S. Kim, “Designing Explainability of an Artificial Intelligence System,” 2018. [53] R. Kurzweil, “The Singularity Is Near: When Humans Transcend Biology,” Book, vol. 2011. p. 652, 2005. [54] R. V Yampolskiy, “Leakproofing the Singularity - Artificial Intelligence Confinement Problem,” J. Conscious. Stud., vol. 19, no. 1–2, pp. 194–214, 2012. [55] J. Vallverdú, “The Emotional Nature of Post-Cognitive Singularities,” S. A. Victor Callaghan, James Miller, Roman Yampolskiy, Ed. Springer Berlin Heidelberg, 2017, pp. 193–208. [56] R. D. King et al., “The automation of science,” Science (80-. )., 2009. [57] A. Sparkes et al., “Towards Robot Scientists for autonomous scientific discovery,” Automated Experimentation. 2010. [58] A. Iqbal, R. Khan, and T. Karayannis, “Developing a brain atlas through deep learning,” Nat. Mach. Intell., vol. 1, no. 6, pp. 277–287, Jun. 2019. [59] D. D. Bourgin, J. C. Peterson, D. Reichman, T. L. Griffiths, and S. J. Russell, “Cognitive Model Priors for Predicting Human Decisions.” [60] J. Vallverdu, “Re-embodying cognition with the same ‘biases’?,” Int. J. Eng. Futur. Technol., vol. 15, no. 1, pp. 23–31, 2018. [61] A. Leukhin, M. Talanov, J. Vallverdú, and F. Gafarov, “Bio-plausible simulation of three monoamine systems to replicate emotional phenomena in a machine,” Biologically Inspired Cognitive Architectures, 2018. [62] J. Vallverdú, M. Talanov, S. Distefano, M. Mazzara, A. Tchitchigin, and I. Nurgaliev, “A cognitive architecture for the implementation of emotions in computing systems,” Biol. Inspired Cogn. Archit., vol. 15, pp. 34–40, 2016. [63] H. Taniguchi, H. Sato, and T. Shirakawa, “A machine learning model with human cognitive biases capable of learning from small and biased datasets,” Sci. Rep., 2018. [64] B. M. Lake, R. R. Salakhutdinov, and J. B. Tenenbaum, “One-shot learning by inverting a compositional causal process,” in Advances in Neural Information Processing Systems 27 (NIPS 2013), 2013. [65] Q. Xu, T. Mytkowicz, and N. S. Kim, “Approximate Computing: A Survey,” IEEE Des. Test, vol. 33, no. 1, pp. 8–22, 2016. [66] C.-Y. Chen, J. Choi, K. Gopalakrishnan, V. Srinivasan, and S. Venkataramani, “Exploiting approximate computing for deep learning acceleration,” in 2018 Design, Automation & Test in Europe Conference & Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 doi:10.20944/preprints201907.0110.v1 Exhibition (DATE), 2018, pp. 821–826. [67] J. Choi and S. Venkataramani, “Approximate Computing Techniques for Deep Neural Networks,” in Approximate Circuits, Cham: Springer International Publishing, 2019, pp. 307–329. [68] M. A. Gianfrancesco, S. Tamang, J. Yazdany, and G. Schmajuk, “Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data,” JAMA Internal Medicine. 2018. [69] T. Kliegr, Š. Bahník, and J. Fürnkranz, “A review of possible effects of cognitive biases on interpretation of rule-based machine learning models,” Apr. 2018. [70] B. Shneiderman, “Opinion: The dangers of faulty, biased, or malicious algorithms requires independent oversight,” Proc. Natl. Acad. Sci., 2016. [71] T. Narendra, A. Sankaran, D. Vijaykeerthy, and S. Mani, “Explaining Deep Learning Models using Causal Inference,” Nov. 2018. [72] M. Nauta, D. Bucur, C. Seifert, M. Nauta, D. Bucur, and C. Seifert, “Causal Discovery with Attention-Based Convolutional Neural Networks,” Mach. Learn. Knowl. Extr., vol. 1, no. 1, pp. 312–340, Jan. 2019. [73] W. Ahrens and I. Pigeot, Handbook of epidemiology: Second edition. 2014. [74] B. MacMahon and T. F. Pugh, Epidemiology; principles and methods. Little, Brown, 1970. [75] R. Lipton and T. Ødegaard, “Causal thinking and causal language in epidemiology: it’s in the details.,” Epidemiol. Perspect. Innov., vol. 2, p. 8, 2005. [76] P. Vineis and D. Kriebel, “Causal models in epidemiology: Past inheritance and genetic future,” Environmental Health: A Global Access Science Source. 2006. [77] J. S. Kaufman and C. Poole, “Looking Back on ‘Causal Thinking in the Health Sciences,’” Annu. Rev. Public Health, 2002. [78] J. P. Vandenbroucke, A. Broadbent, and N. Pearce, “Causality and causal inference in epidemiology: The need for a pluralistic approach,” Int. J. Epidemiol., vol. 45, no. 6, pp. 1776–1786, 2016. [79] M. Susser, “Causal thinking in the health sciences concepts and strategies of epidemiology,” Causal Think. Heal. Sci. concepts, 1973. [80] N. Krieger, “Epidemiology and the web of causation: Has anyone seen the spider?,” Soc. Sci. Med., vol. 39, no. 7, pp. 887–903, 1994. [81] M. Susser, “What is a cause and how do we know one? a grammar for pragmatic epidemiology,” American Journal of Epidemiology. 1991. [82] C. Buck, “Popper’s philosophy for epidemiologists,” Int. J. Epidemiol., 1975. [83] S. Greenland, J. Pearl, and J. M. Robins, “Causal diagrams for epidemiologic research,” Epidemiology, 1999. [84] D. Gillies, “Judea Pearl Causality: Models, Reasoning, and Inference, Cambridge: Cambridge University Press, 2000,” Br. J. Philos. Sci., 2001. [85] R. R. Tucci, “Introduction to Judea Pearl’s Do-Calculus,” Apr. 2013. [86] S. Greenland, J. Pearl, and J. M. Robins, “Causal diagrams for epidemiologic research.,” Epidemiology, vol. 10, no. 1, pp. 37–48, 1999. [87] T. J. Vanderweele and J. M. Robins, “Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect,” Am. J. Epidemiol., 2007. [88] C. Boudreau, E. C. Guinan, K. R. Lakhani, and C. Riedl, “The Novelty Paradox & Bias for Normal Science: Evidence from Randomized Medical Grant Proposal Evaluations,” 2016. [89] R. M. Daniel, B. L. De Stavola, and S. Vansteelandt, “Commentary: The formal approach to quantitative Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 doi:10.20944/preprints201907.0110.v1 causal inference in epidemiology: Misguided or misrepresented?,” International Journal of Epidemiology, vol. 45, no. 6. pp. 1817–1829, 2016. [90] J. P. A. Ioannidis, “Randomized controlled trials: Often flawed, mostly useless, clearly indispensable: A commentary on Deaton and Cartwright,” Soc. Sci. Med., vol. 210, pp. 53–56, Aug. 2018. [91] J. P. A. Ioannidis, “The proposal to lower P value thresholds to .005,” JAMA - Journal of the American Medical Association. 2018. [92] A. Krauss, “Why all randomised controlled trials produce biased results,” Ann. Med., vol. 50, no. 4, pp. 312–322, May 2018. [93] I. Shrier and R. W. Platt, “Reducing bias through directed acyclic graphs,” BMC Medical Research Methodology. 2008. [94] R. Doll and A. B. Hill, “Smoking and Carcinoma of the Lung,” Br. Med. J., vol. 2, no. 4682, pp. 739–748, Sep. 1950. [95] R. A. FISHER, “Cancer and Smoking,” Nature, vol. 182, no. 4635, pp. 596–596, Aug. 1958. [96] R. A. FISHER, “Lung Cancer and Cigarettes?,” Nature, vol. 182, no. 4628, pp. 108–108, Jul. 1958. [97] B. Wynne, “When doubt becomes a weapon,” Nature, vol. 466, no. 7305, pp. 441–442, 2010. [98] C. Pisinger, N. Godtfredsen, and A. M. Bender, “A conflict of interest is strongly associated with tobacco industry–favourable results, indicating no harm of e-cigarettes,” Prev. Med. (Baltim)., vol. 119, pp. 124–131, Feb. 2019. [99] T. Grüning, A. B. Gilmore, and M. McKee, “Tobacco industry influence on science and scientists in Germany.,” Am. J. Public Health, vol. 96, no. 1, pp. 20–32, Jan. 2006. [100] N. Ramarajan, R. A. Badwe, P. Perry, G. Srivastava, N. S. Nair, and S. Gupta, “A machine learning approach to enable evidence based oncology practice: Ranking grade and applicability of RCTs to individual patients.,” J. Clin. Oncol., vol. 34, no. 15_suppl, pp. e18165–e18165, May 2016. [101] M. Salathé, “Digital epidemiology: what is it, and where is it going?,” Life Sci. Soc. Policy, vol. 14, no. 1, p. 1, Dec. 2018. [102] E. Velasco, “Disease detection, epidemiology and outbreak response: the digital future of public health practice,” Life Sci. Soc. Policy, vol. 14, no. 1, p. 7, Dec. 2018. [103] C. Bellinger, M. S. Mohomed Jabbar, O. Zaïane, and A. Osornio-Vargas, “A systematic review of data mining and machine learning for air pollution epidemiology,” BMC Public Health. 2017. [104] S. Weichenthal, M. Hatzopoulou, and M. Brauer, “A picture tells a thousand…exposures: Opportunities and challenges of deep learning image analyses in exposure science and environmental epidemiology,” Environment International. 2019. [105] G. Eraslan, Ž. Avsec, J. Gagneur, and F. J. Theis, “Deep learning: new computational modelling techniques for genomics,” Nat. Rev. Genet., vol. 20, no. 7, pp. 389–403, Jul. 2019. [106] M. J. Cardoso et al., Eds., Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol. 10553. Cham: Springer International Publishing, 2017. [107] C. Kreatsoulas and S. V Subramanian, “Machine learning in social epidemiology: Learning from experience.,” SSM - Popul. Heal., vol. 4, pp. 347–349, Apr. 2018. [108] “In the age of machine learning randomized controlled trials are unethical.” [Online]. Available: https://towardsdatascience.com/in-the-age-of-machine-learning-randomized-controlled-trials-are-unet hical-74acc05724af. [Accessed: 03-Jul-2019]. [109] T. F. Drumond, T. Viéville, and F. Alexandre, “Bio-inspired Analysis of Deep Learning on Not-So-Big Data Using Data-Prototypes,” Front. Comput. Neurosci., vol. 12, p. 100, Jan. 2019. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 July 2019 523 524 525 [110] doi:10.20944/preprints201907.0110.v1 K. Charalampous and A. Gasteratos, “Bio-inspired deep learning model for object recognition,” in 2013 IEEE International Conference on Imaging Systems and Techniques (IST), 2013, pp. 51–55.