How to Find Productive Causes in Big Data: An Information Transmission Account
It has been argued that the use of big data in scientific research obviates the need for causal knowledge in making sound predictions and interventions. Whilst few accept that this claim is true, there is an ongoing discussion about what effect, if any, big data has on scientific methodology and, in particular, the search for causes. One response has been to show that the automated analysis of big data by a computer program can be used to find causes in addition to mere correlations. However, up until now it has only been demonstrated how this can be achieved with respect to difference-making causes. Yet it is widely acknowledged that scientists need evidence of both “difference-making” and “production” in order to infer a genuine causal link. This paper fills in the gap by outlining how computer-assisted discovery in big data can find productive causes. This is achieved by developing an inference rule based on a little-known causal process theory called the information transmission account.
Anderson C. (2008), "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete," Wired Magazine, June 23. https://www.wired.com/2008/06/pb-theory.
Aronson J. (1971), "On the Grammar of Cause," Synthese 22(3-4), 414-430. https://doi.org/10.1007/BF00413436
Budhathoki K., Vreeken J. (2018), "ORIGO: Causal Inference by Compression," Knowledge and Information Systems 56(2), 285-307. https://doi.org/10.1007/s10115-017-1130-5
Canali S. (2016), "Big Data, Epistemology and Causality: Knowledge in and Knowledge out in EXPOsOMICS," Big Data and Society 3(2), 1-11. https://doi.org/10.1177/2053951716669530
Chadeau-Hyam M., Athersuch T. J., Keun H. C., De Iorio M., Ebbels T. M., Jenab M., Sacerdote C., Bruce S. J., Holmes E., Vineis P. (2011), "Meeting-in-the-Middle Using Metabolic Profiling — A Strategy for the Identification of Intermediate Biomarkers in Cohort Studies," Biomarkers 16(1), 83-88. https://doi.org/10.3109/1354750X.2010.533285
Clarke B., Gillies D., Illari P., Russo F., Williamson J. (2013), "The Evidence that Evidence-Based Medicine Omits," Preventative Medicine 57(6), 745-747. https://doi.org/10.1016/j.ypmed.2012.10.020
Clarke B., Gillies D., Illari P., Russo F., Williamson J. (2014), "Mechanisms and the Evidence Hierarchy," Topoi 33(2), 339-360. https://doi.org/10.1007/s11245-013-9220-9
Collier J. (1999), "Causation is the Transfer of Information," Australasian Studies in History and Philosophy of Science 14, 215-245. https://doi.org/10.1007/978-94-015-9229-1_18
Collier J. (2010), Information, Causation and Computation [in:] Information and Computation: Essays on Scientific and Philosophical Understanding of Foundations of Information and Computation, G. Crnkovic, M. Burgin (eds.), London: World Scientific, 89-106.
Dowe P. (2000), Physical Causation, Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511570650
Earman J. (1984), Laws of Nature: The Empiricist Challenge [in:] D. M. Armstrong, R. J. Bogdan (ed.), Dordrecht: D. Reidel Publishing Company, 191-223. https://doi.org/10.1007/978-94-009-6280-4_8
Fair D. (1979), "Causation and the Flow of Energy," Erkenntnis 14(3), 219-250. https://doi.org/10.1007/BF00174894
Gillies D. (2000), Philosophical Theories of Probability, London: Routledge.
Ginsberg J., Mohebbi M. H., Patel R. S., Brammer L., Smolinski M. S., Brilliant L. (2009), "Detecting Influenza Epidemics Using Search Engine Query Data," Nature 457 (19 February), 1012-1014. https://doi.org/10.1038/nature07634
Godfrey-Smith P. (2010), Causal Pluralism [in:] The Oxford Handbook of Causation, H. Beebee, C. Hitchcock, P. Menzies (eds.), Oxford: Oxford University Press, 326-337.
Gray J. (2007), Jim Gray on eScience: A Transformed Scientific Method [in:] T. Hey, S. Tansley, K. Tolle (eds.), The Fourth Paradigm: Data Intensive Scientific Discovery, Redworth: Microsoft, xvii-xxxi.
Grünwald P. (2007), The Minimum Description Length Principle, Cambridge, MA: MIT Press.
Grünwald P., Myung J., Pitt M. (2013), Advances in Minimum Description Length: Theory and Applications, Cambridge, MA: MIT Press.
Hall N. (2004), Two Concepts of Information [in:] Causation and Counterfactuals, J. Collins, N. Hall, L. A. Paul (eds.), Cambridge, MA: MIT Press, 198-222.
Hawking S. (2015), "Stephen Hawking Says He's Solved a Black Hole Mystery, but Physicists Await the Proof," accessed 10.04.2015. http://phys.org/news/2015-08-stephenhawking-black-hole-mystery.html.
Helft M. (2008), "Google Uses Searches to Track Flu's Spread," access 10.04.2015. https://www.nytimes.com/2008/11/12/technology/internet/12flu.html.
Howson C., Urbach P. (1993), Scientific Reasoning, Chicago: Open Court.
Hume D. (1978), A Treatise of Human Nature, L. A. Selby-Bigge, P. H. Nidditch (eds.), Oxford: Clarendon Press.
Iba H., Garis H., Sato T. (1994), Genetic Programming Using a Minimum Description Length Principle [in:] Advances in Genetic Programming, K. Kennear (ed.), Cambridge, MA: MIT Press, 265-284.
Illari P. (2011), "Why Theories of Causality Need Production: An Information-Transmission Account," Philosophy & Technology 24(2), 95-114. https://doi.org/10.1007/s13347-010-0006-3
Illari P., Russo F. (2014), Causality: Philosophical Theory Meets Scientific Practice, Oxford: Oxford University Press.
Illari P., Russo F. (2016), "Information Channels and Biomarkers of Disease," Topoi 35(1), 175-190. https://doi.org/10.1007/s11245-013-9228-1
Klompmaker J., Montagne D. R., Meliefste K., Hoek G., Brunekreef B. (2015), "Spatial Variation of Ultrafine Particles and Black Carbon in Two Cities: Results from a Short-Term Measurement Campaign," Science of the Total Environment 508(1), 266-275. https://doi.org/10.1016/j.scitotenv.2014.11.088
Kolmogorov A. (1965), "Three Approaches to the Definition of the Quantity of Information," Problems of Information Transmission 1(1), 1-7.
Laney D. (2001), "3D Data Management: Controlling Data Volume, Velocity, and Variety," Application Delivery Services 949, 1-4.
Leonelli S. (2014), "What Difference does Quantity Make? On the Epistemology of Big Data in Biology," Big Data and Society 1(1), 1-11. https://doi.org/10.1177/2053951714534395
Li M., Vintanyi P. (1993), An Introduction to Kolmogorov Complexity and its Applications, New York: Springer-Verlag. https://doi.org/10.1007/978-1-4757-3860-5
Manrai A. K., Cui Y., Bushel P. R.,…, Patel C. J. (2017), "Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health," The Annual Review of Public Health 38, 279-94.
Mayer-Schönberger V., Cukier K. (2013), Big Data: A Revolution that will Transform how we Live, Work and Think, London: John Murray.
Pierce J. (1961), An Introduction to Information Theory: Symbols, Signals, and Noise, New York: Dover.
Pietsch W. (2016), "The Causal Nature of Modeling with Big Data," Philosophy and Technology, 29(2), 137-171.
Popper K. (1959), The Logic of Scientific Discovery, New York: Basic Books.
Preskill J. (1992), Do Black Holes Destroy Information? [in:] Black Holes, Membranes, Wormholes, and Superstrings, S. Kalara, D. V. Nanopoulos (eds.), Hackensack, NJ: World Scientific, 1-18.
Ramsey F. (1990), Philosophical Papers, Cambridge: Cambridge University Press.
Russo F., Williamson J. (2007), "Interpreting Causality in the Health Sciences," International Studies in the Philosophy of Science 21(2), 157-170. https://doi.org/10.1080/02698590701498084
Russo F., Williamson J. (2011), "Generic versus Single-Case Causality: The Case of Autopsy Reports," European Journal for the Philosophy of Science, 1(1), 47-69. https://doi.org/10.1007/s13194-010-0012-4
Russo F., Williamson J. (2012), "EnviroGenomarkers: The Interplay Between Mechanisms and Difference Making in Establishing Causal Claims," Medicine Studies 3(4), 249-262. https://doi.org/10.1007/s12376-012-0079-7
Salmon W. (1984), Scientific Explanation and the Causal Structure of the World, Princeton: Princeton University Press.
Shannon C., Weaver W. (1949), The Mathematical Theory of Communication, Urbana: University of Illinois Press.
Solomonoff R. (1964), "A Formal Theory of Inductive Inference: Part I," Information and Control 7(1), 1-22. https://doi.org/10.1016/S0019-9958(64)90223-2
Tan P. J., Dowe D. L. (2003), MML Inference of Decision Graphs with Multi-Way Joins and Dynamic Attributes [in:] AI 2003: Advances in Artificial Intelligence: 16th Australian https://doi.org/10.1007/978-3-540-24581-0_23
Vineis P., Chadeau-Hyam M., Gmuender H.,…, EXPOsOMICS Consortium (2017), "The Exposome in Practice: Design of the EXPOsOMICS Project," International Journal of Hygiene and Environmental Health 220(2), 142-151. https://doi.org/10.1016/j.ijheh.2016.08.001
Wild C. (2005), "Complementing the Genome with an 'Exposome': The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology," Cancer Epidemiology, Biomarkers and Prevention 14(8), 1847-1850.