Event Report | Evidence: An Interdisciplinary Conversation about Knowing and Certainty

Event Report | Evidence: An Interdisciplinary Conversation about Knowing and Certainty

April 21-22, 2017, Jerome Greene Hall 103, Columbia Law School

EVIDENCE began with an introduction and insights from the conference directors: Pamela Smith (Seth Low Professor of History, Columbia University), Stuart Firestein (Professor of Biological Studies, Columbia University), and Jeremy Kessler (Associate Professor of Law, Columbia University). Professor Smith outlined the key research questions and goals that guided the conference organization before ushering in the first session.

Session 1:   Reproducibility
Session 2:   Regulatory Policy
Session 3:   Humanities
Session 4:   Federal Funding
Session 5:   Medicine and Public Health
Session 6:   Journalism
Keynote:     Paradox of Evidence
Session 7:   Philosophy
Session 8:   Law 
Session 9:   Economics
Session 10: History and Social Sciences
Session 11: Big Data
Keynote:     Expert Evidence


Veronica Vieland (Pediatrics and Statistics, Ohio State University), Niall Bolger (Psychology, Columbia University), and Shai Silberberg (Extramural Research Program, National Institute of Neurological Disorders and Strokes at the National Institutes of Health) served as speakers during the first session, moderated by chair Hasok Chang (History and Philosophy of Science, Cambridge University), and joined by panelists Jeremy Kessler (Law, Columbia University) and Jennifer Manly (Neurology, Columbia University). Professor Vieland opened by highlighting the importance of reproducibility within research, but noted the disconnect between scientists and statisticians in their approaches can create difficulties. Her speech focused mainly on ascertainment bias and how it might obscure the distribution. In her closing remarks, Professor Vieland emphasized that the connection between evidence and reproducibility is tangential and needs to be handled with care. Niall Bolger then traced the role of statistical evidence through social psychology, which has been transformed within the last five years. With every published paper, authors are now expected to include data and computer coding to facilitate secondary analysis and cross checking. While psychology was quick to adapt the use of statistical evidence, the discipline continues to struggle with its interpretations. By emphasizing any deviation from the mean, the statistical evidence can reduce populations to an average without taking into account individual experience. To prove his point, Niall Bolger described one of his own experiments, which he recreated a few days later, and found a lot of individual variability. The final speaker of the morning session, Shai Silberberg, informed the audience that he preferred not to use the term “reproducibility,” because it masked the various reasons why certain experiments could not be replicated. Instead, he explained why “rigor and transparency” in scientific research should take precedence over reproducibility. Addressing the speakers, panelists Jeremy Kessler and Jennifer Manly raised important question about reproducibility as a reliable form of evidence. The panel closed with a discussion about data sharing and its best practices, in which participants weighed the costs and benefits of allowing open access to raw evidence.


This session explored the models of regulating scientific standards of evidence within the governmental paradigm. Jennifer Mnookin (Law, University of California at Los Angeles) chaired the session, while Wendy Wagner (Law, University of Texas at Austin), David Adelman (Law, University of Texas at Austin), and Naomi Schrag (Office of the Executive Vice President for Research, Columbia University) spoke alongside panel members Paul Appelbaum (Psychiatry and Law, Columbia University) and Frances Champagne (Psychology, Columbia University.) Analyzing five decades of standards for scientific evidence in regulatory policy in the United States, Wendy Wagner noted a shift from internal standards guided by scientists to external standards that are influenced by non-scientists. This shift has propagated the notion that scientists are biased and need external standards imposed to ensure transparency. Focusing on the environmental domain of regulatory policy, David Adelman concluded that regulation of science as a coherent entity has its own challenges. Professor Adelman attributed these difficulties to the enormous variability in work, limitation and cost of data collection, and gaps in scientific knowledge. The final speaker of the second session, Naomi Schrag, focused on the area of research misconduct and offered an example of an alleged fabrication of data in a Ph.D. thesis, which highlighted the importance of inclusion of raw data in research projects. The session concluded with remarks from panelists Frances Champagne and Paul Appelbaum. Deliberating about the regulatory practices related to environmental health, Frances Champagne questioned the existing standards and their interpretation by regulators and policy makers. Assessing the impact of legal regulation on mental health and health in general, Paul Appelbaum added that evidence should influence policy regulation and information, which is not always the case as political considerations imposed on the regulatory process will almost always supersede evidence.


Within this session, speakers Barbara Shapiro (Rhetoric, University of California at Berkeley, Emerita), Jenny Davidson (English and Comparative Literature, Columbia University), were joined by panelists Stuart Firestein (Biology, Columbia University) and Jeffrey Fagan (Law, Columbia University), and chair Nick Lemann (Journalism, Columbia University). Barbara Shapiro pointed to the way in which early natural-historical investigations borrowed from the European law in their examination and categorization of evidence (especially witness statements), while Jenny Davidson suggested they shared a common language in part because it was often the same people practicing across these fields in the seventeenth, eighteenth, and early nineteenth centuries. Both humanities and sciences seek to create “understanding” (rather than merely data/information/knowledge), therefore requiring scholars to narrativize their works. Stuart Firestein showed how witness statements were often doubted by neuroscientists because the brain tries to construct plausible narratives to connect disjointed events and actions within environments. Jeffrey Fagan provided a legal perspective on the issue of evidence in the humanities. In particular, he introduced the pros and cons of narrative evidence widely used in the field of history and deliberated about its use and application today.


Matt Connelly (History, Columbia University) chaired the session on federal funding alongside speakers Frances Champagne (Psychology, Columbia University), Stuart Firestein (Biology, Columbia University), and Shai Silberberg (Extramural Research Program, National Institute of Neurological Disorders and Strokes at the National Institutes of Health), and panelists Niall Bolger (Psychology, Columbia University) and Kristen Underhill (Law, Columbia University.) Frances Champagne explained how research funding proposals are evaluated on five factors: significance (the importance and impact of the work), innovation (creativity of proposed approach), approach (the crux of the proposal – the experimental design), investigators (and their previous record of funding and publication), and the environment (prior reputation of the institution and lab, including available equipment). Addressing the intricate nature of federal funding and research evaluation, Stuart Firestein argued that, generally, universities place more emphasis on the importance of the actual funding received than the knowledge gained from research projects, highlighting a problem with the current funding system. As the final speaker of the session, Shai Silberberg echoed his earlier comments about the importance of rigor and transparency in scientific research. In his closing statement, Shai Silberberg outlined the correlation between the expectation to publish influential papers with high-impact findings within the first few years of receiving grant funding and rushed research designs. Panelists Niall Bolger and Kristen Underhill joined the conversation with their questions and comments about evidence in federal funding. Niall Bolger argued that the “all-or-nothing” system of federal funding should be replaced with a model that distributes the overall available funding to more research projects and includes funding for administrative support. Kristen Underhill concluded the discussion with her comments about transparency of NIH proposals, and raised questions about the priorities of scientists compared to the priorities of the general public in future scientific research projects funded by NIH.


Speakers Paul Appelbaum (Psychiatry and Law, Columbia University) and Kristen Underhill (Law, Columbia University), chair Pamela Smith (History, Columbia University), and panelists Kavita Sivaramakrishnan (Sociomedical Sciences, Columbia University) and Wendy Wagner (Law, University of Texas at Austin) participated in the sixth session of the two-day conference. Paul Appelbaum discussed the role of evidence as a validating criteria for updating and adapting the Diagnostic and Statistical Manual of Mental Disorders (DSM). Currently there are eleven validators divided into three types, which are used as a method to exercise judgement on additions to the DSM. However, the threshold of sufficient evidence can be subjective and validators are often in conflict. Meanwhile, Kristen Underhill spoke about the use of systematic review methods in public health research and the challenges of devising an aggregate metric for meta analysis. Using one of her own research projects as an example, Kristen Underhill demonstrated potential difficulties with incorporating quantitative and qualitative metrics into a single experiment. Kavita Sivaramakrishnan responded with several questions about the role of translation, role of history,  and role of trust between public, collectors of data, and government agencies. While Wendy Wagner focused on the pros and cons of self-policing scientific communities and the potential incentives to keep the discipline of medicine and public health internally regulated.


Led by chair Matteo Farinella (Presidential Scholar in Society and Neuroscience, Columbia University), speakers Meehan Crist (Writer-in-Residence, Biology, Columbia University) and Nick Lemann (Journalism and Dean Emeritus, Columbia University), and panelists Jenny Davidson (English and Comparative Literature, Columbia University) and Shai Silberberg (Extramural Research Program, National Institute of Neurological Disorders and Strokes at the National Institutes of Health) discussed the role of evidence in journalism, especially within a 21st century context. Nick Lemann stated that the discussion of what counts as evidence is in its infancy in journalism compared to other fields. Journalists must weigh conflicting priorities of informing the public and generating interest in the story. As the general public increasingly look to journalism to justify their own views, evidence can become malleable in the fight to increase viewership. Meanwhile, journalists also face time pressure to release stories quickly, often as the evidence is still developing.  Both panelists Jenny Davidson and Shai Silberberg questioned how journalists could convey uncertainty in narratives while also resisting simplification of the narrative. In particular, they addressed the issue of oversimplification of scientific ideas in journalism, and how to maintain integrity and complexity of an issue with the inherent time constraints on publishing.


Keynote Speaker Annie Duke, a graduate of Columbia University and a World Series of Poker Champion, unpacked the feedback (evidence) mechanisms in poker and its role in decision making. Poker, which served as a basis for the creation of game theory, provides an excellent environment for gathering data. Learning occurs when there is a lot of feedback (evidence) tied closely in time to decisions and actions. Based on this statement, poker appears to be a fitting environment to learn, as the game offers many opportunities for quick feedback. This is a contrast to everyday environments, which have much longer feedback cycles. The upside of so much evidence is that it becomes apparent very quickly that there is an “up and down equity” to even the smallest executional decision. However due to the large quantities of quickly gathered data, there is little time to recover emotionally between each piece of evidence received or deeply examine the causal connections. The evidence is also altered by a self-serving bias that will generally supersede factual analysis. Annie Duke concluded her keynote speech with examples of strategies used by professional poker player and explained the cognitive process behind each strategy.


Chair Stuart Firestein (Biology, Columbia University), speakers Hasok Chang (History and Philosophy of Science, Cambridge University), John Krakauer (Neurology, Johns Hopkins University), and Christia Mercer (Philosophy, Columbia University), and panelists Dan Kahan (Law and Psychology, Yale University) and Veronica Vieland (Pediatrics and Statistics, Ohio State University) explored the challenge of defining evidence in philosophy. Hasok Chang opened the session with his talk and maintained that the evidential value of a body of data depends on its provenance. The way we observe something can have a different evidential value for the same hypothesis. John Krakauer offered his perspective on the use of neuroscience to justify cultural and philosophical claims, and Christia Mercer introduced the concept of “direction of fit” and its use as evidence in philosophy. During the discussion, participants debated the meaning of “evidence” and the pros and cons of adopting a clear definition of the term in philosophy. By keeping the terminology ambiguous, the idea of evidence can be adapted according to context, the hypothesis, and the planned use of the results. Hasok Chang was in favor of “cleaning up” the language of evidence, while other participants contemplated other possible definitions. The discussion concluded without a clear consensus on the definition of evidence in philosophy.


Speakers Jeffrey Fagan (Law, Columbia University) and Anna Lvovsky (Academic Fellow, Columbia Law School), alongside chair Veronica Vieland (Pediatrics and Statistics, Ohio State University) and panelists Nick Lemann (Journalism, Columbia University), Barbara Shapiro (Rhetoric, University of California at Berkeley, Emerita), and Jeremy Kessler (Law, Columbia University) offered their perspectives on the use of evidence within the courtroom. Jeffrey Fagan opened the second session of the morning with his talk about the use of scientific conventions in law. In particular, Professor Fagan addressed the issue of hypothesis testing in law and evaluating evidence with type I and type II errors in mind. Jeffrey Fagan thinks we need to allow more type II errors to allow for fewer type I errors. Anna Lvovsky, who trained as a historian, explained how standards of evidence change in the courtroom and judicial reasoning. She provided a historical overview about the use of police testimony as evidence in the courtroom, which led to the creation of “police expertise” that has now permeated many aspects of law. The discussion raised many questions about “expertise” and “evidence” in law. There is a high margin for error within jury trials as jurors often struggle to understand the law and standards of evidence as communicated to them by the judges and lawyers. The law struggles to adapt scientific hypothesis testing into the courtroom context. There is no legal equivalent standards of proof because facts are always in dispute. Additionally, probable cause has different connotations and standards in different times, places, and contexts. How does the notion of preponderance of the evidence in a courtroom relate to the scientific standards of evidence? Currently, jurors are asked to navigate these questions and weigh complex evidence against complicated standards, which can lead to confusion and error in the courtroom.


Chaired by Niall Bolger (Psychology, Columbia University), the session included speakers Alessandra Casella (Economics, Columbia University) and Suresh Naidu (Economics, Columbia University) and panelists Hasok Chang (History and Philosophy of Science, Cambridge University) and Matt Connelly (History, Columbia University), who explored the role of evidence within the field of economics. The disciplines of economics and physics face similar challenges in experimental design and evidence evaluation, both done within the confines of controlled conditions. Professor Casella stated how forms of data in economics included statistical analysis, mathematical logic of models, and randomized control trials. According to Suresh Naidu, the main issues with collecting and interpreting data within these conditions is exploring measurements and causality. Using the example of the effect of minimum wage on employment (a long held debate between economists), Suresh Naidu stated that data needed to be limited in order to create control groups. While studying the possible correlation between minimum wage, labor unions, and decreased employment, he used counties located near state borders as a control group for comparing the effects of minimum wage change. By using smaller focus sets of evidence, economists can better control the boundaries of their experiments and studies.


Chair John Krakauer (Neurology, Johns Hopkins University) was joined by speakers Kavita Sivaramakrishnan (Sociomedical Sciences, Columbia University) and Zoe Crossland (Anthropology, Columbia University) and panelists David Adelman (Law, University of Texas at Austin) and Pamela Smith (History, Columbia University) to discuss evidence within social science fields and the discipline of history. Within anthropology, “testimony” is a common metaphor for evidence and used within the trope of a “speaking corpse” in forensic anthropology. These terms work to substantiate the claim that facts speak for themselves within the discipline, removing the role of the anthropologist in evidence interpretation. Additionally within social sciences, there is a struggle between the particularity and universality of data. Statistical and numerical analysis is equated with scientific rigor, and consequently  endowed with a more universal application and weight. Zoe Crossland also raised the question of who and what can establish evidentiary validity, mentioning the divide between community-defined metrics and metrics imposed on a community by outside experts. How evidence is defined within a study or a discipline legitimizes certain forms of data, influencing knowledge production and analysis.


In the final session of EVIDENCE, speakers Matt Connelly (History, Columbia University) and David Madigan (Statistics, Columbia University), chair Suresh Naidu (Economics, Columbia University), and panelists Stuart Firestein (Biology, Columbia University) and Alessandra Casella (Economics, Columbia University) questioned how to select and evaluate qualitative data in historical research. David Madigan noted generating reliable evidence from large scale data sources was not an easy task, but a growing aspect of historical analysis. Major research journals now use these large datasets to look for causality. However, it is very common to find contradictory interpretations of the evidence with different analysts uncovering completely different results. David Madigan acknowledged that useful analysis and evidence was being generated from these datasets, but academics are “wildly overstating our confidence in the results.” The role of big data in collecting and interpreting data has earned is place in historical analysis, but still requires improvements.


In EVIDENCE’s second keynote event, Jennifer Mnookin (Law, University of California at Los Angeles) explored the mismatch between widely held cultural perceptions of forensic science and its epistemological authority. Despite being sold in courtrooms as having a zero error rate, these methods of analyzing evidence are subjective and rarely empirically tested. She described fingerprint analysis as a “leap of faith” more than a numerical measurement of data. However, despite sustained criticism about the use of such so-called evidence within the legal system, there has been very little success in implementing change. A 2009 report from the National Academy of Science (NAS) titled “Strengthening Forensic Science in the United States,” chastised current evidentiary practices, but refrained from explicitly offering recommendations to the court system. However, the NAS report did generate press interest and ushered in a new era of debate about what kinds of scientific testing should be used to substantiate forensic sciences and who should be the validating authority. The current system of forensic science turns courtrooms into adversarial testing spaces for evidence.

@ 2018 The Center of Science and Society at Columbia University
| Contact Us | Non-Discrimination | |