forum 9: week of 12 March: Fisher and the design of experiments

Fragment of a discussion from Course talk:Phil440A
Jump to: navigation, search
Edited by author.
Last edit: 21:45, 17 March 2012

One individual asked me after class about "the black swan problem", and whether the Fisherian way of testing hypotheses would relate to that. Before I respond to this question, I should clarify that "the black swan problem" is about falsification of hypotheses ( as a 'solution' to the problem of induction - at least that is how I take it. Now that we know somewhat what the black swan problem refers to, my short answer to the question is that Fisher's significance tests do NOT provide a means to falsify hypotheses. Yes, Fisher says that there's a chance at disproving the null hypothesis (and that the null hypothesis could never be proved), but this does NOT (necessarily) mean that the primary objective of significance tests is to falsify the null hypothesis! My longer answer follows, if anyone cares to read it. Deborah Mayo would agree with me that significance tests should NOT be used to falsify hypotheses in the way Popper describes falsification. In fact, I quote verbatim an excerpt from Mayo's 1996 book, Error and the Growth of Experimental Knowledge (page 2):

For Popper, learning is a matter of deductive falsification. In a nutshell, hypothesis H is deductively falsified if H entails experimental outcome O, while in fact the outcome is ~O. What is learned is that H is false. ... We cannot know, however, which of several auxiliary hypotheses is to blame, which needs altering. Often H entails, not a specific observation, but a claim about the probability of an outcome. With such a statistical hypothesis H, the nonoccurrence of an outcome does not contradict H, even if there are no problems with the auxiliaries or the observation.

As such, for a Popperian falsification to get off the ground, additional information is needed to determine (1) what counts as observational, (2) whether

auxiliary hypotheses are acceptable and alternatives are ruled out, and (3) when to reject statistical hypotheses. Only with (1) and (2) does an anomalous observation O falsify hypotheses H, and only with (3) can statistical hypotheses be falsifiable. Because each determination is fallible, Popper and, later, Imre Lakatos regard their acceptance as decisions, driven more by conventions than by experimental evidence.

Mayo later states in the same book, "A genuine account of learning from error shows where and how to justify Popper's 'risky decisions.' The result, let me be clear, is not a filling-in of the Popperian (or the Lakatosian) framework, but a wholly different picture of learning from error, and with it a different program for explaining the growth of scientific knowledge" (page 4, emphasis mine). In other words, Popperian falsification is NOT the right way to think about hypothesis tests! Hypothesis tests, whether Fisherian or from the Neyman-Pearson methodology, are NOT about falsifying statistical claims. Hence, the Fisherian way of testing hypotheses does NOT apply to the black swan problem. I quoted Deborah Mayo's view on Popper's falsification to show that my view against Popper's falsification came from her work. My term paper for PHIL 440 will address how to correctly interpret Fisher's significance tests. I hope this helps. Are there any questions about my answer to this person's question?

04:36, 16 March 2012


Nicole, Does the above chart copy, and the associated text copy that follows below, that are copied from the experiment described on page 248 of the Boeing Advanced Quality Sytem Tools document, satisfy your concept of significance test methods, and the significance test methods of Deborah Mayo, and Sir Ronald A. Fisher

Robust design: testing process parameters Parts in a heat-treat process were experiencing unpredictable growth, causing some parts to grow outside of the specification limits and be rejected as scrap. It was surmised by the engineering team that irregular growth was due to the orientation of the part in the oven and the part’s location in the oven. Since it was desirable to heat treat a maximum number of parts in each oven load, it was important to be able to determine a set of heat-treat processing conditions that would result in minimum growth for heat-treated parts in both a horizontal and vertical orientation, and at both the top and bottom locations in the oven.

Four process factors were identified: hold temperature, dwell time, gas flow rate, and temperature at removal. The team defined two settings for each of the process factors. The experiment used eight runs of the oven, as shown in figure 2.7 (a fractional factorial design, that is, a particular selection of half of the 16 possibilities defined by all combinations of the process factors at two settings). For each oven run, parts were placed at both the top and the bottom of the oven and in both orientations.

The experimental results indicated an unsuspected effect due to oven location, with parts in the bottom of the oven experiencing less growth than those in the top of the oven. The analysis indicated that a particular combination of hold temperature and dwell time would result in part growth that is insensitive (or robust) to part orientation and part location. Furthermore, the experiment indicated that temperature at removal did not affect part growth, leading to the conclusion that parts could be removed from the oven at a higher temperature; thus resulting in savings in run time.

02:13, 17 March 2012

Unless I have access to the analysis of variance (ANOVA) table, I cannot comment on the last paragraph (pertaining to what the results indicated, or the conclusion that was drawn from the experiment). Also, I should mention that ANOVA (e.g., see is a separate technique in itself, distinct from the type of significance tests that Fisher introduced in Chapter 2 of his book (i.e., the reading for this past week). To answer your question, page 248 of the Boeing Advanced Quality System Tools document does not satisfy my concept of significance test methods insofar as it not relating at all to the significance test introduced in my presentation earlier this week. I hope this helps. If you have any more questions pertaining to the example you gave me in the Boeing Advanced Quality System Tools document, then I suggest we do not communicate on this forum but in other ways that will not disturb the focus of the discussions in this course.

20:41, 17 March 2012