forum 9: week of 12 March: Fisher and the design of experiments
Now that my presentation has been finished, I will accept any additional questions anyone may have pertaining to Fisher's work on significance tests and design of experiments! I will be sure to answer your questions to the best of my ability.
May you comment on Fisher's statements page 7, heading 4. The Logic of the Laboratory: "Inductive inference is the only process known to us by which essentially new knowledge comes into the world." And page 8, "Experimental observations are only experience carefully planned in advance, and designed to form a secure basis of new knowledge; that is, they are systematically related to the body of knowledge already acquired, and the results are deliberately observed, and put on record accurately."
In reply to the concern over the need to know in advance all possibilities in order to learn something from an experiment: I think the problem might lie in the fact that in the paper there is no distinction between 'learning' and 'contributing to scientific knowledge'. We may well learn that under certain experimental conditions a possibility that we hadn't foreseen does in fact obtain, and use this result as a basis for further investigation. But for the purposes of gleaning some legitimate scientific knowledge, those results are irrelevant because they don't substantiate either of the hypotheses in the experiment.
Thomas, In the 2011 NOVA film series, The fabric of the Cosmos, physicist Dr. Leonard Susskind argues from the perspective that there are 10 to the 500 different String Theories. He claims this is exactly what cosmologists are looking for. This fits with the ideas of a multiverse, a huge number of universes; each different. In Dr. Susskind’s 2006 book titled The Cosmic Landscape, Dr. Susskind, page 381, provides a distinction between the words he used in his book as landscape and megaverse. In the film Dr. Susskind used multiverse in place of megaverse. On the megaverse [multiverse] he wrote, “The megaverse [multiverse], by contrast is quite real. The pocket universes that fill it are actual existing places, not hypothetical possibilities.
I think any testing devised challenges Dr. Fisher’s concept to forecast all possible results.
One individual asked me after class about "the black swan problem", and whether the Fisherian way of testing hypotheses would relate to that. Before I respond to this question, I should clarify that "the black swan problem" is about falsification of hypotheses (http://en.wikipedia.org/wiki/Falsifiability#Inductive_categorical_inference) as a 'solution' to the problem of induction - at least that is how I take it. Now that we know somewhat what the black swan problem refers to, my short answer to the question is that Fisher's significance tests do NOT provide a means to falsify hypotheses. Yes, Fisher says that there's a chance at disproving the null hypothesis (and that the null hypothesis could never be proved), but this does NOT (necessarily) mean that the primary objective of significance tests is to falsify the null hypothesis! My longer answer follows, if anyone cares to read it. Deborah Mayo would agree with me that significance tests should NOT be used to falsify hypotheses in the way Popper describes falsification. In fact, I quote verbatim an excerpt from Mayo's 1996 book, Error and the Growth of Experimental Knowledge (page 2):
For Popper, learning is a matter of deductive falsification. In a nutshell, hypothesis H is deductively falsified if H entails experimental outcome O, while in fact the outcome is ~O. What is learned is that H is false. ... We cannot know, however, which of several auxiliary hypotheses is to blame, which needs altering. Often H entails, not a specific observation, but a claim about the probability of an outcome. With such a statistical hypothesis H, the nonoccurrence of an outcome does not contradict H, even if there are no problems with the auxiliaries or the observation.
As such, for a Popperian falsification to get off the ground, additional information is needed to determine (1) what counts as observational, (2) whether
auxiliary hypotheses are acceptable and alternatives are ruled out, and (3) when to reject statistical hypotheses. Only with (1) and (2) does an anomalous observation O falsify hypotheses H, and only with (3) can statistical hypotheses be falsifiable. Because each determination is fallible, Popper and, later, Imre Lakatos regard their acceptance as decisions, driven more by conventions than by experimental evidence.
Mayo later states in the same book, "A genuine account of learning from error shows where and how to justify Popper's 'risky decisions.' The result, let me be clear, is not a filling-in of the Popperian (or the Lakatosian) framework, but a wholly different picture of learning from error, and with it a different program for explaining the growth of scientific knowledge" (page 4, emphasis mine). In other words, Popperian falsification is NOT the right way to think about hypothesis tests! Hypothesis tests, whether Fisherian or from the Neyman-Pearson methodology, are NOT about falsifying statistical claims. Hence, the Fisherian way of testing hypotheses does NOT apply to the black swan problem. I quoted Deborah Mayo's view on Popper's falsification to show that my view against Popper's falsification came from her work. My term paper for PHIL 440 will address how to correctly interpret Fisher's significance tests. I hope this helps. Are there any questions about my answer to this person's question?
Nicole, Does the above chart copy, and the associated text copy that follows below, that are copied from the experiment described on page 248 of the Boeing Advanced Quality Sytem Tools document, satisfy your concept of significance test methods, and the significance test methods of Deborah Mayo, and Sir Ronald A. Fisher
Robust design: testing process parameters Parts in a heat-treat process were experiencing unpredictable growth, causing some parts to grow outside of the specification limits and be rejected as scrap. It was surmised by the engineering team that irregular growth was due to the orientation of the part in the oven and the part’s location in the oven. Since it was desirable to heat treat a maximum number of parts in each oven load, it was important to be able to determine a set of heat-treat processing conditions that would result in minimum growth for heat-treated parts in both a horizontal and vertical orientation, and at both the top and bottom locations in the oven.
Four process factors were identified: hold temperature, dwell time, gas flow rate, and temperature at removal. The team defined two settings for each of the process factors. The experiment used eight runs of the oven, as shown in figure 2.7 (a fractional factorial design, that is, a particular selection of half of the 16 possibilities defined by all combinations of the process factors at two settings). For each oven run, parts were placed at both the top and the bottom of the oven and in both orientations.
The experimental results indicated an unsuspected effect due to oven location, with parts in the bottom of the oven experiencing less growth than those in the top of the oven. The analysis indicated that a particular combination of hold temperature and dwell time would result in part growth that is insensitive (or robust) to part orientation and part location. Furthermore, the experiment indicated that temperature at removal did not affect part growth, leading to the conclusion that parts could be removed from the oven at a higher temperature; thus resulting in savings in run time.
Unless I have access to the analysis of variance (ANOVA) table, I cannot comment on the last paragraph (pertaining to what the results indicated, or the conclusion that was drawn from the experiment). Also, I should mention that ANOVA (e.g., see http://www.stat.columbia.edu/~gelman/research/unpublished/econanova.pdf) is a separate technique in itself, distinct from the type of significance tests that Fisher introduced in Chapter 2 of his book (i.e., the reading for this past week). To answer your question, page 248 of the Boeing Advanced Quality System Tools document does not satisfy my concept of significance test methods insofar as it not relating at all to the significance test introduced in my presentation earlier this week. I hope this helps. If you have any more questions pertaining to the example you gave me in the Boeing Advanced Quality System Tools document, then I suggest we do not communicate on this forum but in other ways that will not disturb the focus of the discussions in this course.