How Do You Know to Reject or Do Not Reject
We saw in Chapter three that the mean of a sample has a standard fault, and a mean that departs by more than than twice its standard error from the population hateful would be expected by chance only in about 5% of samples. Besides, the difference between the means of two samples has a standard mistake. We do not unremarkably know the population mean, so we may suppose that the mean of i of our samples estimates it. The sample mean may happen to be identical with the population mean only it more probably lies somewhere above or beneath the population mean, and there is a 95% run a risk that it is within ane.96 standard errors of it.
Consider at present the mean of the second sample. If the sample comes from the same population its mean volition also accept a 95% run a risk of lying inside 196 standard errors of the population mean simply if we practise not know the population hateful nosotros have just the means of our samples to guide us. Therefore, if we desire to know whether they are likely to have come up from the same population, we ask whether they lie within a certain range, represented by their standard errors, of each other.
Large sample standard error of difference between means
If SD1 represents the standard divergence of sample 1 and SD2 the standard difference of sample 2, n1 the number in sample 1 and n2 the number in sample 2, the formula denoting the standard error of the difference between two means is:
(5.1)
The computation is straightforward.
Square the standard deviation of sample ane and divide by the number of observations in the sample:
(i)
Square the standard deviation of sample ii and dissever by the number of observations in the sample:
(2)
Add (i) and (2).
Take the foursquare root, to give equation 5.1. This is the standard error of the difference between the two means.
Large sample conviction interval for the difference in two means
From the information in the general practitioner wants to compare the hateful of the printers' blood pressures with the hateful of the farmers' blood pressures. The figures are ready out starting time as in table 5.1 (which repeats table 3.1 ).
Table v.one
Analysing these figures in accordance with the formula given to a higher place, nosotros have:
The difference between the means is 88 – 79 = 9 mmHg.
For large samples we can calculate a 95% conviction interval for the departure in means every bit
9 – ane.96 x 0.81 to 9 + 1.96 10 0.81 which is seven.41 to ten.59 mmHg.
For a small sample we need to change this procedure, as described in Chapter 7.
Zero hypothesis and type I fault
In comparing the mean blood pressures of the printers and the farmers we are testing the hypothesis that the two samples came from the same population of claret pressures. The hypothesis that there is no deviation between the population from which the printers' blood pressures were drawn and the population from which the farmers' claret pressures were fatigued is called the zilch hypothesis.
Only what practice we mean by "no difference"? Chance lone volition nearly certainly ensure that there is some difference betwixt the sample means, for they are most unlikely to be identical. Consequently we set limits within which we shall regard the samples as not having whatsoever meaning difference. If nosotros prepare the limits at twice the standard error of the difference, and regard a mean outside this range as coming from another population, we shall on average exist wrong nearly one time in twenty if the cypher hypothesis is in fact true. If nosotros exercise obtain a mean difference bigger than 2 standard errors we are faced with two choices: either an unusual event has happened, or the null hypothesis is wrong. Imagine tossing a money v times and getting the aforementioned face each fourth dimension. This has nearly the same probability (6.3%) as obtaining a mean difference bigger than two standard errors when the nothing hypothesis is true. Do we regard information technology every bit a lucky effect or suspect a biased coin? If we are unwilling to believe in unlucky events, we reject the null hypothesis, in this example that the coin is a fair one.
To decline the null hypothesis when it is truthful is to make what is known every bit a type I error . The level at which a upshot is declared pregnant is known every bit the type I fault rate, often denoted by α. We try to show that a null hypothesis is unlikely , not its antipodal (that information technology is likely), so a departure which is greater than the limits we take set, and which we therefore regard equally "significant", makes the cipher hypothesis unlikely . However, a difference within the limits we have set, and which we therefore regard as "non-significant", does non make the hypothesis likely.
A range of not more than two standard errors is often taken every bit implying "no difference" simply there is goose egg to stop investigators choosing a range of three standard errors (or more) if they desire to reduce the chances of a type I error.
Testing for differences of ii ways
To discover out whether the difference in blood pressure of printers and farmers could have arisen past hazard the general practitioner erects the zippo hypothesis that in that location is no meaning difference between them. The question is, how many multiples of its standard error does the difference in means difference represent? Since the difference in means is ix mmHg and its standard mistake is 0.81 mmHg, the reply is: 9/0.81 = 11.ane. We usually announce the ratio of an gauge to its standard mistake by "z", that is, z = eleven.1. Reference to Table A (Appendix table A.pdf) shows that z is far beyond the figure of 3.291 standard deviations, representing a probability of 0.001 (or i in 1000). The probability of a departure of 11.one standard errors or more occurring by take a chance is therefore exceedingly low, and correspondingly the null hypothesis that these ii samples came from the same population of observations is exceedingly unlikely. The probability is known as the P value and may be written P < 0.001.
Information technology is worth recapping this process, which is at the center of statistical inference. Suppose that nosotros accept samples from two groups of subjects, and nosotros wish to run into if they could plausibly come from the aforementioned population. The first approach would exist to calculate the difference betwixt two statistics (such as the ways of the 2 groups) and summate the 95% confidence interval. If the two samples were from the same population we would expect the conviction interval to include zero 95% of the time, and so if the confidence interval excludes zero we doubtable that they are from a unlike population. The other approach is to compute the probability of getting the observed value, or one that is more farthermost , if the null hypothesis were right. This is the P value. If this is less than a specified level (usually five%) then the outcome is declared significant and the null hypothesis is rejected. These two approaches, the estimation and hypothesis testing approach, are complementary. Imagine if the 95% confidence interval just captured the value null, what would exist the P value? A moment's thought should convince i that information technology is 2.five%. This is known as a one sided P value , because it is the probability of getting the observed result or ane bigger than it. However, the 95% confidence interval is two sided, considering it excludes not only the two.5% above the upper limit but also the two.5% below the lower limit. To support the complementarity of the confidence interval approach and the null hypothesis testing arroyo, most authorities double the one sided P value to obtain a two sided P value (see beneath for the stardom betwixt ane sided and two sided tests).
Sometimes an investigator knows a hateful from a very large number of observations and wants to compare the mean of her sample with information technology. We may not know the standard deviation of the large number of observations or the standard error of their mean merely this demand not hinder the comparing if nosotros tin presume that the standard error of the mean of the large number of observations is virtually zero or at least very small in relation to the standard mistake of the mean of the small sample.
This is because in equation v.1 for calculating the standard mistake of the departure between the two means, when n1 is very large and so
becomes so small equally to exist negligible. The formula thus reduces to
which is the same as that for standard error of the sample mean, namely
Consequently we detect the standard error of the hateful of the sample and divide information technology into the divergence between the means.
For example, a big number of observations has shown that the hateful count of erythrocytes in men is
In a sample of 100 men a mean count of 5.35 was found with standard deviation one.one. The standard mistake of this mean is
,
. The difference between the two means is v.5 – 5.35 = 0.xv. This deviation, divided by the standard error, gives z = 0.15/0.eleven = 136. This figure is well below the five% level of 1.96 and in fact is beneath the 10% level of i.645 (see table A ). We therefore conclude that the difference could take arisen past chance.
Culling hypothesis and type 2 error
It is of import to realise that when nosotros are comparison two groups a non-significant effect does non mean that we accept proved the two samples come from the same population – it simply means that we have failed to prove that they do non come from the population. When planning studies information technology is useful to call back of what differences are likely to ascend between the two groups, or what would exist clinically worthwhile; for instance, what do we await to be the improved benefit from a new treatment in a clinical trial? This leads to a written report hypothesis , which is a difference nosotros would like to demonstrate. To dissimilarity the written report hypothesis with the nothing hypothesis, it is often called the alternative hypothesis . If we do not reject the zero hypothesis when in fact in that location is a divergence between the groups nosotros make what is known as a blazon II error . The type II error rate is oft denoted as
. The power of a study is defined as 1 –
and is the probability of rejecting the aught hypothesis when information technology is false. The well-nigh common reason for blazon II errors is that the study is likewise small.
The concept of ability is really merely relevant when a study is being planned (meet Chapter 13 for sample size calculations). After a report has been completed, nosotros wish to make statements not nearly hypothetical alternative hypotheses but about the data, and the way to do this is with estimates and conviction intervals.(i)
Mutual questions
Why is the P value not the probability that the null hypothesis is truthful?
A moment'southward reflection should convince you that the P value could not be the probability that the null hypothesis is true. Suppose nosotros got exactly the same value for the mean in two samples (if the samples were small and the observations coarsely rounded this would not be uncommon; the difference betwixt the means is aught). The probability of getting the observed issue (zero) or a result more than extreme (a result that is either positive or negative) is unity, that is we tin be certain that we must obtain a result which is positive, negative or zippo. Yet, we can never be certain that the zilch hypothesis is true, especially with small samples, then clearly the argument that the P value is the probability that the null hypothesis is true is in error. We tin can recall of it as a measure of the strength of testify against the cypher hypothesis, but since information technology is critically dependent on the sample size we should not compare P values to argue that a divergence found in one group is more than "meaning" than a difference found in another.
References
Gardner MJ Altman DG, editors. Statistics with Conviction . London: BMJ Publishing Group. Differences between means: type I and type II errors and power
Exercises
5.1 In one group of 62 patients with iron deficiency anaemia the haemoglobin level was 1 two.ii g/dl, standard deviation ane.8 grand/dl; in another grouping of 35 patients information technology was ten.9 k/dl, standard deviation 2.1 k/dl.
What is the standard error of the difference between the two means, and what is the significance of the difference? What is the difference? Give an approximate 95% confidence interval for the difference. 5.2 If the mean haemoglobin level in the full general population is taken as 14.4 grand/dl, what is the standard mistake of the difference between the mean of the starting time sample and the population mean and what is the significance of this difference?
Source: https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/5-differences-between-means-type-i-an
0 Response to "How Do You Know to Reject or Do Not Reject"
Post a Comment