|
|
|
|
|
|
|
|
|
| Home > Science Fair Project Guide > |
It would be useful to have a measure of scatter that has the following properties:
The variance (σ2) is a measure of how far each value in the data set is from the mean. Here is how it is defined:
In order to write the equation that defines the variance, it is simplest to use the summation operator, Σ. The summation operator is just a shorthand way to write, "Take the sum of a set of numbers." As an example, we'll show how we would use the summation operator to write the equation for calculating the mean value of data set 1. We'll start by assigning each number to variable, X1–X6, like this:
| data set 1 | |
| variable | value |
| X1 | 3 |
| X2 | 4 |
| X3 | 4 |
| X4 | 5 |
| X5 | 6 |
| X6 | 8 |
Think of the variable (X) as the measured quantity from your experiment—like number of leaves per plant—and think of the subscript as indicating the trial number (1–6). To calculate the average number of leaves per plant, we first have to add up the values from each of the six trials. Using the summation operator, we'd write it like this:
![]() |
which is equivalent to:
![]() |
or:
![]() |
Obviously the sum is a lot more compact to write with the summation operator. Here is the equation for calculating the mean, &mux, of our data set using the summation operator:
![]() |
The general equation for calculating the mean, μ, of a set of numbers, X1 – XN, would be written like this:
![]() |
Sometimes, for simplicity, the subscripts are left out, as we did on the right, above. Doing away with the subscripts makes the equations less cluttered, but it is still understood that you are adding up all the values of X.
Now that you know how the summation operator works, you can understand the equation that defines the variance:
![]() |
The variance (σ2), is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N).
There's a more efficient way to calculate the standard deviation for a group of numbers, shown in the following equation:
![]() |
You take the sum of the squares of the terms in the distribution, and divide by the number of terms in the distribution (N). From this, you subtract the square of the mean (μ2). It's a lot less work to calculate the standard deviation this way.
It's easy to prove to yourself that the two equations are equivalent. Start with the definition for the variance (Equation 1, below). Expand the expression for squaring the distance of a term from the mean (Equation 2, below).
![]() |
Now separate the individual terms of the equation (the summation operator distributes over the terms in parentheses, see Equation 3, above). In the final term, the sum of μ2/N, taken N times, is just Nμ2/N.
Next, we can simplify the second and third terms in Equation 3. In the second term, you can see that ΣX/N is just another way of writing μ, the average of the terms. So the second term simplifies to −2μ2 (compare Equations3 and 4, above). In the third term, N/N is equal to 1, so the third term simplifies to μ2 (compare Equations 3 and 4, above).
Finally, from Equation 4, you can see that the second and third terms can be combined, giving us the result we were trying to prove in Equation 5.
As an example, let's go back to the two distributions we started our discussion with:
We'll construct a table to calculate the values. You can use a similar table to find the variance and standard deviation for results from your experiments.
| data set | N | ΣX | ΣX2 | μ | μ2 | σ2 | &sigma |
| 1 | 6 | 30 | 166 | 5 | 25 | 2.67 | 1.63 |
| 2 | 6 | 30 | 216 | 5 | 25 | 11.00 | 3.32 |
Although both data sets have the same mean (μ = 5), the variance (σ2) of the second data set, 11.00, is a little more than four times the variance of the first data set, 2.67. The standard deviation (&sigma) is the square root of the variance, so the standard deviation of the second data set, 3.32, is just over two times the standard deviation of the first data set, 1.63.
![]() |
![]() |
The variance and the standard deviation give us a numerical measure of the scatter of a data set. These measures are useful for making comparisons between data sets that go beyond simple visual impressions.
Science Buddies gratefully acknowledges its Presenting Sponsor
Science Fair Project Home
Our Sponsors
About Us
Volunteer
Donate
Contact Us
Online Store
Privacy Policy
Image Credits
Site Map
Science Fair Project Ideas
Science Fair Project Guide
Ask an Expert
Teacher Resources
Science Fair Competitions
Copyright © 2002-2008 Kenneth Lafferty Hess Family Charitable Foundation. All rights reserved.
Reproduction of material from this website without written permission is strictly prohibited.
Use of this site constitutes acceptance of our
Terms and Conditions of Fair Use.