Box plot
Encyclopedia
|
| Tutorials | Encyclopedia | Dictionary | Directory |
|
Box plot
Figure 1. Box plot of data from the Michelson-Morley Experiment Boxplots can be useful to display differences between populations without making any assumptions of the underlying statistical distribution. The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. Boxplots can be drawn either horizontally or vertically.
ConstructionFor a data set, one constructs a horizontal box plot in the following manner:
ExampleA plain-text version might look like this: +-----+-+
* o |-------| | |---|
+-----+-+
+---+---+---+---+---+---+---+---+---+---+---+---+ number line
0 1 2 3 4 5 6 7 8 9 10 11 12
For this data set:
The horizontal lines (the "whiskers") extend to at most 1.5 times the box width (the interquartile range) from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. Three times the box width marks the boundary between "mild" and "extreme" outliers. In this boxplot, "mild" and "extreme" outliers are differentiated by closed and open dots, respectively. There are alternative implementations of this detail of the box plot in various software packages, such as the whiskers extending to at most the 5th and 95th (or some more extreme) percentiles. Such approaches do not conform to Tukey's definition, with its emphasis on the median in particular and counting methods in general, and they tend to produce "outliers" for all data sets larger than ten, no matter what the shape of the distribution.[1] Alternative formsBox and whisker plots are uniform in their use of the box: the bottom and top of the box are always the 25th and 75th percentile (the lower and upper quartiles, respectively), and the band near the middle of the box is always the 50th percentile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
Any data not included between the whiskers should be plotted as an outlier with a dot, small circle, or star, but occasionally this is not done. Some box plots include an additional dot or a cross is plotted inside of the box, to represent the mean of the data in addition to the median. On some box plots a crosshatch is placed on each whisker, before the end of the whisker. Fairly rarely, box plots can be presented with no whiskers at all. Because of this variability, it is appropriate to describe the convention being used for the whiskers and outliers in the caption for the plot.
Visualization
Figure 2. Boxplot and a probability density function (pdf) of a Normal N(0,1?2) Population The boxplot is a quick graphic for examining one or more sets of data. Boxplots may seem more primitive than a histogram or kernel density estimate but they do have some advantages. They take up less space and are therefore particularly useful for comparing distributions between several groups or sets of data (see Figure 1 for an example). Choice of number and width of bins techniques can heavily influence the appearance of a histogram, and choice of bandwidth can heavily influence the appearance of a kernel density estimate. As looking at a statistical distribution is more intuitive than looking at a boxplot, comparing the boxplot against the probability density function (theoretical histogram) for a normal N(0,1?2) distribution may be a useful tool for understanding the boxplot (Figure 2). See alsoReferences
NotesExternal links
de:Boxplot es:Diagrama de caja fr:Boîte ā moustaches it:Box-plot nl:Boxplot ja:???? pl:Wykres pude?kowy sv:Lådagram zh:??? Source: Wikipedia | The above article is available under the GNU FDL. | Edit this article
|
|
top
©2008-2009 TutorGig.com. All Rights Reserved. Privacy Statement