Tải bản đầy đủ - 0 (trang)
3 Bar Graphs, Histograms, and Box Plots

# 3 Bar Graphs, Histograms, and Box Plots

Tải bản đầy đủ - 0trang

of the independent, categorical variable (locality) and the y-axis represents the dependent

variable (mean snowfall in meters).

Figure 3.1 Clustered bar chart comparing the mean snowfall of alpine forests between

2013 and 2015 in Mammoth, CA; Mount Baker, WA; and Alyeska, AK.

Notice that Figure 3.1 gives a clear depiction of the differences in the mean snowfall at the

three localities. By adding error bars (standard deviations), the researcher is also able to

illustrate the variance in each one of the groups of data. For instance, the snowfall in

Alyeska, AK is less variable than the snowfall in Mount Baker, WA.

Figure 3.2 Clustered bar chart comparing the mean snowfall of alpine forests between

2013 and 2015 in Mount Baker, WA and Alyeska, AK. An improperly scaled axis

exaggerates the differences between groups.

One of the most important considerations when displaying data with a bar graph is the

scaling of the axes. Unfortunately, graphs built in the programs Excel and Numbers are

often created with an improperly scaled y-axis. If the y-axis does not begin with zero, then

the differences between groups appear exaggerated. By “zooming in” on this smaller set of

y-axis values, the graph can be misleading. Take the previous example of snowfall

measurements. It is clear that the average snowfall in Mount Baker, WA and Alyeska, AK

are very similar. However, if the graph of these two localities is built with a modified yaxis as in Figure 3.2, with a minimum value set to 16.5, the differences appear dramatic,

when in reality they are not.

Figure 3.3 Clumped bar chart comparing the mean snowfall of alpine forests by year

(2013, 2014, and 2015) in Mammoth, CA; Mount Baker, WA; and Alyeska, AK.

Clumped Bar Charts

If this same researcher was interested in illustrating the trends in snowfall patterns over

a 3-year period, a clumped bar chart would be useful. Figure 3.3 shows snowfall patterns

within each locality over a 3-year period. By using a clumped bar chart, the researcher can

demonstrate trends within each category of data. For example, we can see that the

snowfall was exceptionally high in 2014 at the Mount Baker, WA location; however, the

snowfall at the Mammoth, CA location was fairly stable over time.

Stacked Bar Charts

Next the researcher wants to illustrate differences in the timing of snowfall by month

within each site. For this example, a stacked bar chart is helpful in illustrating the relative

contributions of parts to the whole. Figure 3.4 shows the amount of snow that fell within

the months of January, February, and March, 2015. Notice that in Mammoth, CA there

was zero snowfall in the month of January.

Figure 3.4 Stacked bar chart comparing the mean snowfall of alpine forests by month

(January, February, and March) for 2015 in Mammoth, CA; Mount Baker, WA; and

Alyeska, AK.

Figure 3.5 Histogram of seal size.

Histograms

Histograms are another form of bar charts used to display continuous categories, like a

consecutive range of values for age. If your data are made up of quantitative variables,

then consider constructing a histogram. The format is similar to that of a bar chart;

however, the categories along the bottom are represented with a set range of values.

Hence, both axes will be represented on a numerical scale. Also, the aesthetics are slightly

different because there are no spaces between the bars. In a histogram, there will never

be space between bars because the horizontal axis is representing continuous values

(Figure 3.5). If a space does exist between bars, then it means that there are no values for

that range.

Box Plots

The box plot (also called a box and whisker plot) is a convenient way to illustrate several

key descriptive statistics from a dataset. Box plots show the median, as well as the

distribution of the data through the use of quartiles, which divide ranked data into four

equal groups, each consisting of a quarter of the data.

Consider the following dataset:

The first step in developing a box plot for these data is to define the quartiles. Several

methods are currently debated regarding how to define quartiles; the following example

uses the simplest and most intuitive method. In the sample dataset above, the numbers

must first be rearranged so that they are in order:

Second, find the median, which is also defined as the second quartile (Q2). In the current

example, there is an even number of data points, so the median is calculated as the

average of the middle two numbers (Q2 = 24). If there were an odd number of points, the

median would be excluded for the next step. Third, calculate the median of each half of

the data (on either side of the median); these medians are the first and third quartiles (Q1

and Q3):

The box component of a box plot spans the first quartile to the third quartile, and is

known as the interquartile range (IQR); the median is shown inside the box at the

position of the second quartile, as illustrated in Figures 3.6 and 3.7.

Figure 3.6 Example box plot showing the median, first and third quartiles, as well as the

whiskers.

Figure 3.7 Comparison of the box plot to the normal distribution of a sample population.

By showing the median, as well as the position of the first and third quartiles, box plots

give information about the degree of dispersion, as well as the skewness of the data. Box

plots often also have lines (the whiskers) extending from the box to represent the

variability of the data outside of the upper and lower quartiles. The whiskers usually mark

the minimum and maximum values for the dataset. However, if the dataset contains

outliers, the whiskers will extend only up to a certain point, defined as Q1 − 1.5 × IQR or

Q3 + 1.5 × IQR (Figure 3.7). Outliers will be depicted as points outside of the whiskers

(Figure 3.8).

Figure 3.8 Sample box plot with an outlier.

The box plots on previous pages, Figures 3.7 and 3.8, have been drawn for illustrative

purposes in a horizontal orientation, but are most often shown vertically, as in Figure 3.8.

In Figure 3.8, descriptive information from two groups of data is depicted. Although the

medians for the two groups are the same, the differences in the dispersion and skew of

the data are apparent. While group B shows a normal distribution, group A shows a

“positive skew,” with a tail that extends in the positive direction. The box plot for group A

also shows the position of an outlier, whose value is beyond the range of the whiskers.

Generating box plots is straightforward in both SPSS and R and is included in this book's

tutorials. However, generating box plots in Excel and Numbers is both lengthy and

complex, and involves manipulating stacked bar charts. If you do not have access to SPSS

or R, we recommend looking for a free, online box plot generator, which is an easy and

quick solution for creating box plots of your data.

Tutorials

How to Make a Bar Chart in Excel

The following tutorial will walk you through the construction of a bar chart (also known

as column graph or bar plot) using Excel. The data involve the number of rows of snail

*Data

taken from the research of Vanessa C. Morales, Robert Candelaria, and

Dr. Kathleen Weaver.

Refer to Chapter 12 for tips and tools when using Excel.

Excel offers two methods to construct a simplified bar chart with error bars. While the

first method may be more challenging at first, the lessons learned will give you greater

mastery and flexibility. Calculate the average and standard deviation of the radula from

each population prior to beginning the tutorial.

Method 1

1. Arrange data in columns on the spreadsheet.

2. Click on an empty cell. Select Insert, Column, and select the first 2-D Column

option. There are several types of bar graphs available. Use the one appropriate for the

data you want to display.

3. A blank canvas will appear.

4. Right click on the blank canvas and choose the Select Data option.

5. Under Legend Entries, select Add.

Note: Add each data point as separate series so that the standard deviation bars can be

entered separately.

6. Select the icon corresponding to the Series name subheading.

7. Select the first series title then click on the icon to the right.

8. Select the icon corresponding to the Series values.

9. Select the first value then click the icon on the right.

10. Click OK.

11. You will be directed to the original popup. Repeat steps 5–10 to input the remaining

values.

12. After the second variable is added, you should be left with a graph that looks like the

following.

After the third variable:

13. Once all the variables have been added to the graph, click OK.

14. A very basic column graph will appear, similar to the one below.

15. As a default, Excel labels the x-axis as “1.” To delete this label, select label “1.” A box

will appear. Then, press delete on your keyboard. ### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3 Bar Graphs, Histograms, and Box Plots

Tải bản đầy đủ ngay(0 tr)

×