Modify Bin Width For A Histogram In Excel
The original histogram had a bin width of.2 after I collected the information from the histogram I am supposed to change the histogram so it has 10 bins. My professor taught us to type in the upper limits and when we create the histogram we select those cells for the bin range and use the cells that we entered the data in for the input range. I did create a histogram from built in graphs but under 'format axis' the 'bin' option does not appear: Moderator comment: Split from this thread, which was for Excel for Windows. Your screenshot looks like Excel for Mac, so I'm moving it here.
The bins in a histogram can be adjusted in the Axis Options, but only with these options: It accepts a specified number of bins. And you could use overflow bins and underflow bins to adjust the start and end. For more information, you could check the Configure Histogram Bins in this article: Create a histogram in Excel. Bin width – You tell Excel how many units each bin should be. In our example below, we are telling Excel that we want each bin to cover 9,984 units each. Number of bins – Excel will automatically determine the size of each bin, however, you can modify the number of bins.
When you create a histogram with statistical software, the software uses the data (including the sample size) to automatically choose the width and location of the histogram bins. The resulting histogram is an attempt to balance statistical considerations, such as estimating the underlying density, and 'human considerations,' such as choosing 'round numbers' for the location and width of bins for histograms. Common 'round' bin widths include 1, 2, 2.5, and 5, as well as these numbers multiplied by a power of 10.
The default bin width and locations tend to work well for 95% of the data that I plot, but sometimes I decide to override the default choices. This article describes how to set the width and location of bins in histograms that are created by the UNIVARIATE and SGPLOT procedures in SAS.
Why override the default bin locations?
The most common reason to override the default bin locations is because the data have special properties. For example, sometimes the data are measured in units for which the common 'round numbers' are not optimal:
- For a histogram of time measured in minutes, a bin width of 60 is a better choice than a width of 50. Bin widths of 15 and 30 are also useful.
- For a histogram of time measured in hours, 6, 12, and 24 are good bin widths.
- For days, a bin width of 7 is a good choice.
- For a histogram of age (or other values that are rounded to integers), the bins should align with integers.
You might also want to override the default bin locations when you know that the data come from a bounded distribution. If you are plotting a positive quantity, you might want to force the histogram to use 0 as the leftmost endpoint. If you are plotting percentages, you might want to force the histogram to choose 100 as the rightmost endpoint.
To illustrate these situations, let's manufacture some data with special properties. The following DATA step creates two variables. The T variable represents time measured in minutes. The program generates times that are normally distributed with a mean of 120 minutes, then rounds these times to the nearest five-minute mark. The U variable represents a proportion between 0 and 1; it is uniformly distributed and rounded to two decimal places.
How do we control the location of histogram bins in SAS? Read on!
Custom bins with PROC UNIVARIATE: An example of a time variable
I create histograms with PROC UNIVARIATE when I am interested in also computing descriptive statistics such as means and quantiles, or when I want to fit a parametric distribution to the data. The following statements create the default histogram for the time variable, T:
The default bin width is 20 minutes, which is not horrible, but not as convenient as 15 or 30 minutes. The first bin is centered at 70 minutes; a better choice would be 60 minutes.
The HISTOGRAM statement in PROC UNIVARIATE supports two options for specifying the locations of bins. The ENDPOINTS= option specifies the endpoints of the bins; the MIDPOINTS= option specifies the midpoints of the bins. The following statements use these options to create two customize histograms for which the bin widths are 30 minutes:
The histogram on the left has bins that are centered at 30-minute intervals. This histogram makes it easy to estimate that about 40 observations are approximately 120 minutes. The counts for other half-hour increments are similarly easy to estimate. In contrast, the histogram on the right has bins whose endpoints are 60, 90, 120,... minutes. With this histogram, it easy to see that about 35 observations have times that are between 90 and 120 minutes. Similarly, you can estimate the number of observations that are greater than three hours or less than 90 minutes.
Both histograms are equally correct. The one you choose should depend on the questions that you want to ask about the data. Use midpoints if you want to know how many observations have a value; use endpoints if you want to know how many observations are between two values.
If you run the SAS statements that create the histogram on the right, you will see the warning message
WARNING: The ENDPOINTS= list was extended to accommodate the data.
This message informs you that you specified the last endpoint as 210, but that additional bins were created to display all of the data.
Custom bins for a bounded variable
As mentioned earlier, if you know that values are constrained within some interval, you might want to choose histogram bins that incorporate that knowledge. The U variable has values that are in the interval [0,1), but of course PROC UNIVARIATE does not know that. The following statement create a histogram of the U variable with the default bin locations:
The default histogram shows seven bins with a bin width of 0.15. From a statistical point of view, this is an adequate histogram. The histogram indicates that the data are uniformly distributed and, although it is not obvious, the left endpoint of the first bin is at 0. However, from a 'human readable' perspective, this histogram can be improved. The following statements use the MIDPOINTS= and ENDPOINTS= options to create histograms that have bin widths of 0.2 units:
The histogram on the left is not optimal for these data. Because we created uniformly distributed data in [0,1], we know that the expected count in the leftmost bin (which is centered at 0) is half the expected count of an inner bin. Similarly, the expected count in the rightmost bin (which is centered at 1) is half the count of an inner bins because no value can exceed 1. Consequently, this choice of midpoints is not very good. For these data, the histogram on the right is better at revealing that the data are uniformly distributed and are within the interval [0,1).
Custom bins with PROC SGPLOT
If you do not need the statistical power of the UNIVARIATE procedure, you might choose to create histograms with PROC SGPLOT. The SGPLOT procedure supports the BINWIDTH= and BINSTART= options on the HISTOGRAM statement. The BINWIDTH= option specifies the width for the bins. The BINSTART= option specifies the center of the first bin.
I recommend that you specify both the BINWIDTH= and BINSTART= options, and that you choose the bin width first. Be aware that not all specifications result a valid histogram. If you make a mistake when specifying the bins, you might get the following error
WARNING: The specified BINWIDTH= value will be ignored in order to accommodate the data.
That message usually means that the minimum value of the data was not contained in a bin. For a bin width of h, the BINSTART= value must be less than xmin + h/2, where xmin is the minimum value of the data.
By default, the axis does not show a tick mark for every bin, but you can force that behavior by using the SHOWBINS option. The following statements call the SGPLOT procedure to create histograms for the time-like variable, T. The results are again similar to the custom histograms that are shown in the previous section:
The following statements call the SGPLOT procedure to create histograms for the bounded variable, U. The results are similar to those created by the UNIVARIATE procedure:
In summary, for most data the default bin width and location result in a histogram that is both statistically useful and easy to read. However, the default choices can lead to a less-than-optimal visualization if the data have special properties, such as being time intervals or being bounded. In those cases, it makes sense to choose a bin width and a location of the first bin such that reveals your data's special properties. For the UNIVARIATE procedure, use the MIDPOINTS= or ENDPOINTS= options on the HISTOGRAM statement. For the SGPLOT procedure, use the BINWIDTH= and BINSTART= options to create a histogram with custom bins.
What is a Histogram?
A histogram is a common chart used for data analysis in people’s work or study. It is a graphic display where the data will be grouped into ranges and plotted as bars. The height of each bar represents the volume of the data in each range. A histogram looks familiar to a column chart, but you can see their differences in the below pictures.
How to Make a Histogram in Excel 2016
If you are using Excel 2016, there is a built-in histogram chart type and it will be very easy and convenient for users to make a histogram in Excel.
Step 1: Input Data
The first thing you have to do is to input data into the worksheet. You can just type the data manually or import the data from outside sources.
Step 2: Create Your Histogram
Select the dataset, go to Insert tab and click on the Insert Statistic Chart option in Charts group.
Then click on the first Histogram icon, the histogram will be inserted based on your dataset and shown on the worksheet.
Step 3: Customize Your Histogram
For customizing a histogram, there are 4 function areas in Excel 2016 where you can change the styles, layout and colors of the histogram, add chart elements, modify the axis options or even change the gap width between bars.
On Design and Format tab or on the floating menus next to the histogram, you can modify the style, layout and color options to change the appearance of the chart.
When you right-click on any part of the histogram, the contextual menus will be some of different from each other. However, most of the options on the menus are also shown on Design tab, Format tab or the floating menus.
For the right Format pane, the options on the pane will also change immediately according to your selection area on the histogram. When you select the bar of the histogram, the right Format pane will provide an option of changing gap width between bars.
Step 4: Change Histogram Bins
The word “Bins” represents bars on the histogram. If you want to add or decrease the number of bins on the histogram, you can select the horizontal axis and the axis options will show on the right Format pane.
- By Category: This option is used when your horizontal categories are in text format. For example, if you have sales data of Smart Phone, Computer and Tablet, and you want to know the sales volume of each item, this option would be very helpful.
- Automatic: This option will automatically decide the number of bins in the histogram.
- Bin Width: This option allows you to define how big each bin should be.
- Number of bins: This option allows you to specify how many bins you want in the histogram.
- Overflow bin: When you want to see the number of all values, which are above a certain value in the histogram, you can tick this option and input the certain number.
- Underflow bin: When you want to see the number of all values, which are below a certain value in the histogram, you can tick this option and input the number.
Step 5: Move Your Histogram
To move your histogram into another place, you can need to select the histogram at first, click Move Chart button on Design tab or the contextual menu of Chart Area.
Then, you will see the Move Chart window to ask you to choose where to place your histogram.
If you choose New Sheet, the histogram will be moved to a new sheet called Chart1 and the histogram will be on the center of the sheet; if you choose Object in, you can choose to move the histogram into another worksheet.
How to Make a Histogram with Data Analysis ToolPak
Creating a histogram with Data Analysis ToolPak works for all the versions of Excel (including Excel 2016). However, if you’re using Excel 2016, I would recommend you using the built-in histogram chart as the below section.
Step 1: Install Data Analysis ToolPak
Go to File tab, select Options; in the pop-up Excel Options window, select Add-ins.
Then in the Manage drop-down menu, select Excel Add-ins and click Go.
Modify Bin Width For A Histogram In Excel Cell
Next, in the Add-ins window, choose Analysis ToolPak and click OK.
The Analysis Toolpak add-in will be inserted on your Excel and you can access it in the Analysis group of Data tab.
Step 2: Input Data & Add Bins
This step is the same as the steps in the first section. So you can type or import the data.
After inputting data, you also need to create data intervals in order to specify the histogram bin ranges. Bins are the numbers that represent the intervals into which you want to group the data. The intervals should be continuous, non-overlapping and usually in equal size.
Now you have to specify the bins in an additional column next to the dataset.
Step 3: Create Your Histogram
Go to Data tab, click Data Analysis in the Analysis group. Then in the Data Analysis dialog box, select Histogram from the list, click OK.
In the Histogram dialog box, you need to select the Input Range and Bin Range. You can leave the Labels checkbox unchecked if you don’t include labels in the data selection.
Besides, you can choose where to place the histogram in the part of Output options. Then, don’t forget to select Chart Output.
After that, click OK, the histogram will be inserted on the worksheet with the frequency distribution table.
Once you have created a histogram with Data Analysis ToolPak, you can not use Ctrl + Z to revert it. You have to delete the table and the chart manually.
How to Make a Histogram with FREQUENCY Function
Apart from the above 2 methods, you can also create a histogram via using FREQUENCY function. And the histogram will be dynamic, which means when you change the data, the histogram will update accordingly.
Step 1: Input Data & Add Bins
Similarly, you need to input the data into the worksheet and then create the data intervals.
Step 2: Enter the Formula
Before you enter the frequency formula, you need to add a column named Frequency next to the Bins column.
The FREQUENCY formula has the following syntax:
FREQUENCY(data_array, bins_array)
In this example, the data array is B2:B17, bin array is D2:D6, so we get the frequency formula:
Modify Bin Width For A Histogram In Excel Formula
=FREQUENCY(B2:B17, D2:D6)
Because the FREQUENCY function is actually an array formula, you need to press Ctrl + Shift + Enter, not just click Enter.
Here are the specific steps to get the frequency result from the dataset:
- Select the cells under the Frequency column, which are E2:E6 in this example.
- Press F2 to get into the edit mode for cell E2.
- Input the FREQUENCY formula.
- Press Ctrl + Shift + Enter to make sure the formula will be entered in cells (E2:E6) with the curly brackets.
Step 3: Create Your Histogram
If you are not using Excel 2016 or the premium version of Excel, you can’t make a histogram directly with the inbuilt templates. But when you get the frequency numbers from your dataset by using FREQUENCY function, you will be able to create a histogram with a simple column chart.
Note: Because the FREQUENCY function in Excel is an array function, you cannot edit, move, add or delete the individual cells that are included in the formula. When you need to change the number of cells of bins or dataset, you have to delete the existing formula at first, then add or delete the cells, select a new range of cells, and re-input the formula.
Related Articles
- What is a Pie Chart?
- How to Make a Graph or Chart in Excel
- How to Create a Flowchart in Excel
- How to Make a Line Graph in Excel
- All You Need to Know about Line Graph
- How to Make A Line Graph in Word