Creating a Frequency Distribution with a Pivot Table

June 30, 2009 by datapig Leave a reply »

In honor of the upcoming 4th of July weekend, I’ve declared this week to be Pivot Table week. 

Afterall, what’s more American than slaving away on a pivot table during a holiday.

In today’s post, I’d like to share a trick that allows you to create a frequency distribution with a pivot table.

If you’ve created a frequency distribution with the FREQUENCY function, you’ll know it can turn into a hot mess.  The fact that it’s an array formula doesn’t help matters.  Then there’s that Histogram P.O.S. you find in the Analysis Tool Pack. That doesn’t make life much better.  Each time you want to change your Bins, you’ve got to run through the whole 25-step process again.

No. If I have a choice, I like to use a pivot table.  Here’s a quick example how.

First, you need to create a pivot table where the data values are plotted in the Row area (not the Data area).

Notice that the SalesAmount field is placed in the Row Labels area.

 

Next, right-click on any value in the Row area and select Group.

In the Group dialog box, set the start and end values, then set the intervals. This will essentially create your frequency distribution.

Once you click the OK button, you can leverage the result to create a distribution view of your data.

In this example, I’ve added the number of Transactions to the data area to get a frequency distribution of transactions by dollar amount.

The obvious benefit to this technique is you can interactively filter the data based on other dimensions like Region and Market.

Also, unlike the Analysis Tool Pack Histogram shwag, you can quickly adjust your frequency intervals by simply right-clicking on any number in the Row area and selecting Group.

RELATED STUFF

  1. Using Arrows to Edit a Named Range
  2. Outta My Way Formula Thingy!
  3. Zoom into Spreadsheet on Double-Click
  4. Find and Highlight Specific Values
  5. Creating Subtotals on Many Columns
Advertisement

20 Responses

  1. jaymz says:

    good skill

  2. datapig says:

    you good skill too

  3. jeffrey weir says:

    In honor of your upcoming 4th of July weekend, I’d like to propose a bacon recipe for Saturday.

    Take 10 rashers of bacon. Add a few token bits of bread, some shavings of onion, and surround with a large Turkey.

    Fire up the oven, and then Excel. Enter the Turkey into the oven. Enter the Turkey’s weight into cell A1 of Excel. (DO NOT mix up these steps. And don’t twitter about it if you do…we want to read about it HERE, not on Debra Dalgleish’s blog.)

    Enter the following cooking time formula into cell $A$2 (of Excel, not of your oven, Excel is probably not installed in your oven)

    =$A$1/20*4.5

    Note that this is an array formula. Or rather, it’s an array formula if (and only if) you’re sober enough to hold down Control + Shift + Enter after you’ve typed it into the cell. If you’re too drunk to hold down any 3 keys – let alone the 3 keys specified above, then just go ahead and press Enter. But then don’t blame me if you overtype your formula later.

    Now, go ahead and set your oven’s buzzer for the time specified in Cell $A$2 … assuming the format of Cell $A$2 remotely resembles a time of some sort. If it’s a percent, then your goose will probably be 100% cooked.

    Serve with lots of beer and low expectations.

  4. Jared says:

    I cannot get this to work in 2007. When I select Group I get “Cannot group that section”. Thoughts?

  5. DataPig says:

    Sounds like you’re trying to group dates that are not formatted as dates. Or you have a calculated item in your pivot table already. I don’t think you can group in a pivot table with calculated items.

  6. Jared says:

    No dates, no calculated items….no luck. Thanks anyway, I just suffer through the 25-step process.

  7. Ethan says:

    First off, great tip. Thanks!
    Second, unless there’s a convention I missed back in my Stats classes, don’t you’re bins need to be discrete and not overlap?

    This question lead me to a curious behavior in Excel. When I tried your trick I started with a simple dataset of whole numbers from 1 to 20 and created the pivot table with a group interval of 5 which gave me four discrete bins, i.e. 1-5, 6-10, etc. All is good.

    Next, I changed the first value in my dataset to 0.1 which changed the bins to 0.1-5.1, 5.1-10.1, 10.1-15.1, etc. Now my bins overlap…why and why now?

    Then, with bin #1 = 0.1-5.1 and bin #2 = 5.1-10.1 I changed the fifth value in the dataset from 5 to 5.1 tentatively expecting it to be included in bin #1, however Excel included it in bin #2. Is this expected and by specific design?

    Thanks

  8. General Ledger says:

    Jared,

    I got the exact same problem as you when using 2007 but fine with 2003. While in 2007, it seems the row data are left justified and being treated as text even though the raw data is formatted as Accounting. With the cursor on a row data and right click. The Field Setting option does not let you change the format of the numbers to Accounting, etc.

    However, the order of the row data suggests Excel knows they are numbers (1000 is after 300).

    Help!?!?

  9. Mathias says:

    Very nice trick – came at the right time, to, as I needed to generate the distribution of a few sets of data, thank you!
    And I got it to work with Excel 2007 with no issue, on a field which was calculated, and formatted as %. Jared, General Ledger, what are your data?

  10. Colin Banfield says:

    I’ve often wished for an option to show “empty” groups (in your example, if there was no data for the 200-300 group, the next row after 100-200 would be 300-400). Such an option would be useful for some types of frequency distributions e.g graphical stem and leaf plots.

  11. Mark Truss says:

    Very helpful – exactly what I needed – thank you.

  12. Gusto says:

    Excellent tip. Too bad I can’t set the interval manually with time data either.

  13. tani says:

    wonderful tip…just does what I want
    thanks very much

  14. Daryl says:

    A year later and still nice…
    Thanks for posting this

  15. Fanboi says:

    This allows me to bucket the values with an even distribution only… how can I set things up so that the bucket sizes are variable (different-size ranges)?

  16. jeff weir says:

    @Fanboi – Jon Peltier has some utilities on his site that will do this for you. Google his name and the phrase “Cascade Chart” and you should get there.

  17. jeff weir says:

    @Fanboi – in fact if you google this you’ll get some links from Jon’s site regarding this issue:
    “variable width column charts” “Jon Peltier”

  18. Windfery says:

    Hey good stuff, but what happens if i’ve got a gap in my x axis more than the “group by”?

    It appears to ignore the missing data range in the x axis. anyone know how to fix so I get a properly scaled axis?

  19. Niels says:

    This is an INCORRECT METHOD, if values are duplicated (usually with absolute values)

    e.g., I have the following set:

    1
    2
    2
    3
    4
    etc

    Bin 0-3 should give 4, but using this method, it gives three.

  20. Tom says:

    Great tip!

    I am getting the same “Cannot group that selection.” message. In my case, the first column in my pivot table is age, in whole numbers.
    As a work-around, select the categories that fit one of your bins sizes. Then go to the Options Tab, and use “Group by Selection”. Rename the GroupXX to your choice.
    It’s a bit tedious, but works and is recognized by the pivot chart. :)
    The simplest approach would be to find out why Excel 2010 will not allow me to “Group” that field.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

* Copy this password:

* Type or paste password here:

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>