logo
Archive

How To Use Grouper In Python

author
Bob Roberts
• Monday, 07 December, 2020
• 7 min read

Every once in a while it is useful to take a step back and look at pandas’ functions and see if there is a new or better way to do things. I was recently working on a problem and noticed that pandas had a Grouper function that I had never used before.

python pandas aggregation groupby grouping named agg update summarising analysis dataframe apply library aggregating using statistics sample 1024 ie multiple
(Source: www.shanelynn.ie)

Contents

I looked into how it can be used and it turns out it is useful for the type of summary analysis I tend to do on a frequent basis. In addition to functions that have been around a while, pandas continues to provide new and improved capabilities with every release.

The updated AGG function is another very useful and intuitive tool for summarizing data. This article will walk through how and why you may want to use the Grouper and AGG functions on your own data.

Pandas’ origins are in the financial industry so it should not be a surprise that it has robust capabilities to manipulate and summarize time series data. Just look at the extensive time series documentation to get a feel for all the options.

For this example, I’ll use my trusty transaction data that I’ve used in other articles. These strings are used to represent various common time frequencies like days vs. weeks vs. years.

Since group by is one of my standard functions, this approach seems simpler to me and it is more likely to stick in my brain. The nice benefit of this capability is that if you are interested in looking at data summarized in a different time frame, just change the freq parameter to one of the valid offset aliases.

pandas groupby split apply combine objects dataframe functions applying custom another
(Source: medium.com)

If your annual sales were on a non-calendar basis, then the data can be easily changed by modifying the freq parameter. When dealing with summarizing time series data, this is incredibly handy.

It is certainly possible (using pivot tables and custom grouping) but I do not think it is nearly as intuitive as the pandas approach. In pandas 0.20.1, there was a new AGG function added that makes it a lot simpler to summarize data in a manner similar to the group by API.

Fortunately we can pass a dictionary to AGG and specify what operations to apply to each column. In the past, I would run the individual calculations and build up the resulting data frame a row at a time.

The aggregate function using a dictionary is useful but one challenge is that it does not preserve order. This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

Convention {‘start’, ‘end’, ‘e’, ‘s’} If grouper is PeriodIndex and freq parameter is passed. Base int, default 0 Only when freq parameter is passed.

plot format grouper plt convert readable pd human python months
(Source: stackoverflow.com)

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. Loffset STR, Dateset, time delta object Only when freq parameter is passed.

Dropna built, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Grouper recipe is an extended tool set made using an existing Liverpool as building blocks. If the iterable are of uneven length, missing values are filled-in with fill value.

The extended tools offer the same high performance as the underlying tool set. Code volume is kept small by linking the tools together in a functional style which helps eliminate temporary variables.

High speed is retained by preferring “vectorized” building blocks over the use of for-loops and generators which incur interpreter overhead. To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

analysis science python dataquest pandas io methods tutorial
(Source: www.pinterest.com)

These examples are extracted from open source projects. The reason that a DataFrameGroupBy object can be difficult to wrap your head around is that it’s lazy in nature.

It can be difficult to inspect of.group by(“state”) because it does virtually none of these things until you do something with the resulting object. It delays virtually every part of the split-apply-combine process until you invoke a method on it.

So, how can you mentally separate the split, apply, and combine stages if you can’t see any of them happening in isolation? One useful way to inspect a Pandas Group object and see the splitting in action is to iterate over it.

If you’re working on a challenging aggregation problem, then iterating over the Pandas Group object can be a great way to visualize the split part of split-apply-combine. Each value is a sequence of the index locations for the rows belonging to that particular group.

In the output above, 4, 19, and 21 are the first indices in of at which the state equals “PA.” It’s also worth mentioning that.group by() does do some, but not all, of the splitting work by building a Grouping class instance for each key that you pass.

pandas python grouping aggregating summarising aggregation analysis
(Source: www.shanelynn.ie)

However, many of the methods of the BaseGrouper class that holds these groupings are called lazily rather than at __unit__(), and many also use a cached property design. You can think of this step of the process as applying the same operation (or callable) to every “sub-table” that is produced by the splitting stage.

It simply takes the results of all the applied operations on all the sub-tables and combines them back together in an intuitive way. The dataset contains members’ first and last names, birthdate, gender, type (“rep” for House of Representatives or “sen” for Senate), U.S. state, and political party.

You can see that most columns of the dataset have the type category, which reduces the memory load on your machine. Now that you’re familiar with the dataset, you’ll start with a “Hello, World!” for the Pandas Group operation.

What is the count of Congressional members, on a state-by-state basis, over the entire history of the dataset? You call.group by() and pass the name of the column you want to group on, which is “state”.

As you’ll see next,.group by() and the comparable SQL statements are close cousins, but they’re often not functionally identical. As we developed this tutorial, we encountered a small but tricky bug in the Pandas source that doesn’t handle the observed parameter well with certain types of data.

python aggregation figure apply combine grouping split viewframes io source binning altair tutorial
(Source: viewframes.co)

This produces a Database with three columns and a Rangefinder, rather than a Series with a Multitude. In short, using as_index=False will make your result more closely mimic the default SQL output for a similar operation.

Related Videos

Other Articles You Might Be Interested In

01: The Grouper Naples Fl
02: The Jupiter Inlet Square Grouper Tiki Bar
03: The Man Who Died In His Boat
04: The Square Grouper Cudjoe Key Fl
05: The Square Grouper Restaurant Cudjoe Key Fl
06: The Ugly Grouper Menu Prices
07: Other Names For Goliath Grouper
08: Other Names For Grouper Fish
09: For Florida Fishing Dogs
10: For Florida Fishing Eso
Sources
1 elderscrollsonline.wiki.fextralife.com - https://elderscrollsonline.wiki.fextralife.com/Ultimate+Guide+to+fishing
2 myfwc.com - https://myfwc.com/license/recreational/do-i-need-one/
3 www.visitflorida.com - https://www.visitflorida.com/en-us/things-to-do/florida-fishing/florida-fishing-rules-regulations.html
4 myfwc.com - https://myfwc.com/license/
5 forums.elderscrollsonline.com - https://forums.elderscrollsonline.com/en/discussion/261671/best-fishing-spots