I looked into how it can be used and it turns out it is useful for the type of summary analysis I tend to do on a frequent basis. In addition to functions that have been around a while, pandas continues to provide new and improved capabilities with every release.
The updated AGG function is another very useful and intuitive tool for summarizing data. This article will walk through how and why you may want to use the Grouper and AGG functions on your own data.
Pandas’ origins are in the financial industry so it should not be a surprise that it has robust capabilities to manipulate and summarize time series data. Just look at the extensive time series documentation to get a feel for all the options.
These strings are used to represent various common time frequencies like days vs. weeks vs. years. For example, if you were interested in summarizing all the sales by month, you could use the resample function.
Instead of having to play around with reindexing, we can use our normal group by syntax but provide a little more info on how to group the data in the date column: Since group by is one of my standard functions, this approach seems simpler to me and it is more likely to stick in my brain.
The nice benefit of this capability is that if you are interested in looking at data summarized in a different time frame, just change the freq parameter to one of the valid offset aliases. If your annual sales were on a non-calendar basis, then the data can be easily changed by modifying the freq parameter.
When dealing with summarizing time series data, this is incredibly handy. It is certainly possible (using pivot tables and custom grouping) but I do not think it is nearly as intuitive as the pandas approach.
In pandas 0.20.1, there was a new AGG function added that makes it a lot simpler to summarize data in a manner similar to the group by API. Fortunately we can pass a dictionary to AGG and specify what operations to apply to each column.
I find this approach really handy when I want to summarize several columns of data. In the past, I would run the individual calculations and build up the resulting data frame a row at a time.
The aggregate function using a dictionary is useful but one challenge is that it does not preserve order. Each year, between August and September, Goliath Groupers migrate in by the hundreds to spawn around the wrecks and reefs surrounding Palm Beach County.
We have trips planned to our beautiful local reefs and wrecks throughout the month of August to get you up close, swimming among huge schools of these magnificent fish. Goliath's primarily feed on crustaceans, but they’ve been known to steal an opportunistic meal from an unsuspecting angler or diver, especially during mating aggregations.
Historically, fishermen loved to catch Goliath Grouper as they were considered to be of fine food quality. However, research now shows that the flesh of Goliath Grouper is too high in mercury content to be edible to humans.
To attempt a population recovery, a harvest ban was put into place in 1990 in Florida, in 1993 in the Caribbean, and is still in effect. The Goliath grouper is considered critically endangered by the IUCN and a long recovery time is suspected as these fish exhibit slow growth rates.
Goliath groupers are the largest predatory, reef-dwelling fish species in the Caribbean, weighing as much as 660 pounds (300 kg) when fully grown. Game fishermen in the 20th century hunted them to such extremes that the species was near extinction along the Florida coastline by the 1990s.
Those attempting to spot them in typical dive locations off the Palm Beach coast will be disappointed. Depending on the year, Goliath grouper numbers fluctuate between the different wrecks, with some enjoy a higher concentration than others.
Due to their sheer size, Goliath groupers often intimidate divers in the water. Divers who approach the fish should remain calm, not chase them, and swim slowly to avoid spooking them.
Following those rules and visiting the right sites at the right time of the year means a good chance to see one of the ocean’s truly spectacular spawning events. A Pad MST instructor, Chris graduated from university in 2016 with a degree in film and journalism.
His documentary depicting the adapting generation of whale hunters on the island of PICO won a Royal Television Award in 2017. I could not resist posting this short but awesome video of a Goliath grouper aggregation in the waters off Jupiter, Florida.
Well, they are more concerned with the larger fish, due to their skeletal structure not being able to handle the pressures of being out of the water. The FCC says on their regulations page that “removing smaller Goliath groupers from the water to remove hooks is not necessarily a bad practice, but this practice must be done with care”.
However, as the photos across the internet shows, many proud fishermen still take Goliath grouper (Epimetheus Tamara), out of the water for a quick photo before releasing them back to the depths. These days, most all bottom fishermen have experienced the sudden, and heavy tug of a Goliath grouper as it ‘chomps’ down on the fish that is on your hook.
Fish Spawning Aggregations (FSA) represent a vulnerable ecological process. In many species, individuals travel long distances to reproduce in large groups that occur in a given place and time during short periods, especially in the days after the full moon.
In the State of Quintana Roo, Mexico, information on spawning aggregations has been recorded since 1955 . At present a few of these FSA sites still exist and visual verification has been completed , , , while others remain unidentified.
Since 2008, Communized y Biodiversity, A.C. has collaborated with fishing cooperatives to locate and visually confirm the FSA's. With the prior knowledge that these sites are often found close to reef structures with particular geophysical characteristics, the sites are located using bathymetric maps, the local ecological knowledge of the fishermen, and in situ exploration by SCUBA.
To date, 38 fishers from five cooperatives have been engaged in exploring 25 potential FSA sites in the center and south of the state. These sites had previously been roughly identified from the fishers’ knowledge , however only once the precise location is known can management strategies be developed, for example through small marine reserves that can have large impacts on the populations of these commercially important fish .
One of the largest documented FSA in Mexico was subjected to fishing pressure for more than 80 years and today the aggregation no longer forms . During the monitoring of FSA sites, fishermen observe signs indicating that the fish are ready to spawn.
Fishermen record data during spawning periods in order to support the development of proposals for marine reserves that can contribute to population recovery. Fishermen and researchers have characterized 25 potential fish spawning aggregation sites during 140 hours of bathymetric and over 150 SCUBA dives in Quintana Roo.
Biogeography of transient reef-fish spawning aggregations in the Caribbean: a synthesis for future research and management. Medina-Quej A, Herrera-Pavón AR, Poot-López G, Sosa-Cordero E, Bolio-Moguel K, Haddad W. Studio preliminary DE la aggregation Del hero Epimetheus stratus en ‘El Blanquizal en la Costa SUR DE Quintana Roo, México.
Preliminary observations of the spawning aggregation of Nassau grouper, Epimetheus stratus, at Manual, Quintana Roo, Mexico. Sosa-Cordero E, Medina-Quej A, Herrera R, Aguilar-Dávila W. Aggregations reproductive DE pieces en El system arrival Mesoamerican.
Informed prepared para El consul tor international, Research Planning Inc., y Project SAM-Banco Mundial-Belice. Fish spawning aggregations: where well-placed management actions can yield big benefits for fisheries and conservation.
Proceedings of the Gulf and Caribbean Fisheries Institute, 66, 382-386 Archer, S. K., Respell, S. A., Segment, B. X., Pattengill-Semmens, C. V., Bush, P. G., McCoy, C. M., Johnson, B. C. 2012. In some cases, this level of analysis may be sufficient to answer business questions.
In other instances, this activity might be the first step in a more complex data science analysis. However, they might be surprised at how useful complex aggregation functions can be for supporting sophisticated analysis.
In the context of this article, an aggregation function is one which takes multiple individual values and returns a summary. The most common aggregation functions are a simple average or summation of values.
Here’s a quick example of calculating the total and average fare using the Titanic dataset (loaded from seaborne): One area that needs to be discussed is that there are multiple ways to call an aggregation function.
As shown above, you may pass a list of functions to apply to one or more columns of data. The tuple approach is limited by only being able to apply one aggregation at a time to a specific column.
The most common built-in aggregation functions are basic math functions including sum, mean, median, minimum, maximum, standard deviation, variance, mean absolute deviation and product. As an aside, I have not found a good usage for the prod function which computes the product of all the values in a group.
After basic math, counting is the next most common aggregation I perform on grouped data. The major distinction to keep in mind is that count will not include Nan values whereas size will.
In addition, the unique function will exclude Nan values in the unique counts. Keep reading for an example of how to include Nan in the unique value counts.
In this example, we can select the highest and lowest fare by embarked town. One important point to remember is that you must sort the data first if you want first and last to pick the max and min values.
In the example above, I would recommend using max and min, but I am including first and last for the sake of completeness. The city.stats mode function returns the most frequent value as well as the count of occurrences.
This summary of the class and deck shows how this approach can be useful for some data sets. This is an area of programmer preference but I encourage you to be familiar with the options since you will encounter most of these in online solutions.
Like many other areas of programming, this is an element of style and preference but I encourage you to pick one or two approaches and stick with them for consistency. As shown above, there are multiple approaches to developing custom aggregation functions.
Using apply with group gives maximum flexibility over all aspects of the results. For the first example, we can figure out what percentage of the total fares sold can be attributed to each embark_town and class combination.
One important thing to keep in mind is that you can actually do this more simply using a pd.cross tab as described in my previous article : While we are talking about cross tab, a useful concept to keep in mind is that AGG functions can be combined with pivot tables too.
To understand this, you need to look at the quarter boundary (end of March through start of April) to get a good sense of what is going on. If you want to just get a cumulative quarterly total, you can chain multiple group by functions.
In this example, I included the named aggregation approach to rename the variable to clarify that it is now daily sales. By default, pandas creates a hierarchical column index on the summary Database.
At some point in the analysis process you will likely want to “flatten” the columns so that there is a single row of names. Just keep in mind that it will be easier for your subsequent analysis if the resulting column names do not have spaces.
Refer to the package documentation for more examples of how side table can summarize your data. There is a lot of detail here but that is due to how many uses there are for grouping and aggregating data with pandas.
My hope is that this post becomes a useful resource that you can bookmark and come back to when you get stuck with a challenging problem of your own. Home to a large diversity of marine life, these reefs function as a source of both food of tourism for residents of the three islands.
The Cayman Island government also enacted size and daily count limits during the now-limited fishing season, and spear-fishing for Nassau groupers was prohibited altogether. A researcher swims in the midst of a Nassau grouper aggregation as part of an ongoing effort to ... track the critically endangered species numbers.
The team shared their data on the grouper aggregations with the Cayman Islands’ local communities and discussed next steps. In fact, the Project’s data-sharing is part of what moved the Cayman Island government to enact the 2016 regulations to protect the Nassau grouper in the first place.
“You need to partner with groups and governments capable of turning science into conservation decisions that support the local community.” A previous version of this article erroneously said the Cayman Islands Department of Energy was involved in this project.
A previous version of this article erroneously said whale sharks and mantas feed on Cayman grouper eggs. Liz covers marine biology, ecology, and oceanography for Forbes Science and works as an environmental consultant in Northern California.
… Read More Liz covers marine biology, ecology, and oceanography for Forbes Science and works as an environmental consultant in Northern California.