agate.aggregations

This module contains the Aggregation class and its various subclasses. Each of these classes processes a column’s data and returns some value(s). For instance, Mean, when applied to a column containing Number data, returns a single decimal.Decimal value which is the average of all values in that column.

Aggregations are applied to instances of Column using the Column.aggregate() method. Typically, the column is first retrieved using the Table.columns attribute.

Most aggregations can also be applied to instances of TableSet using the Tableset.aggregate() method, in which case the result will be a new Table with a column for each aggregation and a row for each table in the set.

class agate.aggregations.Aggregation

Bases: object

Base class defining an operation that can be executed on a column using Table.aggregate() or on a set of columns using TableSet.aggregate.

get_cache_key()

Aggregations can optionally define a cache key that uniquely identifies this operation. If they do they future invocations of this aggregation with the same cache key applied to the same column will use the cached value.

get_aggregate_data_type(column)

Get the data type that should be used when using this aggregation with a TableSet to produce a new column.

Should raise UnsupportedAggregationError if this column does not support aggregation into a TableSet. (For example, if it does not return a single value.)

run(column)

Execute this aggregation on a given column and return the result.

class agate.aggregations.Summary(data_type, func, cache_key=None)

Bases: agate.aggregations.Aggregation

An aggregation that can apply any function to a column.

get_aggregate_data_type(column)
get_cache_key()
run(column)
class agate.aggregations.HasNulls

Bases: agate.aggregations.Aggregation

Returns True if the column contains null values.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:bool
class agate.aggregations.Any(test=None)

Bases: agate.aggregations.Aggregation

Returns True if any value in a column passes a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(column)
run(column)
Returns:bool
class agate.aggregations.All(test=None)

Bases: agate.aggregations.Aggregation

Returns True if all values in a column pass a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(column)
run(column)
Returns:bool
class agate.aggregations.Length

Bases: agate.aggregations.Aggregation

Count the total number of values in the column.

Equivalent to calling len() on a Column.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:int
class agate.aggregations.Count(value)

Bases: agate.aggregations.Aggregation

Count the number of times a specific value occurs in a column.

If you want to count the total number of values in a column use Length.

Parameters:value – Any value to be counted, including None.
get_aggregate_data_type(column)
run(column)
Returns:int
class agate.aggregations.Min

Bases: agate.aggregations.Aggregation

Compute the minimum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(column)
run(column)
Returns:datetime.date
class agate.aggregations.Max

Bases: agate.aggregations.Aggregation

Compute the maximum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(column)
run(column)
Returns:datetime.date
class agate.aggregations.MaxPrecision

Bases: agate.aggregations.Aggregation

Compute the most decimal places present for any value in this column.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Sum

Bases: agate.aggregations.Aggregation

Compute the sum of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Mean

Bases: agate.aggregations.Aggregation

Compute the mean value of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Median

Bases: agate.aggregations.Aggregation

Compute the median value of a column containing Number data.

This is the 50th percentile. See Percentiles for implementation details.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Mode

Bases: agate.aggregations.Aggregation

Compute the mode value of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.IQR

Bases: agate.aggregations.Aggregation

Compute the interquartile range of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Variance

Bases: agate.aggregations.Aggregation

Compute the sample variance of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.PopulationVariance

Bases: agate.aggregations.Variance

Compute the population variance of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.StDev

Bases: agate.aggregations.Aggregation

Compute the sample standard of deviation of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.PopulationStDev

Bases: agate.aggregations.StDev

Compute the population standard of deviation of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.MAD

Bases: agate.aggregations.Aggregation

Compute the median absolute deviation of a column containing Number data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:decimal.Decimal.
class agate.aggregations.Percentiles

Bases: agate.aggregations.Aggregation

Divides a Number column into 100 equal-size groups using the “CDF” method.

See this explanation of the various methods for computing percentiles.

“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.

A reference implementation was provided by pycalcstats.

This aggregation can not be applied to a TableSet.

get_cache_key()
run(column)
Returns:An array of decimal.Decimal.
class agate.aggregations.Quartiles

Bases: agate.aggregations.Aggregation

The quartiles of a Number column based on the 25th, 50th and 75th percentiles.

“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

get_cache_key()
run(column)
class agate.aggregations.Quintiles

Bases: agate.aggregations.Aggregation

The quintiles of a column based on the 20th, 40th, 60th and 80th percentiles.

“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

get_cache_key()
run(column)
class agate.aggregations.Deciles

Bases: agate.aggregations.Aggregation

The deciles of a column based on the 10th, 20th ... 90th percentiles.

“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

get_cache_key()
run(column)
class agate.aggregations.MaxLength

Bases: agate.aggregations.Aggregation

Calculates the longest string in a column containing Text data.

get_aggregate_data_type(column)
get_cache_key()
run(column)
Returns:int.