agate.aggregations

This module contains the Aggregation class and its various subclasses. Each of these classes processes a column’s data and returns some value(s). For instance, Mean, when applied to a column containing Number data, returns a single decimal.Decimal value which is the average of all values in that column.

Aggregations are applied to single columns using the Table.aggregate() method. There result is a single value if a single aggregation was applied, or a tuple of values if a sequence of aggregations was applied.

Aggregations are applied to instances of TableSet using the Tableset.aggregate() method. The result will be a new Table with a column for each aggregation and a row for each table in the set.

class agate.aggregations.Aggregation

Bases: object

An operation that takes a table and produces a single value summarizing one of it’s columns. Aggregations are invoked with TableSet.aggregate.

When implementing a custom subclass, ensure that the values returned by run() are of the type specified by get_aggregate_data_type(). This can be ensured by using the DataType.cast() method. See Formula for an example.

get_aggregate_data_type(table)

Get the data type that should be used when using this aggregation with a TableSet to produce a new column.

Should raise UnsupportedAggregationError if this column does not support aggregation into a TableSet. (For example, if it does not return a single value.)

validate(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by Table.aggregate() before run().

run(table)

Execute this aggregation on a given column and return the result.

class agate.aggregations.Summary(column_name, data_type, func)

Bases: agate.aggregations.Aggregation

An aggregation that can apply an arbitrary function to a column.

Parameters:
  • column_name – The column being summarized.
  • data_type – The return type of this aggregation.
  • func – A function which will be passed the column for processing.
get_aggregate_data_type(table)
run(table)
class agate.aggregations.HasNulls(column_name)

Bases: agate.aggregations.Aggregation

Returns True if the column contains null values.

get_aggregate_data_type(table)
run(table)
class agate.aggregations.Any(column_name, test=None)

Bases: agate.aggregations.Aggregation

Returns True if any value in a column passes a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.All(column_name, test=None)

Bases: agate.aggregations.Aggregation

Returns True if all values in a column pass a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(table)
validate(table)
run(table)
Returns:bool
class agate.aggregations.Count(column_name=None, value=<object object>)

Bases: agate.aggregations.Aggregation

Count values. If no arguments are specified, this is simply a count of the number of rows in the table. If only column_name is specified, this will count the number of non-null values in that column. If both column_name and value are specified, then it will count occurrences of a specific value in the specified column will be counted.

Parameters:
  • column_name – A column to count values in.
  • value – Any value to be counted, including None.
get_aggregate_data_type(table)
run(table)
class agate.aggregations.Min(column_name)

Bases: agate.aggregations.Aggregation

Compute the minimum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Max(column_name)

Bases: agate.aggregations.Aggregation

Compute the maximum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.MaxPrecision(column_name)

Bases: agate.aggregations.Aggregation

Compute the most decimal places present for any value in this column.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Sum(column_name)

Bases: agate.aggregations.Aggregation

Compute the sum of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Mean(column_name)

Bases: agate.aggregations.Aggregation

Compute the mean value of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Median(column_name)

Bases: agate.aggregations.Aggregation

Compute the median value of a column containing Number data.

This is the 50th percentile. See Percentiles for implementation details.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Mode(column_name)

Bases: agate.aggregations.Aggregation

Compute the mode value of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.IQR(column_name)

Bases: agate.aggregations.Aggregation

Compute the interquartile range of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Variance(column_name)

Bases: agate.aggregations.Aggregation

Compute the sample variance of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.PopulationVariance(column_name)

Bases: agate.aggregations.Variance

Compute the population variance of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.StDev(column_name)

Bases: agate.aggregations.Aggregation

Compute the sample standard of deviation of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.PopulationStDev(column_name)

Bases: agate.aggregations.StDev

Compute the population standard of deviation of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.MAD(column_name)

Bases: agate.aggregations.Aggregation

Compute the median absolute deviation of a column containing Number data.

get_aggregate_data_type(table)
validate(table)
run(table)
class agate.aggregations.Percentiles(column_name)

Bases: agate.aggregations.Aggregation

Divides a Number column into 100 equal-size groups using the “CDF” method.

See this explanation of the various methods for computing percentiles.

“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.

A reference implementation was provided by pycalcstats.

This aggregation can not be applied to a TableSet.

validate(table)
run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Quartiles(column_name)

Bases: agate.aggregations.Aggregation

The quartiles of a Number column based on the 25th, 50th and 75th percentiles.

“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

validate(table)
run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Quintiles(column_name)

Bases: agate.aggregations.Aggregation

The quintiles of a column based on the 20th, 40th, 60th and 80th percentiles.

“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

validate(table)
run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Deciles(column_name)

Bases: agate.aggregations.Aggregation

The deciles of a column based on the 10th, 20th ... 90th percentiles.

“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

validate(table)
run(table)
Returns:An instance of Quantiles.
class agate.aggregations.MaxLength(column_name)

Bases: agate.aggregations.Aggregation

Calculates the longest string in a column containing Text data.

get_aggregate_data_type(table)
validate(table)
run(table)
Returns:int.