agate.aggregations

This module contains the Aggregation class and its various subclasses. Each of these classes processes a column’s data and returns some value(s). For instance, Mean, when applied to a column containing Number data, returns a single decimal.Decimal value which is the average of all values in that column.

Aggregations are applied to single columns using the Table.aggregate() method. There result is a single value if a single aggregation was applied, or a tuple of values if a sequence of aggregations was applied.

Aggregations are applied to instances of TableSet using the Tableset.aggregate() method. The result will be a new Table with a column for each aggregation and a row for each table in the set.

class agate.aggregations.Aggregation

Bases: object

Base class defining an operation that can be executed on a column using TableSet.aggregate.

get_aggregate_data_type(table)

Get the data type that should be used when using this aggregation with a TableSet to produce a new column.

Should raise UnsupportedAggregationError if this column does not support aggregation into a TableSet. (For example, if it does not return a single value.)

run(table)

Execute this aggregation on a given column and return the result.

class agate.aggregations.Summary(column_name, data_type, func, cache_key=None)

Bases: agate.aggregations.Aggregation

An aggregation that can apply an arbitrary function to a column.

get_aggregate_data_type(table)
run(table)
class agate.aggregations.HasNulls(column_name)

Bases: agate.aggregations.Aggregation

Returns True if the column contains null values.

get_aggregate_data_type(table)
run(table)
Returns:bool
class agate.aggregations.Any(column_name, test=None)

Bases: agate.aggregations.Aggregation

Returns True if any value in a column passes a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(table)
run(table)
Returns:bool
class agate.aggregations.All(column_name, test=None)

Bases: agate.aggregations.Aggregation

Returns True if all values in a column pass a truth test. The truth test may be omitted when testing Boolean data.

Parameters:test – A function that takes a value and returns True or False.
get_aggregate_data_type(table)
run(table)
Returns:bool
class agate.aggregations.Length

Bases: agate.aggregations.Aggregation

Count the total number of values in the column.

Equivalent to calling len() on a Column.

get_aggregate_data_type(table)
run(table)
Returns:int
class agate.aggregations.Count(column_name, value)

Bases: agate.aggregations.Aggregation

Count the number of times a specific value occurs in a column.

If you want to count the total number of rows in a column use Length.

Parameters:value – Any value to be counted, including None.
get_aggregate_data_type(table)
run(table)
Returns:int
class agate.aggregations.Min(column_name)

Bases: agate.aggregations.Aggregation

Compute the minimum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(table)
run(table)
Returns:A single value whose type is dependent on the type of the column.
class agate.aggregations.Max(column_name)

Bases: agate.aggregations.Aggregation

Compute the maximum value in a column. May be applied to columns containing DateTime or Number data.

get_aggregate_data_type(table)
run(table)
Returns:A single value whose type is dependent on the type of the column.
class agate.aggregations.MaxPrecision(column_name)

Bases: agate.aggregations.Aggregation

Compute the most decimal places present for any value in this column.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Sum(column_name)

Bases: agate.aggregations.Aggregation

Compute the sum of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Mean(column_name)

Bases: agate.aggregations.Aggregation

Compute the mean value of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Median(column_name)

Bases: agate.aggregations.Aggregation

Compute the median value of a column containing Number data.

This is the 50th percentile. See Percentiles for implementation details.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Mode(column_name)

Bases: agate.aggregations.Aggregation

Compute the mode value of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.IQR(column_name)

Bases: agate.aggregations.Aggregation

Compute the interquartile range of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Variance(column_name)

Bases: agate.aggregations.Aggregation

Compute the sample variance of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.PopulationVariance(column_name)

Bases: agate.aggregations.Variance

Compute the population variance of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.StDev(column_name)

Bases: agate.aggregations.Aggregation

Compute the sample standard of deviation of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.PopulationStDev(column_name)

Bases: agate.aggregations.StDev

Compute the population standard of deviation of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.MAD(column_name)

Bases: agate.aggregations.Aggregation

Compute the median absolute deviation of a column containing Number data.

get_aggregate_data_type(table)
run(table)
Returns:decimal.Decimal.
class agate.aggregations.Percentiles(column_name)

Bases: agate.aggregations.Aggregation

Divides a Number column into 100 equal-size groups using the “CDF” method.

See this explanation of the various methods for computing percentiles.

“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.

A reference implementation was provided by pycalcstats.

This aggregation can not be applied to a TableSet.

run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Quartiles(column_name)

Bases: agate.aggregations.Aggregation

The quartiles of a Number column based on the 25th, 50th and 75th percentiles.

“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Quintiles(column_name)

Bases: agate.aggregations.Aggregation

The quintiles of a column based on the 20th, 40th, 60th and 80th percentiles.

“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

run(table)
Returns:An instance of Quantiles.
class agate.aggregations.Deciles(column_name)

Bases: agate.aggregations.Aggregation

The deciles of a column based on the 10th, 20th ... 90th percentiles.

“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

run(table)
Returns:An instance of Quantiles.
class agate.aggregations.MaxLength(column_name)

Bases: agate.aggregations.Aggregation

Calculates the longest string in a column containing Text data.

get_aggregate_data_type(table)
run(table)
Returns:int.