# Aggregations¶

Aggregations create a new value by summarizing a `Column`. For example, `Mean`, when applied to a column containing `Number` data, returns a single `decimal.Decimal` value which is the average of all values in that column.

Aggregations can be applied to single columns using the `Table.aggregate()` method. The result is a single value if a one aggregation was applied, or a tuple of values if a sequence of aggregations was applied.

Aggregations can be applied to instances of `TableSet` using the `TableSet.aggregate()` method. The result is a new `Table` with a column for each aggregation and a row for each table in the set.

 `agate.Aggregation` Aggregations create a new value by summarizing a `Column`. `agate.Summary` Apply an arbitrary function to a column.

## Basic aggregations¶

 `agate.All` Check if all values in a column pass a test. `agate.Any` Check if any value in a column passes a test. `agate.Count` Count occurences of a value or values. `agate.HasNulls` Check if the column contains null values. `agate.Min` Find the minimum value in a column. `agate.Max` Find the maximum value in a column. `agate.MaxPrecision` Find the most decimal places present for any value in this column.

## Statistical aggregations¶

 `agate.Deciles` Calculate the deciles of a column based on its percentiles. `agate.IQR` Calculate the interquartile range of a column. `agate.MAD` Calculate the median absolute deviation of a column. `agate.Mean` Calculate the mean of a column. `agate.Median` Calculate the median of a column. `agate.Mode` Calculate the mode of a column. `agate.Percentiles` Divide a column into 100 equal-size groups using the “CDF” method. `agate.PopulationStDev` Calculate the population standard of deviation of a column. `agate.PopulationVariance` Calculate the population variance of a column. `agate.Quartiles` Calculate the quartiles of column based on its percentiles. `agate.Quintiles` Calculate the quintiles of a column based on its percentiles. `agate.StDev` Calculate the sample standard of deviation of a column. `agate.Sum` Calculate the sum of a column. `agate.Variance` Calculate the sample variance of a column.

## Text aggregations¶

 `agate.MaxLength` Find the length of the longest string in a column.

## Detailed list¶

class `agate.``Aggregation`

Bases: `object`

Aggregations create a new value by summarizing a `Column`.

Aggregations are applied with `Table.aggregate()` and `TableSet.aggregate()`.

When creating a custom aggregation, ensure that the values returned by `Aggregation.run()` are of the type specified by `Aggregation.get_aggregate_data_type()`. This can be ensured by using the `DataType.cast()` method. See `Summary` for an example.

`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``All`(column_name, test)

Bases: `agate.aggregations.base.Aggregation`

Check if all values in a column pass a test.

Parameters: column_name – The name of the column to check. test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)
Returns: `bool`
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Any`(column_name, test)

Bases: `agate.aggregations.base.Aggregation`

Check if any value in a column passes a test.

Parameters: column_name – The name of the column to check. test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Count`(column_name=None, value=<object object>)

Bases: `agate.aggregations.base.Aggregation`

Count occurences of a value or values.

This aggregation can be used in three ways:

1. If no arguments are specified, then it will count the number of rows in the table.
2. If only `column_name` is specified, then it will count the number of non-null values in that column.
3. If both `column_name` and `value` are specified, then it will count occurrences of a specific value.
Parameters: column_name – The column containing the values to be counted. value – Any value to be counted, including `None`.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

class `agate.``Deciles`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the deciles of a column based on its percentiles.

Deciles will be equivalent to the 10th, 20th … 90th percentiles.

“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.

See `Percentiles` for implementation details.

This aggregation can not be applied to a `TableSet`.

Parameters: column_name – The name of a column containing `Number` data.
`run`(table)
Returns: An instance of `Quantiles`.
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``HasNulls`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Check if the column contains null values.

Parameters: column_name – The name of the column to check.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

class `agate.``IQR`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the interquartile range of a column.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``MAD`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the median absolute deviation of a column.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Min`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Find the minimum value in a column.

This aggregation can be applied to columns containing `Date`, `DateTime`, or `Number` data.

Parameters: column_name – The name of the column to be searched.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Max`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Find the maximum value in a column.

This aggregation can be applied to columns containing `Date`, `DateTime`, or `Number` data.

Parameters: column_name – The name of the column to be searched.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``MaxLength`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Find the length of the longest string in a column.

Note: On Python 2.7 this function may miscalcuate the length of unicode strings that contain “wide characters”. For details see this StackOverflow answer: http://stackoverflow.com/a/35462951

Parameters: column_name – The name of a column containing `Text` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)
Returns: `int`.
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``MaxPrecision`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Find the most decimal places present for any value in this column.

Parameters: column_name – The name of the column to be searched.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Mean`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the mean of a column.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Median`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the median of a column.

Median is equivalent to the 50th percentile. See `Percentiles` for implementation details.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Mode`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the mode of a column.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Percentiles`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Divide a column into 100 equal-size groups using the “CDF” method.

See this explanation of the various methods for computing percentiles.

“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.

A reference implementation was provided by pycalcstats.

This aggregation can not be applied to a `TableSet`.

Parameters: column_name – The name of a column containing `Number` data.
`run`(table)
Returns: An instance of `Quantiles`.
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``PopulationStDev`(column_name)

Bases: `agate.aggregations.stdev.StDev`

Calculate the population standard of deviation of a column.

For the sample standard of deviation see `StDev`.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``PopulationVariance`(column_name)

Bases: `agate.aggregations.variance.Variance`

Calculate the population variance of a column.

For the sample variance see `Variance`.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Quartiles`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the quartiles of column based on its percentiles.

Quartiles will be equivalent to the the 25th, 50th and 75th percentiles.

“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.

See `Percentiles` for implementation details.

This aggregation can not be applied to a `TableSet`.

Parameters: column_name – The name of a column containing `Number` data.
`run`(table)
Returns: An instance of `Quantiles`.
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Quintiles`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the quintiles of a column based on its percentiles.

Quintiles will be equivalent to the 20th, 40th, 60th and 80th percentiles.

“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.

See `Percentiles` for implementation details.

This aggregation can not be applied to a `TableSet`.

Parameters: column_name – The name of a column containing `Number` data.
`run`(table)
Returns: An instance of `Quantiles`.
`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``StDev`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the sample standard of deviation of a column.

For the population standard of deviation see `PopulationStDev`.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Sum`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the sum of a column.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.

class `agate.``Summary`(column_name, data_type, func, cast=True)

Bases: `agate.aggregations.base.Aggregation`

Apply an arbitrary function to a column.

Parameters: column_name – The name of a column to be summarized. data_type – The return type of this aggregation. func – A function which will be passed the column for processing. cast – If `True`, each return value will be cast to the specified `data_type` to ensure it is valid. Only disable this if you are certain your summary always returns the correct type.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

class `agate.``Variance`(column_name)

Bases: `agate.aggregations.base.Aggregation`

Calculate the sample variance of a column.

For the population variance see `PopulationVariance`.

Parameters: column_name – The name of a column containing `Number` data.
`get_aggregate_data_type`(table)

Get the data type that should be used when using this aggregation with a `TableSet` to produce a new column.

Should raise `UnsupportedAggregationError` if this column does not support aggregation into a `TableSet`. (For example, if it does not return a single value.)

`run`(table)

Execute this aggregation on a given column and return the result.

`validate`(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by `Table.aggregate()` before `run()`.