Aggregations

Aggregations create a new value by summarizing a Column. For example, Mean, when applied to a column containing Number data, returns a single decimal.Decimal value which is the average of all values in that column.

Aggregations can be applied to single columns using the Table.aggregate() method. The result is a single value if a one aggregation was applied, or a tuple of values if a sequence of aggregations was applied.

Aggregations can be applied to instances of TableSet using the TableSet.aggregate() method. The result is a new Table with a column for each aggregation and a row for each table in the set.

agate.Aggregation Aggregations create a new value by summarizing a Column.
agate.Summary Apply an arbitrary function to a column.

Basic aggregations

agate.All Check if all values in a column pass a test.
agate.Any Check if any value in a column passes a test.
agate.Count Count occurences of a value or values.
agate.HasNulls Check if the column contains null values.
agate.Min Find the minimum value in a column.
agate.Max Find the maximum value in a column.
agate.MaxPrecision Find the most decimal places present for any value in this column.

Statistical aggregations

agate.Deciles Calculate the deciles of a column based on its percentiles.
agate.IQR Calculate the interquartile range of a column.
agate.MAD Calculate the median absolute deviation of a column.
agate.Mean Calculate the mean of a column.
agate.Median Calculate the median of a column.
agate.Mode Calculate the mode of a column.
agate.Percentiles Divide a column into 100 equal-size groups using the “CDF” method.
agate.PopulationStDev Calculate the population standard of deviation of a column.
agate.PopulationVariance Calculate the population variance of a column.
agate.Quartiles Calculate the quartiles of column based on its percentiles.
agate.Quintiles Calculate the quintiles of a column based on its percentiles.
agate.StDev Calculate the sample standard of deviation of a column.
agate.Sum Calculate the sum of a column.
agate.Variance Calculate the sample variance of a column.

Text aggregations

agate.MaxLength Find the length of the longest string in a column.

Detailed list

class agate.Aggregation

Bases: object

Aggregations create a new value by summarizing a Column.

Aggregations are applied with Table.aggregate() and TableSet.aggregate().

When creating a custom aggregation, ensure that the values returned by Aggregation.run() are of the type specified by Aggregation.get_aggregate_data_type(). This can be ensured by using the DataType.cast() method. See Summary for an example.

get_aggregate_data_type(table)

Get the data type that should be used when using this aggregation with a TableSet to produce a new column.

Should raise UnsupportedAggregationError if this column does not support aggregation into a TableSet. (For example, if it does not return a single value.)

run(table)

Execute this aggregation on a given column and return the result.

validate(table)

Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by Table.aggregate() before run().

class agate.All(column_name, test)

Bases: agate.aggregations.base.Aggregation

Check if all values in a column pass a test.

Parameters:
  • column_name – The name of the column to check.
  • test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
run(table)
Returns:bool
class agate.Any(column_name, test)

Bases: agate.aggregations.base.Aggregation

Check if any value in a column passes a test.

Parameters:
  • column_name – The name of the column to check.
  • test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
class agate.Count(column_name=None, value=<object object>)

Bases: agate.aggregations.base.Aggregation

Count occurences of a value or values.

This aggregation can be used in three ways:

  1. If no arguments are specified, then it will count the number of rows in the table.
  2. If only column_name is specified, then it will count the number of non-null values in that column.
  3. If both column_name and value are specified, then it will count occurrences of a specific value.
Parameters:
  • column_name – The column containing the values to be counted.
  • value – Any value to be counted, including None.
class agate.Deciles(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the deciles of a column based on its percentiles.

Deciles will be equivalent to the 10th, 20th … 90th percentiles.

“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

Parameters:column_name – The name of a column containing Number data.
run(table)
Returns:An instance of Quantiles.
class agate.HasNulls(column_name)

Bases: agate.aggregations.base.Aggregation

Check if the column contains null values.

Parameters:column_name – The name of the column to check.
class agate.IQR(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the interquartile range of a column.

Parameters:column_name – The name of a column containing Number data.
class agate.MAD(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the median absolute deviation of a column.

Parameters:column_name – The name of a column containing Number data.
class agate.Min(column_name)

Bases: agate.aggregations.base.Aggregation

Find the minimum value in a column.

This aggregation can be applied to columns containing Date, DateTime, or Number data.

Parameters:column_name – The name of the column to be searched.
class agate.Max(column_name)

Bases: agate.aggregations.base.Aggregation

Find the maximum value in a column.

This aggregation can be applied to columns containing Date, DateTime, or Number data.

Parameters:column_name – The name of the column to be searched.
class agate.MaxLength(column_name)

Bases: agate.aggregations.base.Aggregation

Find the length of the longest string in a column.

Note: On Python 2.7 this function may miscalcuate the length of unicode strings that contain “wide characters”. For details see this StackOverflow answer: http://stackoverflow.com/a/35462951

Parameters:column_name – The name of a column containing Text data.
run(table)
Returns:int.
class agate.MaxPrecision(column_name)

Bases: agate.aggregations.base.Aggregation

Find the most decimal places present for any value in this column.

Parameters:column_name – The name of the column to be searched.
class agate.Mean(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the mean of a column.

Parameters:column_name – The name of a column containing Number data.
class agate.Median(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the median of a column.

Median is equivalent to the 50th percentile. See Percentiles for implementation details.

Parameters:column_name – The name of a column containing Number data.
class agate.Mode(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the mode of a column.

Parameters:column_name – The name of a column containing Number data.
class agate.Percentiles(column_name)

Bases: agate.aggregations.base.Aggregation

Divide a column into 100 equal-size groups using the “CDF” method.

See this explanation of the various methods for computing percentiles.

“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.

A reference implementation was provided by pycalcstats.

This aggregation can not be applied to a TableSet.

Parameters:column_name – The name of a column containing Number data.
run(table)
Returns:An instance of Quantiles.
class agate.PopulationStDev(column_name)

Bases: agate.aggregations.stdev.StDev

Calculate the population standard of deviation of a column.

For the sample standard of deviation see StDev.

Parameters:column_name – The name of a column containing Number data.
class agate.PopulationVariance(column_name)

Bases: agate.aggregations.variance.Variance

Calculate the population variance of a column.

For the sample variance see Variance.

Parameters:column_name – The name of a column containing Number data.
class agate.Quartiles(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the quartiles of column based on its percentiles.

Quartiles will be equivalent to the the 25th, 50th and 75th percentiles.

“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

Parameters:column_name – The name of a column containing Number data.
run(table)
Returns:An instance of Quantiles.
class agate.Quintiles(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the quintiles of a column based on its percentiles.

Quintiles will be equivalent to the 20th, 40th, 60th and 80th percentiles.

“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.

See Percentiles for implementation details.

This aggregation can not be applied to a TableSet.

Parameters:column_name – The name of a column containing Number data.
run(table)
Returns:An instance of Quantiles.
class agate.StDev(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the sample standard of deviation of a column.

For the population standard of deviation see PopulationStDev.

Parameters:column_name – The name of a column containing Number data.
class agate.Sum(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the sum of a column.

Parameters:column_name – The name of a column containing Number data.
class agate.Summary(column_name, data_type, func, cast=True)

Bases: agate.aggregations.base.Aggregation

Apply an arbitrary function to a column.

Parameters:
  • column_name – The name of a column to be summarized.
  • data_type – The return type of this aggregation.
  • func – A function which will be passed the column for processing.
  • cast – If True, each return value will be cast to the specified data_type to ensure it is valid. Only disable this if you are certain your summary always returns the correct type.
class agate.Variance(column_name)

Bases: agate.aggregations.base.Aggregation

Calculate the sample variance of a column.

For the population variance see PopulationVariance.

Parameters:column_name – The name of a column containing Number data.