Aggregations¶
Aggregations create a new value by summarizing a Column
. For
example, Mean
, when applied to a column containing Number
data, returns a single decimal.Decimal
value which is the average of
all values in that column.
Aggregations can be applied to single columns using the Table.aggregate()
method. The result is a single value if a one aggregation was applied, or
a tuple of values if a sequence of aggregations was applied.
Aggregations can be applied to instances of TableSet
using the
TableSet.aggregate()
method. The result is a new Table
with a column for each aggregation and a row for each table in the set.
agate.Aggregation |
Aggregations create a new value by summarizing a Column . |
agate.Summary |
Apply an arbitrary function to a column. |
Basic aggregations¶
agate.All |
Check if all values in a column pass a test. |
agate.Any |
Check if any value in a column passes a test. |
agate.Count |
Count occurences of a value or values. |
agate.HasNulls |
Check if the column contains null values. |
agate.Min |
Find the minimum value in a column. |
agate.Max |
Find the maximum value in a column. |
agate.MaxPrecision |
Find the most decimal places present for any value in this column. |
Statistical aggregations¶
agate.Deciles |
Calculate the deciles of a column based on its percentiles. |
agate.IQR |
Calculate the interquartile range of a column. |
agate.MAD |
Calculate the median absolute deviation of a column. |
agate.Mean |
Calculate the mean of a column. |
agate.Median |
Calculate the median of a column. |
agate.Mode |
Calculate the mode of a column. |
agate.Percentiles |
Divide a column into 100 equal-size groups using the “CDF” method. |
agate.PopulationStDev |
Calculate the population standard of deviation of a column. |
agate.PopulationVariance |
Calculate the population variance of a column. |
agate.Quartiles |
Calculate the quartiles of column based on its percentiles. |
agate.Quintiles |
Calculate the quintiles of a column based on its percentiles. |
agate.StDev |
Calculate the sample standard of deviation of a column. |
agate.Sum |
Calculate the sum of a column. |
agate.Variance |
Calculate the sample variance of a column. |
Text aggregations¶
agate.MaxLength |
Find the length of the longest string in a column. |
Detailed list¶
-
class
agate.
Aggregation
¶ Bases:
object
Aggregations create a new value by summarizing a
Column
.Aggregations are applied with
Table.aggregate()
andTableSet.aggregate()
.When creating a custom aggregation, ensure that the values returned by
Aggregation.run()
are of the type specified byAggregation.get_aggregate_data_type()
. This can be ensured by using theDataType.cast()
method. SeeSummary
for an example.-
get_aggregate_data_type
(table)¶ Get the data type that should be used when using this aggregation with a
TableSet
to produce a new column.Should raise
UnsupportedAggregationError
if this column does not support aggregation into aTableSet
. (For example, if it does not return a single value.)
-
run
(table)¶ Execute this aggregation on a given column and return the result.
-
validate
(table)¶ Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()
beforerun()
.
-
-
class
agate.
All
(column_name, test)¶ Bases:
agate.aggregations.base.Aggregation
Check if all values in a column pass a test.
Parameters: - column_name – The name of the column to check.
- test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
-
class
agate.
Any
(column_name, test)¶ Bases:
agate.aggregations.base.Aggregation
Check if any value in a column passes a test.
Parameters: - column_name – The name of the column to check.
- test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
-
class
agate.
Count
(column_name=None, value=<object object>)¶ Bases:
agate.aggregations.base.Aggregation
Count occurences of a value or values.
This aggregation can be used in three ways:
- If no arguments are specified, then it will count the number of rows in the table.
- If only
column_name
is specified, then it will count the number of non-null values in that column. - If both
column_name
andvalue
are specified, then it will count occurrences of a specific value.
Parameters: - column_name – The column containing the values to be counted.
- value – Any value to be counted, including
None
.
-
class
agate.
Deciles
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the deciles of a column based on its percentiles.
Deciles will be equivalent to the 10th, 20th … 90th percentiles.
“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.
See
Percentiles
for implementation details.This aggregation can not be applied to a
TableSet
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
HasNulls
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Check if the column contains null values.
Parameters: column_name – The name of the column to check.
-
class
agate.
IQR
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the interquartile range of a column.
Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
MAD
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the median absolute deviation of a column.
Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Min
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Find the minimum value in a column.
This aggregation can be applied to columns containing
Date
,DateTime
, orNumber
data.Parameters: column_name – The name of the column to be searched.
-
class
agate.
Max
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Find the maximum value in a column.
This aggregation can be applied to columns containing
Date
,DateTime
, orNumber
data.Parameters: column_name – The name of the column to be searched.
-
class
agate.
MaxLength
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Find the length of the longest string in a column.
Note: On Python 2.7 this function may miscalcuate the length of unicode strings that contain “wide characters”. For details see this StackOverflow answer: http://stackoverflow.com/a/35462951
Parameters: column_name – The name of a column containing Text
data.
-
class
agate.
MaxPrecision
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Find the most decimal places present for any value in this column.
Parameters: column_name – The name of the column to be searched.
-
class
agate.
Mean
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the mean of a column.
Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Median
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the median of a column.
Median is equivalent to the 50th percentile. See
Percentiles
for implementation details.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Mode
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the mode of a column.
Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Percentiles
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Divide a column into 100 equal-size groups using the “CDF” method.
See this explanation of the various methods for computing percentiles.
“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.
A reference implementation was provided by pycalcstats.
This aggregation can not be applied to a
TableSet
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
PopulationStDev
(column_name)¶ Bases:
agate.aggregations.stdev.StDev
Calculate the population standard of deviation of a column.
For the sample standard of deviation see
StDev
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
PopulationVariance
(column_name)¶ Bases:
agate.aggregations.variance.Variance
Calculate the population variance of a column.
For the sample variance see
Variance
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Quartiles
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the quartiles of column based on its percentiles.
Quartiles will be equivalent to the the 25th, 50th and 75th percentiles.
“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.
See
Percentiles
for implementation details.This aggregation can not be applied to a
TableSet
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Quintiles
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the quintiles of a column based on its percentiles.
Quintiles will be equivalent to the 20th, 40th, 60th and 80th percentiles.
“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.
See
Percentiles
for implementation details.This aggregation can not be applied to a
TableSet
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
StDev
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the sample standard of deviation of a column.
For the population standard of deviation see
PopulationStDev
.Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Sum
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the sum of a column.
Parameters: column_name – The name of a column containing Number
data.
-
class
agate.
Summary
(column_name, data_type, func, cast=True)¶ Bases:
agate.aggregations.base.Aggregation
Apply an arbitrary function to a column.
Parameters: - column_name – The name of a column to be summarized.
- data_type – The return type of this aggregation.
- func – A function which will be passed the column for processing.
- cast – If
True
, each return value will be cast to the specifieddata_type
to ensure it is valid. Only disable this if you are certain your summary always returns the correct type.
-
class
agate.
Variance
(column_name)¶ Bases:
agate.aggregations.base.Aggregation
Calculate the sample variance of a column.
For the population variance see
PopulationVariance
.Parameters: column_name – The name of a column containing Number
data.