Common descriptive and aggregate statistics are included with the core agate library. For additional statistical methods beyond the scope of agate consider using the agate-stats extension or integrating with scipy.
agate includes a full set of standard descriptive statistics that can be applied to any column containing
column = table.columns['salary'] column.aggregate(Sum()) column.aggregate(Min()) column.aggregate(Max()) column.aggregate(Mean()) column.aggregate(Median()) column.aggregate(Mode()) column.aggregate(Variance()) column.aggregate(StDev()) column.aggregate(MAD())
You can also generate aggregate statistics for subsets of data (sometimes colloquially referred to as “rolling up”.
doctors = patients.group_by('doctor') patient_ages = doctors.aggregate([ ('age', agate.Length(), 'patient_count') ('age', agate.Mean(), 'age_mean'), ('age', agate.Median(), 'age_median') ])
The resulting table will have four columns:
The agate-stats extension adds methods for finding outliers.
import agatestats agatestats.patch() outliers = table.stdev_outliers('salary', deviations=3, reject=False)
reject=True you can instead return a table including only those values not identified as outliers.
not_outliers = table.stdev_outliers('salary', deviations=3, reject=True)
The second, more robust, method for identifying outliers is by identifying values which are more than some number of “median absolute deviations” from the median (typically 3).
outliers = table.mad_outliers('salary', deviations=3, reject=False)
As with the first example, you can specify
reject=True to exclude outliers in the resulting table.