Compute new values

Change

new_table = table.compute([
    ('2000_change', Change('2000', '2001')),
    ('2001_change', Change('2001', '2002')),
    ('2002_change', Change('2002', '2003'))
])

Or, better yet, compute the whole decade using a loop:

computations = []

for year in range(2000, 2010):
    change = Change(year, year + 1)
    computations.append(('%i_change' % year, change))

new_table = table.compute(computations)

Percent

Calculate the percentage for each value in a column with Percent. Values are divided into the sum of the column by default.

columns = ('value',)
rows = ([1],[2],[2],[5])
new_table = agate.Table(rows, columns)

new_table = new_table.compute([
    ('percent', agate.Percent('value'))
])

new_table.print_table()

|--------+----------|
|  value | percent  |
|--------+----------|
|      1 |      10  |
|      2 |      20  |
|      2 |      20  |
|      5 |      50  |
|--------+----------|

Override the denominator with a keyword argument.

new_table = new_table.compute([
    ('percent', agate.Percent('value', 5))
])

new_table.print_table()

|--------+----------|
|  value | percent  |
|--------+----------|
|      1 |      20  |
|      2 |      40  |
|      2 |      40  |
|      5 |     100  |
|--------+----------|

Percent change

Want percent change instead of value change? Just swap out the Computation:

computations = []

for year in range(2000, 2010):
    change = PercentChange(year, year + 1)
    computations.append(('%i_change' % year, change))

new_table = table.compute(computations)

Indexed/cumulative change

Need your change indexed to a starting year? Just fix the first argument:

computations = []

for year in range(2000, 2010):
    change = Change(2000, year + 1)
    computations.append(('%i_change' % year, change))

new_table = table.compute(computations)

Of course you can also use PercentChange if you need percents rather than values.

Round to two decimal places

agate stores numerical values using Python’s decimal.Decimal type. This data type ensures numerical precision beyond what is supported by the native float() type, however, because of this we can not use Python’s builtin round() function. Instead we must use decimal.Decimal.quantize().

We can use Table.compute() to apply the quantize to generate a rounded column from an existing one:

from decimal import Decimal

number_type = agate.Number()

def round_price(row):
    return row['price'].quantize(Decimal('0.01'))

new_table = table.compute([
    ('price_rounded', Formula(number_type, round_price))
])

To round to one decimal place you would simply change 0.01 to 0.1.

Difference between dates

Calculating the difference between dates (or dates and times) works exactly the same as it does for numbers:

new_table = table.compute([
    ('age_at_death', Change('born', 'died'))
])

Levenshtein edit distance

The Levenshtein edit distance is a common measure of string similarity. It can be used, for instance, to check for typos between manually-entered names and a version that is known to be spelled correctly.

Implementing Levenshtein requires writing a custom Computation. To save ourselves building the whole thing from scratch, we will lean on the python-Levenshtein library for the actual algorithm.

import agate
from Levenshtein import distance
import six

class LevenshteinDistance(agate.Computation):
    """
    Computes Levenshtein edit distance between the column and a given string.
    """
    def __init__(self, column_name, compare_string):
        self._column_name = column_name
        self._compare_string = compare_string

    def get_computed_data_type(self, table):
        """
        The return value is a numerical distance.
        """
        return agate.Number()

    def validate(self, table):
        """
        Verify the column is text.
        """
        column = table.columns[self._column_name]

        if not isinstance(column.data_type, agate.Text):
            raise agate.DataTypeError('Can only be applied to Text data.')

    def run(self, table):
        """
        Find the distance, returning null when the input column was null.
        """
        new_column = []

        for row in table.rows:
          val = row[self._column_name]

          if val is None:
              new_column.append(None)
          else:
              new_column.append(distance(val, self._compare_string))

        return new_column

This code can now be applied to any Table just as any other Computation would be:

new_table = table.compute([
    ('distance', LevenshteinDistance('column_name', 'string to compare'))
])

The resulting column will contain an integer measuring the edit distance between the value in the column and the comparison string.

USA Today Diversity Index

The USA Today Diversity Index is a widely cited method for evaluating the racial diversity of a given area. Using a custom Computation makes it simple to calculate.

Assuming that your data has a column for the total population, another for the population of each race and a final column for the hispanic population, you can implement the diversity index like this:

class USATodayDiversityIndex(agate.Computation):
    def get_computed_data_type(self, table):
        return agate.Number()

    def run(self, table):
        new_column = []

        for row in table.rows:
          race_squares = 0

          for race in ['white', 'black', 'asian', 'american_indian', 'pacific_islander']:
              race_squares += (row[race] / row['population']) ** 2

          hispanic_squares = (row['hispanic'] / row['population']) ** 2
          hispanic_squares += (1 - (row['hispanic'] / row['population'])) ** 2

          new_column.append((1 - (race_squares * hispanic_squares)) * 100)

        return new_column

We apply the diversity index like any other computation:

with_index = table.compute([
    ('diversity_index', USATodayDiversityIndex())
])

Simple Moving Average

A simple moving average is the average of some number of prior values in a series. It is typically used to smooth out variation in time series data.

The following custom Computation will compute a simple moving average. This example assumes your data is already sorted.

class SimpleMovingAverage(agate.Computation):
    """
    Computes the simple moving average of a column over some interval.
    """
    def __init__(self, column_name, interval):
        self._column_name = column_name
        self._interval = interval

    def get_computed_data_type(self, table):
        """
        The return value is a numerical average.
        """
        return agate.Number()

    def validate(self, table):
        """
        Verify the column is numerical.
        """
        column = table.columns[self._column_name]

        if not isinstance(column.data_type, agate.Number):
            raise agate.DataTypeError('Can only be applied to Number data.')

    def run(self, table):
        new_column = []

        for i, row in enumerate(table.rows):
            if i < self._interval:
                new_column.append(None)
            else:
                values = tuple(r[self._column_name] for r in table.rows[i - self._interval:i])

                if None in values:
                    new_column.append(None)
                else:
                    new_column.append(sum(values) / self._interval)

        return new_column

You would use the simple moving average like so:

with_average = table.compute([
    ('six_month_moving_average', SimpleMovingAverage('price', 6))
])