================== Compute new values ================== Change ====== .. code-block:: python new_table = table.compute([ ('2000_change', agate.Change('2000', '2001')), ('2001_change', agate.Change('2001', '2002')), ('2002_change', agate.Change('2002', '2003')) ]) Or, better yet, compute the whole decade using a loop: .. code-block:: Python computations = [] for year in range(2000, 2010): change = agate.Change(year, year + 1) computations.append(('%i_change' % year, change)) new_table = table.compute(computations) Percent ======= Calculate the percentage for each value in a column with :class:`.Percent`. Values are divided into the sum of the column by default. .. code-block:: python columns = ('value',) rows = ([1],[2],[2],[5]) new_table = agate.Table(rows, columns) new_table = new_table.compute([ ('percent', agate.Percent('value')) ]) new_table.print_table() | value | percent | | ----- | ------- | | 1 | 10 | | 2 | 20 | | 2 | 20 | | 5 | 50 | Override the denominator with a keyword argument. .. code-block:: python new_table = new_table.compute([ ('percent', agate.Percent('value', 5)) ]) new_table.print_table() | value | percent | | ----- | ------- | | 1 | 20 | | 2 | 40 | | 2 | 40 | | 5 | 100 | Percent change ============== Want percent change instead of value change? Just swap out the :class:`.Computation`: .. code-block:: Python computations = [] for year in range(2000, 2010): change = agate.PercentChange(year, year + 1) computations.append(('%i_change' % year, change)) new_table = table.compute(computations) Indexed/cumulative change ========================= Need your change indexed to a starting year? Just fix the first argument: .. code-block:: Python computations = [] for year in range(2000, 2010): change = agate.Change(2000, year + 1) computations.append(('%i_change' % year, change)) new_table = table.compute(computations) Of course you can also use :class:`.PercentChange` if you need percents rather than values. Round to two decimal places =========================== agate stores numerical values using Python's :class:`decimal.Decimal` type. This data type ensures numerical precision beyond what is supported by the native :func:`float` type, however, because of this we can not use Python's builtin :func:`round` function. Instead we must use :meth:`decimal.Decimal.quantize`. We can use :meth:`.Table.compute` to apply the quantize to generate a rounded column from an existing one: .. code-block:: python from decimal import Decimal number_type = agate.Number() def round_price(row): return row['price'].quantize(Decimal('0.01')) new_table = table.compute([ ('price_rounded', agate.Formula(number_type, round_price)) ]) To round to one decimal place you would simply change :code:`0.01` to :code:`0.1`. .. _difference_between_dates: Difference between dates ======================== Calculating the difference between dates (or dates and times) works exactly the same as it does for numbers: .. code-block:: python new_table = table.compute([ ('age_at_death', agate.Change('born', 'died')) ]) Levenshtein edit distance ========================= The Levenshtein edit distance is a common measure of string similarity. It can be used, for instance, to check for typos between manually-entered names and a version that is known to be spelled correctly. Implementing Levenshtein requires writing a custom :class:`.Computation`. To save ourselves building the whole thing from scratch, we will lean on the `python-Levenshtein `_ library for the actual algorithm. .. code-block:: python import agate from Levenshtein import distance class LevenshteinDistance(agate.Computation): """ Computes Levenshtein edit distance between the column and a given string. """ def __init__(self, column_name, compare_string): self._column_name = column_name self._compare_string = compare_string def get_computed_data_type(self, table): """ The return value is a numerical distance. """ return agate.Number() def validate(self, table): """ Verify the column is text. """ column = table.columns[self._column_name] if not isinstance(column.data_type, agate.Text): raise agate.DataTypeError('Can only be applied to Text data.') def run(self, table): """ Find the distance, returning null when the input column was null. """ new_column = [] for row in table.rows: val = row[self._column_name] if val is None: new_column.append(None) else: new_column.append(distance(val, self._compare_string)) return new_column This code can now be applied to any :class:`.Table` just as any other :class:`.Computation` would be: .. code-block:: python new_table = table.compute([ ('distance', LevenshteinDistance('column_name', 'string to compare')) ]) The resulting column will contain an integer measuring the edit distance between the value in the column and the comparison string. USA Today Diversity Index ========================= The `USA Today Diversity Index `_ is a widely cited method for evaluating the racial diversity of a given area. Using a custom :class:`.Computation` makes it simple to calculate. Assuming that your data has a column for the total population, another for the population of each race and a final column for the hispanic population, you can implement the diversity index like this: .. code-block:: python class USATodayDiversityIndex(agate.Computation): def get_computed_data_type(self, table): return agate.Number() def run(self, table): new_column = [] for row in table.rows: race_squares = 0 for race in ['white', 'black', 'asian', 'american_indian', 'pacific_islander']: race_squares += (row[race] / row['population']) ** 2 hispanic_squares = (row['hispanic'] / row['population']) ** 2 hispanic_squares += (1 - (row['hispanic'] / row['population'])) ** 2 new_column.append((1 - (race_squares * hispanic_squares)) * 100) return new_column We apply the diversity index like any other computation: .. code-block:: Python with_index = table.compute([ ('diversity_index', USATodayDiversityIndex()) ]) Simple Moving Average ===================== A simple moving average is the average of some number of prior values in a series. It is typically used to smooth out variation in time series data. The following custom :class:`.Computation` will compute a simple moving average. This example assumes your data is already sorted. .. code-block:: python class SimpleMovingAverage(agate.Computation): """ Computes the simple moving average of a column over some interval. """ def __init__(self, column_name, interval): self._column_name = column_name self._interval = interval def get_computed_data_type(self, table): """ The return value is a numerical average. """ return agate.Number() def validate(self, table): """ Verify the column is numerical. """ column = table.columns[self._column_name] if not isinstance(column.data_type, agate.Number): raise agate.DataTypeError('Can only be applied to Number data.') def run(self, table): new_column = [] for i, row in enumerate(table.rows): if i < self._interval: new_column.append(None) else: values = tuple(r[self._column_name] for r in table.rows[i - self._interval:i]) if None in values: new_column.append(None) else: new_column.append(sum(values) / self._interval) return new_column You would use the simple moving average like so: .. code-block:: Python with_average = table.compute([ ('six_month_moving_average', SimpleMovingAverage('price', 6)) ])