agate.table

This module contains the Table object, which is the central data structure in agate. Tables are created by supplying row data, column names and subclasses of DataType to the constructor. Once instantiated tables are immutable. This concept is central to agate. The table of the data may not be accessed or modified directly.

Various methods on the Table simulate “SQL-like” operations. For example, the Table.select() method reduces the table to only the specified columns. The Table.where() method reduces the table to only those rows that pass a truth test. And the Table.order_by() method sorts the rows in the table. In all of these cases the output is new Table and the existing table remains unmodified.

Tables are not themselves iterable, but the columns of the table can be accessed via Table.columns and the rows via Table.rows.

agate.table.allow_tableset_proxy(func)

Decorator to flag that a given Table method can be proxied as a TableSet method.

class agate.table.Table(rows, column_info)

A dataset consisting of rows and columns.

Parameters:
  • rows – The data as a sequence of any sequences: tuples, lists, etc.
  • column_info – A sequence of pairs of column names and types. The latter must be instances of DataType.
Attr columns:

A ColumnMapping for accessing the columns in this table.

Attr rows:

A RowSequence for accessing the rows in this table.

classmethod from_csv(path, column_info, header=True, **kwargs)

Create a new table for a CSV. This method will use csvkit if it is available, otherwise it will use Python’s builtin csv module.

kwargs will be passed through to csv.reader().

If you are using Python 2 and not using csvkit, this method is not unicode-safe.

Parameters:
  • path – Filepath or file-like object from which to read CSV data.
  • column_info – A sequence of pairs of column names and types. The latter must be instances of DataType. Or, an instance of TypeTester to infer types.
  • header – If True, the first row of the CSV is assumed to contains headers and will be skipped.
to_csv(path, **kwargs)

Write this table to a CSV. This method will use csvkit if it is available, otherwise it will use Python’s builtin csv module.

kwargs will be passed through to csv.writer().

If you are using Python 2 and not using csvkit, this method is not unicode-safe.

Parameters:path – Filepath or file-like object to write to.
column_types

Get an ordered list of this table’s column types.

Returns:A tuple of Column instances.
column_names

Get an ordered list of this table’s column names.

Returns:A tuple of strings.
rows

Get this tables RowSequence.

columns

Get this tables ColumnMapping.

data

Get the data underlying this table.

select(column_names)

Reduce this table to only the specified columns.

Parameters:column_names – A sequence of names of columns to include in the new table.
Returns:A new Table.
where(test)

Filter a to only those rows where the row passes a truth test.

Parameters:test (function) – A function that takes a Row and returns True if it should be included.
Returns:A new Table.
find(test)

Find the first row that passes a truth test.

Parameters:test (function) – A function that takes a Row and returns True if it matches.
Returns:A single Row or None if not found.
order_by(key, reverse=False)

Sort this table by the key. This can be either a column_name or callable that returns a value to sort by.

Parameters:
  • key – Either the name of a column to sort by or a function that takes a row and returns a value to sort by.
  • reverse – If True then sort in reverse (typically, descending) order.
Returns:

A new Table.

limit(start_or_stop=None, stop=None, step=None)

Filter data to a subset of all rows.

See also: Python’s slice().

Parameters:
  • start_or_stop – If the only argument, then how many rows to include, otherwise, the index of the first row to include.
  • stop – The index of the last row to include.
  • step – The size of the jump between rows to include. (step=2 will return every other row.)
Returns:

A new Table.

distinct(key=None)

Filter data to only rows that are unique.

Parameters:key – Either 1) the name of a column to use to identify unique rows or 2) a function that takes a row and returns a value to identify unique rows or 3) None, in which case the entire row will be checked for uniqueness.
Returns:A new Table.
join(right_table, left_key, right_key=None, inner=False)

Performs the equivalent of SQL’s “left outer join”, combining columns from this table and from right_table anywhere that the output of left_key and right_key are equivalent.

Where there is no match for left_key the left columns will be included with the right columns set to None unless the inner argument is specified. (See arguments for more.)

If left_key and right_key are column names, only the left column will be included in the output table.

Column names from the right table which also exist in this table will be suffixed “2” in the new table.

Parameters:
  • right_table – The “right” table to join to.
  • left_key – Either the name of a column from the this table to join on, or a function that takes a row and returns a value to join on.
  • right_key – Either the name of a column from :code:table` to join on, or a function that takes a row and returns a value to join on. If None then left_key will be used for both.
  • inner – Perform a SQL-style “inner join” instead of a left outer join. Rows which have no match for left_key will not be included in the output table.
Returns:

A new Table.

classmethod merge(tables)

Merge an array of tables with identical columns into a single table. Each table must have exactly the same column types. Their column names need not be identical. The first table’s column names will be the ones which are used.

Parameters:tables – An array of Table.
Returns:A new Table.
group_by(key, key_name=None, key_type=None)

Create a new Table for unique value and return them as a TableSet. The key can be either a column name or a function that returns a value to group by.

Note that when group names will always be coerced to a string, regardless of the format of the input column.

Parameters:
  • key – Either the name of a column from the this table to group by, or a function that takes a row and returns a value to group by.
  • key_name – A name that describes the grouped properties. Defaults to the column name that was grouped on or “group” if grouping with a key function. See TableSet for more.
  • key_type – An instance some subclass of DataType. If not provided it will default to a :class`.Text`.
Returns:

A TableSet mapping where the keys are unique values from the key and the values are new Table instances containing the grouped rows.

compute(computations)

Compute new columns by applying one or more Computation to each row.

Parameters:computations – An iterable of pairs of new column names and Computation instances.
Returns:A new Table.
counts(key, key_name=None, key_type=None)

Count the number of occurrences of each distinct value in a column. Creates a new table with only the value and the count. This is effectively equivalent to doing a Table.group_by() followed by an TableSet.aggregate() with a Length aggregator.

The resulting table will have two columns. The first will have the name and type of the specified key column or key_name and key_type, if specified. The second will be named count and will be of type Number.

Parameters:
  • key – Either the name of a column from the this table to count, or a function that takes a row and returns a value to count.
  • key_name – A name that describes the counted properties. Defaults to the column name that was counted or “group” if counting with a key function.
  • key_type – An instance some subclass of DataType. If not provided it will default to a :class`.Text`.
bins(column_name, count=10, start=None, end=None)

Generates (approximately) evenly sized bins for the values in a column. Bins may not be perfectly even if the spread of the data does not divide evenly, but all values will always be included in some bin.

The resulting table will have two columns. The first will have the same name as the specified column, but will be type Text. The second will be named count and will be of type Number.

Parameters:
  • column_name – The name of the column to bin. Must be of type Number
  • count – The number of bins to create. If not specified then each value will be counted as its own bin.
  • start – The minimum value to start the bins at. If not specified the minimum value in the column will be used.
  • end – The maximum value to end the bins at. If not specified the maximum value in the column will be used.
Returns:

A new Table.

print_table(max_rows=None, max_columns=None, output=<open file '<stdout>', mode 'w'>)

Print a well-formatted preview of this table to the console or any other output.

Parameters:
  • max_rows – The maximum number of rows to display before truncating the data.
  • max_columns – The maximum number of columns to display before truncating the data.
  • output – A file-like object to print to. Defaults to sys.stdout.
print_bars(label_column_name, value_column_name, domain=None, width=120, output=<open file '<stdout>', mode 'w'>)

Print a text-based bar chart of the columns names label_column_name and value_column_name.

Parameters:
  • label_column_name – The column containing the label values.
  • value_column_name – The column containing the bar values.
  • domain – A 2-tuple containing the minimum and maximum values for the chart’s x-axis. The domain must be large enough to contain all values in the column.
  • width – The width, in characters, to use for the bar chart. Defaults to 120.
  • output – A file-like object to print to. Defaults to sys.stdout.
monkeypatch(patch_cls)

Dynamically add patch_cls as a base class of this class.

Parameters:patch_cls – The class to be patched on.