This module contains the TableSet class which abstracts an set of related tables into a single data structure. The most common way of creating a TableSet is using the Table.group_by() method, which is similar to SQL’s GROUP BY keyword. The resulting set of tables each have identical columns structure.

TableSet functions as a dictionary. Individual tables in the set can be accessed by using their name as a key. If the table set was created using Table.group_by() then the names of the tables will be the group factors found in the original data.

TableSet replicates the majority of the features of Table. When methods such as, TableSet.where() or TableSet.order_by() are used, the operation is applied to each table in the set and the result is a new TableSet instance made up of entirely new Table instances.

TableSet instances can also contain other TableSet’s. This means you can chain calls to Table.group_by() and TableSet.group_by() and end up with data grouped across multiple dimensions. TableSet.aggregate() on nested TableSets will then group across multiple dimensions.

class agate.tableset.TableMethodProxy(tableset, method_name)

A proxy for TableSet methods that converts them to individual calls on each Table in the set.

class agate.tableset.TableSet(tables, keys, key_name='group', key_type=None)

An group of named tables with identical column definitions. Supports (almost) all the same operations as Table. When executed on a TableSet, any operation that would have returned a new Table instead returns a new TableSet. Any operation that would have returned a single value instead returns a dictionary of values.

TableSet is implemented as a subclass of MappedSequence

  • tables – A sequence Table instances.
  • keys – A sequence of keys corresponding to the tables. These may be any type except int.
  • key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
  • key_type – An instance some subclass of DataType. If not provided it will default to a :class`.Text`.

Equivalent to collections.OrderedDict.keys().


Get the name of the key this TableSet is grouped by. (If created using Table.group_by() then this is the original column name.)


Get the DataType this TableSet is grouped by. (If created using Table.group_by() then this is the original column type.)

classmethod from_csv(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)

Create a new TableSet from a directory of CSVs.

See Table.from_csv() for additional details.

  • dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
  • column_names – See Table.__init__().
  • column_types – See Table.__init__().
  • row_names – See Table.__init__().
  • header – See Table.from_csv().
to_csv(dir_path, **kwargs)

Write each table in this set to a separate CSV in a given directory.

See Table.to_csv() for additional details.

Parameters:dir_path – Path to the directory to write the CSV files to.
to_json(dir_path, **kwargs)

Write each table in this set to a separate JSON file in a given directory.

See Table.to_json() for additional details.

Parameters:dir_path – Path to the directory to write the JSON files to.

Get an ordered list of this TableSet‘s column types.

Returns:A tuple of DataType instances.

Get an ordered list of this TableSet‘s column names.

Returns:A tuple of strings.

Convert this TableSet into a single table. This is the inverse of Table.group_by().

Any row_names set on the merged tables will be lost in this process.

Returns:A new Table.

Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new Table.

aggregations must be a sequence of tuples, where each has two parts: a new_column_name and a Aggregation instance.

The resulting table will have the keys from this TableSet (and any nested TableSets) set as its row_names. See Table.__init__() for more details.

Parameters:aggregations – A list of tuples in the format (new_column_name, aggregation).
Returns:A new Table.
count(value) → integer -- return number of occurrences of value

Retrieve the contents of this column as an collections.OrderedDict.

get(key, default=None)

Equivalent to collections.OrderedDict.get().

index(value) → integer -- return first index of value.

Raises ValueError if the value is not present.


Equivalent to collections.OrderedDict.items().


Dynamically add patch_cls as a base class of this class.

Parameters:patch_cls – The class to be patched on.

Equivalent to collections.OrderedDict.values().