This module contains the TableSet class which abstracts an set of related tables into a single data structure. The most common way of creating a TableSet is using the Table.group_by() method, which is similar to SQL’s GROUP BY keyword. The resulting set of tables each have identical columns structure.

TableSet functions as a dictionary. Individual tables in the set can be accessed by using their name as a key. If the table set was created using Table.group_by() then the names of the tables will be the group factors found in the original data.

TableSet replicates the majority of the features of Table. When methods such as, TableSet.where() or TableSet.order_by() are used, the operation is applied to each table in the set and the result is a new TableSet instance made up of entirely new Table instances.

TableSet instances can also contain other TableSet’s. This means you can chain calls to Table.group_by() and TableSet.group_by() and end up with data grouped across multiple dimensions. TableSet.aggregate() on nested TableSets will then group across multiple dimensions.

class agate.tableset.TableMethodProxy(tableset, method_name)

A proxy for TableSet methods that converts them to individual calls on each Table in the set.

class agate.tableset.TableSet(tables, keys, key_name='group', key_type=None)

An group of named tables with identical column definitions. Supports (almost) all the same operations as Table. When executed on a TableSet, any operation that would have returned a new Table instead returns a new TableSet. Any operation that would have returned a single value instead returns a dictionary of values.

TableSet is implemented as a subclass of MappedSequence

  • tables – A sequence Table instances.
  • keys – A sequence of keys corresponding to the tables. These may be any type except int.
  • key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
  • key_type – An instance some subclass of DataType. If not provided it will default to a :class`.Text`.

Equivalent to collections.OrderedDict.keys().


Get the name of the key this TableSet is grouped by. (If created using Table.group_by() then this is the original column name.)


Get the DataType this TableSet is grouped by. (If created using Table.group_by() then this is the original column type.)

classmethod from_csv(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)

Create a new TableSet from a directory of CSVs.

See Table.from_csv() for additional details.

  • dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
  • column_names – See Table.__init__().
  • column_types – See Table.__init__().
  • row_names – See Table.__init__().
  • header – See Table.from_csv().
to_csv(dir_path, **kwargs)

Write each table in this set to a separate CSV in a given directory.

See Table.to_csv() for additional details.

Parameters:dir_path – Path to the directory to write the CSV files to.
classmethod from_json(path, column_names=None, column_types=None, keys=None, **kwargs)

Create a new TableSet from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for each Table.

See Table.from_json() for additional details.

  • path – Path to a directory containing JSON files or filepath/file-like object of nested JSON file.
  • keys – A list of keys of the top-level dictionaries for each file. If specified, length must be equal to number of JSON files in path.
  • column_types – See Table.__init__().
to_json(path, nested=False, indent=None, **kwargs)

Write TableSet to either a set of JSON files for each table or a single nested JSON file.

See Table.to_json() for additional details.

  • path – Path to the directory to write the JSON file(s) to. If nested is True, this should be a file path or file-like object to write to.
  • nested – If True, the output will be a single nested JSON file with each Table’s key paired with a list of row objects. Otherwise, the output will be a set of files for each table. Defaults to False.
  • indent – See Table.to_json().

Get an ordered list of this TableSet‘s column types.

Returns:A tuple of DataType instances.

Get an ordered list of this TableSet‘s column names.

Returns:A tuple of strings.
merge(groups=None, group_name=None, group_type=None)

Convert this TableSet into a single table. This is the inverse of Table.group_by().

Any row_names set on the merged tables will be lost in this process.

  • groups – A list of grouping factors to add to merged rows in a new column. If specified, it should have exactly one element per Table in the TableSet. If not specified or None, the grouping factor will be the name of the Row‘s original Table.
  • group_name – This will be the column name of the grouping factors. If None, defaults to the TableSet.key_name.
  • group_type – This will be the column type of the grouping factors. If None, defaults to the TableSet.key_type.

A new Table.


Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new Table.

aggregations must be a sequence of tuples, where each has two parts: a new_column_name and a Aggregation instance.

The resulting table will have the keys from this TableSet (and any nested TableSets) set as its row_names. See Table.__init__() for more details.

Parameters:aggregations – A list of tuples in the format (new_column_name, aggregation).
Returns:A new Table.
count(value) → integer -- return number of occurrences of value

Retrieve the contents of this column as an collections.OrderedDict.

get(key, default=None)

Equivalent to collections.OrderedDict.get().

index(value) → integer -- return first index of value.

Raises ValueError if the value is not present.


Equivalent to collections.OrderedDict.items().


Dynamically add patch_cls as a base class of this class.

Parameters:patch_cls – The class to be patched on.
print_structure(max_rows=20, output=<open file '<stdout>', mode 'w'>)

Print the keys and row counts of each table in the tableset.

  • max_rows – The maximum number of rows to display before truncating the data. Defaults to 20.
  • output – The output used to print the structure of the Table.



Equivalent to collections.OrderedDict.values().