agate.tableset¶
This module contains the TableSet
class which abstracts an set of
related tables into a single data structure. The most common way of creating a
TableSet
is using the Table.group_by()
method, which is
similar to SQL’s GROUP BY
keyword. The resulting set of tables each have
identical columns structure.
TableSet
functions as a dictionary. Individual tables in the set can
be accessed by using their name as a key. If the table set was created using
Table.group_by()
then the names of the tables will be the group factors
found in the original data.
TableSet
replicates the majority of the features of Table
.
When methods such as TableSet.select()
, TableSet.where()
or
TableSet.order_by()
are used, the operation is applied to each table
in the set and the result is a new TableSet
instance made up of
entirely new Table
instances.
TableSet
instances can also contain other TableSet’s. This means you
can chain calls to Table.group_by()
and TableSet.group_by()
and end up with data grouped across multiple dimensions.
TableSet.aggregate()
on nested TableSets will then group across multiple
dimensions.
-
class
agate.tableset.
TableMethodProxy
(tableset, method_name)¶ A proxy for
TableSet
methods that converts them to individual calls on eachTable
in the set.
-
class
agate.tableset.
TableSet
(tables, keys, key_name='group', key_type=None)¶ An group of named tables with identical column definitions. Supports (almost) all the same operations as
Table
. When executed on aTableSet
, any operation that would have returned a newTable
instead returns a newTableSet
. Any operation that would have returned a single value instead returns a dictionary of values.TableSet is implemented as a subclass of
MappedSequence
Parameters: - tables – A sequence
Table
instances. - keys – A sequence of keys corresponding to the tables. These may be
any type except
int
. - key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
- key_type – An instance some subclass of
DataType
. If not provided it will default to a :class`.Text`.
-
keys
()¶ Equivalent to
collections.OrderedDict.keys()
.
-
key_name
¶ Get the name of the key this TableSet is grouped by. (If created using
Table.group_by()
then this is the original column name.)
-
key_type
¶ Get the
DataType
this TableSet is grouped by. (If created usingTable.group_by()
then this is the original column type.)
-
classmethod
from_csv
(dir_path, column_info, row_names=None, header=True, **kwargs)¶ Create a new
TableSet
from a directory of CSVs. This method will use csvkit if it is available, otherwise it will use Python’s builtin csv module.kwargs
will be passed through tocsv.reader()
.If you are using Python 2 and not using csvkit, this method is not unicode-safe.
Parameters: - dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
- column_info – A sequence of pairs of column names and types. The latter
must be instances of
DataType
. Or, an instance ofTypeTester
to infer types. - row_names – See
Table.__init__()
. - header – If True, the first row of the CSV is assumed to contains headers and will be skipped.
-
to_csv
(dir_path, **kwargs)¶ Write this each table in this set to a separate CSV in a given directory. This method will use csvkit if it is available, otherwise it will use Python’s builtin csv module.
kwargs
will be passed through tocsv.writer()
.If you are using Python 2 and not using csvkit, this method is not unicode-safe.
Parameters: dir_path – Path to the directory to write the CSV files to.
-
column_types
¶ Get an ordered list of this
TableSet
‘s column types.Returns: A tuple
ofColumn
instances.
-
merge
()¶ Convert this TableSet into a single table. This is the inverse of
Table.group_by()
.Any
row_names
set on the merged tables will be lost in this process.Returns: A new Table
.
-
aggregate
(aggregations=[])¶ Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new
Table
.aggregations
must be a list of tuples, where each has three parts: acolumn_name
, aAggregation
instance and anew_column_name
.The resulting table will have the keys from this
TableSet
(and any nested TableSets) set as itsrow_names
. SeeTable.__init__()
for more details.Parameters: aggregations – An list of triples in the format (column_name, aggregation, new_column_name)
.Returns: A new Table
.
-
count
(value) → integer -- return number of occurrences of value¶
-
dict
()¶ Retrieve the contents of this column as an
collections.OrderedDict
.
-
get
(key, default=None)¶ Equivalent to
collections.OrderedDict.get()
.
-
index
(value) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
items
()¶ Equivalent to
collections.OrderedDict.items()
.
-
monkeypatch
(patch_cls)¶ Dynamically add
patch_cls
as a base class of this class.Parameters: patch_cls – The class to be patched on.
-
values
()¶ Equivalent to
collections.OrderedDict.values()
.
- tables – A sequence