TableSet¶
The TableSet
class collects a set of related tables in a single data
structure. The most common way of creating a TableSet
is using the
Table.group_by()
method, which is similar to SQL’s GROUP BY
keyword.
The resulting set of tables will all have identical columns structure.
TableSet
functions as a dictionary. Individual tables in the set can
be accessed by using their name as a key. If the table set was created using
Table.group_by()
then the names of the tables will be the grouping
factors found in the original data.
TableSet
replicates the majority of the features of Table
.
When methods such as TableSet.select()
, TableSet.where()
or
TableSet.order_by()
are used, the operation is applied to each table
in the set and the result is a new TableSet
instance made up of
entirely new Table
instances.
TableSet
instances can also contain other TableSet’s. This means you
can chain calls to Table.group_by()
and TableSet.group_by()
and end up with data grouped across multiple dimensions.
TableSet.aggregate()
on nested TableSets will then group across multiple
dimensions.
agate.TableSet |
An group of named tables with identical column definitions. |
Properties¶
agate.TableSet.key_name |
Get the name of the key this TableSet is grouped by. |
agate.TableSet.key_type |
Get the DataType this TableSet is grouped by. |
agate.TableSet.column_types |
Get an ordered list of this TableSet ’s column types. |
agate.TableSet.column_names |
Get an ordered list of this TableSet ’s column names. |
Creating¶
agate.TableSet.from_csv |
Create a new TableSet from a directory of CSVs. |
agate.TableSet.from_json |
Create a new TableSet from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for each Table . |
Saving¶
agate.TableSet.to_csv |
Write each table in this set to a separate CSV in a given directory. |
agate.TableSet.to_json |
Write TableSet to either a set of JSON files for each table or a single nested JSON file. |
Processing¶
agate.TableSet.aggregate |
Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new Table . |
agate.TableSet.having |
Create a new TableSet with only those tables that pass a test. |
agate.TableSet.merge |
Convert this TableSet into a single table. |
Previewing¶
agate.TableSet.print_structure |
Print the keys and row counts of each table in the tableset. |
Charting¶
agate.TableSet.bar_chart |
Render a lattice/grid of bar charts using leather.Lattice . |
agate.TableSet.column_chart |
Render a lattice/grid of column charts using leather.Lattice . |
agate.TableSet.line_chart |
Render a lattice/grid of line charts using leather.Lattice . |
agate.TableSet.scatterplot |
Render a lattice/grid of scatterplots using leather.Lattice . |
Table Proxy Methods¶
agate.TableSet.bins |
Calls Table.bins() on each table in the TableSet. |
agate.TableSet.compute |
Calls Table.compute() on each table in the TableSet. |
agate.TableSet.denormalize |
Calls Table.denormalize() on each table in the TableSet. |
agate.TableSet.distinct |
Calls Table.distinct() on each table in the TableSet. |
agate.TableSet.exclude |
Calls Table.exclude() on each table in the TableSet. |
agate.TableSet.find |
Calls Table.find() on each table in the TableSet. |
agate.TableSet.group_by |
Calls Table.group_by() on each table in the TableSet. |
agate.TableSet.homogenize |
Calls Table.homogenize() on each table in the TableSet. |
agate.TableSet.join |
Calls Table.join() on each table in the TableSet. |
agate.TableSet.limit |
Calls Table.limit() on each table in the TableSet. |
agate.TableSet.normalize |
Calls Table.normalize() on each table in the TableSet. |
agate.TableSet.order_by |
Calls Table.order_by() on each table in the TableSet. |
agate.TableSet.pivot |
Calls Table.pivot() on each table in the TableSet. |
agate.TableSet.select |
Calls Table.select() on each table in the TableSet. |
agate.TableSet.where |
Calls Table.where() on each table in the TableSet. |
Detailed list¶
-
class
agate.
TableSet
(tables, keys, key_name='group', key_type=None, _is_fork=False)¶ Bases:
agate.mapped_sequence.MappedSequence
An group of named tables with identical column definitions. Supports (almost) all the same operations as
Table
. When executed on aTableSet
, any operation that would have returned a newTable
instead returns a newTableSet
. Any operation that would have returned a single value instead returns a dictionary of values.TableSet is implemented as a subclass of
MappedSequence
Parameters: - tables – A sequence
Table
instances. - keys – A sequence of keys corresponding to the tables. These may be any type
except
int
. - key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
- key_type – An instance some subclass of
DataType
. If not provided it will default to a :class`.Text`. - _is_fork – Used internally to skip certain validation steps when data is propagated from an existing tablset.
-
aggregate
(aggregations)¶ Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new
Table
.aggregations
must be a sequence of tuples, where each has two parts: anew_column_name
and aAggregation
instance.The resulting table will have the keys from this
TableSet
(and any nested TableSets) set as itsrow_names
. SeeTable.__init__()
for more details.Parameters: aggregations – A list of tuples in the format (new_column_name, aggregation)
, where eachaggregation
is an instance ofAggregation
.Returns: A new Table
.
-
bar_chart
(label=0, value=1, path=None, width=None, height=None)¶ Render a lattice/grid of bar charts using
leather.Lattice
.Parameters: - label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
- value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
- path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string. - width – The width of the output SVG.
- height – The height of the output SVG.
-
bins
(*args, **kwargs)¶ Calls
Table.bins()
on each table in the TableSet.
-
column_chart
(label=0, value=1, path=None, width=None, height=None)¶ Render a lattice/grid of column charts using
leather.Lattice
.Parameters: - label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
- value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
- path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string. - width – The width of the output SVG.
- height – The height of the output SVG.
-
column_types
¶ Get an ordered list of this
TableSet
’s column types.Returns: A tuple
ofDataType
instances.
-
compute
(*args, **kwargs)¶ Calls
Table.compute()
on each table in the TableSet.
-
count
(value) → integer -- return number of occurrences of value¶
-
denormalize
(*args, **kwargs)¶ Calls
Table.denormalize()
on each table in the TableSet.
-
dict
()¶ Retrieve the contents of this sequence as an
collections.OrderedDict
.
-
distinct
(*args, **kwargs)¶ Calls
Table.distinct()
on each table in the TableSet.
-
exclude
(*args, **kwargs)¶ Calls
Table.exclude()
on each table in the TableSet.
-
find
(*args, **kwargs)¶ Calls
Table.find()
on each table in the TableSet.
-
classmethod
from_csv
(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)¶ Create a new
TableSet
from a directory of CSVs.See
Table.from_csv()
for additional details.Parameters: - dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
- column_names – See
Table.__init__()
. - column_types – See
Table.__init__()
. - row_names – See
Table.__init__()
. - header – See
Table.from_csv()
.
-
classmethod
from_json
(path, column_names=None, column_types=None, keys=None, **kwargs)¶ Create a new
TableSet
from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for eachTable
.See
Table.from_json()
for additional details.Parameters: - path – Path to a directory containing JSON files or filepath/file-like object of nested JSON file.
- keys – A list of keys of the top-level dictionaries for each file. If specified, length must be equal to number of JSON files in path.
- column_types – See
Table.__init__()
.
-
get
(key, default=None)¶ Equivalent to
collections.OrderedDict.get()
.
-
group_by
(*args, **kwargs)¶ Calls
Table.group_by()
on each table in the TableSet.
-
having
(aggregations, test)¶ Create a new
TableSet
with only those tables that pass a test.This works by applying a sequence of
Aggregation
instances to each table. The resulting dictionary of properties is then passed to thetest
function.This method does not modify the underlying tables in any way.
Parameters: - aggregations – A list of tuples in the format
(name, aggregation)
, where eachaggregation
is an instance ofAggregation
. - test (
function
) – A function that takes a dictionary of aggregated properties and returnsTrue
if it should be included in the newTableSet
.
Returns: A new
TableSet
.- aggregations – A list of tuples in the format
-
homogenize
(*args, **kwargs)¶ Calls
Table.homogenize()
on each table in the TableSet.
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
-
items
()¶ Equivalent to
collections.OrderedDict.items()
.
-
join
(*args, **kwargs)¶ Calls
Table.join()
on each table in the TableSet.
-
key_name
¶ Get the name of the key this TableSet is grouped by. (If created using
Table.group_by()
then this is the original column name.)
-
key_type
¶ Get the
DataType
this TableSet is grouped by. (If created usingTable.group_by()
then this is the original column type.)
-
keys
()¶ Equivalent to
collections.OrderedDict.keys()
.
-
limit
(*args, **kwargs)¶ Calls
Table.limit()
on each table in the TableSet.
-
line_chart
(x=0, y=1, path=None, width=None, height=None)¶ Render a lattice/grid of line charts using
leather.Lattice
.Parameters: - x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
- y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
- path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string. - width – The width of the output SVG.
- height – The height of the output SVG.
-
merge
(groups=None, group_name=None, group_type=None)¶ Convert this TableSet into a single table. This is the inverse of
Table.group_by()
.Any row_names set on the merged tables will be lost in this process.
Parameters: - groups – A list of grouping factors to add to merged rows in a new column.
If specified, it should have exactly one element per
Table
in theTableSet
. If not specified or None, the grouping factor will be the name of theRow
’s original Table. - group_name – This will be the column name of the grouping factors. If None,
defaults to the
TableSet.key_name
. - group_type – This will be the column type of the grouping factors. If None,
defaults to the
TableSet.key_type
.
Returns: A new
Table
.- groups – A list of grouping factors to add to merged rows in a new column.
If specified, it should have exactly one element per
-
normalize
(*args, **kwargs)¶ Calls
Table.normalize()
on each table in the TableSet.
-
order_by
(*args, **kwargs)¶ Calls
Table.order_by()
on each table in the TableSet.
-
pivot
(*args, **kwargs)¶ Calls
Table.pivot()
on each table in the TableSet.
-
print_structure
(max_rows=20, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)¶ Print the keys and row counts of each table in the tableset.
Parameters: - max_rows – The maximum number of rows to display before truncating the data. Defaults to 20.
- output – The output used to print the structure of the
Table
.
Returns: None
-
scatterplot
(x=0, y=1, path=None, width=None, height=None)¶ Render a lattice/grid of scatterplots using
leather.Lattice
.Parameters: - x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
- y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
- path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string. - width – The width of the output SVG.
- height – The height of the output SVG.
-
select
(*args, **kwargs)¶ Calls
Table.select()
on each table in the TableSet.
-
to_csv
(dir_path, **kwargs)¶ Write each table in this set to a separate CSV in a given directory.
See
Table.to_csv()
for additional details.Parameters: dir_path – Path to the directory to write the CSV files to.
-
to_json
(path, nested=False, indent=None, **kwargs)¶ Write
TableSet
to either a set of JSON files for each table or a single nested JSON file.See
Table.to_json()
for additional details.Parameters: - path – Path to the directory to write the JSON file(s) to. If nested is True, this should be a file path or file-like object to write to.
- nested – If True, the output will be a single nested JSON file with each Table’s key paired with a list of row objects. Otherwise, the output will be a set of files for each table. Defaults to False.
- indent – See
Table.to_json()
.
-
values
()¶ Equivalent to
collections.OrderedDict.values()
.
-
where
(*args, **kwargs)¶ Calls
Table.where()
on each table in the TableSet.
- tables – A sequence