Transform#

Pivot by a single column#

The Table.pivot() method is a general process for grouping data by row and, optionally, by column, and then calculating some aggregation for each group. Consider the following table:

name	race	gender	age
Joe	white	female	20
Jane	asian	male	20
Jill	black	female	20
Jim	latino	male	25
Julia	black	female	25
Joan	asian	female	25

In the very simplest case, this table can be pivoted to count the number occurences of values in a column:

transformed = table.pivot('race')

Result:

race	pivot
white	1
asian	2
black	2
latino	1

Pivot by multiple columns#

You can pivot by multiple columns either as additional row-groups, or as intersecting columns. For example, given the table in the previous example:

transformed = table.pivot(['race', 'gender'])

Result:

race	gender	pivot
white	female	1
asian	male	1
black	female	2
latino	male	1
asian	female	1

For the column, version you would do:

transformed = table.pivot('race', 'gender')

Result:

race	male	female
white	0	1
asian	1	1
black	0	2
latino	1	0

Pivot to sum#

The default pivot aggregation is Count but you can also supply other operations. For example, to aggregate each group by Sum of their ages:

transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'))

race	male	female
white	0	20
asian	20	25
black	0	45
latino	25	0

Pivot to percent of total#

Pivot allows you to apply a Computation to each row of aggregated results prior to returning the table. Use the stringified name of the aggregation as the column argument to your computation:

transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'), computation=agate.Percent('sum'))

race	male	female
white	0	14.8
asian	14.8	18.4
black	0	33.3
latino	18.4	0

Note: actual computed percentages will be much more precise.

It’s helpful when constructing these cases to think of all the cells in the pivot table as a single sequence.

Denormalize key/value columns into separate columns#

It’s common for very large datasets to be distributed in a “normalized” format, such as:

name	property	value
Jane	gender	female
Jane	race	black
Jane	age	24
…	…	…

The Table.denormalize() method can be used to transform the table so that each unique property has its own column.

transformed = table.denormalize('name', 'property', 'value')

Result:

name	gender	race	age
Jane	female	black	24
Jack	male	white	35
Joe	male	black	28

Normalize separate columns into key/value columns#

Sometimes you have a dataset where each property has its own column, but your analysis would be easier if all properties were stored together. Consider this table:

name	gender	race	age
Jane	female	black	24
Jack	male	white	35
Joe	male	black	28

The Table.normalize() method can be used to transform the table so that all the properties and their values share two columns.

transformed = table.normalize('name', ['gender', 'race', 'age'])

Result:

name	property	value
Jane	gender	female
Jane	race	black
Jane	age	24
…	…	…