Type inference

class agate.TypeTester(force={}, limit=None, types=None, null_values=('', 'na', 'n/a', 'none', 'null', '.'))

Bases: object

Control how data types are inferred for columns in a given set of data.

This class is used by passing it to the column_types argument of the Table constructor, or the same argument for any other method that create a Table

Type inference can be a slow process. To limit the number of rows of data to be tested, pass the limit argument. Note that may cause errors if your data contains different types of values after the specified number of rows.

By default, data types will be tested against each column in this order:

  1. Boolean
  2. Number
  3. TimeDelta
  4. Date
  5. DateTime
  6. Text

Individual types may be specified using the force argument. The type order by be changed, or entire types disabled, by using the types argument. Beware that changing the order of the types may cause unexpected behavior.

Parameters:
  • force – A dictionary where each key is a column name and each value is a DataType instance that overrides inference.
  • limit – An optional limit on how many rows to evaluate before selecting the most likely type. Note that applying a limit may mean errors arise when the data is cast–if the guess is proved incorrect in further rows of data.
  • types – A sequence of possible types to test against. This be used to specify what data formats you want to test against. For instance, you may want to exclude TimeDelta from testing. It can also be used to pass options such as locale to Number or cast_nulls to Text. Take care in specifying the order of the list. It is the order they are tested in. Text should always be last.
  • null_values – If types is None, a sequence of values which should be cast to None when encountered by the default data types.
run(rows, column_names)

Apply type inference to the provided data and return an array of column types.

Parameters:rows – The data as a sequence of any sequences: tuples, lists, etc.