journalism 0.4.0 (alpha)

About

journalism is a Python library that takes the horror out of basic data analysis and manipulation. It is an alternative to numpy and pandas that is optimized to make humans faster.

It is inspired by underscore.js and all the other libraries that know how to get the hell out of the way and let us do Journalism.

Important links:

Why journalism?

Why use journalism?

  • A clean, readable API.
  • Optimized for exploratory use in the shell.
  • A full set of SQL-like operations.
  • Full unicode support.
  • Decimal precision everywhere.
  • Pure Python. It works everywhere.
  • 100% test coverage.
  • Extensive user documentation.
  • Access to the full power of Python in every command.

Authors

The following individuals have contributed code to journalism:

  • Mick O’Brien
  • Christopher Groskopf
  • Jeff Larson
  • Eric Sagara
  • John Heasly

License

The MIT License

Copyright (c) 2014 Christopher Groskopf and contributers

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Changelog

0.4.0

  • Upgrade to python-dateutil 2.2. (#134)
  • Wrote introductory tutorial. (#133)
  • Reorganize documentation (#132)
  • Add John Heasly to AUTHORS.
  • Implement percentile. (#35)
  • no_null_computations now accepts args. (#122)
  • Table.z_scores implemented. (#123)
  • DateTimeColumn implemented. (#23)
  • Column.counts now returns dict instead of Table. (#109)
  • ColumnType.create_column renamed _create_column. (#118)
  • Added Mick O’Brien to AUTHORS. (#121)
  • Pearson correlation implemented. (#103)

0.3.0

  • DateType.date_format implemented. (#112)
  • Create ColumnType classes to simplify data parsing.
  • DateColumn implemented. (#7)
  • Cookbook: Excel pivot tables. (#41)
  • Cookbook: statistics, including outlier detection. (#82)
  • Cookbook: emulating Underscore’s any and all. (#107)
  • Parameter documention for method parameters. (#108)
  • Table.rank now accepts a column name or key function.
  • Optionally use cdecimal for improved performance. (#106)
  • Smart naming of aggregate columns.
  • Duplicate columns names are now an error. (#92)
  • BooleanColumn implemented. (#6)
  • TextColumn.max_length implemented. (#95)
  • Table.find implemented. (#14)
  • Better error handling in Table.__init__. (#38)
  • Collapse IntColumn and FloatColumn into NumberColumn. (#64)
  • Table.mad_outliers implemented. (#93)
  • Column.mad implemented. (#93)
  • Table.stdev_outliers implemented. (#86)
  • Table.group_by implemented. (#3)
  • Cookbook: emulating R. (#81)
  • Table.left_outer_join now accepts column names or key functions. (#80)
  • Table.inner_join now accepts column names or key functions. (#80)
  • Table.distinct now accepts a column name or key function. (#80)
  • Table.order_by now accepts a column name or key function. (#80)
  • Table.rank implemented. (#15)
  • Reached 100% test coverage. (#76)
  • Tests for Column._cast methods. (#20)
  • Table.distinct implemented. (#83)
  • Use assertSequenceEqual in tests. (#84)
  • Docs: features section. (#87)
  • Cookbook: emulating SQL. (#79)
  • Table.left_outer_join implemented. (#11)
  • Table.inner_join implemented. (#11)

0.2.0

  • Python 3.2, 3.3 and 3.4 support. (#52)
  • Documented supported platforms.
  • Cookbook: csvkit. (#36)
  • Cookbook: glob syntax. (#28)
  • Cookbook: filter to values in range. (#30)
  • RowDoesNotExistError implemented. (#70)
  • ColumnDoesNotExistError implemented. (#71)
  • Cookbook: percent change. (#67)
  • Cookbook: sampleing. (#59)
  • Cookbook: random sort order. (#68)
  • Eliminate Table.get_data.
  • Use tuples everywhere. (#66)
  • Fixes for Python 2.6 compatibility. (#53)
  • Cookbook: multi-column sorting. (#13)
  • Cookbook: simple sorting.
  • Destructive Table ops now deepcopy row data. (#63)
  • Non-destructive Table ops now share row data. (#63)
  • Table.sort_by now accepts a function. (#65)
  • Cookbook: pygal.
  • Cookbook: Matplotlib.
  • Cookbook: VLOOKUP. (#40)
  • Cookbook: Excel formulas. (#44)
  • Cookbook: Rounding to two decimal places. (#49)
  • Better repr for Column and Row. (#56)
  • Cookbook: Filter by regex. (#27)
  • Cookbook: Underscore filter & reject. (#57)
  • Table.limit implemented. (#58)
  • Cookbook: writing a CSV. (#51)
  • Kill Table.filter and Table.reject. (#55)
  • Column.map removed. (#43)
  • Column instance & data caching implemented. (#42)
  • Table.select implemented. (#32)
  • Eliminate repeated column index lookups. (#25)
  • Precise DecimalColumn tests.
  • Use Decimal type everywhere internally.
  • FloatColumn converted to DecimalColumn. (#17)
  • Added Eric Sagara to AUTHORS. (#48)
  • NumberColumn.variance implemented. (#1)
  • Cookbook: loading a CSV. (#37)
  • Table.percent_change implemented. (#16)
  • Table.compute implemented. (#31)
  • Table.filter and Table.reject now take funcs. (#24)
  • Column.count implemented. (#12)
  • Column.counts implemented. (#8)
  • Column.all implemented. (#5)
  • Column.any implemented. (#4)
  • Added Jeff Larson to AUTHORS. (#18)
  • NumberColumn.mode implmented. (#18)

0.1.0

  • Initial prototype

Indices and tables