What Pandas 1.0 brought us

What Pandas 1.0 brought us

On January 9, Pandas 1.0.0rc was released. The previous version of the library is 0.25.

The first major release contains many great new features, including improved automatic dataframe summarization, more output formats, new data types, and even a new documentation site.

All changes can be viewed here, in the article we will limit ourselves to a small, less technical review of the most important things.

You can install the library as usual using pip, but since at the time of writing Pandas 1.0 is still release candidate, you will need to explicitly specify the version:

pip install --upgrade pandas==1.0.0rc0

Be careful: since this is a major release, the update may break the old code!

By the way, support for Python 2 has been completely discontinued since this version (what could be a good reason renew β€” approx. translation). Pandas 1.0 requires at least Python 3.6+, so if you are not sure, check which one you have installed:

$ pip --version
pip 19.3.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

$ python --version
Python 3.7.5

The easiest way to check the Pandas version is this:

>>> import pandas as pd
>>> pd.__version__
1.0.0rc0

Improved auto-summarization with DataFrame.info

My favorite innovation was the update to the method DataFrame.info. The function has become much more readable, making the process of data exploration even easier:

>>> df = pd.DataFrame({
...:   'A': [1,2,3], 
...:   'B': ["goodbye", "cruel", "world"], 
...:   'C': [False, True, False]
...:})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       3 non-null      int64
 1   B       3 non-null      object
 2   C       3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

Outputting tables in Markdown format

An equally pleasant innovation is the ability to export dataframes to Markdown tables using DataFrame.to_markdown.

>>> df.to_markdown()
|    |   A | B       | C     |
|---:|----:|:--------|:------|
|  0 |   1 | goodbye | False |
|  1 |   2 | cruel   | True  |
|  2 |   3 | world   | False |

This makes it much easier to publish tables on sites like Medium using github gists.

What Pandas 1.0 brought us

New types for strings and booleans

The Pandas 1.0 release also added new experimental types. Their API may still change, so use it with caution. But in general, Pandas recommends using new types wherever it makes sense.

For now, the cast needs to be done explicitly:

>>> B = pd.Series(["goodbye", "cruel", "world"], dtype="string")
>>> C = pd.Series([False, True, False], dtype="bool")
>>> df.B = B, df.C = C
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       3 non-null      int64
 1   B       3 non-null      string
 2   C       3 non-null      bool
dtypes: int64(1), object(1), string(1)
memory usage: 200.0+ bytes

Notice how the column Dtype displays new types βˆ’ string ΠΈ bool.

The most useful feature of the new string type is the ability to select only row columns from dataframes. This can make parsing text data much easier:

df.select_dtypes("string")

Previously, row columns could not be selected without explicitly specifying names.

You can read more about new types here.

Thank you for reading! The full list of changes, as already mentioned, can be viewed here.

Source: habr.com

Add a comment