On January 9, Pandas 1.0.0rc was released. The previous version of the library is 0.25.
The first major release contains many great new features, including improved automatic dataframe summarization, more output formats, new data types, and even a new documentation site.
All changes can be viewed
You can install the library as usual using pip, but since at the time of writing Pandas 1.0 is still release candidate, you will need to explicitly specify the version:
pip install --upgrade pandas==1.0.0rc0
Be careful: since this is a major release, the update may break the old code!
By the way, support for Python 2 has been completely discontinued since this version (what could be a good reason
$ pip --version
pip 19.3.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
$ python --version
Python 3.7.5
The easiest way to check the Pandas version is this:
>>> import pandas as pd
>>> pd.__version__
1.0.0rc0
Improved auto-summarization with DataFrame.info
My favorite innovation was the update to the method DataFrame.info. The function has become much more readable, making the process of data exploration even easier:
>>> df = pd.DataFrame({
...: 'A': [1,2,3],
...: 'B': ["goodbye", "cruel", "world"],
...: 'C': [False, True, False]
...:})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 3 non-null int64
1 B 3 non-null object
2 C 3 non-null object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
Outputting tables in Markdown format
An equally pleasant innovation is the ability to export dataframes to Markdown tables using DataFrame.to_markdown.
>>> df.to_markdown()
| | A | B | C |
|---:|----:|:--------|:------|
| 0 | 1 | goodbye | False |
| 1 | 2 | cruel | True |
| 2 | 3 | world | False |
This makes it much easier to publish tables on sites like Medium using github gists.
New types for strings and booleans
The Pandas 1.0 release also added new experimental types. Their API may still change, so use it with caution. But in general, Pandas recommends using new types wherever it makes sense.
For now, the cast needs to be done explicitly:
>>> B = pd.Series(["goodbye", "cruel", "world"], dtype="string")
>>> C = pd.Series([False, True, False], dtype="bool")
>>> df.B = B, df.C = C
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 3 non-null int64
1 B 3 non-null string
2 C 3 non-null bool
dtypes: int64(1), object(1), string(1)
memory usage: 200.0+ bytes
Notice how the column Dtype displays new types β string ΠΈ bool.
The most useful feature of the new string type is the ability to select only row columns from dataframes. This can make parsing text data much easier:
df.select_dtypes("string")
Previously, row columns could not be selected without explicitly specifying names.
You can read more about new types
Thank you for reading! The full list of changes, as already mentioned, can be viewed
Source: habr.com