Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Sawubona Habr!

Namuhla sizosebenzela ikhono lokusebenzisa amathuluzi okuhlanganisa nokubona idatha ngePython. Kokunikeziwe Idatha ye-Github Ake sihlaziye izici ezimbalwa futhi sakhe isethi yokubonwayo.

Ngokwesiko, ekuqaleni, ake sichaze imigomo:

  • Idatha yeqembu ngobulili nangonyaka futhi ubone ngeso lengqondo ukuguquguquka okuphelele kwezinga lokuzalwa kwabo bobabili ubulili;
  • Thola amagama adume kakhulu kunaso sonke isikhathi;
  • Hlukanisa sonke isikhathi kudatha ibe izingxenye ezingu-10 futhi ngayinye, thola igama elidume kakhulu lobulili obunye. Ngegama ngalinye elitholiwe, zibone ngeso lengqondo amandla alo ngaso sonke isikhathi;
  • Ngonyaka ngamunye, bala ukuthi mangaki amagama ahlanganisa ama-50% abantu futhi ubone ngeso lengqondo (sizobona izinhlobonhlobo zamagama ngonyaka ngamunye);
  • Khetha iminyaka emi-4 kuso sonke isikhawu futhi ubonise unyaka ngamunye ukusatshalaliswa ngohlamvu lokuqala egameni nangohlamvu lokugcina egameni;
  • Yenza uhlu lwabantu abambalwa abadumile (omongameli, abaculi, abadlali, abalingisi bamabhayisikobho) futhi uhlole ithonya labo ekuguquguqukeni kwamagama. Yakha umbono.

Amagama amancane, amakhodi amaningi!

Futhi, asihambe.

Masiqoqe idatha ngobulili nangonyaka futhi sibone ngeso lengqondo ukuguquguquka okuphelele kwezinga lokuzalwa kobubili ubulili:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

years = np.arange(1880, 2011, 3)
datalist = 'https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/babynames/yob{year}.txt'
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)
sex = result.groupby('sex')
births_men = sex.get_group('M').groupby('year', as_index=False)
births_women = sex.get_group('F').groupby('year', as_index=False)
births_men_list = births_men.aggregate(np.sum)['count'].tolist()
births_women_list = births_women.aggregate(np.sum)['count'].tolist()

fig, ax = plt.subplots()
fig.set_size_inches(25,15)

index = np.arange(len(years))
stolb1 = ax.bar(index, births_men_list, 0.4, color='c', label='ΠœΡƒΠΆΡ‡ΠΈΠ½Ρ‹')
stolb2 = ax.bar(index + 0.4, births_women_list, 0.4, alpha=0.8, color='r', label='Π–Π΅Π½Ρ‰ΠΈΠ½Ρ‹')

ax.set_title('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ ΠΏΠΎ ΠΏΠΎΠ»Ρƒ ΠΈ Π³ΠΎΠ΄Π°ΠΌ')
ax.set_xlabel('Π“ΠΎΠ΄Π°')
ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ')
ax.set_xticklabels(years)
ax.set_xticks(index + 0.4)
ax.legend(loc=9)

fig.tight_layout()
plt.show()

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Masithole amagama adume kakhulu emlandweni:

years = np.arange(1880, 2011)

dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe)

result = pd.concat(dataframes)
names = result.groupby('name', as_index=False).sum().sort_values('count', ascending=False)
names.head(10)

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Masihlukanise sonke isikhathi kudatha sibe izingxenye ezingu-10 futhi ngayinye sizothola igama elidume kakhulu lobulili obunye. Ngegama ngalinye elitholiwe, sibona ngeso lengqondo amandla alo ngaso sonke isikhathi:

years = np.arange(1880, 2011)
part_size = int((years[years.size - 1] - years[0]) / 10) + 1
parts = {}
def GetPart(year):
    return int((year - years[0]) / part_size)
for year in years:
    index = GetPart(year)
    r = years[0] + part_size * index, min(years[years.size - 1], years[0] + part_size * (index + 1))
    parts[index] = str(r[0]) + '-' + str(r[1])

dataframe_parts = []
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframe_parts.append(dataframe.assign(years=parts[GetPart(year)]))
    dataframes.append(dataframe.assign(year=year))
    
result_parts = pd.concat(dataframe_parts)
result = pd.concat(dataframes)

result_parts_sums = result_parts.groupby(['years', 'sex', 'name'], as_index=False).sum()
result_parts_names = result_parts_sums.iloc[result_parts_sums.groupby(['years', 'sex'], as_index=False).apply(lambda x: x['count'].idxmax())]
result_sums = result.groupby(['year', 'sex', 'name'], as_index=False).sum()

for groupName, groupLabels in result_parts_names.groupby(['name', 'sex']).groups.items():
    group = result_sums.groupby(['name', 'sex']).get_group(groupName)
    fig, ax = plt.subplots(1, 1, figsize=(18,10))

    ax.set_xlabel('Π“ΠΎΠ΄Π°')
    ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ')
    label = group['name']
    ax.plot(group['year'], group['count'], label=label.aggregate(np.max), color='b', ls='-')
    ax.legend(loc=9, fontsize=11)

    plt.show()

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Unyaka ngamunye, sibala ukuthi mangaki amagama amboza u-50% wabantu bese sibona le datha ngeso lengqondo:

dataframe = pd.DataFrame({'year': [], 'count': []})
years = np.arange(1880, 2011)
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    names['sum'] = names.sum()['count']
    names['percent'] = names['count'] / names['sum'] * 100
    names = names.sort_values(['percent'], ascending=False)
    names['cum_perc'] = names['percent'].cumsum()
    names_filtered = names[names['cum_perc'] <= 50]
    dataframe = dataframe.append(pd.DataFrame({'year': [year], 'count': [names_filtered.shape[0]]}))

fig, ax1 = plt.subplots(1, 1, figsize=(22,13))
ax1.set_xlabel('Π“ΠΎΠ΄Π°', fontsize = 12)
ax1.set_ylabel('Π Π°Π·Π½ΠΎΠΎΠ±Ρ€Π°Π·ΠΈΠ΅ ΠΈΠΌΠ΅Π½', fontsize = 12)
ax1.plot(dataframe['year'], dataframe['count'], color='r', ls='-')
ax1.legend(loc=9, fontsize=12)

plt.show()

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Masikhethe iminyaka emi-4 kuso sonke isikhawu futhi sibonise unyaka ngamunye ukusatshalaliswa ngohlamvu lokuqala egameni nangohlamvu lokugcina egameni:

from string import ascii_lowercase, ascii_uppercase

fig_first, ax_first = plt.subplots(1, 1, figsize=(14,10))
fig_last, ax_last = plt.subplots(1, 1, figsize=(14,10))

index = np.arange(len(ascii_uppercase))
years = [1944, 1978, 1991, 2003]
colors = ['r', 'g', 'b', 'y']
n = 0
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    count = names.shape[0]

    dataframe = pd.DataFrame({'letter': [], 'frequency_first': [], 'frequency_last': []})
    for letter in ascii_uppercase:
        countFirst = (names[names.name.str.startswith(letter)].count()['count'])
        countLast = (names[names.name.str.endswith(letter.lower())].count()['count'])

        dataframe = dataframe.append(pd.DataFrame({
            'letter': [letter],
            'frequency_first': [countFirst / count * 100],
            'frequency_last': [countLast / count * 100]}))

    ax_first.bar(index + 0.3 * n, dataframe['frequency_first'], 0.3, alpha=0.5, color=colors[n], label=year)
    ax_last.bar(index + bar_width * n, dataframe['frequency_last'], 0.3, alpha=0.5, color=colors[n], label=year)
    n += 1

ax_first.set_xlabel('Π‘ΡƒΠΊΠ²Π° Π°Π»Ρ„Π°Π²ΠΈΡ‚Π°')
ax_first.set_ylabel('Частота, %')
ax_first.set_title('ΠŸΠ΅Ρ€Π²Π°Ρ Π±ΡƒΠΊΠ²Π° Π² ΠΈΠΌΠ΅Π½ΠΈ')
ax_first.set_xticks(index)
ax_first.set_xticklabels(ascii_uppercase)
ax_first.legend()

ax_last.set_xlabel('Π‘ΡƒΠΊΠ²Π° Π°Π»Ρ„Π°Π²ΠΈΡ‚Π°')
ax_last.set_ylabel('Частота, %')
ax_last.set_title('ПослСдняя Π±ΡƒΠΊΠ²Π° Π² ΠΈΠΌΠ΅Π½ΠΈ')
ax_last.set_xticks(index)
ax_last.set_xticklabels(ascii_uppercase)
ax_last.legend()

fig_first.tight_layout()
fig_last.tight_layout()

plt.show()

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Masenze uhlu lwabantu abambalwa abadumile (omongameli, abaculi, abadlali, abalingisi bamabhayisikobho) futhi sihlole ithonya labo ekuguquguqukeni kwamagama:

celebrities = {'Frank': 'M', 'Britney': 'F', 'Madonna': 'F', 'Bob': 'M'}
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)

for celebrity, sex in celebrities.items():
    names = result[result.name == celebrity]
    dataframe = names[names.sex == sex]
    fig, ax = plt.subplots(1, 1, figsize=(16,8))

    ax.set_xlabel('Π“ΠΎΠ΄Π°', fontsize = 10)
    ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ', fontsize = 10)
    ax.plot(dataframe['year'], dataframe['count'], label=celebrity, color='r', ls='-')
    ax.legend(loc=9, fontsize=12)
        
    plt.show()

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukusebenza ngekhono lokusebenzisa ukuqoqa nokubuka idatha kuPython

Ukuze uthole ukuqeqeshwa, ungakwazi ukwengeza isikhathi sokuphila kosaziwayo ekubukeni ngeso kusukela kusibonelo sokugcina ukuze uhlole ngokucacile ithonya labo ekuguquguqukeni kwamagama.

Ngalokhu, zonke izinjongo zethu zafezwa futhi zafezeka. Sithuthukise ikhono lokusebenzisa amathuluzi okuhlanganisa nokubona idatha ngeso lengqondo kuPython, futhi sizoqhubeka nokusebenza ngedatha. Wonke umuntu angenza iziphetho ngokusekelwe kudatha eyenziwe ngomumo, ebonwa ngamehlo engqondo.

Ulwazi kuwo wonke umuntu!

Source: www.habr.com

Engeza amazwana