Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Hei Habr!

Nhasi tichashanda pahunyanzvi hwekushandisa maturusi ekuunganidza nekuona data muPython. Mune zvakapihwa dataset paGithub Ngationgororei akati wandei uye tivake seti yezviono.

Maererano netsika, pakutanga, ngatitsanangurire zvinangwa:

  • Data yeboka nehukadzi uye gore uye fungidzira kushanduka kwese kwehuwandu hwekuzvarwa kwevose varume nevakadzi;
  • Tsvaga mazita anozivikanwa zvikuru enguva dzose;
  • Kamura nguva yese yedata muzvikamu gumi uye kune chimwe nechimwe, tsvaga zita rinonyanya kufarirwa remurume wega wega. Pazita rega rega rawanikwa, fungidzira mafambiro aro nguva dzese;
  • Kwegore rega rega, verenga kuti mangani mazita anofukidza 50% yevanhu uye fungidzira (tichaona akasiyana mazita egore rega rega);
  • Sarudza makore mana kubva panguva yese uye ratidza gore rega rega kugoverwa nebhii rekutanga muzita uye nerekupedzisira vara muzita;
  • Ita runyoro rwevanhu vanoverengeka vane mukurumbira (mapurezidhendi, vaimbi, vatambi, vatambi vemumabhaisikopo) uye ongorora maitiro avo pakuchinja kwemazita. Gadzira chitarisiko.

Mashoko mashoma, mamwe kodhi!

Uye, handei.

Ngatiunganidzei data nevarume uye gore uye toona masimba ese ehuwandu hwekuzvarwa kwevose varume nevakadzi:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

years = np.arange(1880, 2011, 3)
datalist = 'https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/babynames/yob{year}.txt'
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)
sex = result.groupby('sex')
births_men = sex.get_group('M').groupby('year', as_index=False)
births_women = sex.get_group('F').groupby('year', as_index=False)
births_men_list = births_men.aggregate(np.sum)['count'].tolist()
births_women_list = births_women.aggregate(np.sum)['count'].tolist()

fig, ax = plt.subplots()
fig.set_size_inches(25,15)

index = np.arange(len(years))
stolb1 = ax.bar(index, births_men_list, 0.4, color='c', label='ΠœΡƒΠΆΡ‡ΠΈΠ½Ρ‹')
stolb2 = ax.bar(index + 0.4, births_women_list, 0.4, alpha=0.8, color='r', label='Π–Π΅Π½Ρ‰ΠΈΠ½Ρ‹')

ax.set_title('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ ΠΏΠΎ ΠΏΠΎΠ»Ρƒ ΠΈ Π³ΠΎΠ΄Π°ΠΌ')
ax.set_xlabel('Π“ΠΎΠ΄Π°')
ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ')
ax.set_xticklabels(years)
ax.set_xticks(index + 0.4)
ax.legend(loc=9)

fig.tight_layout()
plt.show()

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Ngatitsvage mazita ane mukurumbira munhoroondo:

years = np.arange(1880, 2011)

dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe)

result = pd.concat(dataframes)
names = result.groupby('name', as_index=False).sum().sort_values('count', ascending=False)
names.head(10)

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Ngatigovanei nguva yese mu data muzvikamu gumi uye kune chimwe nechimwe tichawana zita rinonyanya kufarirwa remurume wega wega. Kune rimwe nerimwe zita rawanikwa, isu tinoona madhirimu aro nguva dzese:

years = np.arange(1880, 2011)
part_size = int((years[years.size - 1] - years[0]) / 10) + 1
parts = {}
def GetPart(year):
    return int((year - years[0]) / part_size)
for year in years:
    index = GetPart(year)
    r = years[0] + part_size * index, min(years[years.size - 1], years[0] + part_size * (index + 1))
    parts[index] = str(r[0]) + '-' + str(r[1])

dataframe_parts = []
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframe_parts.append(dataframe.assign(years=parts[GetPart(year)]))
    dataframes.append(dataframe.assign(year=year))
    
result_parts = pd.concat(dataframe_parts)
result = pd.concat(dataframes)

result_parts_sums = result_parts.groupby(['years', 'sex', 'name'], as_index=False).sum()
result_parts_names = result_parts_sums.iloc[result_parts_sums.groupby(['years', 'sex'], as_index=False).apply(lambda x: x['count'].idxmax())]
result_sums = result.groupby(['year', 'sex', 'name'], as_index=False).sum()

for groupName, groupLabels in result_parts_names.groupby(['name', 'sex']).groups.items():
    group = result_sums.groupby(['name', 'sex']).get_group(groupName)
    fig, ax = plt.subplots(1, 1, figsize=(18,10))

    ax.set_xlabel('Π“ΠΎΠ΄Π°')
    ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ')
    label = group['name']
    ax.plot(group['year'], group['count'], label=label.aggregate(np.max), color='b', ls='-')
    ax.legend(loc=9, fontsize=11)

    plt.show()

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kwegore rega rega, isu tinoverenga kuti mangani mazita anofukidza makumi mashanu muzana evanhu uye kuona iyi data:

dataframe = pd.DataFrame({'year': [], 'count': []})
years = np.arange(1880, 2011)
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    names['sum'] = names.sum()['count']
    names['percent'] = names['count'] / names['sum'] * 100
    names = names.sort_values(['percent'], ascending=False)
    names['cum_perc'] = names['percent'].cumsum()
    names_filtered = names[names['cum_perc'] <= 50]
    dataframe = dataframe.append(pd.DataFrame({'year': [year], 'count': [names_filtered.shape[0]]}))

fig, ax1 = plt.subplots(1, 1, figsize=(22,13))
ax1.set_xlabel('Π“ΠΎΠ΄Π°', fontsize = 12)
ax1.set_ylabel('Π Π°Π·Π½ΠΎΠΎΠ±Ρ€Π°Π·ΠΈΠ΅ ΠΈΠΌΠ΅Π½', fontsize = 12)
ax1.plot(dataframe['year'], dataframe['count'], color='r', ls='-')
ax1.legend(loc=9, fontsize=12)

plt.show()

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Ngatisarudzei makore mana kubva panguva yese uye tiratidze gore rega rega kugoverwa nebhii rekutanga muzita uye nerekupedzisira vara muzita:

from string import ascii_lowercase, ascii_uppercase

fig_first, ax_first = plt.subplots(1, 1, figsize=(14,10))
fig_last, ax_last = plt.subplots(1, 1, figsize=(14,10))

index = np.arange(len(ascii_uppercase))
years = [1944, 1978, 1991, 2003]
colors = ['r', 'g', 'b', 'y']
n = 0
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    count = names.shape[0]

    dataframe = pd.DataFrame({'letter': [], 'frequency_first': [], 'frequency_last': []})
    for letter in ascii_uppercase:
        countFirst = (names[names.name.str.startswith(letter)].count()['count'])
        countLast = (names[names.name.str.endswith(letter.lower())].count()['count'])

        dataframe = dataframe.append(pd.DataFrame({
            'letter': [letter],
            'frequency_first': [countFirst / count * 100],
            'frequency_last': [countLast / count * 100]}))

    ax_first.bar(index + 0.3 * n, dataframe['frequency_first'], 0.3, alpha=0.5, color=colors[n], label=year)
    ax_last.bar(index + bar_width * n, dataframe['frequency_last'], 0.3, alpha=0.5, color=colors[n], label=year)
    n += 1

ax_first.set_xlabel('Π‘ΡƒΠΊΠ²Π° Π°Π»Ρ„Π°Π²ΠΈΡ‚Π°')
ax_first.set_ylabel('Частота, %')
ax_first.set_title('ΠŸΠ΅Ρ€Π²Π°Ρ Π±ΡƒΠΊΠ²Π° Π² ΠΈΠΌΠ΅Π½ΠΈ')
ax_first.set_xticks(index)
ax_first.set_xticklabels(ascii_uppercase)
ax_first.legend()

ax_last.set_xlabel('Π‘ΡƒΠΊΠ²Π° Π°Π»Ρ„Π°Π²ΠΈΡ‚Π°')
ax_last.set_ylabel('Частота, %')
ax_last.set_title('ПослСдняя Π±ΡƒΠΊΠ²Π° Π² ΠΈΠΌΠ΅Π½ΠΈ')
ax_last.set_xticks(index)
ax_last.set_xticklabels(ascii_uppercase)
ax_last.legend()

fig_first.tight_layout()
fig_last.tight_layout()

plt.show()

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Ngatitorei runyorwa rwevanhu vanoverengeka vane mukurumbira (mapurezidhendi, vaimbi, vatambi, vatambi vemumabhaisikopo) uye tiongorore simba ravo pamasimba emazita:

celebrities = {'Frank': 'M', 'Britney': 'F', 'Madonna': 'F', 'Bob': 'M'}
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)

for celebrity, sex in celebrities.items():
    names = result[result.name == celebrity]
    dataframe = names[names.sex == sex]
    fig, ax = plt.subplots(1, 1, figsize=(16,8))

    ax.set_xlabel('Π“ΠΎΠ΄Π°', fontsize = 10)
    ax.set_ylabel('Π ΠΎΠΆΠ΄Π°Π΅ΠΌΠΎΡΡ‚ΡŒ', fontsize = 10)
    ax.plot(dataframe['year'], dataframe['count'], label=celebrity, color='r', ls='-')
    ax.legend(loc=9, fontsize=12)
        
    plt.show()

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kushanda pahunyanzvi hwekushandisa mapoka uye kuona data muPython

Kuti udzidzise, ​​iwe unogona kuwedzera nguva yehupenyu hwemukurumbira kune yekuona kubva kumuenzaniso wekupedzisira kuitira kuti uongorore zvakajeka simba ravo pane simba remazita.

Neizvi, zvinangwa zvedu zvese zvakazadzikiswa uye zvakazadzikiswa. Isu takagadzira hunyanzvi hwekushandisa maturusi ekuunganidza uye kuona data muPython, uye isu ticharamba tichishanda nedata. Wese munhu anogona kutora mhedziso zvichibva pane yakagadzirira-yakagadzirwa, yakaonekwa data ivo pachavo.

Zivo kune wese!

Source: www.habr.com

Voeg