Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Hei Habr!

I tenei ra ka mahi maatau i te mohio ki te whakamahi taputapu mo te whakarōpū me te whakaata raraunga ki te Python. I roto i te whakaratohia raraunga i runga i Github Kia wetewetehia etahi ahuatanga ka hanga he huinga tirohanga.

E ai ki nga korero tuku iho, i te timatanga, me tautuhi nga whainga:

  • Whakarōpūhia nga raraunga ma te ira tangata me te tau me te tiro i te hihiri o te reanga whanau o nga tane e rua;
  • Kimihia nga ingoa rongonui o nga wa katoa;
  • Wehea te wa katoa i roto i nga raraunga kia 10 nga wahanga, mo ia, kimihia te ingoa rongonui o ia ira tangata. Mo ia ingoa ka kitea, tiro i ona hihiritanga i nga wa katoa;
  • Mo ia tau, tatauhia e hia nga ingoa e kapi ana i te 50% o nga tangata me te whakaata (ka kite tatou i nga momo ingoa mo ia tau);
  • Tīpakohia nga tau 4 mai i te waahi katoa ka whakaatu mo ia tau te tohatoha ma te reta tuatahi o te ingoa me te reta whakamutunga i te ingoa;
  • Hangaia he rarangi o etahi tangata rongonui (perehitini, kaiwaiata, kaiwhakaari, tangata kiriata) me te arotake i o raatau awe ki runga i te hihiri o nga ingoa. Hangaia he tirohanga.

He iti ake nga kupu, he nui ake te waehere!

Na, me haere.

Me whakarōpūhia nga raraunga ma te ira tangata me te tau, me te tiro i te hihiri o te reanga whanau o nga tane e rua:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

years = np.arange(1880, 2011, 3)
datalist = 'https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/babynames/yob{year}.txt'
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)
sex = result.groupby('sex')
births_men = sex.get_group('M').groupby('year', as_index=False)
births_women = sex.get_group('F').groupby('year', as_index=False)
births_men_list = births_men.aggregate(np.sum)['count'].tolist()
births_women_list = births_women.aggregate(np.sum)['count'].tolist()

fig, ax = plt.subplots()
fig.set_size_inches(25,15)

index = np.arange(len(years))
stolb1 = ax.bar(index, births_men_list, 0.4, color='c', label='Мужчины')
stolb2 = ax.bar(index + 0.4, births_women_list, 0.4, alpha=0.8, color='r', label='Женщины')

ax.set_title('Рождаемость по полу и годам')
ax.set_xlabel('Года')
ax.set_ylabel('Рождаемость')
ax.set_xticklabels(years)
ax.set_xticks(index + 0.4)
ax.legend(loc=9)

fig.tight_layout()
plt.show()

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Kia kimihia nga ingoa rongonui o te hitori:

years = np.arange(1880, 2011)

dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe)

result = pd.concat(dataframes)
names = result.groupby('name', as_index=False).sum().sort_values('count', ascending=False)
names.head(10)

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Me wehewehe te wa katoa o te raraunga ki nga wahanga 10 ka kitea e ia te ingoa rongonui o ia ira tangata. Mo ia ingoa ka kitea, ka tiro tatou i ona hihiri i nga wa katoa:

years = np.arange(1880, 2011)
part_size = int((years[years.size - 1] - years[0]) / 10) + 1
parts = {}
def GetPart(year):
    return int((year - years[0]) / part_size)
for year in years:
    index = GetPart(year)
    r = years[0] + part_size * index, min(years[years.size - 1], years[0] + part_size * (index + 1))
    parts[index] = str(r[0]) + '-' + str(r[1])

dataframe_parts = []
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframe_parts.append(dataframe.assign(years=parts[GetPart(year)]))
    dataframes.append(dataframe.assign(year=year))
    
result_parts = pd.concat(dataframe_parts)
result = pd.concat(dataframes)

result_parts_sums = result_parts.groupby(['years', 'sex', 'name'], as_index=False).sum()
result_parts_names = result_parts_sums.iloc[result_parts_sums.groupby(['years', 'sex'], as_index=False).apply(lambda x: x['count'].idxmax())]
result_sums = result.groupby(['year', 'sex', 'name'], as_index=False).sum()

for groupName, groupLabels in result_parts_names.groupby(['name', 'sex']).groups.items():
    group = result_sums.groupby(['name', 'sex']).get_group(groupName)
    fig, ax = plt.subplots(1, 1, figsize=(18,10))

    ax.set_xlabel('Года')
    ax.set_ylabel('Рождаемость')
    label = group['name']
    ax.plot(group['year'], group['count'], label=label.aggregate(np.max), color='b', ls='-')
    ax.legend(loc=9, fontsize=11)

    plt.show()

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Mo ia tau, ka tatauhia e hia nga ingoa e kapi ana i te 50% o nga tangata me te whakaata i enei raraunga:

dataframe = pd.DataFrame({'year': [], 'count': []})
years = np.arange(1880, 2011)
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    names['sum'] = names.sum()['count']
    names['percent'] = names['count'] / names['sum'] * 100
    names = names.sort_values(['percent'], ascending=False)
    names['cum_perc'] = names['percent'].cumsum()
    names_filtered = names[names['cum_perc'] <= 50]
    dataframe = dataframe.append(pd.DataFrame({'year': [year], 'count': [names_filtered.shape[0]]}))

fig, ax1 = plt.subplots(1, 1, figsize=(22,13))
ax1.set_xlabel('Года', fontsize = 12)
ax1.set_ylabel('Разнообразие имен', fontsize = 12)
ax1.plot(dataframe['year'], dataframe['count'], color='r', ls='-')
ax1.legend(loc=9, fontsize=12)

plt.show()

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Me whiriwhiri e 4 tau mai i te wa katoa ka whakaatu mo ia tau te tohatoha ma te reta tuatahi o te ingoa me te reta whakamutunga o te ingoa:

from string import ascii_lowercase, ascii_uppercase

fig_first, ax_first = plt.subplots(1, 1, figsize=(14,10))
fig_last, ax_last = plt.subplots(1, 1, figsize=(14,10))

index = np.arange(len(ascii_uppercase))
years = [1944, 1978, 1991, 2003]
colors = ['r', 'g', 'b', 'y']
n = 0
for year in years:
    dataset = datalist.format(year=year)
    csv = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    names = csv.groupby('name', as_index=False).aggregate(np.sum)
    count = names.shape[0]

    dataframe = pd.DataFrame({'letter': [], 'frequency_first': [], 'frequency_last': []})
    for letter in ascii_uppercase:
        countFirst = (names[names.name.str.startswith(letter)].count()['count'])
        countLast = (names[names.name.str.endswith(letter.lower())].count()['count'])

        dataframe = dataframe.append(pd.DataFrame({
            'letter': [letter],
            'frequency_first': [countFirst / count * 100],
            'frequency_last': [countLast / count * 100]}))

    ax_first.bar(index + 0.3 * n, dataframe['frequency_first'], 0.3, alpha=0.5, color=colors[n], label=year)
    ax_last.bar(index + bar_width * n, dataframe['frequency_last'], 0.3, alpha=0.5, color=colors[n], label=year)
    n += 1

ax_first.set_xlabel('Буква алфавита')
ax_first.set_ylabel('Частота, %')
ax_first.set_title('Первая буква в имени')
ax_first.set_xticks(index)
ax_first.set_xticklabels(ascii_uppercase)
ax_first.legend()

ax_last.set_xlabel('Буква алфавита')
ax_last.set_ylabel('Частота, %')
ax_last.set_title('Последняя буква в имени')
ax_last.set_xticks(index)
ax_last.set_xticklabels(ascii_uppercase)
ax_last.legend()

fig_first.tight_layout()
fig_last.tight_layout()

plt.show()

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Me hanga he rarangi o etahi tangata rongonui (perehitini, kaiwaiata, kaiwhakaari, tangata kiriata) me te arotake i o raatau awe ki te kaha o nga ingoa:

celebrities = {'Frank': 'M', 'Britney': 'F', 'Madonna': 'F', 'Bob': 'M'}
dataframes = []
for year in years:
    dataset = datalist.format(year=year)
    dataframe = pd.read_csv(dataset, names=['name', 'sex', 'count'])
    dataframes.append(dataframe.assign(year=year))

result = pd.concat(dataframes)

for celebrity, sex in celebrities.items():
    names = result[result.name == celebrity]
    dataframe = names[names.sex == sex]
    fig, ax = plt.subplots(1, 1, figsize=(16,8))

    ax.set_xlabel('Года', fontsize = 10)
    ax.set_ylabel('Рождаемость', fontsize = 10)
    ax.plot(dataframe['year'], dataframe['count'], label=celebrity, color='r', ls='-')
    ax.legend(loc=9, fontsize=12)
        
    plt.show()

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Te mahi i runga i te pukenga ki te whakamahi whakarōpūtanga me te whakaata raraunga i roto i te Python

Mo te whakangungu, ka taea e koe te taapiri i te waa ora o te tangata rongonui ki te tirohanga mai i te tauira whakamutunga kia pai ai te aromatawai i o raatau awe ki te hihiri o nga ingoa.

Na tenei, i tutuki katoa a matou whainga me te tutuki. Kua whakawhanakehia e matou te mohio ki te whakamahi taputapu mo te whakarōpū me te whakaata raraunga i roto i te Python, a ka mahi tonu matou me nga raraunga. Ka taea e te katoa te whakatau whakatau i runga i nga raraunga kua oti te hanga, kua ata tirohia.

Te matauranga ki te katoa!

Source: will.com

Tāpiri i te kōrero