Yin ɗakin karatu na gida tare da Notion da Python

A koyaushe ina sha'awar yadda mafi kyawun rarraba littattafai a ɗakin karatu na na lantarki. A ƙarshe, na zo wannan zaɓi tare da lissafin atomatik na adadin shafuka da sauran abubuwan alheri. Ina tambayar duk masu sha'awar ƙarƙashin cat.

Part 1. Dropbox

Duk littafai na suna kan akwatin ajiya. Akwai nau'ikan guda 4 waɗanda na raba komai a cikinsu: Littafin Karatu, Reference, Fiction, Non-fiction. Amma ba na ƙara littattafan tunani zuwa tebur ba.

Yawancin littattafan .epub ne, sauran kuma .pdf. Wato, dole ne ko ta yaya mafita ta ƙarshe ta ƙunshi zaɓuɓɓukan biyu.

Hanyoyi na zuwa littattafai sune kamar haka:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Idan littafin almara ne, to, an cire nau'in (wato, "Design" a cikin abin da ke sama).

Na yanke shawarar kada in damu da Dropbox API, tunda ina da aikace-aikacen su wanda ke daidaita babban fayil ɗin. Wato, shirin shine wannan: muna ɗaukar littattafai daga babban fayil, gudanar da kowane littafi ta hanyar ma'aunin kalmomi, sannan mu ƙara shi zuwa Notion.

Sashe na 2. Ƙara layi

Teburin da kansa yakamata yayi kama da wannan. HANKALI: yana da kyau a yi sunaye shafi a cikin Latin.

Yin ɗakin karatu na gida tare da Notion da Python

Za mu yi amfani da API ɗin da ba na hukuma ba, saboda har yanzu ba a isar da na hukuma ba.

Yin ɗakin karatu na gida tare da Notion da Python

Je zuwa Notion, danna Ctrl + Shift + J, je zuwa Application -> Kukis, kwafi token_v2 kuma kira shi TOKEN. Sa'an nan kuma mu je shafin da muke bukata tare da alamar ɗakin karatu kuma mu kwafi hanyar haɗin. Muna kiran shi NOTION.

Sa'an nan kuma mu rubuta code don haɗi zuwa Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Na gaba, bari mu rubuta aiki don ƙara jere zuwa tebur.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Me ke faruwa a nan. Muna ɗauka kuma muna ƙara sabon layi zuwa teburin a jere na farko. Bayan haka, muna raba hanyarmu tare da "/" kuma mu sami tags. Tags - a cikin sharuddan "Art", "Design", wanda shine marubucin, da sauransu. Sa'an nan kuma mu saita duk filayen da ake bukata na farantin.

Sashe na 3. Ƙididdiga kalmomi, sa'o'i da sauran abubuwan jin daɗi

Wannan aiki ne mai wahala. Kamar yadda muke tunawa, muna da tsari guda biyu: epub da pdf. Idan komai ya bayyana tare da epub - kalmomin suna yiwuwa a can, to duk abin bai fito fili ba game da pdf: yana iya kasancewa kawai ya ƙunshi hotuna da aka liƙa.

Don haka aikinmu na kirga kalmomi a cikin PDF zai yi kama da haka: muna ɗaukar adadin shafuka kuma mu ninka ta wani akai-akai (matsakaicin adadin kalmomi a kowane shafi).

Ga ta:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

Wannan WORDS_PER_PAGE na shafi na A4 kusan 300 ne.

Yanzu bari mu rubuta aiki don ƙidaya shafuka. Za mu yi amfani PDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Na gaba, za mu rubuta wani abu don kirga shafuka a cikin Epub. Muna amfani epub_converter. Anan mu ɗauki littafin, mu mayar da shi cikin layi, kuma mu ƙidaya kalmomin kowane layi.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Yanzu bari mu lissafta lokaci. Muna ɗaukar adadin kalmomin da muka fi so kuma mu raba ta ta saurin karatun ku.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Sashe na 4. Haɗa dukkan sassan

Muna buƙatar bi ta duk hanyoyin da za a iya bi a cikin babban fayil ɗin littattafanmu. Bincika idan an riga an sami littafi a cikin Magana: idan akwai, ba ma buƙatar ƙirƙirar layi.
Sa'an nan muna buƙatar ƙayyade nau'in fayil kuma, dangane da wannan, ƙidaya adadin kalmomi. Ƙara littafi a ƙarshe.

Wannan shine code din da muke samu:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Kuma aikin duba ko an ƙara littafi yayi kama da haka:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

ƙarshe

Godiya ga duk wanda ya karanta wannan labarin. Ina fatan zai taimaka muku karantawa :)

source: www.habr.com

Add a comment