Ke hana nei i hale waihona puke me Notion a me Python

Ua hoihoi mau au i ka maikaʻi o ka hāʻawi ʻana i nā puke ma kaʻu waihona uila. I ka hopena, ua hele au i kēia koho me ka helu ʻana i ka helu o nā ʻaoʻao a me nā mea maikaʻi ʻē aʻe. Ke ninau aku nei au i ka poe hoihoi malalo o ka popoki.

Mahele 1. Dropbox

Aia kaʻu mau puke a pau ma ka pahu pahu. He 4 mau ʻāpana aʻu i hoʻokaʻawale ai i nā mea a pau: Puke Haʻawina, Reference, Fiction, Non-fiction. Akā ʻaʻole wau e hoʻohui i nā puke kuhikuhi i ka papaʻaina.

ʻO ka hapa nui o nā puke he .epub, ʻo ke koena he .pdf. ʻO ia hoʻi, pono e uhi ka hopena hope i nā koho ʻelua.

ʻO koʻu mau ala i nā puke penei:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Inā he moʻolelo moʻolelo ka puke, a laila wehe ʻia ka māhele (ʻo ia hoʻi, "Hoʻolālā" i ka hihia ma luna.

Ua hoʻoholo wau ʻaʻole e hoʻopilikia i ka Dropbox API, no ka mea, loaʻa iaʻu kā lākou noi e hoʻonohonoho i ka waihona. ʻO ia ka hoʻolālā: lawe mākou i nā puke mai ka waihona, holo i kēlā me kēia puke ma o ka helu huaʻōlelo, a hoʻohui iā Notion.

Mahele 2. Hoʻohui i kahi laina

ʻO ka papaʻaina ponoʻī ke ʻano like me kēia. NĀ MEA: ʻoi aku ka maikaʻi o ka hana ʻana i nā inoa kolamu ma ka Latin.

Ke hana nei i hale waihona puke me Notion a me Python

E hoʻohana mākou i ka Notion API no ka mea ʻaʻole i hāʻawi ʻia ka mea kūhelu.

Ke hana nei i hale waihona puke me Notion a me Python

E hele i Notion, e kaomi iā Ctrl + Shift + J, e hele i ka Application -> Cookies, kope token_v2 a kapa iā ia TOKEN. A laila hele mākou i ka ʻaoʻao e pono ai mākou me ka hōʻailona waihona a kope i ka loulou. Kapa mākou iā NOTION.

A laila kākau mākou i ke code e hoʻopili iā Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

A laila, e kākau i kahi hana e hoʻohui i kahi lālani i ka papaʻaina.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

He aha ka hana maanei. Lawe mākou a hoʻohui i kahi lālani hou i ka papaʻaina ma ka lālani mua. A laila, hoʻokaʻawale mākou i ko mākou ala ma ka "/" a loaʻa nā hōʻailona. Nā huaʻōlelo - ma ke ʻano o "Art", "Design", ʻo wai ka mea kākau, a pēlā aku. A laila hoʻonoho mākou i nā māla pono a pau o ka pā.

Mahele 3. Ka helu ʻana i nā huaʻōlelo, nā hola a me nā mea leʻaleʻa ʻē aʻe

He hana paʻakikī kēia. Ke hoʻomanaʻo nei mākou, loaʻa iā mākou ʻelua ʻano: epub a me pdf. Inā maopopo nā mea a pau me ka epub - aia paha nā huaʻōlelo, a laila ʻaʻole maopopo loa nā mea āpau e pili ana i ka pdf: aia paha nā kiʻi i hoʻopili ʻia.

No laila, ʻo kā mākou hana no ka helu ʻana i nā huaʻōlelo ma PDF e like me kēia: lawe mākou i ka helu o nā ʻaoʻao a hoʻonui i kahi mau (ka helu awelika o nā huaʻōlelo i kēlā me kēia ʻaoʻao).

Eia ʻo ia:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

ʻO kēia WORDS_PER_PAGE no ka palapala A4 ma kahi o 300.

I kēia manawa e kākau kāua i kahi hana e helu ʻaoʻao. E hoʻohana mākou pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

A laila, e kākau mākou i mea no ka helu ʻana i nā ʻaoʻao ma Epub. Hoʻohana mākou epub_converter. Eia mākou e lawe i ka puke, hoʻololi i nā laina, a helu i nā huaʻōlelo o kēlā me kēia laina.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

I kēia manawa, e helu kākou i ka manawa. Lawe mākou i kā mākou helu huaʻōlelo punahele a puʻunaue i kāu wikiwiki heluhelu.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Mahele 4. Hoʻohui i nā ʻāpana āpau

Pono mākou e hele i nā ala āpau i loko o kā mākou waihona puke. E nānā inā aia kekahi puke ma Notion: inā aia, ʻaʻole pono mākou e hana i kahi laina.
A laila pono mākou e hoʻoholo i ke ʻano o ka faila, ma muli o kēia, e helu i ka helu o nā huaʻōlelo. E hoʻohui i kahi puke ma ka hope.

ʻO kēia ke code i loaʻa iā mākou:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

A ʻo ka hana e nānā inā ua hoʻohui ʻia kahi puke e like me kēia:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

hopena

Mahalo i ka poʻe a pau i heluhelu i kēia ʻatikala. Manaʻo wau e kōkua iā ʻoe e heluhelu hou aʻe :)

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka