Kugadzira raibhurari yekumba neNotion uye Python

Ndagara ndichifarira nzira yakanakisisa yekugovera mabhuku muraibhurari yangu yemagetsi. Pakupedzisira, ndakasvika pane iyi sarudzo nekuverenga otomatiki yehuwandu hwemapeji uye zvimwe zvakanaka. Ndinobvunza vose vanofarira vari pasi pekati.

Chikamu 1. Dropbox

Mabhuku angu ese ari padropbox. Pane zvikamu zvina zvandakagovera zvese: Bhuku reKuverenga, Reference, Fiction, Non-fiction. Asi ini handiwedzere mabhuku ereferenzi patafura.

Mazhinji emabhuku ndee.epub, mamwe ese ari .pdf. Ndiko kuti, mhinduro yekupedzisira inofanira neimwe nzira kuvhara sarudzo mbiri.

Nzira dzangu dzemabhuku dzakaita seizvi:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Kana bhuku iri manyepo, saka chikamu (kureva, "Gadzira" mune iri pamusoro) chinobviswa.

Ndakafunga kusanetsa neDropbox API, sezvo ini ndine yavo application inowiriranisa folda. Ndiko kuti, chirongwa ndeichi: tinotora mabhuku kubva mufolda, tomhanyisa bhuku rega rega kuburikidza nekaunda yemazwi, uye toiwedzera kuNotion.

Chikamu 2. Wedzera mutsara

Tafura pachayo inofanira kutaridzika seizvi. ATTENTION: zviri nani kugadzira mazita emakoramu muchiLatin.

Kugadzira raibhurari yekumba neNotion uye Python

Isu tichashandisa iyo unofficial Notion API, nekuti iyo yepamutemo haisati yaunzwa.

Kugadzira raibhurari yekumba neNotion uye Python

Enda kuNotion, dzvanya Ctrl + Shift + J, enda kuKushandisa -> Cookies, kopi token_v2 uye uishe TOKEN. Ipapo tinoenda kune peji yatinoda nechiratidzo cheraibhurari uye tikopa chinongedzo. Tinozvidaidza kuti ZVOKUITA.

Zvadaro tinonyora kodhi yekubatanidza kuNotion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Tevere, ngatinyorei basa rekuwedzera mutsara patafura.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Chii chiri kuitika pano. Isu tinotora uye tinowedzera mutsara mutsva patafura mumutsara wekutanga. Zvadaro, tinoparadzanisa nzira yedu pamwe chete ne "/" uye tinowana ma tag. Tags - maererano "Art", "Design", ndiani munyori, uye zvichingodaro. Zvadaro tinoisa minda yose inodiwa yeplate.

Chikamu 3. Kuverenga mazwi, maawa uye zvimwe zvinofadza

Iri ibasa rakaoma zvikuru. Sezvatinorangarira, tine mafomati maviri: epub uye pdf. Kana zvese zvakajeka ne epub - mazwi angangove aripo, saka zvese hazvina kujeka nezve pdf: inogona kungove nemifananidzo yakanamirwa.

Saka basa redu rekuverenga mazwi muPDF richaita seizvi: tinotora huwandu hwemapeji uye towanza neimwe nguva (avhareji yehuwandu hwemashoko papeji).

Heunoi

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

Iyi WORDS_PER_PAGE yeA4 peji inoita mazana matatu.

Zvino ngatinyorei basa rekuverenga mapeji. Tichashandisa pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Tevere, isu tichanyora chinhu chekuverenga mapeji muEpub. Isu tinoshandisa epub_converter. Pano tinotora bhuku, torishandura kuita mitsara, uye tinoverenga mazwi emutsara wega wega.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Zvino ngativerengei nguva. Isu tinotora mazwi edu atinoda uye tinoapatsanura nekumhanya kwako kwekuverenga.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Chikamu 4. Kubatanidza zvikamu zvose

Tinofanira kupinda nenzira dzose dzinobvira mufodhi redu remabhuku. Tarisa kana patova nebhuku muNotion: kana iripo, hatichadi kugadzira mutsara.
Zvadaro tinoda kusarudza rudzi rwefaira, zvichienderana neizvi, kuverenga nhamba yemashoko. Wedzera bhuku kumagumo.

Iyi ndiyo kodhi yatinowana:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Uye basa rekutarisa kana bhuku rawedzerwa rinotaridzika seizvi:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

mhedziso

Ndinotenda kune wese akaverenga chinyorwa ichi. Ndinovimba inokubatsira kuverenga zvakawanda :)

Source: www.habr.com

Voeg