Ukwenza ithala leencwadi lasekhaya ngeNotion kunye nePython

Bendisoloko ndinomdla weyona ndlela ilungileyo yokuhambisa iincwadi kwithala lam le-elektroniki. Ekugqibeleni, ndize kolu khetho ngokubala okuzenzekelayo kwenani lamaphepha kunye nezinye izinto ezilungileyo. Ndibuza bonke abantu abanomdla phantsi kwekati.

Icandelo 1. I-Dropbox

Zonke iincwadi zam zifakwe kwi-dropbox. Kukho iindidi ezi-4 apho ndahlulahlula yonke into: Incwadi yesiKhokelo, iReference, Fiction, Non-fiction. Kodwa andongezi iincwadi zereferensi etafileni.

Uninzi lweencwadi ze.epub, ezinye ziyi.pdf. Oko kukuthi, isisombululo sokugqibela kufuneka ngandlela thile sigubungele iinketho zombini.

Iindlela zam eziya ezincwadini zinje:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Ukuba incwadi iyintsomi, ngoko ke udidi (oko kukuthi, “Uyilo” kwimeko engasentla) luyasuswa.

Ndithathe isigqibo sokungazikhathazi ngeDropbox API, kuba ndinesicelo sabo esidibanisa ifolda. Oko kukuthi, isicwangciso sesi: sithatha iincwadi kwifolda, siqhuba incwadi nganye kwikhawuntara yamagama, kwaye songeze kwiNgcaciso.

Icandelo 2. Yongeza umgca

Itafile ngokwayo kufuneka ijonge into enje. QAPHELA: kungcono ukwenza amagama eekholomu ngesiLatini.

Ukwenza ithala leencwadi lasekhaya ngeNotion kunye nePython

Siza kusebenzisa iNotion API engekho semthethweni, kuba esemthethweni ayikahanjiswa.

Ukwenza ithala leencwadi lasekhaya ngeNotion kunye nePython

Yiya kwi-Notion, cinezela i-Ctrl + Shift + J, yiya kwiSicelo -> Iikhukhi, kopisha i-token_v2 kwaye uyibize ngokuthi TOKEN. Emva koko siye kwiphepha esilifunayo kunye nophawu lwethala leencwadi kwaye sikope ikhonkco. Siyibiza ngokuba INGQONDO.

Emva koko sibhala ikhowudi yokudibanisa kwi-Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Okulandelayo, masibhale umsebenzi wokongeza umqolo kwitheyibhile.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Kwenzeka ntoni apha. Sithatha kwaye songeza umqolo omtsha kwitafile kumqolo wokuqala. Emva koko, sahlula indlela yethu kunye "/" kwaye sifumane iithegi. Iithegi - ngokwemiqathango ye "Art", "Design", ngubani umbhali, njalo njalo. Emva koko sibeka zonke iindawo eziyimfuneko zeplate.

Icandelo 3. Ukubala amagama, iiyure kunye nokunye okuvuyisayo

Lo ngumsebenzi onzima ngakumbi. Njengoko sikhumbula, sinefomathi ezimbini: i-epub kunye ne-pdf. Ukuba yonke into icacile nge-epub - amagama akhona, ngoko ke yonke into ayicacanga malunga ne-pdf: inokuba nemifanekiso encamathiselweyo.

Ke umsebenzi wethu wokubala amagama kwiPDF uya kujongeka ngolu hlobo: sithatha inani lamaphepha kwaye siphindaphindeka ngokuguquguquka okuthile (umndilili wenani lamagama kwiphepha ngalinye).

Nanku ke:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

Le WORDS_PER_PAGE yephepha le-A4 limalunga nama-300.

Ngoku masibhale umsebenzi wokubala amaphepha. Siza kusebenzisa pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Okulandelayo, siza kubhala into yokubala amaphepha kwi-Epub. Sisebenzisa epub_converter. Apha sithatha incwadi, siyiguqule ibe yimigca, kwaye sibala amagama omgca ngamnye.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Ngoku masibale ixesha. Sithatha inani lethu lamagama esilithandayo kwaye silahlule ngesantya sakho sokufunda.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Icandelo 4. Ukudibanisa onke amalungu

Kufuneka sihambe kuzo zonke iindlela ezinokubakho kwifolda yeencwadi zethu. Khangela ukuba sele kukho incwadi kwiNgcebiso: ukuba ikhona, akusekho mfuneko yokuba senze umgca.
Emva koko kufuneka sinqume uhlobo lwefayile, kuxhomekeke kule nto, bala inani lamagama. Yongeza incwadi ekugqibeleni.

Le yikhowudi esiyifumanayo:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Kwaye umsebenzi wokukhangela ukuba ingaba incwadi yongeziwe na ibonakala ngolu hlobo:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

isiphelo

Ndibulela wonke umntu ofunde eli nqaku. Ndiyathemba ukuba iya kukunceda ufunde ngakumbi :)

umthombo: www.habr.com

Yongeza izimvo