Kupanga laibulale yakunyumba ndi Notion ndi Python

Nthaŵi zonse ndakhala ndi chidwi ndi mmene ndingagaŵire bwino mabuku mu laibulale yanga yamagetsi. Pamapeto pake, ndinafika pa chisankho ichi ndikuwerengera kokha chiwerengero cha masamba ndi zina zabwino. Ndikufunsa anthu onse achidwi pansi pa mphaka.

Gawo 1. Dropbox

Mabuku anga onse ali pa dropbox. Pali magulu 4 omwe ndidagawanitsa chilichonse: Buku, Reference, Fiction, Non-fiction. Koma sindimawonjezera mabuku ofotokozera patebulo.

Mabuku ambiri ndi .epub, ena onse ndi .pdf. Ndiko kuti, yankho lomaliza liyenera kuphimba njira zonse ziwiri.

Njira zowerengera mabuku ndi izi:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Ngati bukhulo ndi lopeka, ndiye kuti gulu (ndiko kuti, "Design" pamutu womwe uli pamwambapa) limachotsedwa.

Ndidaganiza kuti ndisavutike ndi Dropbox API, popeza ndili ndi pulogalamu yawo yomwe imagwirizanitsa chikwatu. Ndiko kuti, dongosolo ndi ili: timatenga mabuku kuchokera mufoda, kuyendetsa bukhu lirilonse kupyolera mu kauntala ya mawu, ndikuwonjezera ku Notion.

Gawo 2. Onjezani mzere

Tebulo lokha liyenera kuwoneka motere. CHENJEZO: ndikwabwino kupanga mayina amzati mu Chilatini.

Kupanga laibulale yakunyumba ndi Notion ndi Python

Tidzagwiritsa ntchito Notion API yosavomerezeka, chifukwa chovomerezeka sichinaperekedwe.

Kupanga laibulale yakunyumba ndi Notion ndi Python

Pitani ku Notion, dinani Ctrl + Shift + J, pitani ku Application -> Cookies, copy token_v2 ndikuyitcha TOKEN. Kenako timapita patsamba lomwe tikufuna ndi chikwangwani cha library ndikukopera ulalo. Timachitcha kuti NOTION.

Kenako timalemba code kuti tigwirizane ndi Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Kenako, tiyeni tilembe ntchito kuti tiwonjezere mzere patebulo.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Chikuchitika ndi chiyani apa. Timatenga ndikuwonjezera mzere watsopano patebulo pamzere woyamba. Kenako, timagawaniza njira yathu ndi "/" ndikupeza ma tag. Tags - mawu akuti "Art", "Design", amene ndi wolemba, ndi zina zotero. Kenaka timayika minda yonse yofunikira ya mbale.

Gawo 3. Kuwerengera mawu, maola ndi zokondweretsa zina

Iyi ndi ntchito yovuta kwambiri. Monga tikukumbukira, tili ndi mitundu iwiri: epub ndi pdf. Ngati zonse zimveka bwino ndi epub - mawuwo mwina alipo, ndiye kuti zonse sizimveka bwino pa pdf: zitha kukhala ndi zithunzi zomata.

Chifukwa chake ntchito yathu yowerengera mawu mu PDF idzawoneka motere: timatenga kuchuluka kwa masamba ndikuchulukitsa ndi nthawi zina (chiwerengero cha mawu patsamba lililonse).

Ndi uyu:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

WORDS_PER_PAGE iyi pa tsamba la A4 ndi pafupifupi 300.

Tsopano tiyeni tilembe ntchito yowerengera masamba. Tidzagwiritsa ntchito pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Kenako, tilemba chinthu chowerengera masamba mu Epub. Timagwiritsa ntchito epub_converter. Apa timatenga bukhu, kulisintha kukhala mizere, ndikuwerengera mawu a mzere uliwonse.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Tsopano tiyeni tiwerengere nthawi. Timawerengera mawu omwe timakonda ndikuwagawa ndi liwiro lanu lowerenga.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Gawo 4. Kulumikiza zigawo zonse

Tiyenera kudutsa njira zonse zomwe zingatheke mufoda yathu ya mabuku. Onani ngati muli kale buku mu Notion: ngati liripo, sitifunikanso kupanga mzere.
Ndiye tiyenera kudziwa mtundu wa fayilo, kutengera izi, kuwerengera kuchuluka kwa mawu. Onjezani buku kumapeto.

Nayi code yomwe timapeza:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Ndipo ntchito yoyang'ana ngati buku lawonjezeredwa ikuwoneka motere:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

Pomaliza

Zikomo kwa onse amene mwawerenga nkhaniyi. Ndikukhulupirira kuti ikuthandizani kuti muwerenge zambiri :)

Source: www.habr.com

Kuwonjezera ndemanga