Ho etsa laeborari ea lapeng ka Notion le Python

Haesale ke thahasella tsela e molemohali ea ho aba libuka laebraring ea ka ea elektronike. Qetellong, ke fihlile khethong ena ka ho bala ka mokhoa o itekanetseng oa palo ea maqephe le lintho tse ling tse ntle. Ke botsa batho bohle ba nang le thahasello tlas'a katse.

Karolo ea 1. Dropbox

Libuka tsa ka kaofela li ho dropbox. Ho na le mekhahlelo e 4 eo ke arotseng ntho e 'ngoe le e' ngoe ho eona: Buka ea Thuto, Reference, Fiction, Non-fiction. Empa ha ke kenye libuka tsa litšupiso tafoleng.

Libuka tse ngata ke .epub, tse ling kaofela ke .pdf. Ke hore, tharollo ea ho qetela e tlameha ho akaretsa likhetho tse peli ka tsela e itseng.

Litsela tsa ka tsa libuka ke tse kang tsena:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Haeba buka e le tšōmo, joale sehlopha (ke hore, "Moqapi" tabeng e ka holimo) se tla tlosoa.

Ke nkile qeto ea ho se khathatsehe ka Dropbox API, kaha ke na le ts'ebeliso ea bona e hokahanyang foldara. Ke hore, moralo ke ona: re nka libuka foldareng, re tsamaisa buka ka 'ngoe ka k'haontareng ea mantsoe, ebe re e eketsa ho Notion.

Karolo ea 2. Kenya mola

Tafole ka boeona e lokela ho shebahala tjena. TLHOKOMELISO: ho molemo ho etsa mabitso a likholomo ka Selatine.

Ho etsa laeborari ea lapeng ka Notion le Python

Re tla sebelisa Notion API e seng ea semmuso, hobane ea semmuso ha e so fihle.

Ho etsa laeborari ea lapeng ka Notion le Python

E-ea ho Notion, tobetsa Ctrl + Shift + J, u ee ho Kopo -> Li-cookie, kopitsa token_v2 'me u e bitse TOKEN. Ebe re ea leqepheng leo re le hlokang ka letšoao la laebrari ebe re kopitsa sehokelo. Re e bitsa NAKO.

Ebe re ngola khoutu ho hokela Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Ka mor'a moo, ha re ngoleng ts'ebetso ho eketsa mola tafoleng.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Ho etsahalang mona. Re nka le ho eketsa mola o mocha tafoleng ea mola oa pele. Ka mor'a moo, re arola tsela ea rona "/" ebe re fumana li-tag. Li-tag - ho ea ka "Art", "Design", mongoli ke mang, joalo-joalo. Ebe re beha masimo ohle a hlokahalang a poleiti.

Karolo ea 3. Ho bala mantsoe, lihora le lintho tse ling tse monate

Ona ke mosebetsi o boima ho feta. Joalokaha re hopola, re na le lifomate tse peli: epub le pdf. Haeba ntho e 'ngoe le e' ngoe e hlakile ka epub - mohlomong mantsoe a teng ka sebele, joale ka pdf ntho e 'ngoe le e' ngoe ha e hlake haholo: e ka 'na ea e-ba le litšoantšo tse khomaretsoeng.

Kahoo mosebetsi oa rona oa ho bala mantsoe ho PDF o tla shebahala tjena: re nka palo ea maqephe ebe re atisa ka mokhoa o itseng (palo e tloaelehileng ea mantsoe leqepheng le leng le le leng).

Ke enoa:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

WORDS_PER_PAGE ena ea leqephe la A4 e ka ba 300.

Joale ha re ngoleng tšebetso ea ho bala maqephe. Re tla sebelisa pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Ka mor'a moo, re tla ngola ntho bakeng sa ho bala maqephe ho Epub. Re sebelisa epub_converter. Mona re nka buka, re e fetola mela, ebe re bala mantsoe a mola o mong le o mong.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Joale ha re bale nako. Re nka palo ea mantsoe eo re e ratang 'me re e arola ka lebelo la ho bala.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Karolo ea 4. Ho kopanya likarolo tsohle

Re hloka ho tsamaea ka litsela tsohle tse ka har'a foldara ea rona ea libuka. Sheba hore na ho na le buka ho Notion: haeba e teng, ha ho sa hlokahala hore re thehe mola.
Joale re hloka ho tseba mofuta oa faele, ho itšetlehile ka sena, bala palo ea mantsoe. Eketsa buka qetellong.

Ena ke khoutu eo re e fumanang:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Mme mosebetsi oa ho lekola hore na buka e kentsoe o shebahala tjena:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

fihlela qeto e

Ke leboha bohle ba balileng sehlooho sena. Ke tšepa hore e tla u thusa ho bala haholoanyane :)

Source: www.habr.com

Eketsa ka tlhaloso