Ua ib lub tsev qiv ntawv hauv tsev nrog Lus Cim thiab Python

Kuv ib txwm xav paub yuav ua li cas thiaj li faib cov phau ntawv hauv kuv lub tsev qiv ntawv hluav taws xob. Thaum kawg, kuv tuaj rau qhov kev xaiv no nrog kev suav tsis siv neeg ntawm cov nplooj ntawv thiab lwm yam khoom zoo. Kuv nug txhua tus neeg txaus siab hauv qab miv.

Ntu 1. Dropbox

Tag nrho kuv cov ntaub ntawv yog nyob rau hauv dropbox. Muaj 4 pawg uas kuv muab faib ua txhua yam: Phau ntawv, Kev Qhia, Lus tseeb, Tsis yog ntawv tseeb. Tab sis kuv tsis ntxiv cov ntaub ntawv siv rau lub rooj.

Feem ntau ntawm cov phau ntawv yog .epub, tus so yog .pdf. Ntawd yog, qhov kev daws teeb meem kawg yuav tsum tau npog ob qho kev xaiv.

Kuv txoj kev mus rau phau ntawv yog ib yam zoo li no:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Yog tias phau ntawv yog lus tseeb, ces qeb (uas yog, "Tsim" nyob rau hauv rooj plaub saum toj no) raug tshem tawm.

Kuv txiav txim siab tsis txhob thab nrog Dropbox API, vim kuv muaj lawv daim ntawv thov uas synchronizes lub nplaub tshev. Ntawd yog, txoj kev npaj yog qhov no: peb nqa cov phau ntawv los ntawm cov ntawv tais ceev tseg, khiav txhua phau ntawv los ntawm ib lo lus txee, thiab ntxiv rau Kev Xav.

Ntu 2. Ntxiv ib kab

Lub rooj nws tus kheej yuav tsum zoo li no. CEEB TOOM: Nws yog qhov zoo dua los ua cov npe kab hauv Latin.

Ua ib lub tsev qiv ntawv hauv tsev nrog Lus Cim thiab Python

Peb yuav siv qhov kev xav tsis raug cai API, vim hais tias tus thawj coj tseem tsis tau xa.

Ua ib lub tsev qiv ntawv hauv tsev nrog Lus Cim thiab Python

Mus rau Kev Xav, nias Ctrl + Ua haujlwm + J, mus rau Daim Ntawv Thov -> Khoom qab zib, luam token_v2 thiab hu nws TOKEN. Tom qab ntawd peb mus rau nplooj ntawv peb xav tau nrog lub tsev qiv ntawv kos npe thiab luam qhov txuas. Peb hu nws NOTION.

Tom qab ntawd peb sau cov cai txuas mus rau Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Tom ntej no, cia peb sau ib txoj haujlwm ntxiv rau kab rau lub rooj.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Yuav ua li cas rau ntawm no. Peb coj thiab ntxiv ib kab tshiab rau lub rooj hauv thawj kab. Tom ntej no, peb faib peb txoj hauv kev "/" thiab tau txais cov cim npe. Cim npe - nyob rau hauv cov nqe lus ntawm "Art", "Design", uas yog tus sau, thiab hais txog. Tom qab ntawd peb teem tag nrho cov tsim nyog teb ntawm lub phaj.

Ntu 3. suav cov lus, teev thiab lwm yam zoo siab

Qhov no yog ib txoj hauj lwm nyuaj dua. Raws li peb nco qab, peb muaj ob hom: epub thiab pdf. Yog tias txhua yam meej nrog epub - cov lus muaj nyob ntawd, ces txhua yam tsis meej txog pdf: nws tsuas yog muaj cov duab nplaum.

Yog li peb txoj haujlwm rau suav cov lus hauv PDF yuav zoo li no: peb muab cov nplooj ntawv thiab muab faib los ntawm qee qhov tsis tu ncua (qhov nruab nrab ntawm cov lus ib nplooj ntawv).

Ntawm no nws yog:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

WORDS_PER_PAGE no rau nplooj ntawv A4 yog kwv yees li 300.

Tam sim no cia peb sau ib txoj haujlwm los suav cov nplooj ntawv. Peb yuav siv pyPDF 2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Tom ntej no, peb yuav sau ib yam rau suav cov nplooj ntawv hauv Epub. Peb siv epub_ converter. Ntawm no peb muab phau ntawv, hloov mus rau hauv kab, thiab suav cov lus rau txhua kab.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Tam sim no cia peb suav lub sijhawm. Peb muab peb cov lus nyiam suav thiab faib nws los ntawm koj qhov kev nyeem ntawv ceev.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Ntu 4. Txuas txhua qhov chaw

Peb yuav tsum mus dhau txhua txoj hauv kev hauv peb phau ntawv tais ceev tseg. Xyuas seb puas muaj ib phau ntawv hauv Kev Ceeb Toom: yog tias muaj, peb tsis tas yuav tsim kab.
Tom qab ntawd peb yuav tsum txiav txim siab hom ntaub ntawv, nyob ntawm qhov no, suav cov lus. Ntxiv ib phau ntawv thaum kawg.

Nov yog qhov code peb tau txais:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Thiab lub luag haujlwm los xyuas seb phau ntawv puas tau ntxiv zoo li no:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

xaus

Ua tsaug rau txhua tus uas tau nyeem tsab xov xwm no. Kuv vam tias nws yuav pab koj nyeem ntxiv :)

Tau qhov twg los: www.hab.com

Ntxiv ib saib