Ṣiṣe Ile-ikawe Ile pẹlu Notion ati Python

Mo ti nigbagbogbo nife ninu bi o ṣe dara julọ lati pin awọn iwe ni ibi ikawe itanna mi. Ni ipari, Mo wa si aṣayan yii pẹlu iṣiro laifọwọyi ti nọmba awọn oju-iwe ati awọn ire miiran. Mo beere lọwọ gbogbo awọn ti o nife labẹ ologbo.

Apá 1. Dropbox

Gbogbo awọn iwe mi wa lori apoti gbigbe. Awọn ẹka mẹrin wa ninu eyiti Mo pin ohun gbogbo: Iwe-ẹkọ, Itọkasi, Fiction, No-fiction. Ṣugbọn Emi ko ṣafikun awọn iwe itọkasi si tabili.

Pupọ ninu awọn iwe jẹ .epub, awọn iyokù jẹ .pdf. Iyẹn ni, ojutu ikẹhin gbọdọ bakan bo awọn aṣayan mejeeji.

Awọn ọna mi si awọn iwe jẹ nkan bii eyi:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

Ti iwe ba jẹ itan-akọọlẹ, lẹhinna ẹka naa (iyẹn ni, “Apẹrẹ” ninu ọran ti o wa loke) ti yọkuro.

Mo pinnu lati ma ṣe wahala pẹlu Dropbox API, nitori Mo ni ohun elo wọn ti o mu folda ṣiṣẹpọ. Iyẹn ni, ero naa ni eyi: a gba awọn iwe lati inu folda, ṣiṣe iwe kọọkan nipasẹ akọwe ọrọ kan, ki o ṣafikun si Notion.

Apá 2. Fi kan ila

Tabili funrararẹ yẹ ki o dabi iru eyi. AKIYESI: o dara lati ṣe awọn orukọ ọwọn ni Latin.

Ṣiṣe Ile-ikawe Ile pẹlu Notion ati Python

A yoo lo API Notion laigba aṣẹ, nitori pe osise ko tii jiṣẹ.

Ṣiṣe Ile-ikawe Ile pẹlu Notion ati Python

Lọ si Notion, tẹ Ctrl + Shift + J, lọ si Ohun elo -> Awọn kuki, daakọ token_v2 ki o pe ni TOKEN. Lẹhinna a lọ si oju-iwe ti a nilo pẹlu ami ikawe ati daakọ ọna asopọ naa. A npe ni NOTION.

Lẹhinna a kọ koodu naa lati sopọ si Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Nigbamii, jẹ ki a kọ iṣẹ kan lati ṣafikun ila kan si tabili.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

Kini n ṣẹlẹ nibi. A ya ki o si fi kan titun kana si awọn tabili ni akọkọ kana. Nigbamii ti, a pin ọna wa pẹlu "/" ati gba awọn afi. Awọn afi - ni awọn ofin ti “Aworan”, “Apẹrẹ”, tani onkọwe, ati bẹbẹ lọ. Lẹhinna a ṣeto gbogbo awọn aaye pataki ti awo naa.

Apá 3. Kika ọrọ, wakati ati awọn miiran delights

Eyi jẹ iṣẹ-ṣiṣe ti o nira sii. Bi a ṣe ranti, a ni awọn ọna kika meji: epub ati pdf. Ti ohun gbogbo ba han gbangba pẹlu epub - awọn ọrọ le wa nibẹ, lẹhinna ohun gbogbo ko han gbangba nipa pdf: o le jiroro ni awọn aworan ti a fi lẹ pọ.

Nitorinaa iṣẹ wa fun kika awọn ọrọ ni PDF yoo dabi eyi: a mu nọmba awọn oju-iwe ati isodipupo nipasẹ igbagbogbo kan (nọmba apapọ awọn ọrọ fun oju-iwe kan).

Eyi ni:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

WORDS_PER_PAGE yii fun oju-iwe A4 fẹrẹ to 300.

Bayi jẹ ki a kọ iṣẹ kan lati ka awọn oju-iwe. A yoo lo PDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Nigbamii, a yoo kọ ohun kan fun kika awọn oju-iwe ni Epub. A nlo epub_converter. Nibi a mu iwe naa, yi pada si awọn ila, ati ka awọn ọrọ fun laini kọọkan.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Bayi jẹ ki a ṣe iṣiro akoko naa. A gba kika ọrọ ayanfẹ wa ati pin nipasẹ iyara kika rẹ.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Apá 4. Nsopọ gbogbo awọn ẹya ara

A nilo lati lọ nipasẹ gbogbo awọn ọna ti o ṣeeṣe ninu folda awọn iwe wa. Ṣayẹwo boya iwe kan wa tẹlẹ ni Notion: ti o ba wa, a ko nilo lati ṣẹda laini kan.
Lẹhinna a nilo lati pinnu iru faili naa, da lori eyi, ka nọmba awọn ọrọ naa. Fi iwe kan kun ni ipari.

Eyi ni koodu ti a gba:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

Ati pe iṣẹ lati ṣayẹwo boya a ti ṣafikun iwe kan dabi eyi:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

ipari

O ṣeun si gbogbo eniyan ti o ka yi article. Mo nireti pe o ṣe iranlọwọ fun ọ lati ka diẹ sii :)

orisun: www.habr.com

Fi ọrọìwòye kun