ืžืึทื›ืŸ ืึท ื”ื™ื™ื ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืžื™ื˜ ื ืึธื˜ื™ืึธืŸ ืื•ืŸ ืคึผื™ื˜ื”ืึธืŸ

ืื™ืš ื‘ื™ืŸ ืฉื˜ืขื ื“ื™ืง ื’ืขื•ื•ืขืŸ ืื™ื ื˜ืขืจืขืกื™ืจื˜ ืื™ืŸ ื•ื•ื™ ื‘ืขืกื˜ืขืจ ืฆื• ืคืึทืจืฉืคึผืจื™ื™ื˜ืŸ ื‘ื™ื›ืขืจ ืื™ืŸ ืžื™ื™ืŸ ืขืœืขืงื˜ืจืึธื ื™ืฉ ื‘ื™ื‘ืœื™ืึธื˜ืขืง. ืื™ืŸ ื“ื™ ืกื•ืฃ, ืื™ืš ื’ืขืงื•ืžืขืŸ ืฆื• ื“ืขื ืึธืคึผืฆื™ืข ืžื™ื˜ ืึธื˜ืึทืžืึทื˜ื™ืง ื›ืขื–ืฉื‘ืŸ ืคื•ืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ื‘ืœืขื˜ืขืจ ืื•ืŸ ืื ื“ืขืจืข ื’ื•ื“ื™ื–. ืื™ืš ืคืจืขื’ืŸ ืึทืœืข ืื™ื ื˜ืขืจืขืกื™ืจื˜ ืคื ื™ื ืื•ื ื˜ืขืจ ืงืึทืฅ.

ื˜ื™ื™ืœ 1. ื“ืจืึธืคึผื‘ืึธืงืก

ื›ืœ ืžื™ื™ืŸ ื‘ื™ื›ืขืจ ื–ืขื ืขืŸ ืื•ื™ืฃ ื“ืจืึธืคึผื‘ืึธืงืก. ืขืก ื–ืขื ืขืŸ 4 ืงืึทื˜ืขื’ืึธืจื™ืขืก ืื™ืŸ ื•ื•ืึธืก ืื™ืš ืฆืขื˜ื™ื™ืœื˜ ืึทืœืฅ: ืœืขืจื ื‘ื•ืš, ืจืขืคึฟืขืจืขื ืฅ, ื‘ืขืœืขื˜ืจื™ืกื˜ื™ืง, ื ืึธืŸ-ืคื™ืงืฉืึทืŸ. ืึธื‘ืขืจ ืื™ืš ื˜ืึธืŸ ื ื™ื˜ ืœื™ื™ื’ืŸ ืจืขืคึฟืขืจืขื ืฅ ื‘ื™ื›ืขืจ ืฆื• ื“ื™ ื˜ื™ืฉ.

ืจื•ื‘ึฟ ืคื•ืŸ ื“ื™ ื‘ื™ื›ืขืจ ื–ืขื ืขืŸ .ืขืคึผื•ื‘, ื“ื™ ืžื ื•ื—ื” ื–ืขื ืขืŸ .pdf. ืึทื– ืื™ื–, ื“ื™ ืœืขืฆื˜ ืœื™ื™ื–ื•ื ื’ ืžื•ื–ืŸ ืขืคืขืก ื“ืขืงืŸ ื‘ื™ื™ื“ืข ืึธืคึผืฆื™ืขืก.

ืžื™ื™ึทืŸ ืคึผืึทื˜ืก ืฆื• ื‘ื™ื›ืขืจ ื–ืขื ืขืŸ ืขืคึผืขืก ื•ื•ื™ ื“ืึธืก:

/ะšะฝะธะณะธ/ะะตั…ัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต/ะะพะฒะพะต/ะ”ะธะทะฐะนะฝ/ะฎั€ะธะน ะ“ะพั€ะดะพะฝ/ะšะฝะธะณะฐ ะฟั€ะพ ะฑัƒะบะฒั‹ ะพั‚ ะ ะดะพ ะฏ.epub 

ืื•ื™ื‘ ื“ืขืจ ื‘ื•ืš ืื™ื– ื‘ืขืœืขื˜ืจื™ืกื˜ื™ืง, ื“ื™ ืงืึทื˜ืขื’ืึธืจื™ืข (ื“ืึธืก ืื™ื–, "ื“ื™ื–ื™ื™ืŸ" ืื™ืŸ ื“ืขื ืคืึทืœ ืื•ื™ื‘ืŸ) ืื™ื– ืึทื•ื•ืขืงื’ืขื ื•ืžืขืŸ.

ืื™ืš ื‘ืึทืฉืœืึธืกืŸ ื ื™ืฉื˜ ืฆื• ืึทืจืŸ ืžื™ื˜ ื“ื™ ื“ืจืึธืคึผื‘ืึธืงืก ืึทืคึผื™, ื•ื•ื™ื™ึทืœ ืื™ืš ื”ืึธื‘ืŸ ื–ื™ื™ืขืจ ืึทืคึผืœืึทืงื™ื™ืฉืึทืŸ ื•ื•ืึธืก ืกื™ื ื’ืงืจืึทื ื™ื™ื– ื“ื™ ื˜ืขืงืข. ื“ืึธืก ื”ื™ื™ืกื˜, ื“ืขืจ ืคึผืœืึทืŸ ืื™ื– ื“ืึธืก: ืžื™ืจ ื ืขืžืขืŸ ื‘ื™ื›ืขืจ ืคื•ืŸ ื“ืขืจ ื˜ืขืงืข, ืœื•ื™ืคืŸ ื™ืขื“ืขืจ ื‘ื•ืš ื“ื•ืจืš ืึท ื•ื•ืึธืจื˜ ื˜ืึธืžื‘ืึทื ืง ืื•ืŸ ืœื™ื™ื’ืŸ ืขืก ืฆื• Notion.

ื˜ื™ื™ืœ 2. ืœื™ื™ื’ ืึท ืฉื•ืจื”

ื“ืขืจ ื˜ื™ืฉ ื–ื™ืš ื–ืึธืœ ืงื•ืงืŸ ืขืคึผืขืก ื•ื•ื™ ื“ืึธืก. ืื›ื˜ื•ื ื’: ืขืก ืื™ื– ื‘ืขืกืขืจ ืฆื• ืžืึทื›ืŸ ื–ื™ื™ึทืœ ื ืขืžืขืŸ ืื™ืŸ ืœืึทื˜ื™ื™ึทืŸ.

ืžืึทื›ืŸ ืึท ื”ื™ื™ื ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืžื™ื˜ ื ืึธื˜ื™ืึธืŸ ืื•ืŸ ืคึผื™ื˜ื”ืึธืŸ

ืžื™ืจ ื•ื•ืขืœืŸ ื ื•ืฆืŸ ื“ื™ ืึทื ืึทืคื™ืฉืึทืœ ื ืึธื˜ื™ืึธืŸ ืึทืคึผื™, ื•ื•ื™ื™ึทืœ ื“ืขืจ ื‘ืึทืึทืžื˜ืขืจ ืื™ื– ื ืึธืš ื ื™ืฉื˜ ืื™ื‘ืขืจื’ืขื’ืขื‘ืŸ.

ืžืึทื›ืŸ ืึท ื”ื™ื™ื ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืžื™ื˜ ื ืึธื˜ื™ืึธืŸ ืื•ืŸ ืคึผื™ื˜ื”ืึธืŸ

ื’ื™ื™ืŸ ืฆื• ื ืึธื˜ื™ืึธืŸ, ื“ืจื™ืงืŸ ืงื˜ืจืœ + ืฉื™ืคื˜ + ื“ื–ืฉ, ื’ื™ื™ืŸ ืฆื• ืึทืคึผืคึผืœื™ืงืึทื˜ื™ืึธืŸ -> ืงื™ื›ืœืขืš, ืงืึธืคึผื™ืข token_v2 ืื•ืŸ ืจื•ืคืŸ ืขืก TOKEN. ื“ืขืจื ืึธืš ืžื™ืจ ื’ื™ื™ืŸ ืฆื• ื“ื™ ื‘ืœืึทื˜ ื•ื•ืึธืก ืžื™ืจ ื“ืึทืจืคึฟืŸ ืžื™ื˜ ื“ื™ ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืฆื™ื™ื›ืŸ ืื•ืŸ ื ืึธื›ืžืึทื›ืŸ ื“ื™ ืœื™ื ืง. ืžื™ืจ ืจื•ืคืŸ ืขืก NOTION.

ื“ืขืจื ืึธืš ืžื™ืจ ืฉืจื™ื™ึทื‘ืŸ ื“ื™ ืงืึธื“ ืฆื• ืคืึทืจื‘ื™ื ื“ืŸ ืฆื• ื ืึธื˜ื™ืึธืŸ.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

ื•ื•ื™ื™ึทื˜ืขืจ, ืœืึธืžื™ืจ ืฉืจื™ื™ึทื‘ืŸ ืึท ืคึฟื•ื ืงืฆื™ืข ืฆื• ืœื™ื™ื’ืŸ ืึท ืจื•ื“ืขืจืŸ ืฆื• ื“ื™ ื˜ื™ืฉ.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "ะฅัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต":
            row.author = tags[2]

        elif tags[0] == "ะะตั…ัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต":
            row.tags = tags[2]

        elif tags[0] == "ะฃั‡ะตะฑะฝะธะบะธ":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

ื•ื•ืืก ื’ื™ื™ื˜ ื“ื ืคืืจ. ืžื™ืจ ื ืขืžืขืŸ ืื•ืŸ ืœื™ื™ื’ืŸ ืึท ื ื™ื™ึท ืจื•ื“ืขืจืŸ ืฆื• ื“ื™ ื˜ื™ืฉ ืื™ืŸ ื“ืขืจ ืขืจืฉื˜ืขืจ ืจื•ื“ืขืจืŸ. ื“ืขืจื ืึธืš, ืžื™ืจ ืฉืคึผืึทืœื˜ืŸ ืื•ื ื“ื–ืขืจ ื“ืจืš ืฆื•ื–ืืžืขืŸ "/" ืื•ืŸ ื‘ืึทืงื•ืžืขืŸ ื˜ืึทื’ืก. ื˜ืึทื’ืก - ืื™ืŸ ื˜ืขืจืžื™ื ืขืŸ ืคื•ืŸ "ืงื•ื ืกื˜", "ืคึผืœืึทืŸ", ื•ื•ืขืจ ืื™ื– ื“ืขืจ ืžื—ื‘ืจ, ืื•ืŸ ืึทื–ื•ื™ ืื•ื™ืฃ. ื“ืขืžืึธืœื˜ ืžื™ืจ ืฉื˜ืขืœืŸ ืึทืœืข ื“ื™ ื ื™ื™ื˜ื™ืง ืคืขืœื“ืขืจ ืคื•ืŸ ื“ื™ ื˜ืขืœืขืจ.

ื˜ื™ื™ืœ 3. ืงืึทื•ื ื˜ื™ื ื’ ื•ื•ืขืจื˜ืขืจ, ืฉืขื” ืื•ืŸ ืื ื“ืขืจืข ื“ื™ืœื™ื™ืฅ

ื“ืึธืก ืื™ื– ืึท ืžืขืจ ืฉื•ื•ืขืจ ืึทืจื‘ืขื˜. ื•ื•ื™ ืžื™ืจ ื’ืขื“ืขื ืงืขืŸ, ืžื™ืจ ื”ืึธื‘ืŸ ืฆื•ื•ื™ื™ ืคึฟืึธืจืžืึทื˜ื™ืจื•ื ื’ืขืŸ: epub ืื•ืŸ pdf. ืื•ื™ื‘ ืึทืœืฅ ืื™ื– ืงืœืึธืจ ืžื™ื˜ ื“ื™ ืขืคึผื•ื‘ - ื“ื™ ื•ื•ืขืจื˜ืขืจ ื–ืขื ืขืŸ ืžื™ืกื˜ืึธืžืข ื“ืึธืจื˜, ื“ืขืžืึธืœื˜ ืึทืœืฅ ืื™ื– ื ื™ืฉื˜ ืึทื–ื•ื™ ืงืœืึธืจ ื•ื•ืขื’ืŸ ื“ื™ ืคึผื“ืฃ: ืขืก ืงืขืŸ ืคืฉื•ื˜ ืฆื•ื ื•ื™ืคืฉื˜ืขืœื  ื–ื™ืš ืคื•ืŸ ื’ืœื•ื“ ื‘ื™ืœื“ืขืจ.

ืึทื–ื•ื™ ืื•ื ื“ื–ืขืจ ืคื•ื ืงืฆื™ืข ืคึฟืึทืจ ืงืึทื•ื ื˜ื™ื ื’ ื•ื•ืขืจื˜ืขืจ ืื™ืŸ PDF ื•ื•ืขื˜ ืงื•ืงืŸ ื•ื•ื™ ื“ืึธืก: ืžื™ืจ ื ืขืžืขืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ื‘ืœืขื˜ืขืจ ืื•ืŸ ืžืขืจืŸ ืžื™ื˜ ืึท ื–ื™ื›ืขืจ ืงืขืกื™ื™ื“ืขืจื“ื™ืง (ื“ื™ ื“ื•ืจื›ืฉื ื™ื˜ืœืขืš ื ื•ืžืขืจ ืคื•ืŸ ื•ื•ืขืจื˜ืขืจ ืคึผืขืจ ื‘ืœืึทื˜).

ื“ืึธ ืขืก ืื™ื–:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

ื“ืขืจ WORDS_PER_PAGE ืคึฟืึทืจ ืึทืŸ A4 ื‘ืœืึทื˜ ืื™ื– ื‘ืขืขืจืขืš 300.

ื™ืขืฆื˜ ืœืืžื™ืจ ืฉืจื™ื™ื‘ืŸ ื ืคื•ื ืงืฆื™ืข ืฆื• ืฆื™ื™ืœืŸ ื‘ืœืขื˜ืขืจ. ืžื™ืจ ื•ื•ืขืœืŸ ื ื•ืฆืŸ pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

ื•ื•ื™ื™ึทื˜ืขืจ, ืžื™ืจ ื•ื•ืขืœืŸ ืฉืจื™ื™ึทื‘ืŸ ืึท ื–ืึทืš ืคึฟืึทืจ ืงืึทื•ื ื˜ื™ื ื’ ื‘ืœืขื˜ืขืจ ืื™ืŸ ืขืคึผื•ื‘. ืžื™ืจ ื ื•ืฆืŸ epub_converter. ื“ืึธ ื ืขืžืขืŸ ืžื™ืจ ื“ืึธืก ื‘ื•ืš, ืคึฟืึทืจื•ื•ืึทื ื“ืœืขืŸ ืขืก ืื™ืŸ ืฉื•ืจื•ืช, ืื•ืŸ ืฆื™ื™ืœืŸ ื“ื™ ื•ื•ืขืจื˜ืขืจ ืคึฟืึทืจ ื™ืขื“ืขืจ ืฉื•ืจื”.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

ื™ืขืฆื˜ ืœืืžื™ืจ ืื•ื™ืกืจืขื›ืขื ืขืŸ ื“ื™ ืฆื™ื™ื˜. ืžื™ืจ ื ืขืžืขืŸ ืื•ื ื“ื–ืขืจ ื‘ืึทืœื™ื‘ืกื˜ืข ื•ื•ืึธืจื˜ ืฆื™ื™ืœืŸ ืื•ืŸ ื˜ื™ื™ืœืŸ ืขืก ื“ื•ืจืš ื“ื™ื™ืŸ ืœื™ื™ืขื ืขืŸ ื’ื™ื›ืงื™ื™ึทื˜.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

ื˜ื™ื™ืœ 4. ืงืึทื ืขืงื˜ื™ื ื’ ืึทืœืข ื“ื™ ื˜ื™ื™ืœืŸ

ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ื’ื™ื™ืŸ ื“ื•ืจืš ืึทืœืข ืžืขื’ืœืขืš ืคึผืึทื˜ืก ืื™ืŸ ืื•ื ื“ื–ืขืจ ื‘ื™ื›ืขืจ ื˜ืขืงืข. ืงื•ืง ืื•ื™ื‘ ื“ืขืจ ื‘ื•ืš ืฉื•ื™ืŸ ื™ื’ื–ื™ืกืฅ ืื™ืŸ ื ืึธื˜ื™ืึธืŸ: ืื•ื™ื‘ ืึทื–ื•ื™, ืžื™ืจ ื ื™ื˜ ืžืขืจ ื“ืึทืจืคึฟืŸ ืฆื• ืฉืึทืคึฟืŸ ืึท ืฉื•ืจื”.
ื“ืขืžืึธืœื˜ ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ื‘ืึทืฉืœื™ืกืŸ ื“ื™ ื˜ืขืงืข ื˜ื™ืคึผ, ื“ื™ืคึผืขื ื“ื™ื ื’ ืื•ื™ืฃ ื“ืขื, ืฆื™ื™ืœืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ื•ื•ืขืจื˜ืขืจ. ืœื™ื™ื’ ืึท ื‘ื•ืš ืื™ืŸ ื“ื™ ืกื•ืฃ.

ื“ืึธืก ืื™ื– ื“ืขืจ ืงืึธื“ ื•ื•ืึธืก ืžื™ืจ ื‘ืึทืงื•ืžืขืŸ:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

ืื•ืŸ ื“ื™ ืคึฟื•ื ืงืฆื™ืข ืฆื• ืงืึธื ื˜ืจืึธืœื™ืจืŸ ืฆื™ ืึท ื‘ื•ืš ืื™ื– ืฆื•ื’ืขื’ืขื‘ืŸ ืงื•ืงื˜ ื•ื•ื™ ื“ืึธืก:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

ืกืึธืฃ

ื“ืึทื ืงืขืŸ ืฆื• ืึทืœืขืžืขืŸ ื•ื•ืืก ืœื™ื™ืขื ืขืŸ ื“ืขื ืึทืจื˜ื™ืงืœ. ืื™ืš ื”ืึธืคึฟืŸ ืขืก ื”ืขืœืคึผืก ืื™ืจ ืœื™ื™ืขื ืขืŸ ืžืขืจ :)

ืžืงื•ืจ: www.habr.com

ืœื™ื™ื’ืŸ ืึท ื‘ืึทืžืขืจืงื•ื ื’