Notion๊ณผ Python์œผ๋กœ ํ™ˆ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋งŒ๋“ค๊ธฐ

๋‚˜๋Š” ํ•ญ์ƒ ๋‚ด ์ „์ž ๋„์„œ๊ด€์—์„œ ์ฑ…์„ ๊ฐ€์žฅ ์ž˜ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๊ด€์‹ฌ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ๋‚˜๋Š” ํŽ˜์ด์ง€ ์ˆ˜์™€ ๊ธฐํƒ€ ํ˜œํƒ์„ ์ž๋™์œผ๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ์ด ์˜ต์…˜์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ณ ์–‘์ด ์•„๋ž˜์— ๊ด€์‹ฌ์žˆ๋Š” ๋ชจ๋“  ๋ถ„๋“ค๊ป˜ ๋ฌป์Šต๋‹ˆ๋‹ค.

1๋ถ€. ๋“œ๋กญ๋ฐ•์Šค

๋‚ด ์ฑ…์€ ๋ชจ๋‘ Dropbox์— ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ๊ฐ€ ๋ชจ๋“  ๊ฒƒ์„ ๊ต๊ณผ์„œ, ์ฐธ๊ณ ์ž๋ฃŒ, ์†Œ์„ค, ๋…ผํ”ฝ์…˜์˜ 4๊ฐ€์ง€ ๋ฒ”์ฃผ๋กœ ๋‚˜๋ˆ„์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‚˜๋Š” ํ…Œ์ด๋ธ”์— ์ฐธ๊ณ  ๋„์„œ๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋Œ€๋ถ€๋ถ„์˜ ์ฑ…์€ .epub์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” .pdf์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์ตœ์ข… ์†”๋ฃจ์…˜์€ ์–ด๋–ป๊ฒŒ๋“  ๋‘ ๊ฐ€์ง€ ์˜ต์…˜์„ ๋ชจ๋‘ ํฌ๊ด„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ฑ…์— ๋Œ€ํ•œ ๋‚˜์˜ ๊ฒฝ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

/ะšะฝะธะณะธ/ะะตั…ัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต/ะะพะฒะพะต/ะ”ะธะทะฐะนะฝ/ะฎั€ะธะน ะ“ะพั€ะดะพะฝ/ะšะฝะธะณะฐ ะฟั€ะพ ะฑัƒะบะฒั‹ ะพั‚ ะ ะดะพ ะฏ.epub 

์ฑ…์ด ์†Œ์„ค์ธ ๊ฒฝ์šฐ ์นดํ…Œ๊ณ ๋ฆฌ(์ฆ‰, ์œ„์˜ ๊ฒฝ์šฐ '๋””์ž์ธ')๊ฐ€ ์ œ๊ฑฐ๋ฉ๋‹ˆ๋‹ค.

ํด๋”๋ฅผ ๋™๊ธฐํ™”ํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์žˆ์œผ๋ฏ€๋กœ Dropbox API๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ณ„ํš์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํด๋”์—์„œ ์ฑ…์„ ๊ฐ€์ ธ์™€ ๋‹จ์–ด ์นด์šดํ„ฐ๋ฅผ ํ†ตํ•ด ๊ฐ ์ฑ…์„ ์‹คํ–‰ํ•˜๊ณ  Notion์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

2๋ถ€. ์ค„ ์ถ”๊ฐ€

ํ…Œ์ด๋ธ” ์ž์ฒด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์˜: ์—ด ์ด๋ฆ„์„ ๋ผํ‹ด์–ด๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

Notion๊ณผ Python์œผ๋กœ ํ™ˆ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋งŒ๋“ค๊ธฐ

๊ณต์‹ API๊ฐ€ ์•„์ง ์ œ๊ณต๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— ๋น„๊ณต์‹ Notion API๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Notion๊ณผ Python์œผ๋กœ ํ™ˆ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋งŒ๋“ค๊ธฐ

Notion์œผ๋กœ ์ด๋™ํ•˜์—ฌ Ctrl + Shift + J๋ฅผ ๋ˆ„๋ฅด๊ณ  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ -> ์ฟ ํ‚ค๋กœ ์ด๋™ํ•˜์—ฌ token_v2๋ฅผ ๋ณต์‚ฌํ•˜๊ณ  TOKEN์ด๋ผ๊ณ  ๋ถ€๋ฅด์„ธ์š”. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋„์„œ๊ด€ ํ‘œ์‹œ๊ฐ€ ์žˆ๋Š” ํ•„์š”ํ•œ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•˜์—ฌ ๋งํฌ๋ฅผ ๋ณต์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ทธ๊ฒƒ์„ NOTION์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ Notion์— ์—ฐ๊ฒฐํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

๋‹ค์Œ์œผ๋กœ, ํ…Œ์ด๋ธ”์— ํ–‰์„ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "ะฅัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต":
            row.author = tags[2]

        elif tags[0] == "ะะตั…ัƒะดะพะถะตัั‚ะฒะตะฝะฝะพะต":
            row.tags = tags[2]

        elif tags[0] == "ะฃั‡ะตะฑะฝะธะบะธ":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

๋ฌด์Šจ ์ผ์ด์•ผ? ์ฒซ ๋ฒˆ์งธ ํ–‰์˜ ํ…Œ์ด๋ธ”์— ์ƒˆ ํ–‰์„ ๊ฐ€์ ธ์™€ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ "/"๋ฅผ ๋”ฐ๋ผ ๊ฒฝ๋กœ๋ฅผ ๋ถ„ํ• ํ•˜๊ณ  ํƒœ๊ทธ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ํƒœ๊ทธ - "์˜ˆ์ˆ ", "๋””์ž์ธ", ์ž‘์„ฑ์ž ๋“ฑ์˜ ์ธก๋ฉด์—์„œ. ๊ทธ๋Ÿฐ ๋‹ค์Œ ํ”Œ๋ ˆ์ดํŠธ์˜ ํ•„์š”ํ•œ ๋ชจ๋“  ํ•„๋“œ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

3๋ถ€. ๋‹จ์–ด, ์‹œ๊ฐ„ ๋ฐ ๊ธฐํƒ€ ์ฆ๊ฑฐ์›€ ๊ณ„์‚ฐํ•˜๊ธฐ

์ด๊ฒƒ์€ ๋” ์–ด๋ ค์šด ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ๊ธฐ์–ตํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ epub๊ณผ pdf์˜ ๋‘ ๊ฐ€์ง€ ํ˜•์‹์ด ์žˆ์Šต๋‹ˆ๋‹ค. epub์— ๋ชจ๋“  ๊ฒƒ์ด ๋ช…ํ™•ํ•˜๋‹ค๋ฉด(์•„๋งˆ๋„ ๋‹จ์–ด๊ฐ€ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค) pdf์— ๋Œ€ํ•œ ๋ชจ๋“  ๊ฒƒ์ด ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹จ์ˆœํžˆ ์ ‘์ฐฉ๋œ ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ PDF์—์„œ ๋‹จ์–ด ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ธฐ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํŽ˜์ด์ง€ ์ˆ˜์— ํŠน์ • ์ƒ์ˆ˜(ํŽ˜์ด์ง€๋‹น ํ‰๊ท  ๋‹จ์–ด ์ˆ˜)๋ฅผ ๊ณฑํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์žˆ๋‹ค :

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

A4 ํŽ˜์ด์ง€์˜ WORDS_PER_PAGE๋Š” ์•ฝ 300์ž…๋‹ˆ๋‹ค.

์ด์ œ ํŽ˜์ด์ง€ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค ํŒŒ์ดPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

๋‹ค์Œ์œผ๋กœ Epub์—์„œ ํŽ˜์ด์ง€ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋‚ด์šฉ์„ ์ž‘์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์‚ฌ์šฉ epub_converter. ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” ์ฑ…์„ ๊ฐ€์ ธ์™€์„œ ์ค„๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๊ฐ ์ค„์˜ ๋‹จ์–ด ์ˆ˜๋ฅผ ์…‰๋‹ˆ๋‹ค.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

์ด์ œ ์‹œ๊ฐ„์„ ๊ณ„์‚ฐํ•ด ๋ด…์‹œ๋‹ค. ์šฐ๋ฆฌ๋Š” ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ์ข‹์•„ํ•˜๋Š” ๋‹จ์–ด ์ˆ˜๋ฅผ ์ทจํ•˜์—ฌ ์ฝ๊ธฐ ์†๋„๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

4๋ถ€. ๋ชจ๋“  ๋ถ€ํ’ˆ ์—ฐ๊ฒฐํ•˜๊ธฐ

์ฑ… ํด๋”์—์„œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๊ฒฝ๋กœ๋ฅผ ๊ฑฐ์ณ์•ผ ํ•ฉ๋‹ˆ๋‹ค. Notion์— ์ด๋ฏธ ์ฑ…์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ์ฑ…์ด ์žˆ์œผ๋ฉด ๋” ์ด์ƒ ์ค„์„ ๋งŒ๋“ค ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ ๋‹ค์Œ ์ด์— ๋”ฐ๋ผ ํŒŒ์ผ ํ˜•์‹์„ ๊ฒฐ์ •ํ•˜๊ณ  ๋‹จ์–ด ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์— ์ฑ…์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

์šฐ๋ฆฌ๊ฐ€ ์–ป๋Š” ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

๊ทธ๋ฆฌ๊ณ  ์ฑ…์ด ์ถ”๊ฐ€๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๊ธฐ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

๊ฒฐ๋ก 

์ด ๊ธ€์„ ์ฝ์–ด์ฃผ์‹  ๋ชจ๋“  ๋ถ„๋“ค๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋” ์ฝ์–ด๋ณด์‹œ๋ฉด ๋„์›€์ด ๋˜์‹ค ๊ฒƒ ๊ฐ™์•„์š” :)

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€