Making a Home Library with Notion and Python

I have always wondered how best to distribute books in my electronic library. As a result, I came to this option with automatic counting of the number of pages and other goodies. I ask all those interested under cat.

Part 1. Dropbox

All my books are in my dropbox. There are 4 categories into which I divided everything: Textbook, Reference, Artistic, Non-artistic. But I do not add reference books to the table.

Most of the books are .epub, the rest are .pdf. That is, the final solution should somehow cover both options.

The paths to the books I have are something like this:

/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub 

If the book is fiction, then the category (i.e. "Design" in the case above) is removed.

I decided not to bother with the dropbox API, since I have their application that synchronizes the folder. That is, the plan is this: we take books from the folder, run each book through the word counter, add it to Notion.

Part 2. Adding a line

The table itself should look something like this. ATTENTION: it is better to make column names in Latin.

Making a Home Library with Notion and Python

We will use the unofficial API of Notion, because the official one has not yet been delivered.

Making a Home Library with Notion and Python

Go to Notion, press Ctrl + Shift + J, go to Application -> Cookies, copy token_v2 and name it TOKEN. Then we go to the page we need with the library sign and copy the link. We call NOTION.

Then we write the code to connect to Notion.

database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()

Next, let's write a function to add a line to the table.

def add_row(path, file, words_count, pages_count, hours):
    row = database.collection.add_row()
    row.title = file

    tags = path.split("/")

    if len(tags) >= 1:
        row.what = tags[0]

    if len(tags) >= 2:
        row.state = tags[1]

    if len(tags) >= 3:
        if tags[0] == "Художественное":
            row.author = tags[2]

        elif tags[0] == "Нехудожественное":
            row.tags = tags[2]

        elif tags[0] == "Учебники":
            row.tags = tags[2]

    if len(tags) >= 4:
        row.author = tags[3]

    row.hours = hours
    row.pages = pages_count
    row.words = words_count

What's going on here. We take and add a new row to the table in the first row. Next, we split our path by "/" and get the tags. Tags - in terms of "Artistic", "Design", who is the author and so on. Then we set all the necessary fields of the plate.

Part 3. We count words, hours and other delights

This is a more difficult task. As we remember, we have two formats: epub and pdf. If everything is clear with epub - the words are probably there for sure, then everything is not so simple about pdf: it can simply consist of glued images.

So the function for counting words in PDF will look like this: we take the number of pages and multiply by a certain constant (the average number of words per page).

Here she is:

def get_words_count(pages_number):
    return pages_number * WORDS_PER_PAGE

This WORDS_PER_PAGE for an A4 page is about 300.

Now let's write a function for counting pages. We will use pyPDF2.

def get_pdf_pages_number(path, filename):
    pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
    return pdf.getNumPages()

Next, we will write a little thing for counting pages in epab. We use epub_converter. Here we take a book, convert it to lines, and count words for each line.

def get_epub_pages_number(path, filename):
    book = open_book(os.path.join(path, filename))
    lines = convert_epub_to_lines(book)
    words_count = 0

    for line in lines:
        words_count += len(line.split(" "))

    return round(words_count / WORDS_PER_PAGE)

Now let's count the time. We take our favorite number of words and divide by your reading speed.

def get_reading_time(words_count):
    return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10

Part 4. Connecting all the parts

We need to bypass all possible paths in our books folder. Check if there is already a book in Notion: if there is, we no longer need to create a line.
Then we need to determine the file type, depending on this, count the number of words. Add a book at the end.

This is the code we get:

for root, subdirs, files in os.walk(BOOKS_DIR):
    if len(files) > 0 and check_for_excusion(root):
        for file in files:
            array = file.split(".")
            filetype = file.split(".")[len(array) - 1]
            filename = file.replace("." + filetype, "")
            local_root = root.replace(BOOKS_DIR, "")

            print("Dir: {}, file: {}".format(local_root, file))

            if not check_for_existence(filename):
                print("Dir: {}, file: {}".format(local_root, file))

                if filetype == "pdf":
                    count = get_pdf_pages_number(root, file)

                else:
                    count = get_epub_pages_number(root, file)

                words_count = get_words_count(count)
                hours = get_reading_time(words_count)
                print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
                add_row(local_root, filename, words_count, count, hours)

And the function to check if a book has been added looks like this:

def check_for_existence(filename):
    for row in current_rows:
        if row.title in filename:
            return True

        elif filename in row.title:
            return True

    return False

Conclusion

Thanks to everyone who read this article. Hope it helps you read more 🙂

Source: habr.com

Add a comment