A koyaushe ina sha'awar yadda mafi kyawun rarraba littattafai a ɗakin karatu na na lantarki. A ƙarshe, na zo wannan zaɓi tare da lissafin atomatik na adadin shafuka da sauran abubuwan alheri. Ina tambayar duk masu sha'awar ƙarƙashin cat.
Part 1. Dropbox
Duk littafai na suna kan akwatin ajiya. Akwai nau'ikan guda 4 waɗanda na raba komai a cikinsu: Littafin Karatu, Reference, Fiction, Non-fiction. Amma ba na ƙara littattafan tunani zuwa tebur ba.
Yawancin littattafan .epub ne, sauran kuma .pdf. Wato, dole ne ko ta yaya mafita ta ƙarshe ta ƙunshi zaɓuɓɓukan biyu.
Hanyoyi na zuwa littattafai sune kamar haka:
/Книги/Нехудожественное/Новое/Дизайн/Юрий Гордон/Книга про буквы от А до Я.epub
Idan littafin almara ne, to, an cire nau'in (wato, "Design" a cikin abin da ke sama).
Na yanke shawarar kada in damu da Dropbox API, tunda ina da aikace-aikacen su wanda ke daidaita babban fayil ɗin. Wato, shirin shine wannan: muna ɗaukar littattafai daga babban fayil, gudanar da kowane littafi ta hanyar ma'aunin kalmomi, sannan mu ƙara shi zuwa Notion.
Sashe na 2. Ƙara layi
Teburin da kansa yakamata yayi kama da wannan. HANKALI: yana da kyau a yi sunaye shafi a cikin Latin.
Za mu yi amfani da API ɗin da ba na hukuma ba, saboda har yanzu ba a isar da na hukuma ba.
Je zuwa Notion, danna Ctrl + Shift + J, je zuwa Application -> Kukis, kwafi token_v2 kuma kira shi TOKEN. Sa'an nan kuma mu je shafin da muke bukata tare da alamar ɗakin karatu kuma mu kwafi hanyar haɗin. Muna kiran shi NOTION.
Sa'an nan kuma mu rubuta code don haɗi zuwa Notion.
database = client.get_collection_view(NOTION)
current_rows = database.default_query().execute()
Na gaba, bari mu rubuta aiki don ƙara jere zuwa tebur.
def add_row(path, file, words_count, pages_count, hours):
row = database.collection.add_row()
row.title = file
tags = path.split("/")
if len(tags) >= 1:
row.what = tags[0]
if len(tags) >= 2:
row.state = tags[1]
if len(tags) >= 3:
if tags[0] == "Художественное":
row.author = tags[2]
elif tags[0] == "Нехудожественное":
row.tags = tags[2]
elif tags[0] == "Учебники":
row.tags = tags[2]
if len(tags) >= 4:
row.author = tags[3]
row.hours = hours
row.pages = pages_count
row.words = words_count
Me ke faruwa a nan. Muna ɗauka kuma muna ƙara sabon layi zuwa teburin a jere na farko. Bayan haka, muna raba hanyarmu tare da "/" kuma mu sami tags. Tags - a cikin sharuddan "Art", "Design", wanda shine marubucin, da sauransu. Sa'an nan kuma mu saita duk filayen da ake bukata na farantin.
Sashe na 3. Ƙididdiga kalmomi, sa'o'i da sauran abubuwan jin daɗi
Wannan aiki ne mai wahala. Kamar yadda muke tunawa, muna da tsari guda biyu: epub da pdf. Idan komai ya bayyana tare da epub - kalmomin suna yiwuwa a can, to duk abin bai fito fili ba game da pdf: yana iya kasancewa kawai ya ƙunshi hotuna da aka liƙa.
Don haka aikinmu na kirga kalmomi a cikin PDF zai yi kama da haka: muna ɗaukar adadin shafuka kuma mu ninka ta wani akai-akai (matsakaicin adadin kalmomi a kowane shafi).
Ga ta:
def get_words_count(pages_number):
return pages_number * WORDS_PER_PAGE
Wannan WORDS_PER_PAGE na shafi na A4 kusan 300 ne.
Yanzu bari mu rubuta aiki don ƙidaya shafuka. Za mu yi amfani
def get_pdf_pages_number(path, filename):
pdf = PdfFileReader(open(os.path.join(path, filename), 'rb'))
return pdf.getNumPages()
Na gaba, za mu rubuta wani abu don kirga shafuka a cikin Epub. Muna amfani
def get_epub_pages_number(path, filename):
book = open_book(os.path.join(path, filename))
lines = convert_epub_to_lines(book)
words_count = 0
for line in lines:
words_count += len(line.split(" "))
return round(words_count / WORDS_PER_PAGE)
Yanzu bari mu lissafta lokaci. Muna ɗaukar adadin kalmomin da muka fi so kuma mu raba ta ta saurin karatun ku.
def get_reading_time(words_count):
return round(((words_count / WORDS_PER_MINUTE) / 60) * 10) / 10
Sashe na 4. Haɗa dukkan sassan
Muna buƙatar bi ta duk hanyoyin da za a iya bi a cikin babban fayil ɗin littattafanmu. Bincika idan an riga an sami littafi a cikin Magana: idan akwai, ba ma buƙatar ƙirƙirar layi.
Sa'an nan muna buƙatar ƙayyade nau'in fayil kuma, dangane da wannan, ƙidaya adadin kalmomi. Ƙara littafi a ƙarshe.
Wannan shine code din da muke samu:
for root, subdirs, files in os.walk(BOOKS_DIR):
if len(files) > 0 and check_for_excusion(root):
for file in files:
array = file.split(".")
filetype = file.split(".")[len(array) - 1]
filename = file.replace("." + filetype, "")
local_root = root.replace(BOOKS_DIR, "")
print("Dir: {}, file: {}".format(local_root, file))
if not check_for_existence(filename):
print("Dir: {}, file: {}".format(local_root, file))
if filetype == "pdf":
count = get_pdf_pages_number(root, file)
else:
count = get_epub_pages_number(root, file)
words_count = get_words_count(count)
hours = get_reading_time(words_count)
print("Pages: {}, Words: {}, Hours: {}".format(count, words_count, hours))
add_row(local_root, filename, words_count, count, hours)
Kuma aikin duba ko an ƙara littafi yayi kama da haka:
def check_for_existence(filename):
for row in current_rows:
if row.title in filename:
return True
elif filename in row.title:
return True
return False
ƙarshe
Godiya ga duk wanda ya karanta wannan labarin. Ina fatan zai taimaka muku karantawa :)
source: www.habr.com