Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ

์˜ค๋Š˜ ์šฐ๋ฆฌ๊ฐ€ ๋ฒˆ์—ญํ•œ ๊ธฐ์‚ฌ์˜ ์ €์ž๋Š” ๊ทธ ๋ชฉํ‘œ๊ฐ€ ํ•ญ๊ณต๊ถŒ ๊ฐ€๊ฒฉ์„ ๊ฒ€์ƒ‰ํ•˜๋Š” Selenium์„ ์‚ฌ์šฉํ•˜์—ฌ Python์œผ๋กœ ์›น ์Šคํฌ๋ ˆ์ดํผ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ํ‹ฐ์ผ“ ๊ฒ€์ƒ‰ ์‹œ ์œ ๋™์ ์ธ ๋‚ ์งœ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(์ง€์ •๋œ ๋‚ ์งœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ +- 3์ผ). ์Šคํฌ๋ž˜ํผ๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ Excel ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰์„ ์‹คํ–‰ํ•œ ์‚ฌ๋žŒ์—๊ฒŒ ์ฐพ์€ ๋‚ด์šฉ์˜ ์š”์•ฝ์ด ํฌํ•จ๋œ ์ด๋ฉ”์ผ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ์˜ ๋ชฉํ‘œ๋Š” ์—ฌํ–‰์ž๊ฐ€ ์ตœ๊ณ ์˜ ์ƒํ’ˆ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ

์ž๋ฃŒ๋ฅผ ์ดํ•ดํ•˜๋‹ค๊ฐ€ ํ—ค๋งค๋Š” ๋Š๋‚Œ์ด ๋“ ๋‹ค๋ฉด, ์ด ์กฐ.

์šฐ๋ฆฌ๋Š” ๋ฌด์—‡์„ ์ฐพ๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?

์—ฌ๊ธฐ์— ์„ค๋ช…๋œ ์‹œ์Šคํ…œ์„ ์›ํ•˜๋Š” ๋Œ€๋กœ ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ฃผ๋ง ์—ฌํ–‰์ด๋‚˜ ๊ณ ํ–ฅ ํ‹ฐ์ผ“์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜์ต์„ฑ ์žˆ๋Š” ํ‹ฐ์ผ“์„ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๋ฉด ์„œ๋ฒ„์—์„œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ฐ„๋‹จํ•จ). ์„œ๋ฒ„, ํ•œ ๋‹ฌ์— 130 ๋ฃจ๋ธ”์˜ ๊ฒฝ์šฐ ๋งค์šฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.) ํ•˜๋ฃจ์— ํ•œ๋‘ ๋ฒˆ ์‹คํ–‰๋˜๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ๊ฒ€์ƒ‰๊ฒฐ๊ณผ๋Š” ์ด๋ฉ”์ผ๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์™€ ํ•จ๊ป˜ Excel ํŒŒ์ผ์„ Dropbox ํด๋”์— ์ €์žฅํ•˜์—ฌ ์–ธ์ œ ์–ด๋””์„œ๋‚˜ ์ด๋Ÿฌํ•œ ํŒŒ์ผ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋“  ๊ฒƒ์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
์•„์ง ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š” ๊ด€์„ธ๋ฅผ ์ฐพ์ง€ ๋ชปํ–ˆ๋Š”๋ฐ ๊ฐ€๋Šฅํ•  ๊ฒƒ ๊ฐ™์•„์š”

์ด๋ฏธ ์–ธ๊ธ‰ํ•œ ๋Œ€๋กœ ๊ฒ€์ƒ‰ ์‹œ "์œ ์—ฐํ•œ ๋‚ ์งœ"๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉฐ ์Šคํฌ๋ฆฝํŠธ๋Š” ์ง€์ •๋œ ๋‚ ์งœ๋กœ๋ถ€ํ„ฐ XNUMX์ผ ์ด๋‚ด์˜ ์ œ์•ˆ์„ ์ฐพ์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ๋งŒ ์ œ์•ˆ์„ ๊ฒ€์ƒ‰ํ•˜์ง€๋งŒ ์—ฌ๋Ÿฌ ๋น„ํ–‰ ๋ฐฉํ–ฅ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ˆ˜์ •ํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ๋„์›€์„ ๋ฐ›์œผ๋ฉด ์ž˜๋ชป๋œ ๊ด€์„ธ๋ฅผ ์ฐพ์„ ์ˆ˜๋„ ์žˆ์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ๋ฐœ๊ฒฌ์€ ๋งค์šฐ ํฅ๋ฏธ๋กœ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์™œ ๋‹ค๋ฅธ ์›น ์Šคํฌ๋ ˆ์ดํผ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ?

์ฒ˜์Œ ์›น ์Šคํฌ๋ž˜ํ•‘์„ ์‹œ์ž‘ํ–ˆ์„ ๋•Œ๋Š” ์†”์งํžˆ ๋ณ„ ๊ด€์‹ฌ์ด ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” ์˜ˆ์ธก ๋ชจ๋ธ๋ง, ์žฌ๋ฌด ๋ถ„์„ ๋ถ„์•ผ, ๊ทธ๋ฆฌ๊ณ  ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ํ…์ŠคํŠธ์˜ ๊ฐ์ •์  ์ƒ‰์ƒ ๋ถ„์„ ๋ถ„์•ผ์—์„œ ๋” ๋งŽ์€ ํ”„๋กœ์ ํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ์‹ถ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์›น์‚ฌ์ดํŠธ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์€ ๋งค์šฐ ํฅ๋ฏธ๋กœ์šด ์ผ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ฃผ์ œ๋ฅผ ํƒ๊ตฌํ•˜๋ฉด์„œ ์ €๋Š” ์›น ์Šคํฌ๋ž˜ํ•‘์ด ์ธํ„ฐ๋„ท์˜ "์—”์ง„"์ด๋ผ๋Š” ๊ฒƒ์„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค.

๋„ˆ๋ฌด ๋Œ€๋‹ดํ•œ ๋ฐœ์–ธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Google์ด Larry Page๊ฐ€ Java์™€ Python์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งŒ๋“  ์›น ์Šคํฌ๋ ˆ์ดํผ๋กœ ์‹œ์ž‘ํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ด ๋ณด์„ธ์š”. Google ๋กœ๋ด‡์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ตœ์ƒ์˜ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ์ธํ„ฐ๋„ท์„ ํƒ์ƒ‰ํ•ด ์™”์Šต๋‹ˆ๋‹ค. ์›น ์Šคํฌ๋ž˜ํ•‘์€ ์šฉ๋„๊ฐ€ ๋ฌด๊ถ๋ฌด์ง„ํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ๊ณผํ•™์˜ ๋‹ค๋ฅธ ๋ถ„์•ผ์— ๊ด€์‹ฌ์ด ์žˆ๋”๋ผ๋„ ๋ถ„์„์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์œผ๋ ค๋ฉด ์Šคํฌ๋ž˜ํ•‘ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ์—ฌ๊ธฐ์— ์‚ฌ์šฉ๋œ ๋ช‡ ๊ฐ€์ง€ ๊ธฐ์ˆ ์„ ํ›Œ๋ฅญํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ฑ… ์ตœ๊ทผ์— ๊ตฌ์ž…ํ•œ ์›น ์Šคํฌ๋ž˜ํ•‘์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ๋ฐฐ์šด ๋‚ด์šฉ์„ ์‹ค์ œ๋กœ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ์™€ ์•„์ด๋””์–ด๊ฐ€ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ reCaptcha ๊ฒ€์‚ฌ๋ฅผ ์šฐํšŒํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋งค์šฐ ํฅ๋ฏธ๋กœ์šด ์žฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํŠน๋ณ„ํ•œ ๋„๊ตฌ์™€ ์ „์ฒด ์„œ๋น„์Šค๊ฐ€ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์กฐ์ฐจ ๋ชฐ๋ž๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์€ ๋‚˜์—๊ฒŒ ์ƒˆ๋กœ์šด ์†Œ์‹์ด์—ˆ์Šต๋‹ˆ๋‹ค.

์—ฌํ–‰์„ ์ข‹์•„ํ•˜์‹œ๋‚˜์š”?!

์ด ์„น์…˜์˜ ์ œ๋ชฉ์— ์ œ์‹œ๋œ ๋‹จ์ˆœํ•˜๊ณ  ๋‹ค์†Œ ๋ฌดํ•ดํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์งˆ๋ฌธ์„ ๋ฐ›์€ ์‚ฌ๋žŒ์˜ ์—ฌํ–‰์—์„œ ์–ป์€ ๋ช‡ ๊ฐ€์ง€ ์ด์•ผ๊ธฐ์™€ ํ•จ๊ป˜ ๊ธ์ •์ ์ธ ๋Œ€๋‹ต์„ ์ž์ฃผ ๋“ค์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ ๋Œ€๋ถ€๋ถ„์€ ์—ฌํ–‰์ด ์ƒˆ๋กœ์šด ๋ฌธํ™” ํ™˜๊ฒฝ์— ๋ชฐ์ž…ํ•˜๊ณ  ์‹œ์•ผ๋ฅผ ๋„“ํž ์ˆ˜ ์žˆ๋Š” ์ข‹์€ ๋ฐฉ๋ฒ•์ด๋ผ๋Š” ๋ฐ ๋™์˜ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ˆ„๊ตฐ๊ฐ€์—๊ฒŒ ํ•ญ๊ณต๊ถŒ ๊ฒ€์ƒ‰์„ ์ข‹์•„ํ•˜๋Š”์ง€ ๋ฌป๋Š”๋‹ค๋ฉด ๊ทธ๋‹ค์ง€ ๊ธ์ •์ ์ธ ๋Œ€๋‹ต์€ ์•„๋‹ ๊ฒƒ์ด๋ผ๊ณ  ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค, Python์ด ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋ฅผ ๋„์™€์ค๋‹ˆ๋‹ค.

ํ•ญ๊ณต๊ถŒ ์ •๋ณด ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ณผ์ •์—์„œ ์šฐ๋ฆฌ๊ฐ€ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ์ฒซ ๋ฒˆ์งธ ์ž‘์—…์€ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•  ์ ํ•ฉํ•œ ํ”Œ๋žซํผ์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์‰ฝ์ง€ ์•Š์•˜์ง€๋งŒ ๊ฒฐ๊ตญ ์นด์•ฝ ์„œ๋น„์Šค๋ฅผ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. Momondo, Skyscanner, Expedia ๋“ฑ์˜ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•ด ๋ณด์•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•œ ๋กœ๋ด‡ ๋ณดํ˜ธ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ๋šซ์„ ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ์‹ ํ˜ธ๋“ฑ, ํšก๋‹จ๋ณด๋„, ์ž์ „๊ฑฐ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ ๋‚ด๊ฐ€ ์ธ๊ฐ„์ด๋ผ๋Š” ์‹œ์Šคํ…œ์„ ํ™•์‹ ์‹œํ‚ค๋ ค๊ณ  ์—ฌ๋Ÿฌ ๋ฒˆ ์‹œ๋„ํ•œ ๋์— ๋„ˆ๋ฌด ๋งŽ์€ ํŽ˜์ด์ง€๊ฐ€ ๋กœ๋“œ๋˜์–ด ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  Kayak์ด ๋‚˜์—๊ฒŒ ๊ฐ€์žฅ ์ ํ•ฉํ•˜๋‹ค๊ณ  ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ์งง์€ ์‹œ๊ฐ„ ์•ˆ์— ์ ๊ฒ€๋„ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. ์ €๋Š” ๋ด‡์ด 4~6์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์œผ๋กœ ์‚ฌ์ดํŠธ์— ์š”์ฒญ์„ ๋ณด๋‚ด๋„๋ก ๋งŒ๋“ค์—ˆ๊ณ  ๋ชจ๋“  ๊ฒƒ์ด ์ž˜ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋•Œ๋•Œ๋กœ Kayak์œผ๋กœ ์ž‘์—…ํ•  ๋•Œ ์–ด๋ ค์›€์ด ๋ฐœ์ƒํ•˜์ง€๋งŒ ์ˆ˜ํ‘œ๋กœ ๊ดด๋กญํžˆ๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด ์ˆ˜๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•œ ๋‹ค์Œ ๋ด‡์„ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜ ๋ช‡ ์‹œ๊ฐ„ ์ •๋„ ๊ธฐ๋‹ค๋ฆฌ๋ฉด ์ˆ˜ํ‘œ๊ฐ€ ์ค‘์ง€๋ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๋‹ค๋ฅธ ํ”Œ๋žซํผ์— ๋งž๊ฒŒ ์ฝ”๋“œ๋ฅผ ์‰ฝ๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ทธ๋ ‡๊ฒŒ ํ•˜๋Š” ๊ฒฝ์šฐ ๋Œ“๊ธ€๋กœ ์‹ ๊ณ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์›น ์Šคํฌ๋ž˜ํ•‘์„ ๋ง‰ ์‹œ์ž‘ํ–ˆ๊ณ  ์ผ๋ถ€ ์›น์‚ฌ์ดํŠธ๊ฐ€ ์™œ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ๋Š”์ง€ ๋ชจ๋ฅธ๋‹ค๋ฉด, ์ด ๋ถ„์•ผ์—์„œ ์ฒซ ๋ฒˆ์งธ ํ”„๋กœ์ ํŠธ๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— "์›น ์Šคํฌ๋ž˜ํ•‘ ์—ํ‹ฐ์ผ“"์ด๋ผ๋Š” ๋‹จ์–ด๋ฅผ Google์—์„œ ๊ฒ€์ƒ‰ํ•ด ๋ณด์„ธ์š”. . ํ˜„๋ช…ํ•˜์ง€ ์•Š๊ฒŒ ์›น ์Šคํฌ๋ž˜ํ•‘์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ์‹คํ—˜์ด ์ƒ๊ฐ๋ณด๋‹ค ๋นจ๋ฆฌ ์ข…๋ฃŒ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ

์›น ์Šคํฌ๋ž˜ํผ ์ฝ”๋“œ์—์„œ ์–ด๋–ค ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ๊ฐœ์š”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
  • Google Chrome ํƒญ์„ ์—ฝ๋‹ˆ๋‹ค.
  • ๋ด‡์„ ์‹œ์ž‘ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ํ‹ฐ์ผ“์„ ๊ฒ€์ƒ‰ํ•  ๋•Œ ์‚ฌ์šฉํ•  ๋„์‹œ์™€ ๋‚ ์งœ๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๊ธฐ๋Šฅ์€ ๊ฐ€์žฅ ์ข‹์€ ํ•ญ๋ชฉ์œผ๋กœ ์ •๋ ฌ๋œ ์ฒซ ๋ฒˆ์งธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜์—ฌ ๋” ๋งŽ์€ ๊ฒฐ๊ณผ๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ ๋‹ค๋ฅธ ํ•จ์ˆ˜๋Š” ์ „์ฒด ํŽ˜์ด์ง€์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์•ž์˜ ๋‘ ๋‹จ๊ณ„๋Š” ํ‹ฐ์ผ“ ๊ฐ€๊ฒฉ(์ €๋ ดํ•จ)๊ณผ ๋น„ํ–‰ ์†๋„(๊ฐ€์žฅ ๋น ๋ฆ„)๋ณ„๋กœ ์ •๋ ฌ ์œ ํ˜•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • ์Šคํฌ๋ฆฝํŠธ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ํ‹ฐ์ผ“ ๊ฐ€๊ฒฉ ์š”์•ฝ(์ตœ์ € ํ‹ฐ์ผ“ ๋ฐ ํ‰๊ท  ๊ฐ€๊ฒฉ)์ด ํฌํ•จ๋œ ์ด๋ฉ”์ผ์ด ์ „์†ก๋˜๋ฉฐ, ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ XNUMX๊ฐ€์ง€ ์ง€ํ‘œ๋ณ„๋กœ ์ •๋ ฌ๋œ ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ด Excel ํŒŒ์ผ๋กœ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
  • ์œ„์˜ ๋ชจ๋“  ์ž‘์—…์€ ์ง€์ •๋œ ๊ธฐ๊ฐ„์ด ์ง€๋‚˜๋ฉด ์ฃผ๊ธฐ๋กœ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  Selenium ํ”„๋กœ์ ํŠธ๋Š” ์›น ๋“œ๋ผ์ด๋ฒ„๋กœ ์‹œ์ž‘๋œ๋‹ค๋Š” ์ ์— ์œ ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ์‚ฌ์šฉํ•œ๋‹ค ํฌ๋กฌ๋“œ๋ผ์ด๋ฒ„, ์ €๋Š” Google Chrome์œผ๋กœ ์ž‘์—…ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ ์˜ต์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. PhantomJS์™€ Firefox๋„ ์ธ๊ธฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋“œ๋ผ์ด๋ฒ„๋ฅผ ๋‹ค์šด๋กœ๋“œํ•œ ํ›„ ํ•ด๋‹น ํด๋”์— ๋„ฃ์–ด์•ผ ์‚ฌ์šฉ ์ค€๋น„๊ฐ€ ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ์˜ ์ฒซ ๋ฒˆ์งธ ์ค„์€ ์ƒˆ Chrome ํƒญ์„ ์—ฝ๋‹ˆ๋‹ค.

๋‚ด ์ด์•ผ๊ธฐ์—์„œ ๋‚˜๋Š” ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ์ง€ํ‰์„ ์—ด๋ ค๊ณ  ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ๋Š” ์ ์„ ๋ช…์‹ฌํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Ÿฌํ•œ ์ œ์•ˆ์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ํ›จ์”ฌ ๋” ๋ฐœ์ „๋œ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์ด ์ž๋ฃŒ๋ฅผ ์ฝ๋Š” ๋…์ž๋“ค์—๊ฒŒ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

์œ„์—์„œ ์ด์•ผ๊ธฐํ•œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

from time import sleep, strftime
from random import randint
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import smtplib
from email.mime.multipart import MIMEMultipart

# ะ˜ัะฟะพะปัŒะทัƒะนั‚ะต ั‚ัƒั‚ ะฒะฐัˆ ะฟัƒั‚ัŒ ะบ chromedriver!
chromedriver_path = 'C:/{YOUR PATH HERE}/chromedriver_win32/chromedriver.exe'

driver = webdriver.Chrome(executable_path=chromedriver_path) # ะญั‚ะพะน ะบะพะผะฐะฝะดะพะน ะพั‚ะบั€ั‹ะฒะฐะตั‚ัั ะพะบะฝะพ Chrome
sleep(2)

์ฝ”๋“œ ์‹œ์ž‘ ๋ถ€๋ถ„์—์„œ ํ”„๋กœ์ ํŠธ ์ „์ฒด์—์„œ ์‚ฌ์šฉ๋˜๋Š” ํŒจํ‚ค์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ ๋ช…๋ น์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ, randint ์ƒˆ๋กœ์šด ๊ฒ€์ƒ‰ ์ž‘์—…์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์ž„์˜์˜ ์ดˆ ๋™์•ˆ ๋ด‡์„ "์ž ๋“ค๊ฒŒ" ๋งŒ๋“œ๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๊ฒƒ์ด ์—†์œผ๋ฉด ๋‹จ์ผ ๋ด‡์ด ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์œ„ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ด‡์ด ์‚ฌ์ดํŠธ ์ž‘์—…์— ์‚ฌ์šฉํ•  Chrome ์ฐฝ์ด ์—ด๋ฆฝ๋‹ˆ๋‹ค.

์•ฝ๊ฐ„์˜ ์‹คํ—˜์„ ํ•ด๋ณด๊ณ  ๋ณ„๋„์˜ ์ฐฝ์—์„œ kayak.com ์›น์‚ฌ์ดํŠธ๋ฅผ ์—ด์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋น„ํ–‰ํ•  ๋„์‹œ, ๋„์ฐฉํ•  ๋„์‹œ, ๋น„ํ–‰ ๋‚ ์งœ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๋‚ ์งœ๋ฅผ ์„ ํƒํ•  ๋•Œ +-3์ผ ๋ฒ”์œ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ๋‚˜๋Š” ๊ทธ๋Ÿฌํ•œ ์š”์ฒญ์— ๋Œ€ํ•œ ์‘๋‹ต์œผ๋กœ ์‚ฌ์ดํŠธ๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํŠน์ • ๋‚ ์งœ์˜ ํ‹ฐ์ผ“๋งŒ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค๋ฉด ๋ด‡ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•  ๋•Œ ์ ์ ˆํ•œ ์„ค๋ช…์„ ํ•ด์ฃผ๊ณ  ์žˆ๋Š”๋ฐ, ํ—ท๊ฐˆ๋ฆฌ์‹œ๋ฉด ์•Œ๋ ค์ฃผ์„ธ์š”.

์ด์ œ ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜๊ณ  ์ฃผ์†Œ ํ‘œ์‹œ์ค„์— ์žˆ๋Š” ๋งํฌ๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”. ๋ณ€์ˆ˜๊ฐ€ ์„ ์–ธ๋œ ์•„๋ž˜ ์˜ˆ์ œ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋งํฌ์™€ ์œ ์‚ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. kayak, URL์„ ์ €์žฅํ•˜๊ณ  ๋ฉ”์†Œ๋“œ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. get ์›น ๋“œ๋ผ์ด๋ฒ„. ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ํŽ˜์ด์ง€์— ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
๋‚ด๊ฐ€ ๋ช…๋ น์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ get ๋ช‡ ๋ถ„ ์•ˆ์— ๋‘์„ธ ๋ฒˆ ์ด์ƒ reCaptcha๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ์ฆ์„ ์™„๋ฃŒํ•˜๋ผ๋Š” ์š”์ฒญ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ์ด ๊ฒ€์‚ฌ๋ฅผ ์ˆ˜๋™์œผ๋กœ ํ†ต๊ณผํ•˜๊ณ  ์‹œ์Šคํ…œ์ด ์ƒˆ ๊ฒ€์‚ฌ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•  ๋•Œ๊นŒ์ง€ ์‹คํ—˜์„ ๊ณ„์†ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ…Œ์ŠคํŠธํ–ˆ์„ ๋•Œ ์ฒซ ๋ฒˆ์งธ ๊ฒ€์ƒ‰ ์„ธ์…˜์€ ํ•ญ์ƒ ์ˆœ์กฐ๋กญ๊ฒŒ ์ง„ํ–‰๋œ ๊ฒƒ ๊ฐ™์•˜์œผ๋ฏ€๋กœ ์ฝ”๋“œ๋ฅผ ์‹คํ—˜ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๊ฒ€์ƒ‰ ์„ธ์…˜ ์‚ฌ์ด์— ๊ธด ๊ฐ„๊ฒฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ˆ˜๋™์œผ๋กœ ํ™•์ธํ•˜๊ณ  ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ƒ๊ฐํ•ด ๋ณด๋ฉด ๊ฒ€์ƒ‰ ์ž‘์—… ์‚ฌ์ด์— 10๋ถ„ ๊ฐ„๊ฒฉ์œผ๋กœ ๋ฐ›์€ ํ‹ฐ์ผ“ ๊ฐ€๊ฒฉ์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์‚ฌ๋žŒ์—๊ฒŒ ํ•„์š”ํ•˜์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

XPath๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŽ˜์ด์ง€ ์ž‘์—…ํ•˜๊ธฐ

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ์ฐฝ์„ ์—ด๊ณ  ์‚ฌ์ดํŠธ๋ฅผ ๋กœ๋“œํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€๊ฒฉ ๋ฐ ๊ธฐํƒ€ ์ •๋ณด๋ฅผ ์–ป์œผ๋ ค๋ฉด XPath ๊ธฐ์ˆ ์ด๋‚˜ CSS ์„ ํƒ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” XPath๋ฅผ ๊ณ„์† ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๊ณ  CSS ์„ ํƒ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•  ํ•„์š”์„ฑ์„ ๋Š๋ผ์ง€ ๋ชปํ–ˆ์ง€๋งŒ ๊ทธ๋Ÿฐ ์‹์œผ๋กœ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์€ ๊ฝค ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. XPath๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŽ˜์ด์ง€๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๊ฒƒ์€ ๊นŒ๋‹ค๋กœ์šธ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ œ๊ฐ€ ์„ค๋ช…ํ•œ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜๋”๋ผ๋„ ์ด ํŽ˜์ด์ง€ ์ฝ”๋“œ์—์„œ ํ•ด๋‹น ์‹๋ณ„์ž๋ฅผ ๋ณต์‚ฌํ•˜๋Š” ๊ฒƒ๊ณผ ๊ด€๋ จ๋œ ๊ธฐ์‚ฌ๋ฅผ ์ž‘์„ฑํ•˜๋ฉด์„œ ์ด๊ฒƒ์ด ์‹ค์ œ๋กœ ํ•„์š”ํ•œ ์š”์†Œ์— ์•ก์„ธ์Šคํ•˜๋Š” ์ตœ์ ์˜ ๋ฐฉ๋ฒ•์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฑด ๊ทธ๋ ‡๊ณ , ์ด ์ด ์ฑ…์€ XPath ๋ฐ CSS ์„ ํƒ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŽ˜์ด์ง€ ์ž‘์—…์˜ ๊ธฐ๋ณธ ์‚ฌํ•ญ์— ๋Œ€ํ•œ ํ›Œ๋ฅญํ•œ ์„ค๋ช…์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ์›น ๋“œ๋ผ์ด๋ฒ„ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
์ด์ œ ๋ด‡ ์ž‘์—…์„ ๊ณ„์†ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ”„๋กœ๊ทธ๋žจ์˜ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ์ €๋ ดํ•œ ํ‹ฐ์ผ“์„ ์„ ํƒํ•ด ๋ด…์‹œ๋‹ค. ๋‹ค์Œ ์ด๋ฏธ์ง€์—์„œ๋Š” XPath ์„ ํƒ๊ธฐ ์ฝ”๋“œ๊ฐ€ ๋นจ๊ฐ„์ƒ‰์œผ๋กœ ๊ฐ•์กฐ ํ‘œ์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ฅผ ๋ณด๋ ค๋ฉด ๊ด€์‹ฌ ์žˆ๋Š” ํŽ˜์ด์ง€ ์š”์†Œ๋ฅผ ๋งˆ์šฐ์Šค ์˜ค๋ฅธ์ชฝ ๋ฒ„ํŠผ์œผ๋กœ ํด๋ฆญํ•˜๊ณ  ๋‚˜ํƒ€๋‚˜๋Š” ๋ฉ”๋‰ด์—์„œ ๊ฒ€์‚ฌ ๋ช…๋ น์„ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ช…๋ น์€ ๋‹ค์–‘ํ•œ ํŽ˜์ด์ง€ ์š”์†Œ์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ํ•ด๋‹น ์š”์†Œ์˜ ์ฝ”๋“œ๋Š” ์ฝ”๋“œ ๋ทฐ์–ด์— ํ‘œ์‹œ๋˜๊ณ  ๊ฐ•์กฐ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
ํŽ˜์ด์ง€ ์ฝ”๋“œ ๋ณด๊ธฐ

์ฝ”๋“œ์—์„œ ์„ ํƒ๊ธฐ๋ฅผ ๋ณต์‚ฌํ•  ๋•Œ์˜ ๋‹จ์ ์— ๋Œ€ํ•œ ๋‚ด ์ถ”๋ก ์„ ํ™•์ธํ•˜๋ ค๋ฉด ๋‹ค์Œ ๊ธฐ๋Šฅ์— ์ฃผ์˜ํ•˜์„ธ์š”.

์ฝ”๋“œ๋ฅผ ๋ณต์‚ฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.

//*[@id="wtKI-price_aTab"]/div[1]/div/div/div[1]/div/span/span

์ด์™€ ๊ฐ™์€ ๊ฒƒ์„ ๋ณต์‚ฌํ•˜๋ ค๋ฉด ๊ด€์‹ฌ ์žˆ๋Š” ์ฝ”๋“œ ์„น์…˜์„ ๋งˆ์šฐ์Šค ์˜ค๋ฅธ์ชฝ ๋ฒ„ํŠผ์œผ๋กœ ํด๋ฆญํ•˜๊ณ  ๋‚˜ํƒ€๋‚˜๋Š” ๋ฉ”๋‰ด์—์„œ ๋ณต์‚ฌ > XPath ๋ณต์‚ฌ ๋ช…๋ น์„ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์ €๋ ดํ•œ ๋ฒ„ํŠผ์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

cheap_results = โ€˜//a[@data-code = "price"]โ€™

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
๋ณต์‚ฌ ๋ช…๋ น > XPath ๋ณต์‚ฌ

๋‘ ๋ฒˆ์งธ ์˜ต์…˜์ด ํ›จ์”ฌ ๋” ๋‹จ์ˆœํ•ด ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋˜๋ฉด ์†์„ฑ์ด ์žˆ๋Š” ์š”์†Œ a๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. data-code, ๋™์ผ price. ์ฒซ ๋ฒˆ์งธ ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ์š”์†Œ๊ฐ€ ๊ฒ€์ƒ‰๋ฉ๋‹ˆ๋‹ค. id ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค wtKI-price_aTab, ์š”์†Œ์— ๋Œ€ํ•œ XPath ๊ฒฝ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. /div[1]/div/div/div[1]/div/span/span. ํŽ˜์ด์ง€์— ๋Œ€ํ•œ ์ด์™€ ๊ฐ™์€ XPath ์ฟผ๋ฆฌ๋Š” ํŠธ๋ฆญ์„ ์ˆ˜ํ–‰ํ•˜์ง€๋งŒ ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ ๋‹น์žฅ์€ ๊ทธ๋ ‡๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค id ๋‹ค์Œ์— ํŽ˜์ด์ง€๊ฐ€ ๋กœ๋“œ๋  ๋•Œ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค. ๋ฌธ์ž ์ˆœ์„œ wtKI ํŽ˜์ด์ง€๊ฐ€ ๋กœ๋“œ๋  ๋•Œ๋งˆ๋‹ค ๋™์ ์œผ๋กœ ๋ณ€๊ฒฝ๋˜๋ฏ€๋กœ ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋Š” ๋‹ค์Œ ํŽ˜์ด์ง€๋ฅผ ๋‹ค์‹œ ๋กœ๋“œํ•œ ํ›„์—๋Š” ์“ธ๋ชจ๊ฐ€ ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ XPath๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์„ ํˆฌ์žํ•˜์‹ญ์‹œ์˜ค. ์ด ์ง€์‹์€ ๋‹น์‹ ์—๊ฒŒ ํฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ XPath ์„ ํƒ๊ธฐ๋ฅผ ๋ณต์‚ฌํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๊ฐ„๋‹จํ•œ ์‚ฌ์ดํŠธ์—์„œ ์ž‘์—…ํ•  ๋•Œ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด์— ์ต์ˆ™ํ•˜๋‹ค๋ฉด ์•„๋ฌด๋Ÿฐ ๋ฌธ์ œ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

์ด์ œ ๋ชจ๋“  ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ชฉ๋ก ๋‚ด์—์„œ ์—ฌ๋Ÿฌ ์ค„๋กœ ๊ฐ€์ ธ์™€์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋Š”์ง€ ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๊ฒฐ๊ณผ๋Š” ํด๋ž˜์Šค๊ฐ€ ์žˆ๋Š” ๊ฐ์ฒด ๋‚ด๋ถ€์— ์žˆ์Šต๋‹ˆ๋‹ค. resultWrapper. ๋ชจ๋“  ๊ฒฐ๊ณผ ๋กœ๋“œ๋Š” ์•„๋ž˜ ํ‘œ์‹œ๋œ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ๋ฃจํ”„์—์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์œ„์˜ ๋‚ด์šฉ์„ ์ดํ•ดํ–ˆ๋‹ค๋ฉด ์šฐ๋ฆฌ๊ฐ€ ๋ถ„์„ํ•  ๋Œ€๋ถ€๋ถ„์˜ ์ฝ”๋“œ๋ฅผ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๋Š” ์ ์— ์œ ์˜ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๊ฐ€ ์‹คํ–‰๋˜๋ฉด ์ผ์ข…์˜ ๊ฒฝ๋กœ ์ง€์ • ๋ฉ”์ปค๋‹ˆ์ฆ˜(XPath)์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•„์š”ํ•œ ๊ฒƒ(์‹ค์ œ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋ž˜ํ•‘๋˜๋Š” ์š”์†Œ)์— ์•ก์„ธ์Šคํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์š”์†Œ์˜ ํ…์ŠคํŠธ๋ฅผ ๊ฐ€์ ธ์™€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๊ฐœ์ฒด์— ๋ฐฐ์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค(์ฒ˜์Œ ์‚ฌ์šฉ๋จ). flight_containers, ๊ทธ ๋‹ค์Œ์— - flights_list).

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
์ฒ˜์Œ ์„ธ ์ค„์ด ํ‘œ์‹œ๋˜๋ฉฐ ํ•„์š”ํ•œ ๋ชจ๋“  ๊ฒƒ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ •๋ณด๋ฅผ ์–ป๋Š” ๋” ํฅ๋ฏธ๋กœ์šด ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์š”์†Œ์—์„œ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ผํ•˜๋Ÿฌ ๊ฐ€๋‹ค!

ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ ์ถ”๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ๋กœ๋“œํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฒ€์‚ฌ๋กœ ์ด์–ด์ง€๋Š” ์„œ๋น„์Šค์— ๋Œ€ํ•œ ์˜ํ˜น์„ ์ œ๊ธฐํ•˜์ง€ ์•Š๊ณ  ํ”„๋กœ๊ทธ๋žจ์ด ์ •๋ณด๋ฅผ ๋ฐ›๋Š” ํ•ญ๊ณตํŽธ ์ˆ˜๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๊ณ  ์‹ถ์–ด์„œ ํŽ˜์ด์ง€๊ฐ€ ํ‘œ์‹œ๋  ๋•Œ๋งˆ๋‹ค ๊ฒฐ๊ณผ ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์„ ํ•œ ๋ฒˆ์”ฉ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ์—์„œ๋Š” ๋ธ”๋ก์— ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. try, ๊ฐ€๋” ๋ฒ„ํŠผ์ด ์ œ๋Œ€๋กœ ๋กœ๋“œ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋„ ๋ฐœ์ƒํ•˜๋ฉด ํ•จ์ˆ˜ ์ฝ”๋“œ์—์„œ ์ด ํ•จ์ˆ˜์— ๋Œ€ํ•œ ํ˜ธ์ถœ์„ ์ฃผ์„ ์ฒ˜๋ฆฌํ•˜์„ธ์š”. start_kayak, ์•„๋ž˜์—์„œ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# ะ—ะฐะณั€ัƒะทะบะฐ ะฑะพะปัŒัˆะตะณะพ ะบะพะปะธั‡ะตัั‚ะฒะฐ ั€ะตะทัƒะปัŒั‚ะฐั‚ะพะฒ ะดะปั ั‚ะพะณะพ, ั‡ั‚ะพะฑั‹ ะผะฐะบัะธะผะธะทะธั€ะพะฒะฐั‚ัŒ ะพะฑัŠั‘ะผ ัะพะฑะธั€ะฐะตะผั‹ั… ะดะฐะฝะฝั‹ั…
def load_more():
    try:
        more_results = '//a[@class = "moreButton"]'
        driver.find_element_by_xpath(more_results).click()
        # ะ’ั‹ะฒะพะด ัั‚ะธั… ะทะฐะผะตั‚ะพะบ ะฒ ั…ะพะดะต ั€ะฐะฑะพั‚ั‹ ะฟั€ะพะณั€ะฐะผะผั‹ ะฟะพะผะพะณะฐะตั‚ ะผะฝะต ะฑั‹ัั‚ั€ะพ ะฒั‹ััะฝะธั‚ัŒ ั‚ะพ, ั‡ะตะผ ะพะฝะฐ ะทะฐะฝัั‚ะฐ
        print('sleeping.....')
        sleep(randint(45,60))
    except:
        pass

์ด์ œ ์ด ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์˜ค๋žœ ๋ถ„์„์„ ๋งˆ์นœ ํ›„(๋•Œ๋•Œ๋กœ ํฅ๋ถ„ํ•  ์ˆ˜๋„ ์žˆ์Œ) ํŽ˜์ด์ง€๋ฅผ ๊ธ์–ด์˜ค๋Š” ํ•จ์ˆ˜๋ฅผ ์„ ์–ธํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ์ด๋ฏธ ๋‹ค์Œ ํ•จ์ˆ˜์— ํ•„์š”ํ•œ ๋Œ€๋ถ€๋ถ„์„ ์ˆ˜์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค. page_scrape. ๋ฐ˜ํ™˜๋œ ๊ฒฝ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•ฉ์ณ์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณ€์ˆ˜๋ฅผ ์ฒ˜์Œ ์‚ฌ์šฉํ•  ๋•Œ section_a_list ะธ section_b_list. ์šฐ๋ฆฌ ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. flights_df์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์ •๋ ฌ ๋ฐฉ๋ฒ•์—์„œ ์–ป์€ ๊ฒฐ๊ณผ๋ฅผ ๋ถ„๋ฆฌํ•˜๊ณ  ๋‚˜์ค‘์— ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

def page_scrape():
    """This function takes care of the scraping part"""
    
    xp_sections = '//*[@class="section duration"]'
    sections = driver.find_elements_by_xpath(xp_sections)
    sections_list = [value.text for value in sections]
    section_a_list = sections_list[::2] # ั‚ะฐะบ ะผั‹ ั€ะฐะทะดะตะปัะตะผ ะธะฝั„ะพั€ะผะฐั†ะธัŽ ะพ ะดะฒัƒั… ะฟะพะปั‘ั‚ะฐั…
    section_b_list = sections_list[1::2]
    
    # ะ•ัะปะธ ะฒั‹ ะฝะฐั‚ะบะฝัƒะปะธััŒ ะฝะฐ reCaptcha, ะฒะฐะผ ะผะพะถะตั‚ ะฟะพะฝะฐะดะพะฑะธั‚ัŒัั ั‡ั‚ะพ-ั‚ะพ ะฟั€ะตะดะฟั€ะธะฝัั‚ัŒ.
    # ะž ั‚ะพะผ, ั‡ั‚ะพ ั‡ั‚ะพ-ั‚ะพ ะฟะพัˆะปะพ ะฝะต ั‚ะฐะบ, ะฒั‹ ัƒะทะฝะฐะตั‚ะต ะธัั…ะพะดั ะธะท ั‚ะพะณะพ, ั‡ั‚ะพ ะฒั‹ัˆะตะฟั€ะธะฒะตะดั‘ะฝะฝั‹ะต ัะฟะธัะบะธ ะฟัƒัั‚ั‹
    # ัั‚ะพ ะฒั‹ั€ะฐะถะตะฝะธะต if ะฟะพะทะฒะพะปัะตั‚ ะทะฐะฒะตั€ัˆะธั‚ัŒ ั€ะฐะฑะพั‚ัƒ ะฟั€ะพะณั€ะฐะผะผั‹ ะธะปะธ ัะดะตะปะฐั‚ัŒ ะตั‰ั‘ ั‡ั‚ะพ-ะฝะธะฑัƒะดัŒ
    # ั‚ัƒั‚ ะผะพะถะฝะพ ะฟั€ะธะพัั‚ะฐะฝะพะฒะธั‚ัŒ ั€ะฐะฑะพั‚ัƒ, ั‡ั‚ะพ ะฟะพะทะฒะพะปะธั‚ ะฒะฐะผ ะฟั€ะพะนั‚ะธ ะฟั€ะพะฒะตั€ะบัƒ ะธ ะฟั€ะพะดะพะปะถะธั‚ัŒ ัะบั€ะฐะฟะธะฝะณ
    # ั ะธัะฟะพะปัŒะทัƒัŽ ั‚ัƒั‚ SystemExit ั‚ะฐะบ ะบะฐะบ ั…ะพั‡ัƒ ะฟั€ะพั‚ะตัั‚ะธั€ะพะฒะฐั‚ัŒ ะฒัั‘ ั ัะฐะผะพะณะพ ะฝะฐั‡ะฐะปะฐ
    if section_a_list == []:
        raise SystemExit
    
    # ะฏ ะฑัƒะดัƒ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะฑัƒะบะฒัƒ A ะดะปั ัƒั…ะพะดัั‰ะธั… ั€ะตะนัะพะฒ ะธ B ะดะปั ะฟั€ะธะฑั‹ะฒะฐัŽั‰ะธั…
    a_duration = []
    a_section_names = []
    for n in section_a_list:
        # ะŸะพะปัƒั‡ะฐะตะผ ะฒั€ะตะผั
        a_section_names.append(''.join(n.split()[2:5]))
        a_duration.append(''.join(n.split()[0:2]))
    b_duration = []
    b_section_names = []
    for n in section_b_list:
        # ะŸะพะปัƒั‡ะฐะตะผ ะฒั€ะตะผั
        b_section_names.append(''.join(n.split()[2:5]))
        b_duration.append(''.join(n.split()[0:2]))

    xp_dates = '//div[@class="section date"]'
    dates = driver.find_elements_by_xpath(xp_dates)
    dates_list = [value.text for value in dates]
    a_date_list = dates_list[::2]
    b_date_list = dates_list[1::2]
    # ะŸะพะปัƒั‡ะฐะตะผ ะดะตะฝัŒ ะฝะตะดะตะปะธ
    a_day = [value.split()[0] for value in a_date_list]
    a_weekday = [value.split()[1] for value in a_date_list]
    b_day = [value.split()[0] for value in b_date_list]
    b_weekday = [value.split()[1] for value in b_date_list]
    
    # ะŸะพะปัƒั‡ะฐะตะผ ั†ะตะฝั‹
    xp_prices = '//a[@class="booking-link"]/span[@class="price option-text"]'
    prices = driver.find_elements_by_xpath(xp_prices)
    prices_list = [price.text.replace('$','') for price in prices if price.text != '']
    prices_list = list(map(int, prices_list))

    # stops - ัั‚ะพ ะฑะพะปัŒัˆะพะน ัะฟะธัะพะบ, ะฒ ะบะพั‚ะพั€ะพะผ ะฟะตั€ะฒั‹ะน ั„ั€ะฐะณะผะตะฝั‚ ะฟัƒั‚ะธ ะฝะฐั…ะพะดะธั‚ัั ะฟะพ ั‡ั‘ั‚ะฝะพะผัƒ ะธะฝะดะตะบััƒ, ะฐ ะฒั‚ะพั€ะพะน - ะฟะพ ะฝะตั‡ั‘ั‚ะฝะพะผัƒ
    xp_stops = '//div[@class="section stops"]/div[1]'
    stops = driver.find_elements_by_xpath(xp_stops)
    stops_list = [stop.text[0].replace('n','0') for stop in stops]
    a_stop_list = stops_list[::2]
    b_stop_list = stops_list[1::2]

    xp_stops_cities = '//div[@class="section stops"]/div[2]'
    stops_cities = driver.find_elements_by_xpath(xp_stops_cities)
    stops_cities_list = [stop.text for stop in stops_cities]
    a_stop_name_list = stops_cities_list[::2]
    b_stop_name_list = stops_cities_list[1::2]
    
    # ัะฒะตะดะตะฝะธั ะพ ะบะพะผะฟะฐะฝะธะธ-ะฟะตั€ะตะฒะพะทั‡ะธะบะต, ะฒั€ะตะผั ะพั‚ะฟั€ะฐะฒะปะตะฝะธั ะธ ะฟั€ะธะฑั‹ั‚ะธั ะดะปั ะพะฑะพะธั… ั€ะตะนัะพะฒ
    xp_schedule = '//div[@class="section times"]'
    schedules = driver.find_elements_by_xpath(xp_schedule)
    hours_list = []
    carrier_list = []
    for schedule in schedules:
        hours_list.append(schedule.text.split('n')[0])
        carrier_list.append(schedule.text.split('n')[1])
    # ั€ะฐะทะดะตะปัะตะผ ัะฒะตะดะตะฝะธั ะพ ะฒั€ะตะผะตะฝะธ ะธ ะพ ะฟะตั€ะตะฒะพะทั‡ะธะบะฐั… ะผะตะถะดัƒ ั€ะตะนัะฐะผะธ a ะธ b
    a_hours = hours_list[::2]
    a_carrier = carrier_list[1::2]
    b_hours = hours_list[::2]
    b_carrier = carrier_list[1::2]

    
    cols = (['Out Day', 'Out Time', 'Out Weekday', 'Out Airline', 'Out Cities', 'Out Duration', 'Out Stops', 'Out Stop Cities',
            'Return Day', 'Return Time', 'Return Weekday', 'Return Airline', 'Return Cities', 'Return Duration', 'Return Stops', 'Return Stop Cities',
            'Price'])

    flights_df = pd.DataFrame({'Out Day': a_day,
                               'Out Weekday': a_weekday,
                               'Out Duration': a_duration,
                               'Out Cities': a_section_names,
                               'Return Day': b_day,
                               'Return Weekday': b_weekday,
                               'Return Duration': b_duration,
                               'Return Cities': b_section_names,
                               'Out Stops': a_stop_list,
                               'Out Stop Cities': a_stop_name_list,
                               'Return Stops': b_stop_list,
                               'Return Stop Cities': b_stop_name_list,
                               'Out Time': a_hours,
                               'Out Airline': a_carrier,
                               'Return Time': b_hours,
                               'Return Airline': b_carrier,                           
                               'Price': prices_list})[cols]
    
    flights_df['timestamp'] = strftime("%Y%m%d-%H%M") # ะฒั€ะตะผั ัะฑะพั€ะฐ ะดะฐะฝะฝั‹ั…
    return flights_df

์ฝ”๋“œ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๋„๋ก ๋ณ€์ˆ˜ ์ด๋ฆ„์„ ์ง€์ •ํ•˜๋ ค๊ณ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ณ€์ˆ˜๋ฅผ ๊ธฐ์–ตํ•˜์„ธ์š”. a ๊ฒฝ๋กœ์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์— ์†ํ•˜๋ฉฐ, b - ๋‘ ๋ฒˆ์งธ. ๋‹ค์Œ ๊ธฐ๋Šฅ์œผ๋กœ ๋„˜์–ด ๊ฐ‘์‹œ๋‹ค.

์ง€์› ๋ฉ”์ปค๋‹ˆ์ฆ˜

์ด์ œ ์ถ”๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ๊ณผ ํ•ด๋‹น ๊ฒฐ๊ณผ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ธฐ๋Šฅ์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค. ์ด ๋‘ ๊ธฐ๋Šฅ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์—ด ์ˆ˜ ์žˆ๋Š” ํŽ˜์ด์ง€๋ฅผ ์Šคํฌ๋žฉํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฏ€๋กœ ์ด ๊ธฐ์‚ฌ๋Š” ์—ฌ๊ธฐ์„œ ๋๋‚ฌ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ฆฌ๋Š” ์œ„์—์„œ ๋…ผ์˜ํ•œ ๋ณด์กฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ค‘ ์ผ๋ถ€๋ฅผ ์•„์ง ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด๊ฒƒ์€ ์ด๋ฉ”์ผ ๋ฐ ๊ธฐํƒ€ ์‚ฌํ•ญ์„ ๋ณด๋‚ด๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋“  ๊ฒƒ์€ ํ•จ์ˆ˜์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค start_kayak, ์ด์ œ ์šฐ๋ฆฌ๊ฐ€ ๊ณ ๋ คํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๊ธฐ๋Šฅ์—๋Š” ๋„์‹œ์™€ ๋‚ ์งœ์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€์ˆ˜์— ๋งํฌ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. kayak, ๊ฒ€์ƒ‰์–ด์™€ ๊ฐ€์žฅ ์ผ์น˜ํ•˜๋Š” ํ•ญ๋ชฉ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ๋œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ํฌํ•จ๋œ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์Šคํฌ๋ž˜ํ•‘ ์„ธ์…˜์ด ๋๋‚˜๋ฉด ํŽ˜์ด์ง€ ์ƒ๋‹จ์˜ ํ‘œ์— ์žˆ๋Š” ๊ฐ€๊ฒฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ตœ์ € ํ‹ฐ์ผ“ ๊ฐ€๊ฒฉ๊ณผ ํ‰๊ท  ๊ฐ€๊ฒฉ์„ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋“  ๊ฒƒ์€ ์‚ฌ์ดํŠธ์—์„œ ๋ฐœํ–‰ํ•œ ์˜ˆ์ธก๊ณผ ํ•จ๊ป˜ ์ด๋ฉ”์ผ๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค. ํŽ˜์ด์ง€์—์„œ ํ•ด๋‹น ํ…Œ์ด๋ธ”์€ ์™ผ์ชฝ ์ƒ๋‹จ์— ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…ํ•˜๋ฉด ์ •ํ™•ํ•œ ๋‚ ์งœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•  ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ํ…Œ์ด๋ธ”์ด ํŽ˜์ด์ง€์— ํ‘œ์‹œ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

def start_kayak(city_from, city_to, date_start, date_end):
    """City codes - it's the IATA codes!
    Date format -  YYYY-MM-DD"""
    
    kayak = ('https://www.kayak.com/flights/' + city_from + '-' + city_to +
             '/' + date_start + '-flexible/' + date_end + '-flexible?sort=bestflight_a')
    driver.get(kayak)
    sleep(randint(8,10))
    
    # ะธะฝะพะณะดะฐ ะฟะพัะฒะปัะตั‚ัั ะฒัะฟะปั‹ะฒะฐัŽั‰ะตะต ะพะบะฝะพ, ะดะปั ะฟั€ะพะฒะตั€ะบะธ ะฝะฐ ัั‚ะพ ะธ ะตะณะพ ะทะฐะบั€ั‹ั‚ะธั ะผะพะถะฝะพ ะฒะพัะฟะพะปัŒะทะพะฒะฐั‚ัŒัั ะฑะปะพะบะพะผ try
    try:
        xp_popup_close = '//button[contains(@id,"dialog-close") and contains(@class,"Button-No-Standard-Style close ")]'
        driver.find_elements_by_xpath(xp_popup_close)[5].click()
    except Exception as e:
        pass
    sleep(randint(60,95))
    print('loading more.....')
    
#     load_more()
    
    print('starting first scrape.....')
    df_flights_best = page_scrape()
    df_flights_best['sort'] = 'best'
    sleep(randint(60,80))
    
    # ะ’ะพะทัŒะผั‘ะผ ัะฐะผัƒัŽ ะฝะธะทะบัƒัŽ ั†ะตะฝัƒ ะธะท ั‚ะฐะฑะปะธั†ั‹, ั€ะฐัะฟะพะปะพะถะตะฝะฝะพะน ะฒ ะฒะตั€ั…ะฝะตะน ั‡ะฐัั‚ะธ ัั‚ั€ะฐะฝะธั†ั‹
    matrix = driver.find_elements_by_xpath('//*[contains(@id,"FlexMatrixCell")]')
    matrix_prices = [price.text.replace('$','') for price in matrix]
    matrix_prices = list(map(int, matrix_prices))
    matrix_min = min(matrix_prices)
    matrix_avg = sum(matrix_prices)/len(matrix_prices)
    
    print('switching to cheapest results.....')
    cheap_results = '//a[@data-code = "price"]'
    driver.find_element_by_xpath(cheap_results).click()
    sleep(randint(60,90))
    print('loading more.....')
    
#     load_more()
    
    print('starting second scrape.....')
    df_flights_cheap = page_scrape()
    df_flights_cheap['sort'] = 'cheap'
    sleep(randint(60,80))
    
    print('switching to quickest results.....')
    quick_results = '//a[@data-code = "duration"]'
    driver.find_element_by_xpath(quick_results).click()  
    sleep(randint(60,90))
    print('loading more.....')
    
#     load_more()
    
    print('starting third scrape.....')
    df_flights_fast = page_scrape()
    df_flights_fast['sort'] = 'fast'
    sleep(randint(60,80))
    
    # ะกะพั…ั€ะฐะฝะตะฝะธะต ะฝะพะฒะพะณะพ ั„ั€ะตะนะผะฐ ะฒ Excel-ั„ะฐะนะป, ะธะผั ะบะพั‚ะพั€ะพะณะพ ะพั‚ั€ะฐะถะฐะตั‚ ะณะพั€ะพะดะฐ ะธ ะดะฐั‚ั‹
    final_df = df_flights_cheap.append(df_flights_best).append(df_flights_fast)
    final_df.to_excel('search_backups//{}_flights_{}-{}_from_{}_to_{}.xlsx'.format(strftime("%Y%m%d-%H%M"),
                                                                                   city_from, city_to, 
                                                                                   date_start, date_end), index=False)
    print('saved df.....')
    
    # ะœะพะถะฝะพ ัะปะตะดะธั‚ัŒ ะทะฐ ั‚ะตะผ, ะบะฐะบ ะฟั€ะพะณะฝะพะท, ะฒั‹ะดะฐะฒะฐะตะผั‹ะน ัะฐะนั‚ะพะผ, ัะพะพั‚ะฝะพัะธั‚ัั ั ั€ะตะฐะปัŒะฝะพัั‚ัŒัŽ
    xp_loading = '//div[contains(@id,"advice")]'
    loading = driver.find_element_by_xpath(xp_loading).text
    xp_prediction = '//span[@class="info-text"]'
    prediction = driver.find_element_by_xpath(xp_prediction).text
    print(loading+'n'+prediction)
    
    # ะธะฝะพะณะดะฐ ะฒ ะฟะตั€ะตะผะตะฝะฝะพะน loading ะพะบะฐะทั‹ะฒะฐะตั‚ัั ัั‚ะฐ ัั‚ั€ะพะบะฐ, ะบะพั‚ะพั€ะฐั, ะฟะพะทะถะต, ะฒั‹ะทั‹ะฒะฐะตั‚ ะฟั€ะพะฑะปะตะผั‹ ั ะพั‚ะฟั€ะฐะฒะบะพะน ะฟะธััŒะผะฐ
    # ะตัะปะธ ัั‚ะพ ะฟั€ะพะทะพัˆะปะพ - ะผะตะฝัะตะผ ะตั‘ ะฝะฐ "Not Sure"
    weird = 'ยฏ_(ใƒ„)_/ยฏ'
    if loading == weird:
        loading = 'Not sure'
    
    username = '[email protected]'
    password = 'YOUR PASSWORD'

    server = smtplib.SMTP('smtp.outlook.com', 587)
    server.ehlo()
    server.starttls()
    server.login(username, password)
    msg = ('Subject: Flight Scrapernn
Cheapest Flight: {}nAverage Price: {}nnRecommendation: {}nnEnd of message'.format(matrix_min, matrix_avg, (loading+'n'+prediction)))
    message = MIMEMultipart()
    message['From'] = '[email protected]'
    message['to'] = '[email protected]'
    server.sendmail('[email protected]', '[email protected]', msg)
    print('sent email.....')

์ €๋Š” Outlook ๊ณ„์ •(hotmail.com)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค. Gmail ๊ณ„์ •์—์„œ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜๋Š”์ง€ ํ…Œ์ŠคํŠธํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ด ์ด๋ฉ”์ผ ์‹œ์Šคํ…œ์€ ๊ฝค ์ธ๊ธฐ๊ฐ€ ์žˆ์ง€๋งŒ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜์ด ๋งŽ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Hotmail ๊ณ„์ •์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋ชจ๋“  ๊ฒƒ์ด ์ž‘๋™ํ•˜๋ ค๋ฉด ์ฝ”๋“œ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์ด ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ฝ”๋“œ์˜ ํŠน์ • ์„น์…˜์—์„œ ์ •ํ™•ํžˆ ์ˆ˜ํ–‰๋˜๋Š” ์ž‘์—…์„ ์ดํ•ดํ•˜๋ ค๋ฉด ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ์‹คํ—˜ํ•ด ๋ณด์„ธ์š”. ์ฝ”๋“œ๋ฅผ ์‹คํ—˜ํ•ด ๋ณด๋Š” ๊ฒƒ์ด ์ฝ”๋“œ๋ฅผ ์ง„์ •์œผ๋กœ ์ดํ•ดํ•˜๋Š” ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์ค€๋น„๋œ ์‹œ์Šคํ…œ

์ด์ œ ์šฐ๋ฆฌ๊ฐ€ ์ด์•ผ๊ธฐํ•œ ๋ชจ๋“  ์ž‘์—…์„ ์™„๋ฃŒํ–ˆ์œผ๋ฏ€๋กœ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฃจํ”„๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋Š” ์‚ฌ์šฉ์ž์—๊ฒŒ ๋„์‹œ์™€ ๋‚ ์งœ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ง€์†์ ์œผ๋กœ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜์—ฌ ํ…Œ์ŠคํŠธํ•  ๋•Œ ๋งค๋ฒˆ ์ˆ˜๋™์œผ๋กœ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ์‹ถ์ง€ ์•Š์„ ๊ฒƒ์ด๋ฏ€๋กœ ํ…Œ์ŠคํŠธ ์‹œ ํ•ด๋‹น ๋ผ์ธ์€ ์•„๋ž˜์˜ ์ฃผ์„ ์ฒ˜๋ฆฌ๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ ์ฃผ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ํ•˜๋“œ์ฝ”๋”ฉ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

city_from = input('From which city? ')
city_to = input('Where to? ')
date_start = input('Search around which departure date? Please use YYYY-MM-DD format only ')
date_end = input('Return when? Please use YYYY-MM-DD format only ')

# city_from = 'LIS'
# city_to = 'SIN'
# date_start = '2019-08-21'
# date_end = '2019-09-07'

for n in range(0,5):
    start_kayak(city_from, city_to, date_start, date_end)
    print('iteration {} was complete @ {}'.format(n, strftime("%Y%m%d-%H%M")))
    
    # ะ–ะดั‘ะผ 4 ั‡ะฐัะฐ
    sleep(60*60*4)
    print('sleep finished.....')

์ด๊ฒƒ์ด ์Šคํฌ๋ฆฝํŠธ์˜ ํ…Œ์ŠคํŠธ ์‹คํ–‰ ๋ชจ์Šต์ž…๋‹ˆ๋‹ค.
Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ
์Šคํฌ๋ฆฝํŠธ ํ…Œ์ŠคํŠธ ์‹คํ–‰

๊ฒฐ๊ณผ

์—ฌ๊ธฐ๊นŒ์ง€ ์™„๋ฃŒํ•˜์…จ๋‹ค๋ฉด ์ถ•ํ•˜๋“œ๋ฆฝ๋‹ˆ๋‹ค! ์ด์ œ ์ž‘๋™ํ•˜๋Š” ์›น ์Šคํฌ๋ž˜ํผ๊ฐ€ ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ฏธ ์ด๋ฅผ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Twilio์™€ ํ†ตํ•ฉํ•˜์—ฌ ์ด๋ฉ”์ผ ๋Œ€์‹  ๋ฌธ์ž ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. VPN ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์„œ๋ฒ„์—์„œ ๋™์‹œ์— ๊ฒฐ๊ณผ๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ดํŠธ ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ๋žŒ์ธ์ง€ ํ™•์ธํ•˜๋Š” ๊ณผ์ •์—์„œ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋Š”๋ฐ, ์ด ๋ฌธ์ œ ์—ญ์‹œ ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์–ด์จŒ๋“  ์ด์ œ ์›ํ•  ๊ฒฝ์šฐ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฐ˜์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Excel ํŒŒ์ผ์ด ์ด๋ฉ”์ผ ์ฒจ๋ถ€ ํŒŒ์ผ๋กœ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ „์†ก๋˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

Python - ์—ฌํ–‰์„ ์ข‹์•„ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•ด ์ €๋ ดํ•œ ํ•ญ๊ณต๊ถŒ์„ ์ฐพ๋Š” ๋„์šฐ๋ฏธ

๋“ฑ๋ก๋œ ์‚ฌ์šฉ์ž๋งŒ ์„ค๋ฌธ ์กฐ์‚ฌ์— ์ฐธ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋กœ๊ทธ์ธ์ œ๋ฐœ

์›น ์Šคํฌ๋ž˜ํ•‘ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜์‹ญ๋‹ˆ๊นŒ?

  • ะ”ะฐ

  • ์•„๋‹ˆ

8๋ช…์˜ ์‚ฌ์šฉ์ž๊ฐ€ ํˆฌํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. 1๋ช…์˜ ์‚ฌ์šฉ์ž๊ฐ€ ๊ธฐ๊ถŒํ–ˆ์Šต๋‹ˆ๋‹ค.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€