ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื‘ื—ื™ืคื•ืฉ R ืื• Python ื‘ืื™ื ื˜ืจื ื˜, ืชืžืฆื ืžื™ืœื™ื•ื ื™ ืžืืžืจื™ื ื•ืงื™ืœื•ืžื˜ืจื™ื ืฉืœ ื“ื™ื•ื ื™ื ื‘ื ื•ืฉื ืื™ื–ื” ืžื”ื ืขื“ื™ืฃ, ืžื”ื™ืจ ื•ื ื•ื— ื™ื•ืชืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื. ืืš ืœืžืจื‘ื” ื”ืฆืขืจ, ื›ืœ ื”ืžืืžืจื™ื ื•ื”ืžื—ืœื•ืงื•ืช ื”ืœืœื• ืื™ื ื ืฉื™ืžื•ืฉื™ื™ื ื‘ืžื™ื•ื—ื“.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ืžื˜ืจืช ืžืืžืจ ื–ื” ื”ื™ื ืœื”ืฉื•ื•ืช ืืช ื˜ื›ื ื™ืงื•ืช ืขื™ื‘ื•ื“ ื”ื ืชื•ื ื™ื ื”ื‘ืกื™ืกื™ื•ืช ื‘ื—ื‘ื™ืœื•ืช ื”ืคื•ืคื•ืœืจื™ื•ืช ื‘ื™ื•ืชืจ ืฉืœ ืฉืชื™ ื”ืฉืคื•ืช. ื•ืขื–ื•ืจ ืœืงื•ืจืื™ื ืœืฉืœื•ื˜ ื‘ืžื”ื™ืจื•ืช ื‘ืžืฉื”ื• ืฉื”ื ืขื“ื™ื™ืŸ ืœื ื™ื•ื“ืขื™ื. ืœืžื™ ืฉื›ื•ืชื‘ ื‘-Python, ื’ืœื” ื›ื™ืฆื“ ืœืขืฉื•ืช ืืช ืื•ืชื• ื”ื“ื‘ืจ ื‘-R, ื•ืœื”ื™ืคืš.

ื‘ืžื”ืœืš ื”ืžืืžืจ ื ื ืชื— ืืช ื”ืชื—ื‘ื™ืจ ืฉืœ ื”ื—ื‘ื™ืœื•ืช ื”ืคื•ืคื•ืœืจื™ื•ืช ื‘ื™ื•ืชืจ ื‘-R. ืืœื• ื”ื—ื‘ื™ืœื•ืช ื”ื›ืœื•ืœื•ืช ื‘ืกืคืจื™ื™ื” tidyverseื•ื’ื ื”ื—ื‘ื™ืœื” data.table. ื•ืœื”ืฉื•ื•ืช ืืช ื”ืชื—ื‘ื™ืจ ืฉืœื”ื ืขื pandas, ื—ื‘ื™ืœืช ื ื™ืชื•ื— ื”ื ืชื•ื ื™ื ื”ืคื•ืคื•ืœืจื™ืช ื‘ื™ื•ืชืจ ื‘-Python.

ื ืขื‘ื•ืจ ืฆืขื“ ืื—ืจ ืฆืขื“ ืœืื•ืจืš ื›ืœ ื”ื ืชื™ื‘ ืฉืœ ื ื™ืชื•ื— ื”ื ืชื•ื ื™ื ืžื˜ืขื™ื ืชื ื•ืขื“ ืœื‘ื™ืฆื•ืข ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ืื ืœื™ื˜ื™ื•ืช ื‘ืืžืฆืขื•ืช Python ื•-R.

ืชื•ื›ืŸ

ืžืืžืจ ื–ื” ื™ื›ื•ืœ ืœืฉืžืฉ ื›ื“ืฃ ืจืžื™ื™ื” ืื ืฉื›ื—ืช ื›ื™ืฆื“ ืœื‘ืฆืข ืคืขื•ืœืช ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื›ืœืฉื”ื™ ื‘ืื—ืช ื”ื—ื‘ื™ืœื•ืช ื”ื ื‘ื“ืงื•ืช.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

  1. ื”ื‘ื“ืœื™ ืชื—ื‘ื™ืจ ืขื™ืงืจื™ื™ื ื‘ื™ืŸ R ืœืคื™ื™ืชื•ืŸ
    1.1. ื’ื™ืฉื” ืœืคื•ื ืงืฆื™ื•ืช ื”ื—ื‘ื™ืœื”
    1.2. ืžึฐืฉืึดื™ืžึธื”
    1.3. ืื™ื ื“ืงืก
    1.4. ืฉื™ื˜ื•ืช ื•-OOP
    1.5. ืฆื™ื ื•ืจื•ืช
    1.6. ืžื‘ื ื™ ืžื™ื“ืข
  2. ื›ืžื” ืžื™ืœื™ื ืขืœ ื”ื—ื‘ื™ืœื•ืช ื‘ื”ืŸ ื ืฉืชืžืฉ
    2.1. ืžืกื•ื“ืจ
    2.2. ื˜ื‘ืœืช ื ืชื•ื ื™ื
    2.3. ื“ื•ื‘ื™ ืคื ื“ื”
  3. ื”ืชืงื ืช ื—ื‘ื™ืœื•ืช
  4. ื˜ื•ืขืŸ ื ืชื•ื ื™ื
  5. ื™ืฆื™ืจืช ืžืกื’ืจื•ืช ื ืชื•ื ื™ื
  6. ื‘ื—ื™ืจืช ื”ืขืžื•ื“ื•ืช ืฉืืชื” ืฆืจื™ืš
  7. ืกื™ื ื•ืŸ ืฉื•ืจื•ืช
  8. ืงื™ื‘ื•ืฅ ื•ืื’ืจื’ืฆื™ื”
  9. ืื™ื—ื•ื“ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช (UNION)
  10. ืฆื™ืจื•ืฃ ืื•ืคืงื™ ืฉืœ ื˜ื‘ืœืื•ืช (JOIN)
  11. ืคื•ื ืงืฆื™ื•ืช ื‘ืกื™ืกื™ื•ืช ืฉืœ ื—ืœื•ืŸ ื•ืขืžื•ื“ื•ืช ืžื—ื•ืฉื‘ื•ืช
  12. ื˜ื‘ืœืช ื”ืชืืžื” ื‘ื™ืŸ ืฉื™ื˜ื•ืช ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘-R ื•ื‘-Python
  13. ืžืกืงื ื”
  14. ืกืงืจ ืงืฆืจ ืขืœ ืื™ื–ื• ื—ื‘ื™ืœื” ืืชื” ืžืฉืชืžืฉ

ืื ืืชื” ืžืขื•ื ื™ื™ืŸ ื‘ื ื™ืชื•ื— ื ืชื•ื ื™ื, ืื•ืœื™ ืชืžืฆื ืืช ืฉืœื™ ืžื‘ืจืง ะธ YouTube ืขืจื•ืฆื™ื. ืจื•ื‘ ื”ืชื•ื›ืŸ ืžื•ืงื“ืฉ ืœืฉืคืช R.

ื”ื‘ื“ืœื™ ืชื—ื‘ื™ืจ ืขื™ืงืจื™ื™ื ื‘ื™ืŸ R ืœืคื™ื™ืชื•ืŸ

ื›ื“ื™ ืœื”ืงืœ ืขืœื™ื›ื ืืช ื”ืžืขื‘ืจ ืž-Python ืœ-R, ืื• ืœื”ื™ืคืš, ืืชืŸ ื›ืžื” ื ืงื•ื“ื•ืช ืขื™ืงืจื™ื•ืช ืฉืืชื ืฆืจื™ื›ื™ื ืœืฉื™ื ืœื‘ ืืœื™ื”ืŸ.

ื’ื™ืฉื” ืœืคื•ื ืงืฆื™ื•ืช ื”ื—ื‘ื™ืœื”

ืœืื—ืจ ื˜ืขื™ื ืช ื—ื‘ื™ืœื” ืœืชื•ืš R, ืื™ื ืš ืฆืจื™ืš ืœืฆื™ื™ืŸ ืืช ืฉื ื”ื—ื‘ื™ืœื” ื›ื“ื™ ืœื’ืฉืช ืœืคื•ื ืงืฆื™ื•ืช ืฉืœื”. ื‘ืจื•ื‘ ื”ืžืงืจื™ื ื–ื” ืœื ื ืคื•ืฅ ื‘-R, ืื‘ืœ ื–ื” ืžืงื•ื‘ืœ. ืืชื” ืœื ืฆืจื™ืš ืœื™ื™ื‘ื ื—ื‘ื™ืœื” ื‘ื›ืœืœ ืื ืืชื” ืฆืจื™ืš ืื—ืช ืžื”ืคื•ื ืงืฆื™ื•ืช ืฉืœื” ื‘ืงื•ื“ ืฉืœืš, ืืœื ืคืฉื•ื˜ ืœืงืจื•ื ืœื” ืขืœ ื™ื“ื™ ืฆื™ื•ืŸ ืฉื ื”ื—ื‘ื™ืœื” ื•ืฉื ื”ืคื•ื ืงืฆื™ื”. ื”ืžืคืจื™ื“ ื‘ื™ืŸ ืฉืžื•ืช ื”ื—ื‘ื™ืœื•ืช ื•ื”ืคื•ื ืงืฆื™ื•ืช ื‘-R ื”ื•ื ื ืงื•ื“ืชื™ื™ื ื›ืคื•ืœื”. package_name::function_name().

ื‘-Python, ืœื”ื™ืคืš, ื–ื” ื ื—ืฉื‘ ืงืœืืกื™ ืœืงืจื•ื ืœืคื•ื ืงืฆื™ื•ืช ืฉืœ ื—ื‘ื™ืœื” ืขืœ ื™ื“ื™ ืฆื™ื•ืŸ ืžืคื•ืจืฉ ืฉืœ ืฉืžื”. ื›ืืฉืจ ืžื•ืจื™ื“ื™ื ื—ื‘ื™ืœื”, ื‘ื“ืจืš ื›ืœืœ ื ื™ืชืŸ ืœื” ืฉื ืžืงื•ืฆืจ, ืœืžืฉืœ. pandas ื‘ื“ืจืš ื›ืœืœ ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ืฉื ื‘ื“ื•ื™ pd. ื’ื™ืฉื” ืœืคื•ื ืงืฆื™ื™ืช ื—ื‘ื™ืœื” ื”ื™ื ื“ืจืš ื ืงื•ื“ื” package_name.function_name().

ืžึฐืฉืึดื™ืžึธื”

ื‘-R, ืžืงื•ื‘ืœ ืœื”ืฉืชืžืฉ ื‘ื—ืฅ ื›ื“ื™ ืœื”ืงืฆื•ืช ืขืจืš ืœืื•ื‘ื™ื™ืงื˜. obj_name <- value, ืœืžืจื•ืช ืฉืžื•ืชืจ ืกื™ืžืŸ ืฉื•ื•ื” ื‘ื•ื“ื“, ืกื™ืžืŸ ืฉื•ื•ื” ื‘ื•ื“ื“ ื‘-R ืžืฉืžืฉ ื‘ืขื™ืงืจ ืœื”ืขื‘ืจืช ืขืจื›ื™ื ืœืืจื’ื•ืžื ื˜ื™ื ืฉืœ ืคื•ื ืงืฆื™ื”.

ื‘-Python, ื”ื”ืงืฆืื” ืžืชื‘ืฆืขืช ืืš ื•ืจืง ืขื ืกื™ืžืŸ ืฉื•ื•ื” ื‘ื•ื“ื“ obj_name = value.

ืื™ื ื“ืงืก

ื™ืฉ ื›ืืŸ ื’ื ื”ื‘ื“ืœื™ื ืžืฉืžืขื•ืชื™ื™ื ืœืžื“ื™. ื‘-R, ื”ืื™ื ื“ืงืก ืžืชื—ื™ืœ ื‘ืื—ื“ ื•ื›ื•ืœืœ ืืช ื›ืœ ื”ืืœืžื ื˜ื™ื ืฉืฆื•ื™ื ื• ื‘ื˜ื•ื•ื— ื”ืžืชืงื‘ืœ,

ื‘-Python, ื”ืื™ื ื“ืงืก ืžืชื—ื™ืœ ืžืืคืก ื•ื”ื˜ื•ื•ื— ืฉื ื‘ื—ืจ ืื™ื ื• ื›ื•ืœืœ ืืช ื”ืืœืžื ื˜ ื”ืื—ืจื•ืŸ ืฉืฆื•ื™ืŸ ื‘ืื™ื ื“ืงืก. ืื– ืขื™ืฆื•ื‘ x[i:j] ื‘-Python ืœื ื™ื›ืœื•ืœ ืืช ื”ืืœืžื ื˜ j.

ื™ืฉื ื ื’ื ื”ื‘ื“ืœื™ื ื‘ืื™ื ื“ืงืก ื”ืฉืœื™ืœื™, ื‘ืกื™ืžื•ืŸ R x[-1] ื™ื—ื–ื™ืจ ืืช ื›ืœ ื”ืจื›ื™ื‘ื™ื ืฉืœ ื”ื•ื•ืงื˜ื•ืจ ืžืœื‘ื“ ื”ืื—ืจื•ืŸ. ื‘-Python, ืกื™ืžื•ืŸ ื“ื•ืžื” ื™ื—ื–ื™ืจ ืจืง ืืช ื”ืืœืžื ื˜ ื”ืื—ืจื•ืŸ.

ืฉื™ื˜ื•ืช ื•-OOP

R ืžื™ื™ืฉื ืืช OOP ื‘ื“ืจื›ื• ืฉืœื•, ื›ืชื‘ืชื™ ืขืœ ื›ืš ื‘ืžืืžืจ "OOP ื‘ืฉืคืช R (ื—ืœืง 1): ืฉื™ืขื•ืจื™ S3". ื‘ืื•ืคืŸ ื›ืœืœื™, R ื”ื™ื ืฉืคื” ืคื•ื ืงืฆื™ื•ื ืœื™ืช, ื•ื”ื›ืœ ื‘ื” ื‘ื ื•ื™ ืขืœ ืคื•ื ืงืฆื™ื•ืช. ืœื›ืŸ, ืœืžืฉืœ, ืขื‘ื•ืจ ืžืฉืชืžืฉื™ Excel, ืขื‘ื•ืจ ืืœ tydiverse ื–ื” ื™ื”ื™ื” ืงืœ ื™ื•ืชืจ ืžืืฉืจ pandas. ืœืžืจื•ืช ืฉื–ื• ืื•ืœื™ ื“ืขืชื™ ื”ืกื•ื‘ื™ื™ืงื˜ื™ื‘ื™ืช.

ื‘ืงื™ืฆื•ืจ, ืœืื•ื‘ื™ื™ืงื˜ื™ื ื‘-R ืื™ืŸ ืžืชื•ื“ื•ืช (ืื ืžื“ื‘ืจื™ื ืขืœ ืžื—ืœืงื•ืช S3, ืื‘ืœ ื™ืฉ ืขื•ื“ ืžื™ืžื•ืฉื™ื ืฉืœ OOP ืฉื”ืจื‘ื” ืคื—ื•ืช ื ืคื•ืฆื™ื). ื™ืฉ ืจืง ืคื•ื ืงืฆื™ื•ืช ืžื•ื›ืœืœื•ืช ืฉืžืขื‘ื“ื•ืช ืื•ืชืŸ ื‘ืฆื•ืจื” ืฉื•ื ื” ื‘ื”ืชืื ืœืžื—ืœืงื” ืฉืœ ื”ืื•ื‘ื™ื™ืงื˜.

ืฆื™ื ื•ืจื•ืช

ืื•ืœื™ ื–ื” ื”ืฉื ืฉืœ pandas ื–ื” ืœื ื™ื”ื™ื” ืœื’ืžืจื™ ื ื›ื•ืŸ, ืื‘ืœ ืื ื™ ืื ืกื” ืœื”ืกื‘ื™ืจ ืืช ื”ืžืฉืžืขื•ืช.

ื›ื“ื™ ืœื ืœื—ืกื•ืš ื—ื™ืฉื•ื‘ื™ ื‘ื™ื ื™ื™ื ื•ืœื ืœื™ื™ืฆืจ ื—ืคืฆื™ื ืžื™ื•ืชืจื™ื ื‘ืกื‘ื™ื‘ืช ื”ืขื‘ื•ื“ื”, ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืžืขื™ืŸ ืฆื™ื ื•ืจ. ื”ึธื”ึตืŸ. ืœื”ืขื‘ื™ืจ ืืช ืชื•ืฆืืช ื”ื—ื™ืฉื•ื‘ ืžืคื•ื ืงืฆื™ื” ืื—ืช ืœืื—ืจืช, ื•ืœื ืœืฉืžื•ืจ ืชื•ืฆืื•ืช ื‘ื™ื ื™ื™ื.

ื ื™ืงื— ืืช ื“ื•ื’ืžื ื”ืงื•ื“ ื”ื‘ืื”, ืฉื‘ื” ืื ื• ืžืื—ืกื ื™ื ื—ื™ืฉื•ื‘ื™ ื‘ื™ื ื™ื™ื ื‘ืื•ื‘ื™ื™ืงื˜ื™ื ื ืคืจื“ื™ื:

temp_object <- func1()
temp_object2 <- func2(temp_object )
obj <- func3(temp_object2 )

ื‘ื™ืฆืขื ื• 3 ืคืขื•ืœื•ืช ื‘ืจืฆืฃ, ื•ื”ืชื•ืฆืื” ืฉืœ ื›ืœ ืื—ืช ื ืฉืžืจื” ื‘ืื•ื‘ื™ื™ืงื˜ ื ืคืจื“. ืื‘ืœ ืœืžืขืฉื”, ืื ื—ื ื• ืœื ืฆืจื™ื›ื™ื ืืช ื—ืคืฆื™ ื”ื‘ื™ื ื™ื™ื ื”ืืœื”.

ืื• ืืคื™ืœื• ื™ื•ืชืจ ื’ืจื•ืข, ืื‘ืœ ื™ื•ืชืจ ืžื•ื›ืจ ืœืžืฉืชืžืฉื™ ืืงืกืœ.

obj  <- func3(func2(func1()))

ื‘ืžืงืจื” ื–ื”, ืœื ืฉืžืจื ื• ืชื•ืฆืื•ืช ื—ื™ืฉื•ื‘ ื‘ื™ื ื™ื™ื, ืื‘ืœ ืงืจื™ืืช ืงื•ื“ ืขื ืคื•ื ืงืฆื™ื•ืช ืžืงื•ื ื ื•ืช ื”ื™ื ืžืื•ื“ ืœื ื ื•ื—ื”.

ื ื‘ื—ืŸ ืžืกืคืจ ื’ื™ืฉื•ืช ืœืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘-R, ื•ื”ืŸ ืžื‘ืฆืขื•ืช ืคืขื•ืœื•ืช ื“ื•ืžื•ืช ื‘ื“ืจื›ื™ื ืฉื•ื ื•ืช.

ืฆื™ื ื•ืจื•ืช ื‘ืกืคืจื™ื™ื” tidyverse ืžื™ื•ืฉื ืขืœ ื™ื“ื™ ื”ืžืคืขื™ืœ %>%.

obj <- func1() %>% 
            func2() %>%
            func3()

ื›ืš ืื ื• ืœื•ืงื—ื™ื ืืช ื”ืชื•ืฆืื” ืฉืœ ื”ืขื‘ื•ื“ื” func1() ื•ืœื”ืขื‘ื™ืจ ืืช ื–ื” ื›ื˜ื™ืขื•ืŸ ื”ืจืืฉื•ืŸ ืœ func2(), ืื– ื ืขื‘ื™ืจ ืืช ื”ืชื•ืฆืื” ืฉืœ ื—ื™ืฉื•ื‘ ื–ื” ื›ื˜ื™ืขื•ืŸ ื”ืจืืฉื•ืŸ func3(). ื•ื‘ืกื•ืคื• ืฉืœ ื“ื‘ืจ, ืื ื• ื›ื•ืชื‘ื™ื ืืช ื›ืœ ื”ื—ื™ืฉื•ื‘ื™ื ืฉื‘ื•ืฆืขื• ืœืชื•ืš ื”ืื•ื‘ื™ื™ืงื˜ obj <-.

ื›ืœ ื”ืืžื•ืจ ืœืขื™ืœ ืžื•ืžื—ืฉ ื˜ื•ื‘ ื™ื•ืชืจ ืžืžื™ืœื™ื ืขืœ ื™ื“ื™ ื”ืžื ื”ื–ื”:
ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ะ’ data.table ืฉืจืฉืจืื•ืช ืžืฉืžืฉื•ืช ื‘ืฆื•ืจื” ื“ื•ืžื”.

newDT <- DT[where, select|update|do, by][where, select|update|do, by][where, select|update|do, by]

ื‘ื›ืœ ืื—ืช ืžื”ืกื•ื’ืจื™ื™ื ื”ืžืจื•ื‘ืขื™ื ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืชื•ืฆืื” ืฉืœ ื”ืคืขื•ืœื” ื”ืงื•ื“ืžืช.

ะ’ pandas ืคืขื•ืœื•ืช ื›ืืœื” ืžื•ืคืจื“ื•ืช ื‘ื ืงื•ื“ื”.

obj = df.fun1().fun2().fun3()

ื”ึธื”ึตืŸ. ืื ื—ื ื• ืœื•ืงื—ื™ื ืืช ื”ืฉื•ืœื—ืŸ ืฉืœื ื• df ื•ืœื”ืฉืชืžืฉ ื‘ืฉื™ื˜ื” ืฉืœื” fun1(), ืœืื—ืจ ืžื›ืŸ ืื ื• ืžื™ื™ืฉืžื™ื ืืช ื”ืฉื™ื˜ื” ืขืœ ื”ืชื•ืฆืื” ื”ืžืชืงื‘ืœืช fun2()ืื—ืจื™ fun3(). ื”ืชื•ืฆืื” ื”ืžืชืงื‘ืœืช ื ืฉืžืจืช ื‘ืื•ื‘ื™ื™ืงื˜ obj .

ืžื‘ื ื™ ืžื™ื“ืข

ืžื‘ื ื™ ื ืชื•ื ื™ื ื‘-R ื•ื‘-Python ื“ื•ืžื™ื, ืืš ื™ืฉ ืœื”ื ืฉืžื•ืช ืฉื•ื ื™ื.

ืชื™ืื•ืจ
ืฉื ื‘-R
ืฉื ื‘ืคื™ื™ืชื•ืŸ/ืคื ื“ื•ืช

ืžื‘ื ื” ื”ืฉื•ืœื—ืŸ
data.frame, data.table, tibble
DataFrame

ืจืฉื™ืžืช ืขืจื›ื™ื ื—ื“ ืžื™ืžื“ื™ืช
ื•ืงื˜ื•ืจ
ืกื“ืจื” ื‘ืคื ื“ื•ืช ืื• ืจืฉื™ืžื” ื‘ืคื™ื™ืชื•ืŸ ื˜ื”ื•ืจ

ืžื‘ื ื” ืจื‘ ืžืคืœืกื™ ืฉืื™ื ื• ื˜ื‘ืœืื™
ืจืฉื™ืžื”
ืžื™ืœื•ืŸ (ื“ื™ืงื˜)

ื ืกืชื›ืœ ืขืœ ื›ืžื” ืชื›ื•ื ื•ืช ืื—ืจื•ืช ื•ื”ื‘ื“ืœื™ื ื‘ืชื—ื‘ื™ืจ ืœื”ืœืŸ.

ื›ืžื” ืžื™ืœื™ื ืขืœ ื”ื—ื‘ื™ืœื•ืช ื‘ื”ืŸ ื ืฉืชืžืฉ

ืจืืฉื™ืช, ืืกืคืจ ืœื›ื ืžืขื˜ ืขืœ ื”ื—ื‘ื™ืœื•ืช ืฉืชื›ื™ืจื• ื‘ืžื”ืœืš ืžืืžืจ ื–ื”.

ืžืกื•ื“ืจ

ืืชืจ ืจืฉืžื™: tidyverse.org
ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”
ื”ืกืคืจื™ื™ื” tidyverse ื ื›ืชื‘ ืขืœ ื™ื“ื™ Hedley Wickham, ืžื“ืขืŸ ืžื—ืงืจ ื‘ื›ื™ืจ ื‘-RStudio. tidyverse ืžื•ืจื›ื‘ ืžืกื˜ ืžืจืฉื™ื ืฉืœ ื—ื‘ื™ืœื•ืช ื”ืžืคืฉื˜ื•ืช ืืช ืขื™ื‘ื•ื“ ื”ื ืชื•ื ื™ื, 5 ืžื”ืŸ ื›ืœื•ืœื•ืช ื‘ืขืฉืจืช ื”ื”ื•ืจื“ื•ืช ื”ืžื•ื‘ื™ืœื•ืช ืžืžืื’ืจ CRAN.

ืœื™ื‘ืช ื”ืกืคืจื™ื™ื” ืžื•ืจื›ื‘ืช ืžื”ื—ื‘ื™ืœื•ืช ื”ื‘ืื•ืช: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats. ื›ืœ ืื—ืช ืžื”ื—ื‘ื™ืœื•ืช ื”ืœืœื• ื ื•ืขื“ื” ืœืคืชื•ืจ ื‘ืขื™ื” ืกืคืฆื™ืคื™ืช. ืœื“ื•ื’ืžื” dplyr ื ื•ืฆืจ ืขื‘ื•ืจ ืžื ื™ืคื•ืœืฆื™ื” ืฉืœ ื ืชื•ื ื™ื, tidyr ืœื”ื‘ื™ื ืืช ื”ื ืชื•ื ื™ื ืœืฆื•ืจื” ืžืกื•ื“ืจืช, stringr ืžืคืฉื˜ ืืช ื”ืขื‘ื•ื“ื” ืขื ืžื—ืจื•ื–ื•ืช, ื• ggplot2 ื”ื•ื ืื—ื“ ื”ื›ืœื™ื ื”ืคื•ืคื•ืœืจื™ื™ื ื‘ื™ื•ืชืจ ืœื”ื“ืžื™ื™ืช ื ืชื•ื ื™ื.

ื”ื™ืชืจื•ืŸ tidyverse ื”ื•ื ืชื—ื‘ื™ืจ ื”ืคืฉื˜ื•ืช ื•ื”ืงืœ ืœืงืจื™ืื”, ืฉื“ื•ืžื” ื‘ืžื•ื‘ื ื™ื ืจื‘ื™ื ืœืฉืคืช ื”ืฉืื™ืœืชื•ืช ืฉืœ SQL.

ื˜ื‘ืœืช ื ืชื•ื ื™ื

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”ืืชืจ ืจืฉืžื™: r-datatable.com

ืžึฐื—ึทื‘ึผึตืจ data.table ื”ื•ื ืžืื˜ ื“ื•ืœ ืž-H2O.ai.

ื”ื”ื•ืฆืื” ื”ืจืืฉื•ื ื” ืฉืœ ื”ืกืคืจื™ื™ื” ื”ืชืจื—ืฉื” ื‘-2006.

ืชื—ื‘ื™ืจ ื”ื—ื‘ื™ืœื” ืื™ื ื• ื ื•ื— ื›ืžื• ื‘ tidyverse ื•ืžื–ื›ื™ืจ ื™ื•ืชืจ ืžืกื’ืจื•ืช ื ืชื•ื ื™ื ืงืœืืกื™ื•ืช ื‘-R, ืืš ื‘ืžืงื‘ื™ืœ ืžื•ืจื—ื‘ืช ืžืฉืžืขื•ืชื™ืช ื‘ืคื•ื ืงืฆื™ื•ื ืœื™ื•ืช.

ื›ืœ ื”ืžื ื™ืคื•ืœืฆื™ื•ืช ืขื ื”ื˜ื‘ืœื” ื‘ื—ื‘ื™ืœื” ื–ื• ืžืชื•ืืจื•ืช ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื, ื•ืื ืืชื” ืžืชืจื’ื ืืช ื”ืชื—ื‘ื™ืจ data.table ื‘-SQL ืืชื” ืžืงื‘ืœ ืžืฉื”ื• ื›ื–ื”: data.table[ WHERE, SELECT, GROUP BY ]

ื”ื—ื•ื–ืง ืฉืœ ื—ื‘ื™ืœื” ื–ื• ื”ื•ื ืžื”ื™ืจื•ืช ื”ืขื™ื‘ื•ื“ ืฉืœ ื›ืžื•ื™ื•ืช ื’ื“ื•ืœื•ืช ืฉืœ ื ืชื•ื ื™ื.

ื“ื•ื‘ื™ ืคื ื“ื”

ืืชืจ ืจืฉืžื™: pandas.pydata.org ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ืฉืžื” ืฉืœ ื”ืกืคืจื™ื™ื” ืžื’ื™ืข ืžื”ืžื•ื ื— ื”ืืงื•ื ื•ืžื˜ืจื™ "ื ืชื•ื ื™ ืคืื ืœ", ื”ืžืฉืžืฉ ืœืชื™ืื•ืจ ืงื‘ื•ืฆื•ืช ืžื•ื‘ื ื•ืช ืจื‘-ืžืžื“ื™ื•ืช ืฉืœ ืžื™ื“ืข.

ืžึฐื—ึทื‘ึผึตืจ pandas ื”ื•ื ื”ืืžืจื™ืงืื™ ื•ื•ืก ืžืงื™ื ื™.

ื›ืฉื–ื” ืžื’ื™ืข ืœื ื™ืชื•ื— ื ืชื•ื ื™ื ื‘-Python, ืฉื•ื•ื” pandas ืœื. ื—ื‘ื™ืœื” ืจื‘ ืชื›ืœื™ืชื™ืช ืžืื•ื“, ื‘ืจืžื” ื’ื‘ื•ื”ื”, ื”ืžืืคืฉืจืช ืœืš ืœื‘ืฆืข ื›ืœ ืžื ื™ืคื•ืœืฆื™ื” ืขื ื ืชื•ื ื™ื, ื”ื—ืœ ืžื˜ืขื™ื ืช ื ืชื•ื ื™ื ืžื›ืœ ืžืงื•ืจ ื•ืขื“ ืœื”ื“ืžื™ื”.

ื”ืชืงื ืช ื—ื‘ื™ืœื•ืช ื ื•ืกืคื•ืช

ื”ื—ื‘ื™ืœื•ืช ื”ื ื“ื•ื ื•ืช ื‘ืžืืžืจ ื–ื” ืื™ื ืŸ ื›ืœื•ืœื•ืช ื‘ื”ืคืฆื•ืช ื”ื‘ืกื™ืกื™ื•ืช ืฉืœ R ื•-Python. ืœืžืจื•ืช ืฉื™ืฉ ืื–ื”ืจื” ืงื˜ื ื”, ืื ื”ืชืงื ืช ืืช ื”ืคืฆืช Anaconda, ืื– ื”ืชืงืŸ ื‘ื ื•ืกืฃ pandas ืื™ื ื• ื ื“ืจืฉ.

ื”ืชืงื ืช ื—ื‘ื™ืœื•ืช ื‘-R

ืื ืคืชื—ืช ืืช ืกื‘ื™ื‘ืช ื”ืคื™ืชื•ื— ืฉืœ RStudio ืœืคื—ื•ืช ืคืขื ืื—ืช, ื›ื ืจืื” ืฉืืชื” ื›ื‘ืจ ื™ื•ื“ืข ืื™ืš ืœื”ืชืงื™ืŸ ืืช ื”ื—ื‘ื™ืœื” ื”ื ื“ืจืฉืช ื‘-R. ื›ื“ื™ ืœื”ืชืงื™ืŸ ื—ื‘ื™ืœื•ืช, ื”ืฉืชืžืฉ ื‘ืคืงื•ื“ื” ื”ืจื’ื™ืœื” install.packages() ืขืœ ื™ื“ื™ ื”ืคืขืœืชื• ื™ืฉื™ืจื•ืช ื‘-R ืขืฆืžื•.

# ัƒัั‚ะฐะฝะพะฒะบะฐ ะฟะฐะบะตั‚ะพะฒ
install.packages("vroom")
install.packages("readr")
install.packages("dplyr")
install.packages("data.table")

ืœืื—ืจ ื”ื”ืชืงื ื” ื™ืฉ ืœื—ื‘ืจ ืืช ื”ื—ื‘ื™ืœื•ืช, ืฉืขื‘ื•ืจืŸ ื‘ืจื•ื‘ ื”ืžืงืจื™ื ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ืคืงื•ื“ื” library().

# ะฟะพะดะบะปัŽั‡ะตะฝะธะต ะธะปะธ ะธะผะฟะพั€ั‚ ะฟะฐะบะตั‚ะพะฒ ะฒ ั€ะฐะฑะพั‡ะตะต ะพะบั€ัƒะถะตะฝะธะต
library(vroom)
library(readr)
library(dplyr)
library(data.table)

ื”ืชืงื ืช ื—ื‘ื™ืœื•ืช ื‘ืคื™ื™ืชื•ืŸ

ืื– ืื ืžื•ืชืงืŸ ืืฆืœืš Python ื˜ื”ื•ืจ, ืื– pandas ืืชื” ืฆืจื™ืš ืœื”ืชืงื™ืŸ ืื•ืชื• ื™ื“ื ื™ืช. ืคืชื— ืฉื•ืจืช ืคืงื•ื“ื”, ืื• ืžืกื•ืฃ, ื‘ื”ืชืื ืœืžืขืจื›ืช ื”ื”ืคืขืœื” ืฉืœืš ื•ื”ื–ืŸ ืืช ื”ืคืงื•ื“ื” ื”ื‘ืื”.

pip install pandas

ืœืื—ืจ ืžื›ืŸ ื ื—ื–ื•ืจ ืœืคื™ื™ืชื•ืŸ ื•ื ื™ื™ื‘ื ืืช ื”ื—ื‘ื™ืœื” ื”ืžื•ืชืงื ืช ืขื ื”ืคืงื•ื“ื” import.

import pandas as pd

ื˜ื•ืขืŸ ื ืชื•ื ื™ื

ื›ืจื™ื™ืช ื ืชื•ื ื™ื ื”ื™ื ืื—ื“ ื”ืฉืœื‘ื™ื ื”ื—ืฉื•ื‘ื™ื ื‘ื™ื•ืชืจ ื‘ื ื™ืชื•ื— ื ืชื•ื ื™ื. ื’ื Python ื•ื’ื R, ืื ืชืจืฆื”, ืžืกืคืงื™ื ืœืš ื”ื–ื“ืžื ื•ื™ื•ืช ื ืจื—ื‘ื•ืช ืœื”ืฉื™ื’ ื ืชื•ื ื™ื ืžื›ืœ ืžืงื•ืจื•ืช: ืงื‘ืฆื™ื ืžืงื•ืžื™ื™ื, ืงื‘ืฆื™ื ืžื”ืื™ื ื˜ืจื ื˜, ืืชืจื™ ืื™ื ื˜ืจื ื˜, ื›ืœ ืžื™ื ื™ ืžืกื“ื™ ื ืชื•ื ื™ื.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ืœืื•ืจืš ื”ืžืืžืจ ื ืฉืชืžืฉ ื‘ืžืกืคืจ ืžืขืจื›ื™ ื ืชื•ื ื™ื:

  1. ืฉืชื™ ื”ื•ืจื“ื•ืช ืžื’ื•ื’ืœ ืื ืœื™ื˜ื™ืงืก.
  2. ืžืขืจืš ื ืชื•ื ื™ื ืฉืœ ื ื•ืกืขื™ื ืฉืœ ื˜ื™ื˜ืื ื™ืง.

ื›ืœ ื”ื ืชื•ื ื™ื ืืฆืœื™ GitHub ื‘ืฆื•ืจื” ืฉืœ ืงื‘ืฆื™ csv ื•-tsv. ืžืื™ืคื” ื ื‘ืงืฉ ืื•ืชื?

ื˜ืขื™ื ืช ื ืชื•ื ื™ื ืœืชื•ืš R: tidyverse, vroom, readr

ืœื˜ืขื™ื ืช ื ืชื•ื ื™ื ืœืกืคืจื™ื™ื” tidyverse ื™ืฉ ืฉืชื™ ื—ื‘ื™ืœื•ืช: vroom, readr. vroom ืžื•ื“ืจื ื™ ื™ื•ืชืจ, ืื‘ืœ ื‘ืขืชื™ื“ ื™ื™ืชื›ืŸ ืฉื”ื—ื‘ื™ืœื•ืช ื™ืฉื•ืœื‘ื•.

ืฆื™ื˜ื•ื˜ ืž ืชื™ืขื•ื“ ืจืฉืžื™ vroom.

vroom ืœืขื•ืžืช ืงื•ืจื
ืžื” ื”ืฉื—ืจื•ืจ ืฉืœ vroom ืžืชื›ื•ื•ืŸ ืœ readr? ืœืขืช ืขืชื” ืื ื• ืžืชื›ื ื ื™ื ืœืชืช ืœืฉืชื™ ื”ื—ื‘ื™ืœื•ืช ืœื”ืชืคืชื— ื‘ื ืคืจื“, ืืš ืกื‘ื™ืจ ืœื”ื ื™ื— ืฉื ืื—ื“ ืืช ื”ื—ื‘ื™ืœื•ืช ื‘ืขืชื™ื“. ื—ืกืจื•ืŸ ืื—ื“ ื‘ืงืจื™ืื” ื”ืขืฆืœื ื™ืช ืฉืœ vroom ื”ื•ื ืฉืœื ื ื™ืชืŸ ืœื“ื•ื•ื— ืขืœ ื‘ืขื™ื•ืช ื ืชื•ื ื™ื ืžืกื•ื™ืžื•ืช ืžืจืืฉ, ื›ืš ืฉื”ื“ืจืš ื”ื˜ื•ื‘ื” ื‘ื™ื•ืชืจ ืœืื—ื“ ืื•ืชืŸ ื“ื•ืจืฉืช ืžื—ืฉื‘ื”.

vroom ืœืขื•ืžืช ืงื•ืจื
ืžื” ื”ืžืฉืžืขื•ืช ืฉืœ ืฉื—ืจื•ืจ? vroom ืขื‘ื•ืจ readr? ื›ืจื’ืข ืื ื—ื ื• ืžืชื›ื ื ื™ื ืœืคืชื— ืืช ืฉืชื™ ื”ื—ื‘ื™ืœื•ืช ื‘ื ืคืจื“, ืื‘ืœ ื›ื ืจืื” ืฉื ืฉืœื‘ ืื•ืชืŸ ื‘ืขืชื™ื“. ืื—ื“ ื”ื—ืกืจื•ื ื•ืช ืฉืœ ืงืจื™ืื” ืขืฆืœื ื™ืช vroom ื”ื™ื ืฉืœื ื ื™ืชืŸ ืœื“ื•ื•ื— ืขืœ ื‘ืขื™ื•ืช ืžืกื•ื™ืžื•ืช ื‘ื ืชื•ื ื™ื ืžืจืืฉ, ืื– ืืชื” ืฆืจื™ืš ืœื—ืฉื•ื‘ ืื™ืš ืœืฉืœื‘ ืื•ืชืŸ ื‘ืฆื•ืจื” ื”ื˜ื•ื‘ื” ื‘ื™ื•ืชืจ.

ื‘ืžืืžืจ ื–ื” ื ืกืชื›ืœ ืขืœ ืฉืชื™ ื—ื‘ื™ืœื•ืช ื˜ืขื™ื ืช ื”ื ืชื•ื ื™ื:

ื˜ื•ืขืŸ ื ืชื•ื ื™ื ืœืชื•ืš ื—ื‘ื™ืœืช R: vroom

# install.packages("vroom")
library(vroom)

# ะงั‚ะตะฝะธะต ะดะฐะฝะฝั‹ั…
## vroom
ga_nov  <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

ื˜ื•ืขืŸ ื ืชื•ื ื™ื ืœืชื•ืš R: readr

# install.packages("readr")
library(readr)

# ะงั‚ะตะฝะธะต ะดะฐะฝะฝั‹ั…
## readr
ga_nov  <- read_tsv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- read_tsv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

ื‘ื—ื‘ื™ืœื” vroom, ืœืœื ืงืฉืจ ืœืคื•ืจืžื˜ ื”ื ืชื•ื ื™ื csv / tsv, ื”ื˜ืขื™ื ื” ืžืชื‘ืฆืขืช ืขืœ ื™ื“ื™ ืคื•ื ืงืฆื™ื” ื‘ืขืœืช ืื•ืชื• ืฉื vroom(), ื‘ื—ื‘ื™ืœื” readr ืื ื• ืžืฉืชืžืฉื™ื ื‘ืคื•ื ืงืฆื™ื” ืฉื•ื ื” ืขื‘ื•ืจ ื›ืœ ืคื•ืจืžื˜ read_tsv() ะธ read_csv().

ื˜ืขื™ื ืช ื ืชื•ื ื™ื ืœืชื•ืš R: data.table

ะ’ data.table ื™ืฉ ืคื•ื ืงืฆื™ื” ืœื˜ืขื™ื ืช ื ืชื•ื ื™ื fread().

ื˜ื•ืขืŸ ื ืชื•ื ื™ื ืœืชื•ืš R: data.table ื—ื‘ื™ืœืช

# install.packages("data.table")
library(data.table)

## data.table
ga_nov  <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

ื˜ื•ืขืŸ ื ืชื•ื ื™ื ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ืื ื ืฉื•ื•ื” ืขื ื—ื‘ื™ืœื•ืช R, ืื– ื‘ืžืงืจื” ื–ื” ื”ืชื—ื‘ื™ืจ ื”ื•ื ื”ืงืจื•ื‘ ื‘ื™ื•ืชืจ pandas ืจืฆื•ืŸ readr, ื›ื™ pandas ื™ื›ื•ืœ ืœื‘ืงืฉ ื ืชื•ื ื™ื ืžื›ืœ ืžืงื•ื, ื•ื™ืฉ ืžืฉืคื—ื” ืฉืœืžื” ืฉืœ ืคื•ื ืงืฆื™ื•ืช ื‘ื—ื‘ื™ืœื” ื”ื–ื• read_*().

  • read_csv()
  • read_excel()
  • read_sql()
  • read_json()
  • read_html()

ื•ืขื•ื“ ืคื•ื ืงืฆื™ื•ืช ืจื‘ื•ืช ืฉื ื•ืขื“ื• ืœืงืจื•ื ื ืชื•ื ื™ื ืžืคื•ืจืžื˜ื™ื ืฉื•ื ื™ื. ืื‘ืœ ืœืขื ื™ื™ื ื ื• ื–ื” ืžืกืคื™ืง read_table() ืื• read_csv() ื‘ืืžืฆืขื•ืช ื˜ื™ืขื•ืŸ ืกืคื˜ืžื‘ืจ ื›ื“ื™ ืœืฆื™ื™ืŸ ืืช ืžืคืจื™ื“ ื”ืขืžื•ื“ื•ืช.

ื˜ื•ืขืŸ ื ืชื•ื ื™ื ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

import pandas as pd

ga_nov  = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/ga_nowember.csv", sep = "t")
ga_dec  = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/ga_december.csv", sep = "t")
titanic = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/titanic.csv")

ื™ืฆื™ืจืช ืžืกื’ืจื•ืช ื ืชื•ื ื™ื

ื‘ื˜ื‘ืœื” ื›ื‘ื™ืจ, ืืฉืจ ื”ืขืžืกื ื•, ื™ืฉ ืฉื“ื” ืžึดื™ืŸ, ื”ืžืื—ืกืŸ ืืช ืžื–ื”ื” ื”ืžื’ื“ืจ ืฉืœ ื”ื ื•ืกืข.

ืื‘ืœ ืœื”ืฆื’ื” ื ื•ื—ื” ื™ื•ืชืจ ืฉืœ ื ืชื•ื ื™ื ื‘ืžื•ื ื—ื™ื ืฉืœ ืžื’ื“ืจ ื”ื ื•ืกืข, ื›ื“ืื™ ืœื”ืฉืชืžืฉ ื‘ืฉื ื•ืœื ื‘ืงื•ื“ ื”ืžื’ื“ืจ.

ืœืฉื ื›ืš, ื ื™ืฆื•ืจ ืกืคืจื™ื™ื” ืงื˜ื ื”, ื˜ื‘ืœื” ืฉื‘ื” ื™ื”ื™ื• ืจืง 2 ืขืžื•ื“ื•ืช (ืงื•ื“ ื•ืฉื ืžื’ื“ืจ) ื•-2 ืฉื•ืจื•ืช, ื‘ื”ืชืืžื”.

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-R: tidyverse, dplyr

ื‘ื“ื•ื’ืžื” ืฉืœ ื”ืงื•ื“ ืฉืœื”ืœืŸ, ืื ื• ื™ื•ืฆืจื™ื ืืช ืžืกื’ืจืช ื”ื ืชื•ื ื™ื ื”ืจืฆื•ื™ื” ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” tibble() .

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-R: dplyr

## dplyr
### ัะพะทะดะฐั‘ะผ ัะฟั€ะฐะฒะพั‡ะฝะธะบ
gender <- tibble(id = c(1, 2),
                 gender = c("female", "male"))

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-R: data.table

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-R: data.table

## data.table
### ัะพะทะดะฐั‘ะผ ัะฟั€ะฐะฒะพั‡ะฝะธะบ
gender <- data.table(id = c(1, 2),
                    gender = c("female", "male"))

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-Python: pandas

ะ’ pandas ื™ืฆื™ืจืช ืžืกื’ืจื•ืช ืžืชื‘ืฆืขืช ื‘ืžืกืคืจ ืฉืœื‘ื™ื, ืชื—ื™ืœื” ืื ื• ื™ื•ืฆืจื™ื ืžื™ืœื•ืŸ, ื•ืœืื—ืจ ืžื›ืŸ ืื ื• ืžืžื™ืจื™ื ืืช ื”ืžื™ืœื•ืŸ ืœ-dataframe.

ื™ืฆื™ืจืช ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘-Python: pandas

# ัะพะทะดะฐั‘ะผ ะดะฐั‚ะฐ ั„ั€ะตะนะผ
gender_dict = {'id': [1, 2],
               'gender': ["female", "male"]}
# ะฟั€ะตะพะฑั€ะฐะทัƒะตะผ ัะปะพะฒะฐั€ัŒ ะฒ ะดะฐั‚ะฐั„ั€ะตะนะผ
gender = pd.DataFrame.from_dict(gender_dict)

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช

ื”ื˜ื‘ืœืื•ืช ืฉืืชื” ืขื•ื‘ื“ ืื™ืชืŸ ืขืฉื•ื™ื•ืช ืœื”ื›ื™ืœ ืขืฉืจื•ืช ื•ืืคื™ืœื• ืžืื•ืช ืขืžื•ื“ื•ืช ืฉืœ ื ืชื•ื ื™ื. ืื‘ืœ ื›ื“ื™ ืœื‘ืฆืข ื ื™ืชื•ื—, ื›ื›ืœืœ, ืื™ื ืš ืฆืจื™ืš ืืช ื›ืœ ื”ืขืžื•ื“ื•ืช ื”ื–ืžื™ื ื•ืช ื‘ื˜ื‘ืœืช ื”ืžืงื•ืจ.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ืœื›ืŸ, ืื—ืช ื”ืคืขื•ืœื•ืช ื”ืจืืฉื•ื ื•ืช ืฉืชื‘ืฆืขื• ืขื ื˜ื‘ืœืช ื”ืžืงื•ืจ ื”ื™ื ืœื ืงื•ืช ืื•ืชื” ืžืžื™ื“ืข ืžื™ื•ืชืจ ื•ืœืคื ื•ืช ืืช ื”ื–ื™ื›ืจื•ืŸ ืฉืžื™ื“ืข ื–ื” ืชื•ืคืก.

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘-R: tidyverse, dplyr

ืชื—ื‘ื™ืจ dplyr ื“ื•ืžื” ืžืื•ื“ ืœืฉืคืช ื”ืฉืื™ืœืชื•ืช SQL, ืื ืืชื” ืžื›ื™ืจ ืื•ืชื” ืชื•ื›ืœ ืœืฉืœื•ื˜ ื‘ื—ื‘ื™ืœื” ื”ื–ื• ื‘ืžื”ื™ืจื•ืช.

ื›ื“ื™ ืœื‘ื—ื•ืจ ืขืžื•ื“ื•ืช, ื”ืฉืชืžืฉ ื‘ืคื•ื ืงืฆื™ื” select().

ืœื”ืœืŸ ื“ื•ื’ืžืื•ืช ืœืงื•ื“ ืฉื‘ืืžืฆืขื•ืชื• ื ื™ืชืŸ ืœื‘ื—ื•ืจ ืขืžื•ื“ื•ืช ื‘ื“ืจื›ื™ื ื”ื‘ืื•ืช:

  • ืคื™ืจื•ื˜ ืฉืžื•ืช ื”ืขืžื•ื“ื•ืช ื”ื ื“ืจืฉื•ืช
  • ืขื™ื™ืŸ ื‘ืฉืžื•ืช ื”ืขืžื•ื“ื•ืช ื‘ืืžืฆืขื•ืช ื‘ื™ื˜ื•ื™ื™ื ืจื’ื•ืœืจื™ื™ื
  • ืœืคื™ ืกื•ื’ ื ืชื•ื ื™ื ืื• ื›ืœ ืžืืคื™ื™ืŸ ืื—ืจ ืฉืœ ื”ื ืชื•ื ื™ื ื”ื›ืœื•ืœื™ื ื‘ืขืžื•ื“ื”

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘-R: dplyr

# ะ’ั‹ะฑะพั€ ะฝัƒะถะฝั‹ั… ัั‚ะพะปะฑั†ะพะฒ
## dplyr
### ะฒั‹ะฑั€ะฐั‚ัŒ ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ ัั‚ะพะปะฑั†ะพะฒ
select(ga_nov, date, source, sessions)
### ะธัะบะปัŽั‡ัŒ ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ ัั‚ะพะปะฑั†ะพะฒ
select(ga_nov, -medium, -bounces)
### ะฒั‹ะฑั€ะฐั‚ัŒ ะฟะพ ั€ะตะณัƒะปัั€ะฝะพะผัƒ ะฒั‹ั€ะฐะถะตะฝะธัŽ, ัั‚ะพะฑั†ั‹ ะธะผะตะฝะฐ ะบะพั‚ะพั€ั‹ั… ะทะฐะบะฐะฝั‡ะธะฒะฐัŽั‚ัั ะฝะฐ s
select(ga_nov, matches("s$"))
### ะฒั‹ะฑั€ะฐั‚ัŒ ะฟะพ ัƒัะปะพะฒะธัŽ, ะฒั‹ะฑะธั€ะฐะตะผ ั‚ะพะปัŒะบะพ ั†ะตะปะพั‡ะธัะปะตะฝะฝั‹ะต ัั‚ะพะปะฑั†ั‹
select_if(ga_nov, is.integer)

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘-R: data.table

ืื•ืชืŸ ืคืขื•ืœื•ืช ื‘ data.table ืžื‘ื•ืฆืขื•ืช ื‘ืฆื•ืจื” ืžืขื˜ ืฉื•ื ื”, ื‘ืชื—ื™ืœืช ื”ืžืืžืจ ืกื™ืคืงืชื™ ืชื™ืื•ืจ ืฉืœ ืื™ืœื• ื˜ื™ืขื•ื ื™ื ื ืžืฆืื™ื ื‘ืชื•ืš ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื data.table.

DT[i,j,by]

ืื™ืคื”:
ืื ื™ - ืื™ืคื”, ื›ืœื•ืžืจ. ืกื™ื ื•ืŸ ืœืคื™ ืฉื•ืจื•ืช
j - ื‘ื—ืจ|ืขื“ื›ืŸ|ืขืฉื”, ื›ืœื•ืžืจ. ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื•ื”ืžืจืชืŸ
ืœืคื™ - ืงื™ื‘ื•ืฅ ื ืชื•ื ื™ื

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘-R: data.table

## data.table
### ะฒั‹ะฑั€ะฐั‚ัŒ ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ ัั‚ะพะปะฑั†ะพะฒ
ga_nov[ , .(date, source, sessions) ]
### ะธัะบะปัŽั‡ัŒ ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ ัั‚ะพะปะฑั†ะพะฒ
ga_nov[ , .SD, .SDcols = ! names(ga_nov) %like% "medium|bounces" ]
### ะฒั‹ะฑั€ะฐั‚ัŒ ะฟะพ ั€ะตะณัƒะปัั€ะฝะพะผัƒ ะฒั‹ั€ะฐะถะตะฝะธัŽ
ga_nov[, .SD, .SDcols = patterns("s$")]

ืžืฉืชื ื” .SD ืžืืคืฉืจ ืœืš ืœื’ืฉืช ืœื›ืœ ื”ืขืžื•ื“ื•ืช, ื• .SDcols ืœืกื ืŸ ืืช ื”ืขืžื•ื“ื•ืช ื”ื ื“ืจืฉื•ืช ื‘ืืžืฆืขื•ืช ื‘ื™ื˜ื•ื™ื™ื ืจื’ื•ืœืจื™ื™ื, ืื• ืคื•ื ืงืฆื™ื•ืช ืื—ืจื•ืช ื›ื“ื™ ืœืกื ืŸ ืืช ืฉืžื•ืช ื”ืขืžื•ื“ื•ืช ื”ื“ืจื•ืฉื™ื ืœืš.

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘ืคื™ื™ืชื•ืŸ, ืคื ื“ื•ืช

ื›ื“ื™ ืœื‘ื—ื•ืจ ืขืžื•ื“ื•ืช ืœืคื™ ืฉื ื‘ pandas ืžืกืคื™ืง ืœืกืคืง ืจืฉื™ืžื” ืฉืœ ืฉืžื•ืชื™ื”ื. ื•ื›ื“ื™ ืœื‘ื—ื•ืจ ืื• ืœื ืœื›ืœื•ืœ ืขืžื•ื“ื•ืช ืœืคื™ ืฉื ื‘ืืžืฆืขื•ืช ื‘ื™ื˜ื•ื™ื™ื ืจื’ื•ืœืจื™ื™ื, ืขืœื™ืš ืœื”ืฉืชืžืฉ ื‘ืคื•ื ืงืฆื™ื•ืช drop() ะธ filter(), ื•ื˜ื™ืขื•ืŸ ืฆื™ืจ=1, ืฉื‘ืืžืฆืขื•ืชื• ืืชื” ืžืฆื™ื™ืŸ ืฉื™ืฉ ืฆื•ืจืš ืœืขื‘ื“ ืขืžื•ื“ื•ืช ื•ืœื ืฉื•ืจื•ืช.

ื›ื“ื™ ืœื‘ื—ื•ืจ ืฉื“ื” ืœืคื™ ืกื•ื’ ื ืชื•ื ื™ื, ื”ืฉืชืžืฉ ื‘ืคื•ื ืงืฆื™ื” select_dtypes(), ื•ืœืชื•ืš ื•ื™ื›ื•ื—ื™ื ืœื›ืœื•ืœ ืื• ืœื”ื•ืฆื™ื ืœื”ืขื‘ื™ืจ ืจืฉื™ืžื” ืฉืœ ืกื•ื’ื™ ื ืชื•ื ื™ื ื”ืชื•ืืžื™ื ืœืื™ืœื• ืฉื“ื•ืช ืขืœื™ืš ืœื‘ื—ื•ืจ.

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

# ะ’ั‹ะฑะพั€ ะฟะพะปะตะน ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ
ga_nov[['date', 'source', 'sessions']]
# ะ˜ัะบะปัŽั‡ะธั‚ัŒ ะฟะพ ะฝะฐะทะฒะฐะฝะธัŽ
ga_nov.drop(['medium', 'bounces'], axis=1)
# ะ’ั‹ะฑั€ะฐั‚ัŒ ะฟะพ ั€ะตะณัƒะปัั€ะฝะพะผัƒ ะฒั‹ั€ะฐะถะตะฝะธัŽ
ga_nov.filter(regex="s$", axis=1)
# ะ’ั‹ะฑั€ะฐั‚ัŒ ั‡ะธัะปะพะฒั‹ะต ะฟะพะปั
ga_nov.select_dtypes(include=['number'])
# ะ’ั‹ะฑั€ะฐั‚ัŒ ั‚ะตะบัั‚ะพะฒั‹ะต ะฟะพะปั
ga_nov.select_dtypes(include=['object'])

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช

ืœื“ื•ื’ืžื”, ื˜ื‘ืœืช ื”ืžืงื•ืจ ืขืฉื•ื™ื” ืœื”ื›ื™ืœ ืžืกืคืจ ืฉื ื™ื ืฉืœ ื ืชื•ื ื™ื, ืืš ืขืœื™ืš ืœื ืชื— ืจืง ืืช ื”ื—ื•ื“ืฉ ื”ืื—ืจื•ืŸ. ืฉื•ื‘, ืงื•ื•ื™ื ื ื•ืกืคื™ื ื™ืื˜ื• ืืช ืชื”ืœื™ืš ืขื™ื‘ื•ื“ ื”ื ืชื•ื ื™ื ื•ื™ืกืชืžื• ืืช ื–ื™ื›ืจื•ืŸ ื”ืžื—ืฉื‘.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช ื‘-R: tydyverse, dplyr

ะ’ dplyr ื”ืคื•ื ืงืฆื™ื” ืžืฉืžืฉืช ืœืกื™ื ื•ืŸ ืฉื•ืจื•ืช filter(). ื–ื” ืœื•ืงื— ืžืกื’ืจืช ื ืชื•ื ื™ื ื‘ืชื•ืจ ื”ืืจื’ื•ืžื ื˜ ื”ืจืืฉื•ืŸ, ื•ืื– ืืชื” ืžืคืจื˜ ืืช ืชื ืื™ ื”ืกื™ื ื•ืŸ.

ื‘ืขืช ื›ืชื™ื‘ืช ื‘ื™ื˜ื•ื™ื™ื ืœื•ื’ื™ื™ื ืœืกื™ื ื•ืŸ ื˜ื‘ืœื”, ื‘ืžืงืจื” ื–ื”, ืฆื™ื™ืŸ ืืช ืฉืžื•ืช ื”ืขืžื•ื“ื•ืช ืœืœื ืžืจื›ืื•ืช ื•ืžื‘ืœื™ ืœื”ื›ืจื™ื– ืขืœ ืฉื ื”ื˜ื‘ืœื”.

ื‘ืขืช ืฉื™ืžื•ืฉ ื‘ืžืกืคืจ ื‘ื™ื˜ื•ื™ื™ื ืœื•ื’ื™ื™ื ืœืกื™ื ื•ืŸ, ื”ืฉืชืžืฉ ื‘ืื•ืคืจื˜ื•ืจื™ื ื”ื‘ืื™ื:

  • & ืื• ืคืกื™ืง - ื”ื’ื™ื•ื ื™ AND
  • | - OR ื”ื’ื™ื•ื ื™

ืžืกื ืŸ ืฉื•ืจื•ืช ื‘-R: dplyr

# ั„ะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ
## dplyr
### ั„ะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ ะฟะพ ะพะดะฝะพะผัƒ ัƒัะปะพะฒะธัŽ
filter(ga_nov, source == "google")
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธ
filter(ga_nov, source == "google" & sessions >= 10)
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธะปะธ
filter(ga_nov, source == "google" | sessions >= 10)

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช ื‘-R: data.table

ื›ืคื™ ืฉื›ื‘ืจ ื›ืชื‘ืชื™ ืœืžืขืœื”, ื‘ data.table ืชื—ื‘ื™ืจ ื”ืžืจืช ื ืชื•ื ื™ื ืžื•ืงืฃ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื.

DT[i,j,by]

ืื™ืคื”:
ืื ื™ - ืื™ืคื”, ื›ืœื•ืžืจ. ืกื™ื ื•ืŸ ืœืคื™ ืฉื•ืจื•ืช
j - ื‘ื—ืจ|ืขื“ื›ืŸ|ืขืฉื”, ื›ืœื•ืžืจ. ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช ื•ื”ืžืจืชืŸ
ืœืคื™ - ืงื™ื‘ื•ืฅ ื ืชื•ื ื™ื

ื”ืืจื’ื•ืžื ื˜ ืžืฉืžืฉ ืœืกื™ื ื•ืŸ ืฉื•ืจื•ืช i, ื‘ืขืœ ื”ืžื™ืงื•ื ื”ืจืืฉื•ืŸ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื.

ื”ื’ื™ืฉื” ืœืขืžื•ื“ื•ืช ื”ื™ื ื‘ื‘ื™ื˜ื•ื™ื™ื ืœื•ื’ื™ื™ื ืœืœื ืžืจื›ืื•ืช ื•ืœืœื ืฆื™ื•ืŸ ืฉื ื”ื˜ื‘ืœื”.

ื‘ื™ื˜ื•ื™ื™ื ืœื•ื’ื™ื™ื ืงืฉื•ืจื™ื ื–ื” ืœื–ื” ื‘ืื•ืชื• ืื•ืคืŸ ื›ืžื• ื‘ dplyr ื‘ืืžืฆืขื•ืช ื”ืžืคืขื™ืœื™ื & ื•-|.

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช ื‘-R: data.table

## data.table
### ั„ะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ ะฟะพ ะพะดะฝะพะผัƒ ัƒัะปะพะฒะธัŽ
ga_nov[source == "google"]
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธ
ga_nov[source == "google" & sessions >= 10]
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธะปะธ
ga_nov[source == "google" | sessions >= 10]

ืžื™ืชืจื™ ืกื™ื ื•ืŸ ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ืกื ืŸ ืœืคื™ ืฉื•ืจื•ืช ื‘ pandas ื“ื•ืžื” ืœืกื™ื ื•ืŸ ืคื ื™ืžื” data.table, ื•ื”ื•ื ื ืขืฉื” ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื.

ื‘ืžืงืจื” ื–ื”, ื”ื’ื™ืฉื” ืœืขืžื•ื“ื•ืช ืžืชื‘ืฆืขืช ื‘ื”ื›ืจื— ืขืœ ื™ื“ื™ ืฆื™ื•ืŸ ืฉื ืžืกื’ืจืช ื”ื ืชื•ื ื™ื; ืื– ื ื™ืชืŸ ืœืฆื™ื™ืŸ ืืช ืฉื ื”ืขืžื•ื“ื” ื’ื ื‘ืžืจื›ืื•ืช ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื (ื“ื•ื’ืžื” df['col_name']), ืื• ืœืœื ืžืจื›ืื•ืช ืœืื—ืจ ื”ืชืงื•ืคื” (ื“ื•ื’ืžื” df.col_name).

ืื ืืชื” ืฆืจื™ืš ืœืกื ืŸ ืžืกื’ืจืช ื ืชื•ื ื™ื ืœืคื™ ืžืกืคืจ ืชื ืื™ื, ื›ืœ ืชื ืื™ ื—ื™ื™ื‘ ืœื”ื™ื•ืช ืžืžื•ืงื ื‘ืกื•ื’ืจื™ื™ื. ืชื ืื™ื ืœื•ื’ื™ื™ื ืงืฉื•ืจื™ื ื–ื” ืœื–ื” ืขืœ ื™ื“ื™ ืžืคืขื™ืœื™ื & ะธ |.

ืžื™ืชืจื™ ืกื™ื ื•ืŸ ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

# ะคะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ ั‚ะฐะฑะปะธั†ั‹
### ั„ะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ ะฟะพ ะพะดะฝะพะผัƒ ัƒัะปะพะฒะธัŽ
ga_nov[ ga_nov['source'] == "google" ]
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธ
ga_nov[(ga_nov['source'] == "google") & (ga_nov['sessions'] >= 10)]
### ั„ะธะปัŒั‚ั€ ะฟะพ ะดะฒัƒะผ ัƒัะปะพะฒะธัะผ ัะพะตะดะธะฝั‘ะฝะฝั‹ะผ ะปะพะณะธั‡ะตัะบะธะผ ะธะปะธ
ga_nov[(ga_nov['source'] == "google") | (ga_nov['sessions'] >= 10)]

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจื” ืฉืœ ื ืชื•ื ื™ื

ืื—ืช ื”ืคืขื•ืœื•ืช ื”ื ืคื•ืฆื•ืช ื‘ื™ื•ืชืจ ื‘ื ื™ืชื•ื— ื ืชื•ื ื™ื ื”ื™ื ืงื™ื‘ื•ืฅ ื•ืื’ืจื’ืฆื™ื”.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื”ืชื—ื‘ื™ืจ ืœื‘ื™ืฆื•ืข ืคืขื•ืœื•ืช ืืœื• ืžืคื•ื–ืจ ืขืœ ืคื ื™ ื›ืœ ื”ื—ื‘ื™ืœื•ืช ืฉืื ื• ื‘ื•ื“ืงื™ื.

ื‘ืžืงืจื” ื–ื”, ื ื™ืงื— ืžืกื’ืจืช ื ืชื•ื ื™ื ื›ื“ื•ื’ืžื” ื›ื‘ื™ืจ, ื•ื—ืฉื‘ ืืช ื”ืžืกืคืจ ื•ื”ืขืœื•ืช ื”ืžืžื•ืฆืขืช ืฉืœ ื›ืจื˜ื™ืกื™ื ื‘ื”ืชืื ืœืžื—ืœืงืช ื”ื‘ืงืชื”.

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจืช ื ืชื•ื ื™ื ื‘-R: tidyverse, dplyr

ะ’ dplyr ื”ืคื•ื ืงืฆื™ื” ืžืฉืžืฉืช ืœืงื™ื‘ื•ืฅ group_by(), ื•ืœืฆื‘ื™ืจื” summarise(). ืœืžืขืฉื”, dplyr ื™ืฉ ืžืฉืคื—ื” ืฉืœืžื” ืฉืœ ืคื•ื ืงืฆื™ื•ืช summarise_*(), ืื‘ืœ ืžื˜ืจืช ื”ืžืืžืจ ื”ื–ื” ื”ื™ื ืœื”ืฉื•ื•ืช ืืช ื”ืชื—ื‘ื™ืจ ื”ื‘ืกื™ืกื™, ืื– ืœื ื ื™ื›ื ืก ืœื’'ื•ื ื’ืœ ื›ื–ื”.

ืคื•ื ืงืฆื™ื•ืช ืฆื‘ื™ืจื” ื‘ืกื™ืกื™ื•ืช:

  • sum() - ืกื™ื›ื•ื
  • min() / max() - ืขืจืš ืžื™ื ื™ืžื•ื ื•ืžืงืกื™ืžื•ื
  • mean() - ืžืžื•ืฆืข
  • median() - ื—ืฆื™ื•ืŸ
  • length() - ื›ืžื•ืช

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจื” ื‘ืจ': ื“ืคืœื™ืจ

## dplyr
### ะณั€ัƒะฟะฟะธั€ะพะฒะบะฐ ะธ ะฐะณั€ะตะณะฐั†ะธั ัั‚ั€ะพะบ
group_by(titanic, Pclass) %>%
  summarise(passangers = length(PassengerId),
            avg_price  = mean(Fare))

ืœืชืคืงื“ group_by() ื”ืขื‘ืจื ื• ืืช ื”ื˜ื‘ืœื” ื›ื˜ื™ืขื•ืŸ ื”ืจืืฉื•ืŸ ื›ื‘ื™ืจ, ื•ืœืื—ืจ ืžื›ืŸ ืฆื™ื™ืŸ ืืช ื”ืฉื“ื” Pclass, ืฉืœืคื™ื• ื ืงื‘ืฅ ืืช ื”ืฉื•ืœื—ืŸ ืฉืœื ื•. ื”ืชื•ืฆืื” ืฉืœ ืคืขื•ืœื” ื–ื• ื‘ืืžืฆืขื•ืช ื”ืื•ืคืจื˜ื•ืจ %>% ื”ื•ืขื‘ืจ ื›ืืจื’ื•ืžื ื˜ ื”ืจืืฉื•ืŸ ืœืคื•ื ืงืฆื™ื” summarise(), ื•ื”ื•ืกื™ืคื• ืขื•ื“ 2 ืฉื“ื•ืช: ื ื•ืกืขื™ื ะธ avg_price. ื‘ืจืืฉื•ืŸ, ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” length() ื—ื™ืฉื‘ ืืช ืžืกืคืจ ื”ื›ืจื˜ื™ืกื™ื, ื•ื‘ืฉื ื™ ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” mean() ืงื™ื‘ืœ ืืช ืžื—ื™ืจ ื”ื›ืจื˜ื™ืก ื”ืžืžื•ืฆืข.

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจืช ื ืชื•ื ื™ื ื‘-R: data.table

ะ’ data.table ื”ืืจื’ื•ืžื ื˜ ืžืฉืžืฉ ืœืฆื‘ื™ืจื” j ืฉื™ืฉ ืœื• ืžื™ืงื•ื ืฉื ื™ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื, ื•ืœืงื™ื‘ื•ืฅ by ืื• keyby, ืฉื™ืฉ ืœื”ื ืืช ื”ืžื™ืงื•ื ื”ืฉืœื™ืฉื™.

ืจืฉื™ืžืช ืคื•ื ืงืฆื™ื•ืช ื”ืฆื‘ื™ืจื” ื‘ืžืงืจื” ื–ื” ื–ื”ื” ืœื–ื• ื”ืžืชื•ืืจืช ื‘ dplyr, ื›ื™ ืืœื• ื”ืŸ ืคื•ื ืงืฆื™ื•ืช ืžืชื—ื‘ื™ืจ R ื”ื‘ืกื™ืกื™.

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจื” ื‘-R: data.table

## data.table
### ั„ะธะปัŒั‚ั€ะฐั†ะธั ัั‚ั€ะพะบ ะฟะพ ะพะดะฝะพะผัƒ ัƒัะปะพะฒะธัŽ
titanic[, .(passangers = length(PassengerId),
            avg_price  = mean(Fare)),
        by = Pclass]

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจืช ื ืชื•ื ื™ื ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ืžืงื‘ืฅ ืคื ื™ืžื” pandas ื“ื•ืžื” ืœ dplyr, ืื‘ืœ ื”ืฆื‘ื™ืจื” ืœื ื“ื•ืžื” ืœ dplyr ืœื ืขืœ data.table.

ื›ื“ื™ ืœืงื‘ืฅ, ื”ืฉืชืžืฉ ื‘ืฉื™ื˜ื” groupby(), ืฉืืœื™ื• ืืชื” ืฆืจื™ืš ืœื”ืขื‘ื™ืจ ืจืฉื™ืžื” ืฉืœ ืขืžื•ื“ื•ืช ืฉืœืคื™ื”ืŸ ืชืงื•ื‘ืฅ ืžืกื’ืจืช ื”ื ืชื•ื ื™ื.

ืœืฆื‘ื™ืจื” ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืฉื™ื˜ื” agg()ืฉืžืงื‘ืœ ืžื™ืœื•ืŸ. ืžืคืชื—ื•ืช ื”ืžื™ืœื•ืŸ ื”ื ื”ืขืžื•ื“ื•ืช ืฉืขืœื™ื”ืŸ ืชื—ื™ืœ ืืช ืคื•ื ืงืฆื™ื•ืช ื”ืฆื‘ื™ืจื”, ื•ื”ืขืจื›ื™ื ื”ื ืฉืžื•ืช ืคื•ื ืงืฆื™ื•ืช ื”ืฆื‘ื™ืจื”.

ืคื•ื ืงืฆื™ื•ืช ืฆื‘ื™ืจื”:

  • sum() - ืกื™ื›ื•ื
  • min() / max() - ืขืจืš ืžื™ื ื™ืžื•ื ื•ืžืงืกื™ืžื•ื
  • mean() - ืžืžื•ืฆืข
  • median() - ื—ืฆื™ื•ืŸ
  • count() - ื›ืžื•ืช

ืคื•ื ืงืฆื™ื” reset_index() ื‘ื“ื•ื’ืžื” ืœืžื˜ื” ื”ื•ื ืžืฉืžืฉ ืœืื™ืคื•ืก ืื™ื ื“ืงืกื™ื ืžืงื•ื ื ื™ื ืฉ pandas ื‘ืจื™ืจืช ื”ืžื—ื“ืœ ื”ื™ื ืœืื—ืจ ืฆื‘ื™ืจืช ื ืชื•ื ื™ื.

ืกืžืœ ืžืืคืฉืจ ืœืš ืœืขื‘ื•ืจ ืœืฉื•ืจื” ื”ื‘ืื”.

ืงื™ื‘ื•ืฅ ื•ืฆื‘ื™ืจื” ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

# ะณั€ัƒะฟะฟะธั€ะพะฒะบะฐ ะธ ะฐะณั€ะตะณะฐั†ะธั ะดะฐะฝะฝั‹ั…
titanic.groupby(["Pclass"]).
    agg({'PassengerId': 'count', 'Fare': 'mean'}).
        reset_index()

ืฆื™ืจื•ืฃ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช

ืคืขื•ืœื” ืฉื‘ื” ืืชื” ืžืฆื˜ืจืฃ ืœืฉืชื™ ื˜ื‘ืœืื•ืช ืื• ื™ื•ืชืจ ืžืื•ืชื• ืžื‘ื ื”. ื”ื ืชื•ื ื™ื ืฉื”ืขืœื™ื ื• ืžื›ื™ืœื™ื ื˜ื‘ืœืื•ืช ga_nov ะธ ga_dec. ื˜ื‘ืœืื•ืช ืืœื• ื–ื”ื•ืช ื‘ืžื‘ื ื”, ื›ืœื•ืžืจ. ื™ืฉ ืืช ืื•ืชืŸ ืขืžื•ื“ื•ืช, ื•ืืช ืกื•ื’ื™ ื”ื ืชื•ื ื™ื ื‘ืขืžื•ื“ื•ืช ืืœื•.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื–ื•ื”ื™ ื”ืขืœืื” ืžื’ื•ื’ืœ ืื ืœื™ื˜ื™ืงืก ืœื—ื•ื“ืฉื™ื ื ื•ื‘ืžื‘ืจ ื•ื“ืฆืžื‘ืจ, ื‘ื—ืœืง ื–ื” ื ืฉืœื‘ ืืช ื”ื ืชื•ื ื™ื ื”ืœืœื• ืœื˜ื‘ืœื” ืื—ืช.

ื—ื™ื‘ื•ืจ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช ื‘-R: tidyverse, dplyr

ะ’ dplyr ื ื™ืชืŸ ืœืฉืœื‘ 2 ื˜ื‘ืœืื•ืช ืœืื—ืช ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” bind_rows(), ื”ืขื‘ืจืช ื˜ื‘ืœืื•ืช ื›ื˜ื™ืขื•ื ื™ื ืฉืœื”.

ืžืกื ืŸ ืฉื•ืจื•ืช ื‘-R: dplyr

# ะ’ะตั€ั‚ะธะบะฐะปัŒะฝะพะต ะพะฑัŠะตะดะธะฝะตะฝะธะต ั‚ะฐะฑะปะธั†
## dplyr
bind_rows(ga_nov, ga_dec)

ืฆื™ืจื•ืฃ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช ื‘-R: data.table

ื–ื” ื’ื ืœื ืžืกื•ื‘ืš, ื‘ื•ืื• ื ืฉืชืžืฉ rbind().

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช ื‘-R: data.table

## data.table
rbind(ga_nov, ga_dec)

ื—ื™ื‘ื•ืจ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ะ’ pandas ื”ืคื•ื ืงืฆื™ื” ืžืฉืžืฉืช ืœืฆื™ืจื•ืฃ ื˜ื‘ืœืื•ืช concat(), ืฉืืœื™ื• ืืชื” ืฆืจื™ืš ืœื”ืขื‘ื™ืจ ืจืฉื™ืžื” ืฉืœ ืžืกื’ืจื•ืช ื›ื“ื™ ืœืฉืœื‘ ืื•ืชื.

ืžื™ืชืจื™ ืกื™ื ื•ืŸ ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

# ะฒะตั€ั‚ะธะบะฐะปัŒะฝะพะต ะพะฑัŠะตะดะธะฝะตะฝะธะต ั‚ะฐะฑะปะธั†
pd.concat([ga_nov, ga_dec])

ืฆื™ืจื•ืฃ ืื•ืคืงื™ ืฉืœ ืฉื•ืœื—ื ื•ืช

ืคืขื•ืœื” ืฉื‘ื” ืขืžื•ื“ื•ืช ืžื”ืฉื ื™ื™ื” ืžืชื•ื•ืกืคื•ืช ืœื˜ื‘ืœื” ื”ืจืืฉื•ื ื” ืขืœ ื™ื“ื™ ืžืคืชื—. ื”ื•ื ืžืฉืžืฉ ืœืขืชื™ื ืงืจื•ื‘ื•ืช ื›ืืฉืจ ืžืขืฉื™ืจื™ื ื˜ื‘ืœืช ืขื•ื‘ื“ื•ืช (ืœื“ื•ื’ืžื”, ื˜ื‘ืœื” ืขื ื ืชื•ื ื™ ืžื›ื™ืจื•ืช) ื‘ื›ืžื” ื ืชื•ื ื™ ื”ืชื™ื™ื—ืกื•ืช (ืœื“ื•ื’ืžื”, ืขืœื•ืช ืžื•ืฆืจ).

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื™ืฉื ื ืžืกืคืจ ืกื•ื’ื™ ื”ืฆื˜ืจืคื•ืช:

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื‘ื˜ื‘ืœื” ืฉื ื˜ืขื ื” ืงื•ื“ื ืœื›ืŸ ื›ื‘ื™ืจ ื™ืฉ ืœื ื• ืขืžื•ื“ื” ืžึดื™ืŸ, ื”ืžืชืื™ื ืœืงื•ื“ ื”ืžื’ื“ืจ ืฉืœ ื”ื ื•ืกืข:

1 - ื ืงื‘ื”
2 - ื–ื›ืจ

ื›ืžื• ื›ืŸ, ื™ืฆืจื ื• ื˜ื‘ืœื” - ืกืคืจ ืขื™ื•ืŸ ืžื™ืŸ. ืœื”ืฆื’ื” ื ื•ื—ื” ื™ื•ืชืจ ืฉืœ ื ืชื•ื ื™ื ืขืœ ืžื™ืŸ ื”ื ื•ืกืขื™ื, ืขืœื™ื ื• ืœื”ื•ืกื™ืฃ ืืช ืฉื ื”ืžื’ื“ืจ ืžื”ืกืคืจื™ื™ื” ืžื™ืŸ ืœืฉื•ืœื—ืŸ ื›ื‘ื™ืจ.

ื˜ื‘ืœื” ืื•ืคืงื™ืช ืžืฆื˜ืจืคืช ื‘-R: tidyverse, dplyr

ะ’ dplyr ื™ืฉ ืžืฉืคื—ื” ืฉืœืžื” ืฉืœ ืคื•ื ืงืฆื™ื•ืช ืœื”ืฆื˜ืจืคื•ืช ืื•ืคืงื™ืช:

  • inner_join()
  • left_join()
  • right_join()
  • full_join()
  • semi_join()
  • nest_join()
  • anti_join()

ื”ื ืคื•ืฅ ื‘ื™ื•ืชืจ ื‘ืชืจื’ื•ืœ ืฉืœื™ ื”ื•ื left_join().

ื‘ืชื•ืจ ืฉื ื™ ื”ืืจื’ื•ืžื ื˜ื™ื ื”ืจืืฉื•ื ื™ื, ืœืคื•ื ืงืฆื™ื•ืช ื”ืžืคื•ืจื˜ื•ืช ืœืžืขืœื” ื ื“ืจืฉื•ืช ืฉืชื™ ื˜ื‘ืœืื•ืช ื›ื“ื™ ืœื”ืฆื˜ืจืฃ, ื•ื›ืืจื’ื•ืžื ื˜ ื”ืฉืœื™ืฉื™ by ืขืœื™ืš ืœืฆื™ื™ืŸ ืืช ื”ืขืžื•ื“ื•ืช ืœื”ืฆื˜ืจืคื•ืช.

ืฉื•ืœื—ืŸ ืื•ืคืงื™ ืžืฆื˜ืจืคื™ื ื‘-R: dplyr

# ะพะฑัŠะตะดะธะฝัะตะผ ั‚ะฐะฑะปะธั†ั‹
left_join(titanic, gender,
          by = c("Sex" = "id"))

ืฆื™ืจื•ืฃ ืื•ืคืงื™ ืฉืœ ื˜ื‘ืœืื•ืช ื‘-R: data.table

ะ’ data.table ืขืœื™ืš ืœื”ืฆื˜ืจืฃ ืœื˜ื‘ืœืื•ืช ื‘ืืžืฆืขื•ืช ืžืงืฉ ื‘ืืžืฆืขื•ืช ื”ืคื•ื ืงืฆื™ื” merge().

ื”ืคื•ื ืงืฆื™ื” ืืจื’ื•ืžื ื˜ื™ื ืœืžื™ื–ื•ื’() ื‘-data.table

  • x, y - ื˜ื‘ืœืื•ืช ืœื”ืฆื˜ืจืคื•ืช
  • by โ€” ืขืžื•ื“ื” ืฉื”ื™ื ื”ืžืคืชื— ืœื”ืฆื˜ืจืคื•ืช ืื ื™ืฉ ืœื” ืื•ืชื• ืฉื ื‘ืฉืชื™ ื”ื˜ื‘ืœืื•ืช
  • by.x, by.y - ืฉืžื•ืช ืขืžื•ื“ื•ืช ืฉื™ืฉ ืœืžื–ื’, ืื ื™ืฉ ืœื”ืŸ ืฉืžื•ืช ืฉื•ื ื™ื ื‘ื˜ื‘ืœืื•ืช
  • all, all.x, all.y โ€” ืกื•ื’ ื”ื—ื™ื‘ื•ืจ, all ื™ื—ื–ื™ืจ ืืช ื›ืœ ื”ืฉื•ืจื•ืช ืžืฉืชื™ ื”ื˜ื‘ืœืื•ืช, all.x ืžืชืื™ื ืœืคืขื•ืœืช LEFT JOIN (ื™ืขื–ื•ื‘ ืืช ื›ืœ ื”ืฉื•ืจื•ืช ืฉืœ ื”ื˜ื‘ืœื” ื”ืจืืฉื•ื ื”), all.y โ€” ืžืชืื™ื ืœ- ืคืขื•ืœืช RIGHT JOIN (ืชืฉืื™ืจ ืืช ื›ืœ ื”ืฉื•ืจื•ืช ืฉืœ ื”ื˜ื‘ืœื” ื”ืฉื ื™ื™ื”).

ืฆื™ืจื•ืฃ ืื•ืคืงื™ ืฉืœ ื˜ื‘ืœืื•ืช ื‘-R: data.table

# ะพะฑัŠะตะดะธะฝัะตะผ ั‚ะฐะฑะปะธั†ั‹
merge(titanic, gender, by.x = "Sex", by.y = "id", all.x = T)

ื”ืฆื˜ืจืคื•ืช ืฉื•ืœื—ืŸ ืื•ืคืงื™ ืœืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ื›ืžื• ื’ื ื‘ data.tableื‘ืชื•ืš pandas ื”ืคื•ื ืงืฆื™ื” ืžืฉืžืฉืช ืœืฆื™ืจื•ืฃ ื˜ื‘ืœืื•ืช merge().

ื˜ื™ืขื•ื ื™ื ืฉืœ ืคื•ื ืงืฆื™ื™ืช ื”ืžื™ื–ื•ื’() ื‘ืคื ื“ื•ืช

  • ืื™ืš - ืกื•ื’ ื—ื™ื‘ื•ืจ: ืฉืžืืœ, ื™ืžื™ืŸ, ื—ื™ืฆื•ื ื™, ืคื ื™ืžื™
  • on - ืขืžื•ื“ื” ืฉื”ื™ื ืžืคืชื— ืื ื™ืฉ ืœื” ืื•ืชื• ืฉื ื‘ืฉืชื™ ื”ื˜ื‘ืœืื•ืช
  • left_on, right_on - ืฉืžื•ืช ืฉืœ ืขืžื•ื“ื•ืช ืžืคืชื—, ืื ื™ืฉ ืœื”ืŸ ืฉืžื•ืช ืฉื•ื ื™ื ื‘ื˜ื‘ืœืื•ืช

ื”ืฆื˜ืจืคื•ืช ืฉื•ืœื—ืŸ ืื•ืคืงื™ ืœืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

# ะพะฑัŠะตะดะธะฝัะตะผ ะฟะพ ะบะปัŽั‡ัƒ
titanic.merge(gender, how = "left", left_on = "Sex", right_on = "id")

ืคื•ื ืงืฆื™ื•ืช ื‘ืกื™ืกื™ื•ืช ืฉืœ ื—ืœื•ืŸ ื•ืขืžื•ื“ื•ืช ืžื—ื•ืฉื‘ื•ืช

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื“ื•ืžื•ืช ื‘ืžืฉืžืขื•ืชืŸ ืœืคื•ื ืงืฆื™ื•ืช ืฆื‘ื™ืจื”, ื•ื”ืŸ ืžืฉืžืฉื•ืช ืœืขืชื™ื ืงืจื•ื‘ื•ืช ื’ื ื‘ื ื™ืชื•ื— ื ืชื•ื ื™ื. ืื‘ืœ ื‘ื ื™ื’ื•ื“ ืœืคื•ื ืงืฆื™ื•ืช ืฆื‘ื™ืจื”, ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ืื™ื ืŸ ืžืฉื ื•ืช ืืช ืžืกืคืจ ื”ืฉื•ืจื•ืช ืฉืœ ืžืกื’ืจืช ื”ื ืชื•ื ื™ื ื”ื™ื•ืฆืืช.

ื‘ืื™ื–ื• ืฉืคื” ืœื‘ื—ื•ืจ ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื - R ืื• Python? ืฉื ื™ื”ื! ื”ื’ื™ืจื” ืžืคื ื“ื•ืช ืœ-tidyverse ื•-data.table ื•ื‘ื—ื–ืจื”

ื‘ืขื™ืงืจื• ืฉืœ ื“ื‘ืจ, ื‘ืืžืฆืขื•ืช ืคื•ื ืงืฆื™ื™ืช ื”ื—ืœื•ืŸ, ืื ื• ืžืคืฆืœื™ื ืืช ืžืกื’ืจืช ื”ื ืชื•ื ื™ื ื”ื ื›ื ืกืช ืœื—ืœืงื™ื ืœืคื™ ืงืจื™ื˜ืจื™ื•ืŸ ื›ืœืฉื”ื•, โ€‹โ€‹ื›ืœื•ืžืจ. ืœืคื™ ื”ืขืจืš ืฉืœ ืฉื“ื”, ืื• ืžืกืคืจ ืฉื“ื•ืช. ื•ืื ื—ื ื• ืžื‘ืฆืขื™ื ืคืขื•ืœื•ืช ืืจื™ืชืžื˜ื™ื•ืช ื‘ื›ืœ ื—ืœื•ืŸ. ื”ืชื•ืฆืื” ืฉืœ ืคืขื•ืœื•ืช ืืœื• ืชื•ื—ื–ืจ ื‘ื›ืœ ืฉื•ืจื”, ื›ืœื•ืžืจ. ืžื‘ืœื™ ืœืฉื ื•ืช ืืช ื”ืžืกืคืจ ื”ื›ื•ืœืœ ืฉืœ ืฉื•ืจื•ืช ื‘ื˜ื‘ืœื”.

ืœื“ื•ื’ืžื”, ื‘ื•ืื• ื ื™ืงื— ืืช ื”ื˜ื‘ืœื” ื›ื‘ื™ืจ. ืื ื• ื™ื›ื•ืœื™ื ืœื—ืฉื‘ ื›ืžื” ืื—ื•ื–ื™ื ืขืœื•ืช ื›ืœ ื›ืจื˜ื™ืก ื”ื™ื™ืชื” ื‘ืžืกื’ืจืช ืžื—ืœืงืช ื”ื‘ืงืชื” ืฉืœื•.

ืœืฉื ื›ืš, ืขืœื™ื ื• ืœืงื‘ืœ ื‘ื›ืœ ืงื• ืืช ื”ืขืœื•ืช ื”ื›ื•ืœืœืช ืฉืœ ื›ืจื˜ื™ืก ืœืžื—ืœืงืช ื”ืงื‘ื™ื ื” ื”ื ื•ื›ื—ื™ืช ืืœื™ื” ืฉื™ื™ืš ื”ื›ืจื˜ื™ืก ื‘ืงื• ื–ื”, ื•ืื– ืœื—ืœืง ืืช ื”ืขืœื•ืช ืฉืœ ื›ืœ ื›ืจื˜ื™ืก ื‘ืขืœื•ืช ื”ื›ื•ืœืœืช ืฉืœ ื›ืœ ื”ื›ืจื˜ื™ืกื™ื ืฉืœ ืื•ืชื” ืžื—ืœืงืช ืชื ื ื•ืกืขื™ื. .

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘-R: tidyverse, dplyr

ื›ื“ื™ ืœื”ื•ืกื™ืฃ ืขืžื•ื“ื•ืช ื—ื“ืฉื•ืช, ืžื‘ืœื™ ืœื”ืฉืชืžืฉ ื‘ืงื™ื‘ื•ืฅ ืฉื•ืจื•ืช, ื‘ dplyr ืžืฉืจืช ืคื•ื ืงืฆื™ื” mutate().

ืืชื” ื™ื›ื•ืœ ืœืคืชื•ืจ ืืช ื”ื‘ืขื™ื” ืฉืชื•ืืจื” ืœืขื™ืœ ืขืœ ื™ื“ื™ ืงื™ื‘ื•ืฅ ื ืชื•ื ื™ื ืœืคื™ ืฉื“ื” Pclass ื•ืžืกื›ื ืืช ื”ืฉื“ื” ื‘ื˜ื•ืจ ื—ื“ืฉ ื“ืžื™ ื ืกื™ืขื”. ืœืื—ืจ ืžื›ืŸ, ื‘ื˜ืœ ืืช ืงื™ื‘ื•ืฅ ื”ื˜ื‘ืœื” ื•ื—ืœืง ืืช ืขืจื›ื™ ื”ืฉื“ื•ืช ื“ืžื™ ื ืกื™ืขื” ืœืžื” ืฉืงืจื” ื‘ืฉืœื‘ ื”ืงื•ื“ื.

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘-R: dplyr

group_by(titanic, Pclass) %>%
  mutate(Pclass_cost = sum(Fare)) %>%
  ungroup() %>%
  mutate(ticket_fare_rate = Fare / Pclass_cost)

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘-R: data.table

ืืœื’ื•ืจื™ืชื ื”ืคืชืจื•ืŸ ื ืฉืืจ ื–ื”ื” ืœื–ื” ื‘ dplyr, ืขืœื™ื ื• ืœืคืฆืœ ืืช ื”ื˜ื‘ืœื” ืœื—ืœื•ื ื•ืช ืœืคื™ ืฉื“ื” Pclass. ืคืœื˜ ื‘ืขืžื•ื“ื” ื—ื“ืฉื” ืืช ื”ืกื›ื•ื ืขื‘ื•ืจ ื”ืงื‘ื•ืฆื” ื”ืžืชืื™ื ืœื›ืœ ืฉื•ืจื”, ื•ื”ื•ืกืฃ ืขืžื•ื“ื” ื‘ื” ืื ื• ืžื—ืฉื‘ื™ื ืืช ื—ืœืง ื”ืขืœื•ืช ืฉืœ ื›ืœ ื›ืจื˜ื™ืก ื‘ืงื‘ื•ืฆื” ืฉืœื•.

ื›ื“ื™ ืœื”ื•ืกื™ืฃ ืขืžื•ื“ื•ืช ื—ื“ืฉื•ืช data.table ืžืคืขื™ืœ ื ื•ื›ื— :=. ืœื”ืœืŸ ื“ื•ื’ืžื” ืœืคืชืจื•ืŸ ื‘ืขื™ื” ื‘ืืžืฆืขื•ืช ื”ื—ื‘ื™ืœื” data.table

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘-R: data.table

titanic[,c("Pclass_cost","ticket_fare_rate") := .(sum(Fare), Fare / Pclass_cost), 
        by = Pclass]

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

ื“ืจืš ืื—ืช ืœื”ื•ืกื™ืฃ ืœื” ืขืžื•ื“ื” ื—ื“ืฉื” pandas - ื”ืฉืชืžืฉ ื‘ืคื•ื ืงืฆื™ื” assign(). ืœืกื™ื›ื•ื ืขืœื•ืช ื”ื›ืจื˜ื™ืกื™ื ืœืคื™ ื›ื™ืชืช ื‘ืงืชื”, ืœืœื ืงื™ื‘ื•ืฅ ืฉื•ืจื•ืช, ื ืฉืชืžืฉ ื‘ืคื•ื ืงืฆื™ื” transform().

ืœื”ืœืŸ ื“ื•ื’ืžื” ืœืคืชืจื•ืŸ ื‘ื• ื ื•ืกื™ืฃ ืœื˜ื‘ืœื” ื›ื‘ื™ืจ ืื•ืชืŸ 2 ืขืžื•ื“ื•ืช.

ืคื•ื ืงืฆื™ื•ืช ื—ืœื•ืŸ ื‘ืคื™ื™ืชื•ืŸ: ืคื ื“ื•ืช

titanic.assign(Pclass_cost      =  titanic.groupby('Pclass').Fare.transform(sum),
               ticket_fare_rate = lambda x: x['Fare'] / x['Pclass_cost'])

ื˜ื‘ืœืช ื”ืชืืžืช ืคื•ื ืงืฆื™ื•ืช ื•ืฉื™ื˜ื•ืช

ืœื”ืœืŸ ื˜ื‘ืœืช ื”ืชืืžื” ื‘ื™ืŸ ืฉื™ื˜ื•ืช ืœื‘ื™ืฆื•ืข ืคืขื•ืœื•ืช ืฉื•ื ื•ืช ืขื ื ืชื•ื ื™ื ื‘ื—ื‘ื™ืœื•ืช ื‘ื”ืŸ ืฉืงืœื ื•.

ืชื™ืื•ืจ
ืžืกื•ื“ืจ
ื˜ื‘ืœืช ื ืชื•ื ื™ื
ื“ื•ื‘ื™ ืคื ื“ื”

ื˜ื•ืขืŸ ื ืชื•ื ื™ื
vroom()/ readr::read_csv() / readr::read_tsv()
fread()
read_csv()

ื™ืฆื™ืจืช ืžืกื’ืจื•ืช ื ืชื•ื ื™ื
tibble()
data.table()
dict() + from_dict()

ื‘ื—ื™ืจืช ืขืžื•ื“ื•ืช
select()
ื•ื™ื›ื•ื— j, ืžื™ืงื•ื ืฉื ื™ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื
ืื ื• ืžืขื‘ื™ืจื™ื ืืช ืจืฉื™ืžืช ื”ืขืžื•ื“ื•ืช ื”ื ื“ืจืฉื•ืช ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื / drop() / filter() / select_dtypes()

ืกื™ื ื•ืŸ ืฉื•ืจื•ืช
filter()
ื•ื™ื›ื•ื— i, ืžื™ืงื•ื ืจืืฉื•ืŸ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื
ืื ื• ืžืคืจื˜ื™ื ืืช ืชื ืื™ ื”ืกื™ื ื•ืŸ ื‘ืกื•ื’ืจื™ื™ื ืžืจื•ื‘ืขื™ื / filter()

ืงื™ื‘ื•ืฅ ื•ืื’ืจื’ืฆื™ื”
group_by() + summarise()
ืืจื’ื•ืžื ื˜ื™ื j + by
groupby() + agg()

ืื™ื—ื•ื“ ืื ื›ื™ ืฉืœ ื˜ื‘ืœืื•ืช (UNION)
bind_rows()
rbind()
concat()

ืฆื™ืจื•ืฃ ืื•ืคืงื™ ืฉืœ ื˜ื‘ืœืื•ืช (JOIN)
left_join() / *_join()
merge()
merge()

ืคื•ื ืงืฆื™ื•ืช ื‘ืกื™ืกื™ื•ืช ืฉืœ ื—ืœื•ืŸ ื•ื”ื•ืกืคืช ืขืžื•ื“ื•ืช ืžื—ื•ืฉื‘ื•ืช
group_by() + mutate()
ื•ื™ื›ื•ื— j ื‘ืืžืฆืขื•ืช ื”ืžืคืขื™ืœ := + ื˜ื™ืขื•ืŸ by
transform() + assign()

ืžืกืงื ื”

ืื•ืœื™ ื‘ืžืืžืจ ืชื™ืืจืชื™ ืœื ืืช ื”ืžื™ืžื•ืฉื™ื ื”ืื•ืคื˜ื™ืžืœื™ื™ื ื‘ื™ื•ืชืจ ืฉืœ ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื, ืื– ืื ื™ ืืฉืžื— ืื ืชืชืงืŸ ืืช ื”ื˜ืขื•ื™ื•ืช ืฉืœื™ ื‘ื”ืขืจื•ืช, ืื• ืคืฉื•ื˜ ืชืฉืœื™ื ืืช ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ื‘ืžืืžืจ ืขื ื˜ื›ื ื™ืงื•ืช ืื—ืจื•ืช ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื ื‘- R / Python.

ื›ืคื™ ืฉื›ืชื‘ืชื™ ืœืžืขืœื”, ืžื˜ืจืช ื”ืžืืžืจ ืœื ื”ื™ื™ืชื” ืœื›ืคื•ืช ืืช ื“ืขืชื• ืขืœ ืื™ื–ื• ืฉืคื” ืขื“ื™ืคื”, ืืœื ืœืคืฉื˜ ืืช ื”ื”ื–ื“ืžื ื•ืช ืœืœืžื•ื“ ืืช ืฉืชื™ ื”ืฉืคื•ืช, ืื•, ื‘ืžื™ื“ืช ื”ืฆื•ืจืš, ืœื ื“ื•ื“ ื‘ื™ื ื™ื”ืŸ.

ืื ืื”ื‘ืชื ืืช ื”ืžืืžืจ, ืืฉืžื— ืœืงื‘ืœ ืžื ื•ื™ื™ื ื—ื“ืฉื™ื ืืฆืœื™ YouTube ะธ ืžึดื‘ืจึธืง ืขืจื•ืฆื™ื.

ืจืื™ื•ืŸ

ื‘ืื™ืœื• ืžื”ื—ื‘ื™ืœื•ืช ื”ื‘ืื•ืช ืืชื” ืžืฉืชืžืฉ ื‘ืขื‘ื•ื“ื” ืฉืœืš?

ื‘ืชื’ื•ื‘ื•ืช ืชื•ื›ืœื• ืœื›ืชื•ื‘ ืืช ื”ืกื™ื‘ื” ืœื‘ื—ื™ืจืชื›ื.

ืจืง ืžืฉืชืžืฉื™ื ืจืฉื•ืžื™ื ื™ื›ื•ืœื™ื ืœื”ืฉืชืชืฃ ื‘ืกืงืจ. ืœื”ืชื—ื‘ืจื‘ื‘ืงืฉื”.

ื‘ืื™ื–ื• ื—ื‘ื™ืœืช ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ืืชื” ืžืฉืชืžืฉ (ืชื•ื›ืœ ืœื‘ื—ื•ืจ ืžืกืคืจ ืืคืฉืจื•ื™ื•ืช)

  • 45,2%ืžืกื•ื“ืจ19

  • 33,3%data.table14

  • 54,8%ืคื ื“ื•ืช23

42 ืžืฉืชืžืฉื™ื ื”ืฆื‘ื™ืขื•. 9 ืžืฉืชืžืฉื™ื ื ืžื ืขื•.

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”