Working with dates in the R language (basic features, as well as the lubridate and timeperiodsR packages)

Get the current date in any programming language, the operation is equivalent to "Hello world!". The R language is no exception.

In this article, we will understand how dates work in the basic syntax of the R language, and also consider several useful packages that extend its capabilities when working with dates:

  • lubridate - a package that allows you to perform arithmetic calculations between dates;
  • timeperiodsR — a package for working with time intervals and their components.

Working with dates in the R language (basic features, as well as the lubridate and timeperiodsR packages)

Content

If you are interested in data analysis, and in particular the R language, you may be interested in my telegram и youtube channels. Most of the content of which is devoted to the R language.

  1. Working with dates in basic R syntax
    1.1. Convert text to date
    1.2. Extracting date components in basic R
  2. Working with dates with the lubridate package
    2.1. Convert text to date with lubridate
    2.2. Extracting date components with the lubridate package
    2.3. Arithmetic operations with dates
  3. Simplified work with periods, package timeperiodsR
    3.1. Time intervals in timeperiodsR
    3.2. Filtering a vector of dates with timeperiodsR
  4. Conclusion

Working with dates in basic R syntax

Convert text to date

Basic R has a set of functions for working with dates. The disadvantage of the basic syntax is that the register of function names and arguments is very scattered, and has practically no logical connection. However, you need to know the basic functions of the language, so we will start with them.

Most often, when loading data into R, from csv files, or other sources, you get the date as text. In order to convert this text to the correct data type, use the function as.Date().

# создаём текстовый вектор с датами
my_dates <- c("2019-09-01", "2019-09-10", "2019-09-23")

# проверяем тип данных
class(my_dates)

#> [1] "character"

# преобразуем текст в дату
my_dates <- as.Date(my_dates)

# проверяем тип данных
class(my_dates)

#> [1] "Date"

By default as.Date() accepts a date in two formats: YYYY-MM-DD or YYYY/MM/DD.
If your dataset has dates in some other format, you can use the argument to convert format.

as.Date("September 26, 2019", format = "%B %d, %Y")

format accepts in string form operators denoting a time interval and its format, the most commonly used values ​​are shown in the table below:

Format
Description

%d
Day number in the month

%a
Abbreviation for the day of the week

%A
Full name of the day of the week

%w
Number of the day of the week (0-6, where 0 is Sunday)

%m
Two-digit month (01-12)

%b
Month name abbreviation (apr, mar, ...)

%B
Full month name

%y
Two-digit year

%Y
Four-digit year

%j
Number of the day in the year (001 - 366)

%U
Week number in the year (00 - 53), week start Sunday

%W
Week number in the year (00 - 53), week start Monday

Accordingly, "September 26, 2019" is the full name of the month, day and year. You can describe this date format with operators like this:"%B %d, %Y".

Where:

  • %B — Full name of the month
  • %d — Number of the day in the month
  • %Y - Four-digit year

When describing a date format, it's important to include all extra characters from your string, such as dashes, commas, periods, spaces, and so on. In my example, "September 26, 2019", there is a comma after the date, and a comma should also be put in the format description:"%B %d, %Y".

There are situations when you receive a date not only that does not correspond to standard formats (YYYY-MM-DD or YYYY/MM/DD), but also in a language that is different from the default on your operating system. For example, you uploaded data where the date is shown like this: "December 15, 2019". Before converting this string to a date, you need to change the locale.

# Меняем локаль
Sys.setlocale("LC_TIME", "Russian")
# Конвертируем строку в дату
as.Date("Декабрь 15, 2019 г.", format = "%B %d, %Y")

Extracting date components in basic R

In basic R, there are not many functions that allow you to extract any part of a date from a class object Date:.

current_date <- Sys.Date() # текущая дата
weekdays(current_date)     # получить номер дня недели
months(current_date)       # получить номер месяца в году
quarters(current_date)     # получить номер квартала в году

Beyond the main object class Date: in basic R, there are 2 more data types that store a timestamp: POSIXlt, POSIXct. The main difference between these classes Date: is that in addition to the date they store the time.

# получить текущую дату и время
current_time <- Sys.time()

# узнать класс объекта current_time 
class(current_time)

# "POSIXct" "POSIXt"

Function Sys.time() returns the current date and time in the format POSIXct. This format is similar in meaning to UNIXTIME, and stores the number of seconds since the start of the UNIX era (midnight (UTC) from December 31, 1969 to January 1, 1970).

Class POSIXlt also stores the time and date, and all their components. Therefore, it is an object with a more complex structure, but from which it is easy to get any component of the date and time. in fact POSIXlt it list.

# Получаем текущую дату и время
current_time_ct <- Sys.time()

# Преобразуем в формат POSIXlt
current_time_lt <- as.POSIXlt(current_time_ct)

# извлекаем компоненты даты и времени
current_time_lt$sec   # секунды
current_time_lt$min   # минуты
current_time_lt$hour  # часы
current_time_lt$mday  # день месяца
current_time_lt$mon   # месяц
current_time_lt$year  # год
current_time_lt$wday  # день недели
current_time_lt$yday  # день года
current_time_lt$zone  # часовой пояс

Converting numeric and text data to formats POSIX* carried out by the functions as.POSIXct() и as.POSIXlt(). These functions have a small set of arguments.

  • x — Number, string, or class object Date:to be converted;
  • tz - Time zone, default "GMT";
  • format - Description of the date format in which the data passed to the x argument is presented;
  • origin - Used only when converting a number to POSIX, this argument must be passed a date and time object from which seconds are counted. Typically used to translate from UNIXTIME.

If your date and time data is in UNIXTIME, then to convert them to a clear, readable date, use the following example:

# Конвертируем UNIXTIME в читаемую дату 
as.POSIXlt(1570084639,  origin = "1970-01-01")

In origin you can specify any timestamp. For example, if your date and time are specified in your data as the number of seconds since September 15, 2019 12:15 pm, then to convert them to a date, use:

# Конвертируем UNIXTIME в дату учитывая что начало отсчёта 15 сентября 2019 12:15
as.POSIXlt(1546123,  origin = "2019-09-15 12:15:00")

Working with dates with the lubridate package

lubridate perhaps the most popular package for working with dates in the R language. It provides you with an additional three classes.

  • durations - duration, i.e. the number of seconds between two timestamps;
  • periods - periods allow you to make calculations between dates in human-understandable intervals: days, months, weeks, and so on;
  • intervals are objects that provide a start and end point in time.

Installation of additional packages in the R language is carried out by a standard function install.packages().

Package installation lubridate:

install.packages("lubridate")

Convert text to date with lubridate

Package features lubridate greatly simplify the process of converting text into a date, and also allow you to perform any arithmetic operations with dates and times.

To get the current date, or date and time, the functions will help you today() и now().

today() # текущая дата
now()   # текущая дата и время

To convert string to date in lubridate there is a whole family of functions whose names always consist of three letters, and denote the sequence of date components:

  • y - year
  • m - month
  • d - day

List of functions to convert text to date via lubridate

  • ymd()
  • ydm()
  • mdy()
  • myd()
  • dmy()
  • dym()
  • yq()

Some examples for converting strings to dates:

ymd("2017 jan 21")
mdy("March 20th, 2019")
dmy("1st april of 2018")

As you see lubridate is much more efficient at recognizing date descriptions as text, and allows you to convert text to dates without using additional operators to describe the format.

Extracting date components with the lubridate package

Also using lubridate you can get any component from a date:

dt <- ymd("2017 jan 21")

year(dt)  # год
month(dt) # месяц
mday(dt)  # день в месяце
yday(dt)  # день в году
wday(dt)  # день недели

Arithmetic operations with dates

But, the most important and basic functionality lubridate is the ability to perform various arithmetic operations with dates.

Date rounding is performed by three functions:

  • floor_date - rounding to the nearest past tense
  • ceiling_date — rounding to the nearest future time
  • round_date - rounding up to nearest time

Each of these functions has an argument unit, which allows you to specify the rounding unit: second, minute, hour, day, week, month, bimonth, quarter, season, halfyear, year

dt <- ymd("2017 jan 21")

round_date(dt, unit = "month")    # округлить до месяца
round_date(dt, unit = "3 month")  # округлить до 3 месяцев
round_date(dt, unit = "quarter")  # округлить до квартала
round_date(dt, unit = "season")   # округлить до сезона
round_date(dt, unit = "halfyear") # округлить до полугодия

So let's figure out how to get a date that is 8 days after the current date and do various other arithmetic calculations between the two dates.

today() + days(8)   # какая дата будет через 8 дней
today() - months(2) # какая дата была 2 месяца назад
today() + weeks(12) # какая дата будет через 12 недель
today() - years(2)  # какая дата была 2 года назад

Simplified work with periods, package timeperiodsR.

timeperiodsR is a fresh package for working with dates that was published on CRAN in September 2019.

Package installation timeperiodsR:

install.packages("timeperiodsR")

The main purpose is to quickly determine a certain time interval relative to a given date. For example, using its functions, you can easily:

  • Get last week, month, quarter or year in R.
  • Get the specified number of timespans relative to a date, such as the past 4 weeks.
  • It is easy to extract its components from the obtained time interval: the start and end dates, the number of days included in the interval, the entire sequence of dates that are included in it.

Name of all package functions timeperiodsR intuitive, and consists of two parts: direction_interval, where:

  • direction in which it is necessary to move relative to the given date: last_n, previous, this, next, next_n.
  • temporal interval to calculate the period: day, week, month, quarter, year.

Full feature set:

  • last_n_days()
  • last_n_weeks()
  • last_n_months()
  • last_n_quarters()
  • last_n_years()
  • previous_week()
  • previous_month()
  • previous_quarter()
  • previous_year()
  • this_week()
  • this_month()
  • this_quarter()
  • this_year()
  • next_week()
  • next_month()
  • next_quarter()
  • next_year()
  • next_n_days()
  • next_n_weeks()
  • next_n_months()
  • next_n_quarters()
  • next_n_years()
  • custom_period()

Time intervals in timeperiodsR

These functions are useful when you need to build reports based on data from the past week or month. To get the last month, use the function of the same name previous_month():

prmonth <- previous_month()

After which you will have an object prmonth Class tpr, from which you can easily get the following components:

  • the start date of the period, in our example it is the last month
  • period end date
  • number of days included in the period
  • sequence of dates included in the period

And you can get each of the components in different ways:

# первый день периода
prmonth$start
start(prmonth)

# последний день периода
prmonth$end
end(prmonth)

# последовательность дат
prmonth$sequence
seq(prmonth)

# количество дней входящих в период
prmonth$length
length(prmonth)

You can also get any of the components using the argument Part, which is present in each of the package's functions. Possible values: start, end, sequence, length.

previous_month(part = "start")    # начало периода
previous_month(part = "end")      # конец периода
previous_month(part = "sequence") # последовательность дат
previous_month(part = "length")   # количество дней в периоде

So, let's look at all the arguments available in the package functions timeperiodsR:

  • x - Reference date from which the time period will be calculated, by default the current date;
  • n - The number of intervals that will be included in the period, for example 3 previous weeks;
  • part — What component of the object tpr you need to get the default all;
  • week_start - The argument is present only in functions for working with weeks, and allows you to set the number of the day of the week that will be considered its beginning, by default the start of the week is Monday, but you can set any from 1 - Monday to 7 - Sunday.

Thus, you can calculate any time period relative to the current or any other given date, here are a few more examples:

# получить 3 прошлые недели
# от 6 октября 2019 года
# начало недели - понедельник
last_n_weeks(x = "2019-10-06", 
             n = 3, 
             week_start = 1)

 Time period: from  9 September of 2019, Monday to 29 September of 2019, Sunday

October 6 is Sunday:
Working with dates in the R language (basic features, as well as the lubridate and timeperiodsR packages)

We need a period that, relative to October 6, will take 3 previous weeks. Not including the week that includes October 6 itself. Accordingly, this is the period from 9 to 29 September.

Working with dates in the R language (basic features, as well as the lubridate and timeperiodsR packages)

# получить месяц отстающий на 4 месяца
# от 16 сентября 2019 года
previous_month(x = "2019-09-16", n = 4)

 Time period: from  1 May of 2019, Wednesday to 31 May of 2019, Friday

In this example, we are interested in a month that was 4 months ago, based on September 16, 2019, respectively, it was May 2019.

Filtering a vector of dates with timeperiodsR

To filter dates in timeperiodsR there are several operators:

  • %left_out% - compares two objects of class tpr, and returns the value from the left that is missing in the right.
  • %left_in% - compares two objects of the tpr class, and returns the dates from the left object that are included in the right one.
  • %right_out% - compares two objects of class tpr, and returns the value from the right that is missing from the left.
  • %right_in% - compares two objects of class tpr, and returns the dates from the right object that are present in the left one.

period1 <- this_month("2019-11-07")
period2 <- previous_week("2019-11-07")

period1 %left_in% period2   # получить даты из period1 которые входят в period2
period1 %left_out% period2  # получить даты из period1 которые не входят в period2
period1 %right_in% period2  # получить даты из period2 которые входят в period1
period1 %right_out% period2 # получить даты из period2 которые не входят в period1

Do package timeperiodsR there is an official, Russian-speaking YouTube playlist.

Conclusion

We have examined in detail the object classes that are designed in the R language to work with dates. Also, now you can perform arithmetic operations on dates, and quickly get any time periods using the package timeperiodsR.

If you are interested in the R language, I invite you to subscribe to my telegram channel R4marketing, in which I share useful material on a daily basis about using the R language in solving my everyday problems.

Source: habr.com

Add a comment