We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

We at 1C widely use our own developments to organize the work of the company. In particular, "1C: Document Flow 8". In addition to document management (as the name suggests), it is also a modern ECM- a system (Enterprise Content Management - corporate content management) with a wide range of functionality - mail, work calendars of employees, organization of sharing access to resources (for example, booking meeting rooms), time tracking, a corporate forum and much more.

In the company 1C, more than a thousand employees use document management. The database is already massive (11 billion records), which means it needs more maintenance and more powerful hardware.

How our system works, what difficulties we face when maintaining the database and how we solve them (we use MS SQL Server as a DBMS) - we will tell in the article.

For those who read about 1C products for the first time.
1C:Document Management is an application solution (configuration) implemented on the basis of a framework for developing business applications - the 1C:Enterprise platform.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company


"1C: Document Management 8" (abbreviated - DO) allows you to automate the work with documents in the enterprise. One of the main communication tools for employees is email. In addition to mail, DO also solves other tasks:

  • Accounting for working hours
  • Accounting for employee absences
  • Applications for couriers/transport
  • Employee work calendars
  • Registration of correspondence
  • Employee contacts (Address book)
  • Corporate forum
  • Room reservation
  • Event Planning
  • CRM
  • Collective work with files (with saving file versions)
  • and more

We go to Document Management thin client (native executable application) from Windows, Linux, macOS, web client (from browsers) and mobile client - depending on the situation.

And also thanks to our other product connected to Document Management - Interaction system - we get the functionality of the messenger directly in the Workflow - chats, audio and video calls (including group calls, which has become especially important now, including from a mobile client), quick file exchange plus the ability to write chat bots that simplify working with the system. Another plus from using the Interaction System (compared to other messengers) is the ability to conduct contextual discussions tied to specific Workflow objects - documents, events, etc. That is, the Interaction System is deeply integrated with the target application, and does not just act as a β€œseparate button”.

The number of letters in our DO has already exceeded 100 million, and in general in the DBMS - more than 11 billion records. In total, the system uses almost 30 TB of storage: the database size is 7,5 TB, the files for teamwork are separate and occupy another 21 TB.

If we talk about more specific numbers, then here is the number of letters and files at the moment:

  • Outgoing emails – 14,7 million
  • Incoming emails – 85,4 million
  • File versions - 70,8 million.
  • Internal documents - 30,6 thousand.

DO has more than just mail and files. Below are the numbers of other accounting objects:

  • Reservation of meeting rooms - 52 126
  • Weekly reports - 153
  • Daily reports - 628 153
  • Approval visas - 11 821
  • Incoming documents - 79 677
  • Outgoing documents - 28 357
  • Entries about events in users' work calendars - 168
  • Applications for couriers - 21 883
  • Counterparties - 81 029
  • Records of work with counterparties - 45
  • Contact persons of counterparties - 41 795
  • Events – 10
  • Projects - 6 320
  • Employee tasks – 245
  • Forum posts – 26
  • Chat messages - 891 095
  • Business processes - 109. Interaction between employees occurs through processes - approval, execution, review, registration, signing, etc. We measure the duration of processes, the number of cycles, the number of participants, the number of returns, the number of requests to change deadlines. And this information is very useful to analyze in order to understand what processes are taking place in the enterprise and improve the efficiency of teamwork.

On what equipment do we process all this?

These figures indicate an impressive volume of tasks, so we faced the need to allocate quite productive equipment for the needs of the internal DO. To date, its characteristics are as follows: 38 cores, 240 GB of RAM, 26 TB of disks. Here is the server table:
We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

In the future, we plan to increase the capacity of the equipment.

How are the servers loading?

Network activity has never been a problem for us or for our customers. As a rule, the weak point is the processor and disks, because everyone already knows how to deal with a lack of memory. Here are screenshots of our servers from Resource Monitor, which show that we do not have any terrible load, it is very modest.

For example, in the screenshot below, we see a SQL server where the CPU is 23% loaded. And this is a very good indicator (for comparison: if the load approaches 70%, then, most likely, employees will experience quite significant slowdowns).

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

The second screenshot shows the application server running the 1C:Enterprise platform - it serves only user sessions. Here the processor load is slightly higher - 38%, it is smooth and calm. There is disk loading, but it is acceptable.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

The third screenshot shows another 1C:Enterprise server (it is the second one, we have two of them in the cluster). Only the previous one serves users, and robots work on this one. For example, they receive mail, route documents, perform data exchange, count rights, etc. All of these background activities run about 90-100 background jobs. And this server is loaded very heavily - by 88%. But this does not affect people, and it implements just all the automation that Document Management should do.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

What are the metrics to measure performance?

We have a serious subsystem for measuring performance indicators and calculating various metrics in our DO. This is necessary in order to understand both at the current time and in the historical perspective what is happening in the system, what is getting worse, what is getting better. Monitoring tools - metrics and time measurements - are included in the standard delivery of "1C: Document Management 8". Metrics require customization for implementation, but the mechanism itself is typical.

Metrics are measurements of various business indicators at certain points in time (for example, the average mail delivery time at a moment of 10 minutes).

One of the metrics shows the number of active users in the database. On average, there are 1000-1400 of them during the day. The graph shows that at the time of the screenshot, there were 2144 active users in the database.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

There are more than 30 such actions, the list is under the cut.List

  • Login to the system
  • Sign Out
  • Loading mail
  • Changing the reality of an object
  • Changing access rights
  • Changing the subject of the process
  • Change the workgroup of an object
  • Changing the composition of the kit
  • File change
  • File import
  • Sending by mail
  • Moving files
  • Task redirection
  • EP signing
  • Search by details
  • Full text search
  • Getting a file
  • Process interruption
  • Review
  • Decryption
  • Document registration
  • Scanning
  • Unmark deletion
  • Create an object
  • Saving to disk
  • Process start
  • Deleting User Log Entries
  • Removing the ES signature
  • Setting a deletion flag
  • Encryption
  • Export folder

The week before last, our average user activity increased one and a half times (shown in red on the graph) - this is due to the transition of most employees to remote work (due to well-known events). Also, the number of active users increased by 3 times (shown in blue on the screen), as employees began to actively use mobile: each mobile client creates a connection to the server. Now, on average, there are 2 connections to the server for each of our employees.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

For us, as administrators, this is a signal that we need to be more attentive to performance issues, to see if things have gotten worse. And we look at it in other ways. For example, how the mail delivery time for internal routing changes (shown in blue in the screenshot below). We can see that it was jumping up to this year, but now it is stable - for us this is an indicator that everything is in order with the system.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

Another applied metric for us is the average waiting time for downloading letters from the mail server (shown in red in the screenshot). Roughly speaking, how long will the letter surf the Internet before it reaches our employee. The screenshot shows that this time has not changed in any way recently. There are separate bursts - but they are not associated with delays, but with the fact that time is lost on mail servers.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

Or, for example, another metric (shown in blue in the screenshot) - updating letters in a folder. Opening an email folder is a very common operation and needs to be done quickly. We measure the speed with which it is performed. This indicator is measured for each client. You can see both the overall picture for the company and the dynamics, for example, for an individual employee. The screenshot shows that before this year the metric was unbalanced, then we made a number of improvements, and now it is not getting worse - almost a flat graph.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

Metrics is basically an administrator's tool for monitoring the system, for quickly responding to any changes in the behavior of the system. The screenshot shows the metrics of the internal DO for the year. The jump in the graphs is due to the fact that we were given the task of developing internal DL.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

Here is a list of some more metrics (under the cut).
Metrics

  • User activity
  • Active Users
  • Active processes
  • Number of files
  • File size (MB)
  • Number of documents
  • Number of objects to be sent to recipients
  • Number of counterparties
  • Backlog of tasks
  • Average waiting time for messages to be downloaded from the mail server in the last 10 minutes
  • External data buffer: number of files
  • Border delay from the current date
  • long queue
  • Operational queue
  • Raw account age by external routing
  • Internal Routing Receive Queue Size (Long Queue)
  • Internal Routing Receive Queue Size (Fast Queuing)
  • Mail delivery time for internal routing (long queue)
  • Mail delivery time for internal routing (fast queue)
  • Mail delivery time by external routing (average)
  • Number of documents Booking
  • Number of documents Absence
  • Number of documents "Record on work with a counterparty"
  • Mail Update emails in a folder
  • Mail Open letter card
  • Mail Move mail to a folder
  • Mail Navigate folders

Our system takes measurements of more than 150 indicators around the clock, but not all of them can be monitored quickly. They can come in handy later, in some historical perspective, and you can focus on the most important for business.

On one of the implementations, for example, only 5 indicators were selected. The customer set himself the goal of making a minimum set of indicators, but at the same time such that it covers the main work scenarios. It would be unjustified to include 150 indicators in the act of acceptance, because even within the enterprise it is difficult to agree on which indicators to consider acceptable. And they knew about these 5 indicators and already presented them to the system before the start of the implementation project, including them in the tender documentation: card opening time no more than 3 seconds, task execution time with a file no more than 5 seconds, etc. We had metrics in DO that very clearly reflected the initial request from the customer's TOR.

And we also have a profile analysis of performance measurements. Performance indicators are a fixation of the duration of each running operation (writing a letter to the database, sending a letter to a mail server, etc.). This is used exclusively by technicians. There are a lot of performance indicators in our program. Now we are measuring approximately 1500 key operations, which are broken down into profiles.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

One of the most important profiles for us is the List of Key Mail Metrics from a Consumer Perspective. This profile includes, for example, the following indicators:

  • Command execution: Select by tag
  • Opening Form: List Form
  • Command execution: Filter by folder
  • Show email in the reading pane
  • Save email to your favorite folder
  • Search for letters by details
  • Writing a letter

If we see that the metric for some business indicator has become too large (for example, letters from a particular user have begun to arrive for a very long time), we begin to figure it out, turn to measuring the time of technical operations. We have a technical operation "Archiving letters on the mail server" - we see the time exceeded for this operation for the last period. This operation, in turn, is decomposed into other operations - for example, establishing a connection to a mail server. We see that for some reason it suddenly became very large (we have all the measurements for a month - we can compare that last week it was 10 milliseconds, and now it is 1000 milliseconds). And we understand that something is broken here - we need to fix it.

How do we maintain such a large database?

Our internal DO is an example of a really working high-load project. Let's talk about the technical features of its database.

How long does it take to restructure large database tables?

SQL server requires periodic maintenance, putting things in order in the tables. In a good way, this should be done at least once a day, and for highly demanded tables - even more often. But if the database is large (and our number of records has already exceeded 11 billion), then it is not easy to take care of it.

We did a table restructuring 6 years ago, but then it began to take so much time that we no longer fit into the night intervals. And since these operations heavily load the SQL server, it cannot serve other users well.

So now we have to apply various tricks. For example, we cannot perform these procedures on complete datasets. We have to resort to the procedure Update Sample 500000 rows - it takes 14 minutes. It does not update statistics on all table data, but selects half a million rows, and calculates statistics from them, which it uses for the entire table. This is some assumption, but we are forced to go for it, because for a particular table, collecting statistics over the entire billion records will take an unacceptably long time.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company
We also optimized other maintenance operations, making them partial.

Maintenance of a DBMS is generally a difficult task. In the case of active interaction of employees, the database grows rapidly, it becomes more and more difficult for administrators to maintain it - to update statistics, defragment, index. Here we need to apply different strategies, we know well how to do it, we have experience, we can share it.

How is a backup implemented with such volumes?

A full DBMS backup is performed once a day at night, an incremental one every hour. A directory of files is also created every day, and it is a portion of the incremental backup of the file storage.

How long does a full backup take?

On the hard drive, a full backup is performed in three hours, a partial one in an hour. It takes longer to write to a tape (a special device that makes a backup copy on a special cassette stored outside the office; an alienable copy is made on the tape, which will be preserved if, for example, the server one burns down). The backup is done exactly on the same server, the parameters of which were higher - a SQL server with 20% CPU load. At the time of the backup, of course, the system becomes much worse, but it is still functional.

We check for ourselves: how 1C is deployed and how it is administered: Document flow within the 1C company

Is there deduplication?

Deduplication There are files, we are running it on ourselves, and soon it will enter the new version of the Workflow. We are also testing the counterparty deduplication mechanism. There is no deduplication of records at the DBMS level, since this is not necessary. The 1C:Enterprise platform stores objects in the DBMS, and only the platform can be responsible for their consistency.

Are there read-only nodes?

There are no nodes for reading (dedicated nodes of the system that serve those who need to receive any data for reading). DO is not an accounting system to put on a separate BI node, but there is a separate node for the development department, with which messages are exchanged in JSON format, and the typical replication time is units and tens of seconds. The node is still small, it has about 800 million records, but it is growing rapidly.

And the letters marked for deletion are not deleted at all?

Not yet. We have no task to lighten the base. There were several rather serious cases when I had to turn to letters marked for deletion, including 2009. So for now, we decided to keep everything. But when the cost of this becomes unjustified, we will think about removing it. But, if you need to delete some separate letter from the database with ends so that there are no traces, then this can be done on special request.

Why keep it? Is there a statistics of accesses to old documents?

There are no statistics. More precisely, it is in the form of a protocol for the work of users, but it is not stored for long. Records older than a year are deleted from the protocol.

There were situations when it was necessary to raise the old correspondence of five years and even ten years ago. And this has always been done not out of idle curiosity, but to make complex business decisions. There was a case when, without a history of correspondence, the wrong business decision would have been made.

How is the examination of the value and destruction of documents according to the terms of storage carried out?

For paper documents, this is done in the usual traditional way, like everyone else. We don’t do it for electronic ones - let them keep it for themselves. The sit is here. There is a benefit. Everyone is fine.

What are the development prospects?

Now our DO solves about 30 internal tasks, some of which we listed at the beginning of the article. The DO is also used to prepare conferences that we hold twice a year for our partners: the entire program, all reports, all parallel sections, halls - all this is typed into the DO, and then unloaded from it, and a printed program is made.

There are several more tasks on the way for DO, in addition to those that he is already solving. There are company-wide tasks, and there are unique and rare tasks that are needed only by a particular department. It is necessary to help them, which means to expand the "geography" of using the system within 1C - to expand the scope, to solve the problems of all departments. This would be the best test for performance and reliability. I would like to see the system work on trillions of records, petabytes of information.

Source: habr.com

Add a comment