A little more about bad testing

One day, I accidentally came across a code that a user was trying to monitor RAM performance in his virtual machine. I will not give this code (there is a “footcloth”) and I will leave only the most essential. So, the cat is in the studio!

#include <sys/time.h>
#include <string.h>
#include <iostream>

#define CNT 1024
#define SIZE (1024*1024)

int main() {
	struct timeval start;
	struct timeval end;
	long millis;
	double gbs;
	char ** buffers;
	buffers = new char*[CNT];
	for (int i=0;i<CNT;i++) {
		buffers[i] = new char[SIZE];
	}
	gettimeofday(&start, NULL);
	for (int i=0;i<CNT;i++) {
		memset(buffers[i], 0, SIZE);
	}
	gettimeofday(&end, NULL);
	millis = (end.tv_sec - start.tv_sec) * 1000 +
		(end.tv_usec - start.tv_usec) / 1000;
	gbs = 1000.0 / millis;
	std::cout << gbs << " GB/sn";
	for (int i=0;i<CNT;i++) {
		delete buffers[i];
	}
	delete buffers;
	return 0;
}

It's simple - we allocate memory and write one gigabyte to it. And what does this test show?

$ ./memtest
4.06504 GB / s

Approximately 4GB/s.

What?!?!

How?!?!?

This is a Core i7 (albeit not the newest), DDR4, the processor is almost not loaded - WHY?!?!

The answer, as always, is unusually ordinary.

The new operator (like the malloc function, by the way) does not actually allocate memory. With this call, the allocator looks at the list of free areas in the memory pool, and if there are none, calls sbrk() to increase the data segment, and then returns to the program a reference to the address from the newly allocated area.

The problem is that the selected area is entirely virtual. Real memory pages are not allocated.

And when each page from this allocated segment is accessed for the first time, the MMU “fires” a page fault, after which the virtual page that is being accessed is assigned a real one.

Therefore, in fact, we are not testing the performance of the bus and RAM modules, but the performance of the MMU and VMM of the operating system. And in order to test the real performance of RAM, we just need to initialize the allocated areas once. For example like this:

#include <sys/time.h>
#include <string.h>
#include <iostream>

#define CNT 1024
#define SIZE (1024*1024)

int main() {
	struct timeval start;
	struct timeval end;
	long millis;
	double gbs;
	char ** buffers;
	buffers = new char*[CNT];
	for (int i=0;i<CNT;i++) {
                // FIXED HERE!!!
		buffers[i] = new char[SIZE](); // Add brackets, &$# !!!
	}
	gettimeofday(&start, NULL);
	for (int i=0;i<CNT;i++) {
		memset(buffers[i], 0, SIZE);
	}
	gettimeofday(&end, NULL);
	millis = (end.tv_sec - start.tv_sec) * 1000 +
		(end.tv_usec - start.tv_usec) / 1000;
	gbs = 1000.0 / millis;
	std::cout << gbs << " GB/sn";
	for (int i=0;i<CNT;i++) {
		delete buffers[i];
	}
	delete buffers;
	return 0;
}

That is, we simply initialize the allocated buffers with the default value (char 0).

We check:

$ ./memtest
28.5714 GB / s

Another thing.

Moral - if you need large buffers to work quickly, don't forget to initialize them.

Source: habr.com

Add a comment