Data structures for storing graphs: an overview of existing ones and two "almost new" ones

Hello.

In this note, I decided to list the main data structures used to store graphs in computer science, and also talk about a couple more such structures that I somehow "crystallized" by itself.

So, let's begin. But not from the very beginning - I think what a graph is and what they are (directed, undirected, weighted, unweighted, with or without multiple edges and loops), we all already know.

So let's go. What are the options for data structures for "graph storage" we have.

1. Matrix data structures

1.1 Adjacency matrix. The adjacency matrix is ​​a matrix where the row and column headings correspond to the numbers of the graph vertices, and the value of each of its elements a(i,j) is determined by the presence or absence of edges between the vertices i and j (it is clear that for an undirected graph such a matrix will be symmetrical, or we can agree that we store all values ​​​​only above the main diagonal). For unweighted graphs, a(i,j) can be set by the number of edges from i to j (if there is no such edge, then a(i,j) = 0), and for weighted graphs, also by the weight (total weight) of the mentioned edges.

1.2 Incidence matrix. In this case, our graph is also stored in a table, in which, as a rule, the numbers of rows correspond to the numbers of its vertices, and the numbers of columns correspond to pre-numbered edges. If a vertex and an edge are incident to each other, then a non-zero value is written in the corresponding cell (for undirected graphs, 1 is written if the vertex and the edge are incident, for directed graphs - "1" if the edge "leaves" the vertex and "-1" if it “includes” in it (it is remembered quite easily, because the “minus” sign is also, as it were, “included” in the number “-1”)). For weighted graphs, again, instead of 1 and -1, you can specify the total weight of the edge.

2. Enumerated data structures

2.1 adjacency list. Well, here, it seems, everything is simple. Each vertex of the graph can, in general, be associated with any enumerative structure (list, vector, array, ...), which will store the numbers of all vertices adjacent to the given one. For directed graphs, we will include in such a list only those vertices that have a “directed” edge from the feature vertex. For weighted graphs, the implementation will be more complex.

2.2 List of ribs. Quite a popular data structure. The list of edges, as Captain Evidence tells us, is actually a list of graph edges, each of which is specified by the start vertex, end vertex (for undirected graphs, the order is not important here, although various rules can be used for unification, for example, specify the vertices in the order increasing) and weight (only for weighted graphs).

You can see in more detail (and with illustrations) about the matrix lists listed above, for example, here.

2.3 Adjacency array. Not the most common structure. At its core, it is a form of "packing" adjacency lists into one enumerative structure (array, vector). The first n (according to the number of graph vertices) elements of such an array contain the starting indices of the same array, starting from which all vertices adjacent to the given array are written in a row.

Here I found the most understandable (for myself) explanation: ejuo.livejournal.com/4518.html

3. Adjacency Vector and Associative Adjacency Array

It so happened that the author of these lines, not being a professional programmer, but who periodically dealt with graphs, most often dealt with lists of edges. Indeed, it is convenient if the graph has multiple loops and edges. And so, in the development of the classic lists of edges, I propose to pay attention to their “development / branch / modification / mutation”, namely: the adjacency vector and the associative adjacency array.

3.1 Adjacency vector

Case (a1): unweighted graph

We will call the adjacency vector for an unweighted graph an ordered set of an even number of integers (а[2i], a[2i+1],…, where i is numbered from 0), in which each pair of numbers a[2i], a[2i+1 ] specifies the edge of the graph between the vertices a[2i] and a[2i+1], respectively.
This record format does not contain information whether the graph is directed (both options are possible). When using the format for a digraph, it is considered that the edge is directed from a[2i] to a[2i+1]. Here and below: for undirected graphs, if necessary, the requirements for the order of recording vertices can be applied (for example, that the vertex with the lower value of the number assigned to it goes first).

In C++, it makes sense to define an adjacency vector using std::vector, hence the name of this data structure was chosen.

Case (a2): unweighted graph, edge weights are integer

By analogy with case (a1), we will call the adjacency vector for a weighted graph with integer weights of edges an ordered set (dynamic array) of numbers (а[3i], a[3i+1], a[3i+2],…, where i is numbered c 0), where each “triplet” of numbers a[3i], a[3i+1], a[3i+2] defines a graph edge between vertices numbered a[3i] and a[3i+1], respectively, and the value a [3i+2] is the weight of this edge. Such a graph may or may not be directed.

Case (b): unweighted graph, edge weights non-integer

Since heterogeneous elements cannot be stored in one array (vector), the following implementation is possible, for example. The graph is stored in a pair of vectors, in which the first vector is the adjacency vector of the graph without weights, and the second vector contains the corresponding weights (possible implementation for C++: std::pair ). Thus, for an edge defined by a pair of vertices at indices 2i, 2i+1 of the first vector, the weight will be equal to the element at index i of the second vector.

Well, why is it necessary?

Well, it seemed quite useful to the author of these lines for solving a number of problems. Well, from a formal point of view, there will be such advantages:

  • The adjacency vector, like any other "enumerative" structure, is quite compact, takes up less memory than the adjacency matrix (for sparse graphs), and is relatively easy to implement.
  • The vertices of the graph, in principle, can be marked with negative numbers. Suddenly, yes, such a “perversion” is needed.
  • Graphs can contain multiple edges and multiple loops, with different weights (positive, negative, even zero). There are no restrictions here.
  • You can also assign different properties to edges - but see Section 4 for more on that.

However, it must be admitted that this “list” does not imply fast access to the edge. And here the Associative Adjacency Array comes to the rescue, which is discussed below.

3.2 Associative adjacency array

So, if access to a particular edge, its weight and other properties is critical for us, and memory requirements do not allow us to use the adjacency matrix, then let's think about how we can change the adjacency vector to solve this problem. So, the key is the edge of the graph, which can be specified as an ordered pair of integers. What does it look like? Whether on a key in an associative array? And if so, why don't we implement it? Let us have such an associative array, where each key - an ordered pair of integers - will be assigned a value - an integer or real number that specifies the weight of the edge. In C++, it is expedient to implement this structure based on the container std::map (std::map , int> or std::map , double>), or std::multimap if multiple edges are expected. Well, here we have a graph storage structure that takes up less memory than “matrix” structures, can define graphs with multiple loops and edges, and does not even have strict requirements for the non-negativity of vertex numbers (I don’t know who needs it, but still).

4. At least “fill in” data structures, but something is missing

And the truth is: when solving a number of problems, we may need to assign some features to the edges of the graph and, accordingly, store them. If it is possible to unambiguously reduce these features to integers, then it is possible to store such "complementary feature graphs" using extended versions of the adjacency vector and adjacency associative array.

So, let us have an unweighted graph, for each edge of which it is necessary to store, for example, 2 additional features specified by integers. In this case, it is possible to define its adjacency vector as an ordered set of not “pairs”, but “quartets” of integers (а[2i], a[2i+1], a[2i+2], a[2i+3]…) , where a[2i+2] and a[2i+3] will define the features of the corresponding edge. For a graph with integer weights of edges, the order is generally similar (the only difference is that the features will follow the weight of the edge and are given by the elements a[2i+3] and a[2i+4], while the edge itself will be given by not 4, but 5 ordered numbers). And for a graph with non-integer edge weights, features can be written in its unweighted component.

When using an associative adjacency array for graphs with integer edge weights, it is possible to set as a value not a single number, but an array (vector) of numbers that specify, in addition to the edge weight, all its other necessary features. At the same time, an inconvenience for the case of non-integer weights will be the need to set the sign as a floating point number (yes, this is an inconvenience, but if there are not so many such signs, and if you don’t set them too “tricky” double, then it may be nothing) . And so, in C++, extended associative adjacency arrays can be defined as follows: std::map , std::vector> or std::map , std::vector, with the first value in the "value-by-key-vector" being the weight of the edge, followed by the numerical designations of its attributes.

References:

About graphs and algorithms in general:

1. Kormen, Thomas H., Leiserson, Charles I., Rivest, Ronald L., Stein, Clifford. Algorithms: construction and analysis, 2nd edition: Per. from English. – M.: Williams Publishing House, 2011.
2. Harari Frank. Graph theory. M.: Mir, 1973.
The author's report about these same vector and associative adjacency array:
3. Chernoukhov S.A. Adjacency vector and associative adjacency array as ways of representing and storing graphs / SA Chernouhov. Adjacency vector and adjacency map as data structures to represent a graph // Collection of articles of the International Scientific and Practical Conference "Problems of implementing the results of innovative developments and ways to solve them" (Saratov, 14.09.2019). – Sterlitamak: AMI, 2019, p. 65-69
Useful online sources on the topic:
4. prog-cpp.ru/data-graph
5. ejuo.livejournal.com/4518.html

Source: habr.com

Add a comment