LINQ entered .NET as a powerful new data manipulation language. LINQ to SQL as part of it allows you to communicate with the DBMS quite conveniently using, for example, the Entity Framework. However, using it quite often, developers forget to look at what kind of SQL query the queryable provider will generate, in your case, the Entity Framework.
Let's look at two main points with an example.
To do this, in SQL Server we will create a database Test, and in it we will create two tables using the following query:
Creating tables
USE [TEST]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Ref](
[ID] [int] NOT NULL,
[ID2] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
CONSTRAINT [PK_Ref] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Ref] ADD CONSTRAINT [DF_Ref_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO
USE [TEST]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Customer](
[ID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[Ref_ID] [int] NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
[Ref_ID2] [int] NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [DF_Customer_Ref_ID] DEFAULT ((0)) FOR [Ref_ID]
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [DF_Customer_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO
Now let's populate the Ref table by running the following script:
Filling the table Ref
USE [TEST]
GO
DECLARE @ind INT=1;
WHILE(@ind<1200000)
BEGIN
INSERT INTO [dbo].[Ref]
([ID]
,[ID2]
,[Name])
SELECT
@ind
,@ind
,CAST(@ind AS NVARCHAR(255));
SET @ind=@ind+1;
END
GO
Let's populate the Customer table in the same way using the following script:
Populating the Customer table
USE [TEST]
GO
DECLARE @ind INT=1;
DECLARE @ind_ref INT=1;
WHILE(@ind<=12000000)
BEGIN
IF(@ind%3=0) SET @ind_ref=1;
ELSE IF (@ind%5=0) SET @ind_ref=2;
ELSE IF (@ind%7=0) SET @ind_ref=3;
ELSE IF (@ind%11=0) SET @ind_ref=4;
ELSE IF (@ind%13=0) SET @ind_ref=5;
ELSE IF (@ind%17=0) SET @ind_ref=6;
ELSE IF (@ind%19=0) SET @ind_ref=7;
ELSE IF (@ind%23=0) SET @ind_ref=8;
ELSE IF (@ind%29=0) SET @ind_ref=9;
ELSE IF (@ind%31=0) SET @ind_ref=10;
ELSE IF (@ind%37=0) SET @ind_ref=11;
ELSE SET @ind_ref=@ind%1190000;
INSERT INTO [dbo].[Customer]
([ID]
,[Name]
,[Ref_ID]
,[Ref_ID2])
SELECT
@ind,
CAST(@ind AS NVARCHAR(255)),
@ind_ref,
@ind_ref;
SET @ind=@ind+1;
END
GO
Thus, we got two tables, one of which has more than 1 million rows of data, and the other has more than 10 million rows of data.
Now in Visual Studio, you need to create a test project Visual C# Console App (.NET Framework):
Next, you need to add a library for the Entity Framework to interact with the database.
To add it, right-click on the project and select Manage NuGet Packages from the context menu:
Then, in the NuGet package management window that appears, in the search box, enter the word "Entity Framework" and select the Entity Framework package and install it:
Next, in the App.config file, after closing the configSections element, add the following block:
<connectionStrings>
<add name="DBConnection" connectionString="data source=ΠΠΠ―_ΠΠΠΠΠΠΠΠ―Π Π_MSSQL;Initial Catalog=TEST;Integrated Security=True;" providerName="System.Data.SqlClient" />
</connectionStrings>
In connectionString you need to enter the connection string.
Now let's create 3 interfaces in separate files:
- Implementation of the IBaseEntityID interface
namespace TestLINQ { public interface IBaseEntityID { int ID { get; set; } } }
- Implementation of the IBaseEntityName interface
namespace TestLINQ { public interface IBaseEntityName { string Name { get; set; } } }
- Implementation of the IBaseNameInsertUTCDate interface
namespace TestLINQ { public interface IBaseNameInsertUTCDate { DateTime InsertUTCDate { get; set; } } }
And in a separate file, we will create a BaseEntity base class for our two entities, which will include common fields:
Implementation of the base class BaseEntity
namespace TestLINQ
{
public class BaseEntity : IBaseEntityID, IBaseEntityName, IBaseNameInsertUTCDate
{
public int ID { get; set; }
public string Name { get; set; }
public DateTime InsertUTCDate { get; set; }
}
}
Next, in separate files, we will create our two entities:
- Ref class implementation
using System.ComponentModel.DataAnnotations.Schema; namespace TestLINQ { [Table("Ref")] public class Ref : BaseEntity { public int ID2 { get; set; } } }
- Implementation of the Customer class
using System.ComponentModel.DataAnnotations.Schema; namespace TestLINQ { [Table("Customer")] public class Customer: BaseEntity { public int Ref_ID { get; set; } public int Ref_ID2 { get; set; } } }
Now let's create a UserContext context in a separate file:
Implementation of the UserContex class
using System.Data.Entity;
namespace TestLINQ
{
public class UserContext : DbContext
{
public UserContext()
: base("DbConnection")
{
Database.SetInitializer<UserContext>(null);
}
public DbSet<Customer> Customer { get; set; }
public DbSet<Ref> Ref { get; set; }
}
}
We got a ready-made solution for conducting optimization tests with LINQ to SQL via EF for MS SQL Server:
Now in the Program.cs file, enter the following code:
Program.cs file
using System;
using System.Collections.Generic;
using System.Linq;
namespace TestLINQ
{
class Program
{
static void Main(string[] args)
{
using (UserContext db = new UserContext())
{
var dblog = new List<string>();
db.Database.Log = dblog.Add;
var query = from e1 in db.Customer
from e2 in db.Ref
where (e1.Ref_ID == e2.ID)
&& (e1.Ref_ID2 == e2.ID2)
select new { Data1 = e1.Name, Data2 = e2.Name };
var result = query.Take(1000).ToList();
Console.WriteLine(dblog[1]);
Console.ReadKey();
}
}
}
}
Next, let's run our project.
At the end of the work, the following will be displayed on the console:
Generated SQL query
SELECT TOP (1000)
[Extent1].[Ref_ID] AS [Ref_ID],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
INNER JOIN [dbo].[Ref] AS [Extent2] ON ([Extent1].[Ref_ID] = [Extent2].[ID]) AND ([Extent1].[Ref_ID2] = [Extent2].[ID2])
That is, in general, the LINQ query generated a SQL query to the MS SQL Server DBMS quite well.
Now let's change the AND condition to OR in the LINQ query:
LINQ query
var query = from e1 in db.Customer
from e2 in db.Ref
where (e1.Ref_ID == e2.ID)
|| (e1.Ref_ID2 == e2.ID2)
select new { Data1 = e1.Name, Data2 = e2.Name };
Let's run our application again.
The execution will crash with an error related to the command execution time exceeding 30 seconds:
If you look at which query was generated by LINQ:
, then you can make sure that the selection occurs through the Cartesian product of two sets (tables):
Generated SQL query
SELECT TOP (1000)
[Extent1].[Ref_ID] AS [Ref_ID],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
CROSS JOIN [dbo].[Ref] AS [Extent2]
WHERE [Extent1].[Ref_ID] = [Extent2].[ID] OR [Extent1].[Ref_ID2] = [Extent2].[ID2]
Let's rewrite the LINQ query like this:
Optimized LINQ query
var query = (from e1 in db.Customer
join e2 in db.Ref
on e1.Ref_ID equals e2.ID
select new { Data1 = e1.Name, Data2 = e2.Name }).Union(
from e1 in db.Customer
join e2 in db.Ref
on e1.Ref_ID2 equals e2.ID2
select new { Data1 = e1.Name, Data2 = e2.Name });
Then we get the following SQL query:
SQL query
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[C2] AS [C2],
[Limit1].[C3] AS [C3]
FROM ( SELECT DISTINCT TOP (1000)
[UnionAll1].[C1] AS [C1],
[UnionAll1].[Name] AS [C2],
[UnionAll1].[Name1] AS [C3]
FROM (SELECT
1 AS [C1],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
INNER JOIN [dbo].[Ref] AS [Extent2] ON [Extent1].[Ref_ID] = [Extent2].[ID]
UNION ALL
SELECT
1 AS [C1],
[Extent3].[Name] AS [Name],
[Extent4].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent3]
INNER JOIN [dbo].[Ref] AS [Extent4] ON [Extent3].[Ref_ID2] = [Extent4].[ID2]) AS [UnionAll1]
) AS [Limit1]
Alas, in LINQ queries there can be only one join condition, therefore it is possible to make an equivalent query through two queries for each condition, followed by their union through Union to remove duplicates among rows.
Yes, the queries will generally be non-equivalent, given that complete duplicate rows may be returned. However, in real life, full duplicate lines are not needed and they are trying to get rid of them.
Now let's compare the execution plans of these two queries:
- for CROSS JOIN, the average execution time is 195 sec:
- for INNER JOIN-UNION the average execution time is less than 24 sec:
As can be seen from the results, for two tables with millions of records, the optimized LINQ query is many times faster than the non-optimized one.
For the variant with AND in the conditions of a LINQ query of the form:
LINQ query
var query = from e1 in db.Customer
from e2 in db.Ref
where (e1.Ref_ID == e2.ID)
&& (e1.Ref_ID2 == e2.ID2)
select new { Data1 = e1.Name, Data2 = e2.Name };
almost always a correct SQL query will be generated, which will run on average for about 1 second:
Also for LINQ to Objects manipulations instead of querying the view:
LINQ query (1st option)
var query = from e1 in seq1
from e2 in seq2
where (e1.Key1==e2.Key1)
&& (e1.Key2==e2.Key2)
select new { Data1 = e1.Data, Data2 = e2.Data };
You can use a query like:
LINQ query (2st option)
var query = from e1 in seq1
join e2 in seq2
on new { e1.Key1, e1.Key2 } equals new { e2.Key1, e2.Key2 }
select new { Data1 = e1.Data, Data2 = e2.Data };
where:
Defining two arrays
Para[] seq1 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 4, Data = "999" } };
Para[] seq2 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 5, Data = "999" } };
, and the Para type is defined as follows:
Para type definition
class Para
{
public int Key1, Key2;
public string Data;
}
Thus, we have considered some aspects in optimizing LINQ queries to MS SQL Server.
Unfortunately, even experienced and leading .NET developers forget that it is necessary to understand what the instructions they use are doing behind the scenes. Otherwise, they become configurators and can lay a time bomb in the future, both when scaling a software solution, and with minor changes in external environmental conditions.
There was also a small review
Sources for the test - the project itself, creating tables in the TEST database, as well as filling these tables with data is located
Also in this repository in the Plans folder are plans for executing queries with OR conditions.
Source: habr.com