LINQ entered .NET as a powerful new data manipulation language. LINQ to SQL as part of it allows you to communicate with the DBMS quite conveniently using, for example, the Entity Framework. However, using it quite often, developers forget to look at what kind of SQL query the queryable provider will generate, in your case, the Entity Framework.
Let's look at two main points with an example.
To do this, in SQL Server we will create a database Test, and in it we will create two tables using the following query:
Creating tables
USE [TEST]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Ref](
[ID] [int] NOT NULL,
[ID2] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
CONSTRAINT [PK_Ref] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Ref] ADD CONSTRAINT [DF_Ref_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO
USE [TEST]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Customer](
[ID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[Ref_ID] [int] NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
[Ref_ID2] [int] NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [DF_Customer_Ref_ID] DEFAULT ((0)) FOR [Ref_ID]
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [DF_Customer_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO
Now let's populate the Ref table by running the following script:
Filling the table Ref
USE [TEST]
GO
DECLARE @ind INT=1;
WHILE(@ind<1200000)
BEGIN
INSERT INTO [dbo].[Ref]
([ID]
,[ID2]
,[Name])
SELECT
@ind
,@ind
,CAST(@ind AS NVARCHAR(255));
SET @ind=@ind+1;
END
GO
Let's populate the Customer table in the same way using the following script:
Populating the Customer table
USE [TEST]
GO
DECLARE @ind INT=1;
DECLARE @ind_ref INT=1;
WHILE(@ind<=12000000)
BEGIN
IF(@ind%3=0) SET @ind_ref=1;
ELSE IF (@ind%5=0) SET @ind_ref=2;
ELSE IF (@ind%7=0) SET @ind_ref=3;
ELSE IF (@ind%11=0) SET @ind_ref=4;
ELSE IF (@ind%13=0) SET @ind_ref=5;
ELSE IF (@ind%17=0) SET @ind_ref=6;
ELSE IF (@ind%19=0) SET @ind_ref=7;
ELSE IF (@ind%23=0) SET @ind_ref=8;
ELSE IF (@ind%29=0) SET @ind_ref=9;
ELSE IF (@ind%31=0) SET @ind_ref=10;
ELSE IF (@ind%37=0) SET @ind_ref=11;
ELSE SET @ind_ref=@ind%1190000;
INSERT INTO [dbo].[Customer]
([ID]
,[Name]
,[Ref_ID]
,[Ref_ID2])
SELECT
@ind,
CAST(@ind AS NVARCHAR(255)),
@ind_ref,
@ind_ref;
SET @ind=@ind+1;
END
GO
Thus, we got two tables, one of which has more than 1 million rows of data, and the other has more than 10 million rows of data.
Now in Visual Studio, you need to create a test project Visual C# Console App (.NET Framework):
Next, you need to add a library for the Entity Framework to interact with the database.
To add it, right-click on the project and select Manage NuGet Packages from the context menu:
Then, in the NuGet package management window that appears, in the search box, enter the word "Entity Framework" and select the Entity Framework package and install it:
Next, in the App.config file, after closing the configSections element, add the following block:
<connectionStrings>
<add name="DBConnection" connectionString="data source=ΠΠΠ―_ΠΠΠΠΠΠΠΠ―Π Π_MSSQL;Initial Catalog=TEST;Integrated Security=True;" providerName="System.Data.SqlClient" />
</connectionStrings>
In connectionString you need to enter the connection string.
Now let's create 3 interfaces in separate files:
- Implementation of the IBaseEntityID interface
namespace TestLINQ { public interface IBaseEntityID { int ID { get; set; } } }
- Implementation of the IBaseEntityName interface
namespace TestLINQ { public interface IBaseEntityName { string Name { get; set; } } }
- Implementation of the IBaseNameInsertUTCDate interface
namespace TestLINQ { public interface IBaseNameInsertUTCDate { DateTime InsertUTCDate { get; set; } } }
And in a separate file, we will create a BaseEntity base class for our two entities, which will include common fields:
Implementation of the base class BaseEntity
namespace TestLINQ
{
public class BaseEntity : IBaseEntityID, IBaseEntityName, IBaseNameInsertUTCDate
{
public int ID { get; set; }
public string Name { get; set; }
public DateTime InsertUTCDate { get; set; }
}
}
Next, in separate files, we will create our two entities:
- Ref class implementation
using System.ComponentModel.DataAnnotations.Schema; namespace TestLINQ { [Table("Ref")] public class Ref : BaseEntity { public int ID2 { get; set; } } }
- Implementation of the Customer class
using System.ComponentModel.DataAnnotations.Schema; namespace TestLINQ { [Table("Customer")] public class Customer: BaseEntity { public int Ref_ID { get; set; } public int Ref_ID2 { get; set; } } }
Now let's create a UserContext context in a separate file:
Implementation of the UserContex class
using System.Data.Entity;
namespace TestLINQ
{
public class UserContext : DbContext
{
public UserContext()
: base("DbConnection")
{
Database.SetInitializer<UserContext>(null);
}
public DbSet<Customer> Customer { get; set; }
public DbSet<Ref> Ref { get; set; }
}
}
We got a ready-made solution for conducting optimization tests with LINQ to SQL via EF for MS SQL Server:
Now in the Program.cs file, enter the following code:
Program.cs file
using System;
using System.Collections.Generic;
using System.Linq;
namespace TestLINQ
{
class Program
{
static void Main(string[] args)
{
using (UserContext db = new UserContext())
{
var dblog = new List<string>();
db.Database.Log = dblog.Add;
var query = from e1 in db.Customer
from e2 in db.Ref
where (e1.Ref_ID == e2.ID)
&& (e1.Ref_ID2 == e2.ID2)
select new { Data1 = e1.Name, Data2 = e2.Name };
var result = query.Take(1000).ToList();
Console.WriteLine(dblog[1]);
Console.ReadKey();
}
}
}
}
Next, let's run our project.
At the end of the work, the following will be displayed on the console:
Generated SQL query
SELECT TOP (1000)
[Extent1].[Ref_ID] AS [Ref_ID],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
INNER JOIN [dbo].[Ref] AS [Extent2] ON ([Extent1].[Ref_ID] = [Extent2].[ID]) AND ([Extent1].[Ref_ID2] = [Extent2].[ID2])
That is, in general, the LINQ query generated a SQL query to the MS SQL Server DBMS quite well.
Now let's change the AND condition to OR in the LINQ query:
LINQ query
var query = from e1 in db.Customer
from e2 in db.Ref
where (e1.Ref_ID == e2.ID)
|| (e1.Ref_ID2 == e2.ID2)
select new { Data1 = e1.Name, Data2 = e2.Name };
Let's run our application again.
The execution will crash with an error related to the command execution time exceeding 30 seconds:
If you look at which query was generated by LINQ:
, then you can make sure that the selection occurs through the Cartesian product of two sets (tables):
Generated SQL query
SELECT TOP (1000)
[Extent1].[Ref_ID] AS [Ref_ID],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
CROSS JOIN [dbo].[Ref] AS [Extent2]
WHERE [Extent1].[Ref_ID] = [Extent2].[ID] OR [Extent1].[Ref_ID2] = [Extent2].[ID2]
Let's rewrite the LINQ query like this:
Optimized LINQ query
var query = (from e1 in db.Customer
join e2 in db.Ref
on e1.Ref_ID equals e2.ID
select new { Data1 = e1.Name, Data2 = e2.Name }).Union(
from e1 in db.Customer
join e2 in db.Ref
on e1.Ref_ID2 equals e2.ID2
select new { Data1 = e1.Name, Data2 = e2.Name });
Then we get the following SQL query:
SQL query
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[C2] AS [C2],
[Limit1].[C3] AS [C3]
FROM ( SELECT DISTINCT TOP (1000)
[UnionAll1].[C1] AS [C1],
[UnionAll1].[Name] AS [C2],
[UnionAll1].[Name1] AS [C3]
FROM (SELECT
1 AS [C1],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent1]
INNER JOIN [dbo].[Ref] AS [Extent2] ON [Extent1].[Ref_ID] = [Extent2].[ID]
UNION ALL
SELECT
1 AS [C1],
[Extent3].[Name] AS [Name],
[Extent4].[Name] AS [Name1]
FROM [dbo].[Customer] AS [Extent3]
INNER JOIN [dbo].[Ref] AS [Extent4] ON [Extent3].[Ref_ID2] = [Extent4].[ID2]) AS [UnionAll1]
) AS [Limit1]
Alas, in LINQ queries there can be only one join condition, therefore it is possible to make an equivalent query through two queries for each condition, followed by their union through Union to remove duplicates among rows.
Yes, the queries will generally be non-equivalent, given that complete duplicate rows may be returned. However, in real life, full duplicate lines are not needed and they are trying to get rid of them.
Now let's compare the execution plans of these two queries:
- for CROSS JOIN, the average execution time is 195 sec:
- for INNER JOIN-UNION the average execution time is less than 24 sec:
.
Also in this repository in the Plans folder are plans for executing queries with OR conditions.
Source: habr.com