EF Core 6 comes with some GroupBy queries improvements. In this post I wanna talk about the improvements related to “group by top(n)” queries.
Let’s say we have the following table Documents:
CREATE TABLE [dbo].[Documents](
[Id] [integer] PRIMARY KEY,
[UserId] [integer] NOT NULL,
[Title] [nvarchar](50) NOT NULL,
[Body] [nvarchar](250) NOT NULL,
[CreatedOn] [datetime] NOT NULL
)
and that we want to get the two most recent documents for each user. For instance, if we have the following records:
the query should return
To do this query using EF Core and LINQ we can try to group the Documents by UserId and then sort each group by the CreatedOn column to pick the first two documents for each user.
We can start by trying to group all documents for each user
var usersDocs = await ctx
.Documents
.GroupBy(doc => doc.UserId)
and then for each group we try to sort its elements by the CreatedOn column
var usersDocs = await ctx
.Documents
.GroupBy(doc => doc.UserId)
.SelectMany(userDocs => userDocs.OrderByDescending(doc => doc.CreatedOn).Take(2))
.ToArrayAsync();
If we try to execute this query using a previous version of EF Core 6 we get the following error:
.OrderByDescending(doc => doc.CreatedOn)' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.'
This is because the previous versions of EF don’t know how to translate the GroupBy inner expressions to sql. After following their suggestion I ended up with this query
var usersDocs = await ctx
.Documents
.Select(doc => doc.UserId)
.Distinct()
.SelectMany(userId =>
ctx
.Documents
.Where(doc => doc.UserId == userId)
.OrderByDescending(doc => doc.CreatedOn)
.Take(2)
)
.ToArrayAsync();
However this query has two main issues. It does a distinct over the UserId column, loads all the user ids to application memory and then it does a query for each user resulting in a n+1 query problem.
One way to solve these issues is to forget LINQ and rewrite the query using raw sql with a partition by UserId and the ROW_NUMBER() window funtion, doing a CTE .
Now, with the EF Core 6 we can use the first version of the query, since the EF Core team has added the support for translating some GroupBy inner expressions and it can translate this LINQ query to a single sql query.
var usersDocs = await ctx
.Documents
.GroupBy(doc => doc.UserId)
.SelectMany(userDocs => userDocs.OrderByDescending(doc => doc.CreatedOn).Take(2))
.ToArrayAsync();
You can read more about these features and other improvements added to GroupBy queries here and check the related github issues 12088 13805
Happy coding!