Wednesday, March 26, 2008

ROW_NUMBER() Sucks?

Is it just me, or does the SQL Server 2005 ROW_NUMBER() function's performance suck? I always used to complain about SQL Server's lack of a LIMIT clause like MySQL has. I was thrilled when 2005 came out with the ROW_NUMBER() function.

When we switched to SQL 2005, I dutifully recoded the main search query to use the ROW_NUMBER() function, only to find out that it hurt performance by almost 3 fold! In addition, using ROW_NUMBER() doesn't give you an easy way to get the total number of records a query would return. (Helpful to display something like, "Viewing records 51-100 of 2,120")

I should put a caveat here: My testing shows that ROW_NUMBER() is actually much faster when doing very simple queries with large numbers of rows. For example, if you need to quickly grab 10 rows in the middle of all 10,000 rows in a single table with no joins, ROW_NUMBER() is probably for you. However, if you're doing a more complex query like to search for specific records or joining with many other tables, my testing shows ROW_NUMBER() is a significant slow-down. Here are the results for my specific query:

Description Rows Estimated Cost Actual Time Slowdown
My original query 12,547 3.701 0.729 -
Using ROW_NUMBER() 12,547 6.616 2.094 287.1%!

So, if not ROW_NUMBER(), then what? My solution is to declare a temporary table and insert all of your result IDs into that table, along with the ordering. You can then join with that table to grab only the results you need. For example:

DECLARE @FirstRecord Int
SET @FirstRecord=0
DECLARE @LastRecord Int
SET @LastRecord=49

--First make a table to hold the order of the final sorted results
DECLARE @ResultOrder TABLE  
(     OrderID int identity(0,1) not null primary key clustered,
    ResultID int    
);

--The main "search" query... this can be pretty slow with lots of rows
INSERT INTO @ResultOrder (ProductID)
    SELECT p.ProductID
    FROM Products p
    WHERE p.Cost < 100 AND p.CategoryID=4 --Or whatever search you're doing

--A very fast way to get the total number of results
SELECT COUNT(*) FROM @ResultOrder; 

--A very fast way to get the details... you can even join with many more tables
SELECT p.Name, p.Cost, c.Name AS CategoryName, s.Name AS SalesPerson
FROM @ResultOrder ro, Products p, Categories c, SalesPeople s
WHERE p.ProductID=ro.ProductID
    AND p.CategoryID=c.CategoryID
    AND p.SalesPersonID=s.SalesPersonID
ORDER BY ro.OrderID

Obviously, that is a simple example, but I suspect as your query gets more complicated (especially the 2nd details part) you're way better off not using ROW_NUMBER().

As an ADDITIONAL speedup, that first result set is potentially PRIME cacheing material. For example, if a user (or a crawler) hits the first page of results, then clicks "next" a few times. You do the expensive search only once, then quickly pull up the details for each page on subsequent requests. You might have to analyze your usage to see how often and how close together different pages of the same search are requested.

How do you best cache those Result IDs you ask? And how do you get the IDs to and from SQL? Well, I'm about to implement that myself so hopefully I'll let you know in a subsequent blog post.

No comments: