So you have a large SQL dataset? Say 1,000,000+ records? Need to convey simple trends in data over vast historical periods on-demand? Sounds like a great candidate for data decimation (aka Down Sampling). In some recent Silverlight development, this very issue was raised and solved using examples provided in this post. The visualization project in question required that users be able to visualize trends in statistics data over significant historical periods (5+ years, 200+ points a day), support user selectable date ranges, and refresh UI visualization controls within an application mandated 3-5 second response window. Decimation proved to be both valueable in meeting project spec and allowed users to target specific ranges based on trends for further investigation (drill down) that are generally not available using Skip() & Take() methods.
For this post, we'll generate 1,000,000+ records of sample data and walk through the steps of creating a simple data decimation solution that accepts custom date range information and evenly displays decimated data for the requested date range. The T-SQL syntax can easily be ported into Stored Procedures, Views and more. Time for data generation!
Creating The Sample Data Set
For the purpose of this post, we'll add the following table schema to a stock AdventureWorks2008R2 SQL database, though any SQL database will do. We'll use this new schema to house 1,000,000 sample date and float value records for use with our sample data decimation solution.
CREATE TABLE [dbo].[SampleStats] (
[ID] [int] IDENTITY(1,1) NOT NULL,
[MeanDate] [date] NULL,
[MeanValue] [float] NULL
)
Now that we have our T-SQL table in place, we'll make use of Redgate's SQL Data Generator utility to rapidly fill our table with 1,000,000 randomly generated records for a five year period (Jan 1, 2006 - Dec 31, 2011).

Defining The Base Query
Over the course of the example, we'll make use of EXPR1 and EXPR2 table aliases as we add additional logic. For now, our base query is meat of our syntax designed to query Date and Mean value data from our SampleStats schema via what would be user-supplied @startDate and @endDate parameter values. This currently returns our 1,000,000 records.
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '2006-01-01'
SET @endDate = '2011-12-31'
SELECT
MeanDate ,
MeanValue
FROM
SampleStats
WHERE
MeanDate >= @startDate AND
MeanDate <= @endDate
Augmenting with ROW_NUMBER()
Next we'll add the all important RowNumber field ordering by our MeanDate field to ensure decimated data remains evenly spaced out by MeanDate for the queried period. The RowNumber field will be used in combination with a SQL Modulo operator in our next step.
...
MeanValue ,
ROW_NUMBER() OVER ( ORDER BY MeanDate ) AS RowNumber
FROM
SampleStats
...
Integrating SQL Modulo (%) and Nth Record Logic
In this step, we add a few things: EXPR1 alias, an @nthRecord parameter and a SQL Modulo operator. A Modulo (%) provides the remainder of one number divided by another. In this example, we return the MOD of each RowNumber using our @nthRecord value (1000) as the divisor.
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @nthRecord INT
SET @startDate = '2006-01-01'
SET @endDate = '2011-12-31'
SET @nthRecord = 1000
SELECT
EXPR1.MeanDate ,
EXPR1.MeanValue ,
EXPR1.RowNumber ,
(EXPR1.RowNumber % @nthRecord) AS RowMod
FROM
(SELECT
MeanDate ,
MeanValue ,
ROW_NUMBER() OVER ( ORDER BY MeanDate ) AS RowNumber
FROM
SampleStats
WHERE
MeanDate >= @startDate AND
MeanDate <= @endDate
) EXPR1
The @nthRecord value controls the amount of data returned. Our example makes use of a fairly divisible value (1000). This value would normally be the result of a custom algorithm based on any number of factors (i.e.: total table record count, max records returned, etc.). Our query now returns the following (note the RowMod field change):

Querying by the RowMod Field
Finally, we alias our current SELECT clause (EXPR2) and query for records where our aliased RowMod field equals 0 and order by RowNumber. The final example now looks like the following:
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @nthRecord INT
SET @startDate = '2006-01-01'
SET @endDate = '2011-12-31'
SET @nthRecord = 1000
SELECT
EXPR2.*
FROM
(SELECT
EXPR1.MeanDate ,
EXPR1.MeanValue ,
EXPR1.RowNumber ,
(EXPR1.RowNumber % @nthRecord) AS RowMod
FROM
(SELECT
MeanDate ,
MeanValue ,
ROW_NUMBER() OVER ( ORDER BY MeanDate ) AS RowNumber
FROM
SampleStats
WHERE
MeanDate >= @startDate AND
MeanDate <= @endDate
) EXPR1
) EXPR2
WHERE
EXPR2.RowMod = 0
ORDER BY
EXPR2.RowNumber
With this final step, our data decimation solution is now complete as we're querying for a mere fraction of our 1,000,000 records (every 1000th record) spanning a period of five years evenly spaced out thanks in part to the ordering by MeanDate syntax. One million records decimated down to 1,000 records in a mere two seconds. Extreme decimation, but you get the idea.

From an application perspective, we're returning a smaller number of records processed ideally, by a more powerful SQL host rather than selecting all records an iterating through them in application code. This is great for QUICKLY identifying trends in larger data sets prompting the user for more focused querying that is an advantage over regular Skip() / Take() methods of reviewing data in grids, etc. Cheers!