Traditionally, businesses have had separate systems for operational (i.e. OLTP) and analytics workloads. For such systems, Extract, Transform, and Load (ETL) jobs regularly move the data from the operational store to an analytics store. The analytics data is usually stored in a data warehouse or data mart dedicated to running analytics queries. While this solution has been the standard, it has these three key challenges:
SQL Server 2016 introduces Real-time operational analytics, the ability to run both analytics and OLTP workloads on the same database tables at the same time. Besides running analytics in real-time, you can also eliminate the need for ETL and a data warehouse. Real-Time Operational Analytics enables running analytics queries directly on your operational workload using Columnstore indexes thereby eliminating any data latency.
Real-time analytics uses an updateable Columnstore index on a rowstore table. The Columnstore index maintains a copy of the data, so the OLTP and analytics workloads run against separate copies of the data. This minimizes the performance impact of both workloads running at the same time. SQL Server automatically maintains index changes so OLTP changes are always up-to-date for analytics. With this design, it is possible and practical to run analytics in real-time on up-to-date data. This works for both disk-based and memory-optimized tables. The image below shows one possible configuration using Analysis Server in Direct Query mode, but you can use any analytics tool or custom solution at your disposal. This solution addresses the drawbacks mentioned above since the analysis happens directly on the operational data.
The schema for OLTP databases is highly normalized (i.e. with minimal data duplication) which when used for analytic queries could lead to poor performance primarily because of complexity of joins between larger number of tables. The significant query speed up made possible by Columnstore index (i.e. NCCI) can overcome the complexity of query and still deliver most analytics queries in few seconds. At this point, it is important to emphasize the analytics query performance with real-time operational analytics will not be as fast as you can get with a dedicated DW but the key benefit is the ability to do analytics in real-time.
SELECT DatabasePropertyEx(DB_NAME(), 'IsXTPSupported');
NOTE: If you do not have any database you can use sample database Adventureworks2016CTP3.
For downloading sample database, you can refer to steps in 4.1.1.
NOTE: For reference, we are using Adventureworks2016CTP3
database throughout the document.
Download AdventureWorks sample databases and restore it to SQL Server 2016 instance.
Note: The provided link is mapped to SQL 2016 CTP3 version downloads. Look for the latest version.
Select top 10 * from Sales.Customer
Post Condition
SQL Server 2016 CTP3 is up and running in the HSP server.
If you have downloaded the sample database then from SQL management studio, you can see the AdventureWorks2016CTP3 database created on server.
Figure 3 Customer records displayed through the Management Services
Figure 3 executes a simple SELECT query to retrieve data from the restored database.
SQL Server 2016 supports real-time operational analytics both for disk-based (i.e. traditional tables) and memory-optimized tables without requiring any changes to your transactional workload. For disk-based tables, you can do it by creating a non-clustered Columnstore index (NCCI).
We'll first walkthrough scenarios implementing real time analytics for disk based tables.
Script:
--Creating disk-based Orders Table
create table orders
(
AccountKey int not null,
customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint not NULL,
OrderStatusDesc nvarchar (50)
)
--Create a Non Clustered Columnstore Index (NCCI)
CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders (accountkey, customername, purchaseprice, orderstatus, orderstatusdesc)
NOTE: The Columnstore index on an in-memory table allows operational analytics by integrating in-memory OLTP and in-memory Columnstore technologies to deliver high performance for both OLTP and analytics workloads. The Columnstore index on an in-memory table must include all the columns.
Let us now take an example of how NCCI can be used on a regular rowstore table in the context of an Order Management application:
The key table for this application is 'orders' that tracks customer information, purchase price and the order status. It is expected that over time a large number of rows will have the order status as 'Order Received' (i.e. the order has been received by the customer. We don't expect any more changes to the order once it has been received by the customer). So, once the order has been received by the customer, it can be considered that this row is now 'cold'.
--Creating disk-based Orders Table
create table orders
(
AccountKey int not null,
customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint not NULL,
OrderStatusDesc nvarchar (50)
)
--Creating Clustered Index on OrderStatus
create
clustered
index orders_ci on orders(OrderStatus)
-- Loading 3 million rows with the data pattern that 95% of the orders have already been received by the customer.
declare @outerloop int = 0
declare @i int = 0
declare @purchaseprice decimal (9,2)
declare @customername nvarchar (50)
declare @accountkey int
declare @orderstatus smallint
declare @orderstatusdesc nvarchar(50)
declare @ordernumber bigint
while (@outerloop < 3000000)
begin
Select @i = 0
begin tran
while (@i < 2000)
begin
set @ordernumber = @outerloop + @i
set @purchaseprice = rand() * 1000.0
set @accountkey = convert (int, RAND ()*1000)
set @orderstatus = convert (smallint, RAND()*100)
if (@orderstatus >= 5)
set @orderstatus = 5
set @orderstatusdesc =
case @orderstatus
WHEN 0 THEN 'Order Started'
WHEN 1 THEN 'Order Closed'
WHEN 2 THEN 'Order Paid'
WHEN 3 THEN 'Order Fullfillment'
WHEN 4 THEN 'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey, (convert (varchar (6), @accountkey) + 'firstname'),@ordernumber, @purchaseprice,@orderstatus, @orderstatusdesc)
set @i += 1;
end
commit
set @outerloop = @outerloop + 2000
set @i = 0
end
go
NOTE: It is just a DDL operation and similar to any other btree index that you would create on a rowstore table. No changes to the application needed because the NCCI is updateable in SQL Server 2016 so your transaction workload will continue to run.
--Create a Non Clustered Columnstore Index (NCCI)
CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders (accountkey, customername, purchaseprice, orderstatus, orderstatusdesc)
--Loading another 200k orders but these orders are not 'yet' received by the customers.
--insert additional 200k rows
declare @outerloop int = 3000000
declare @i int = 0
declare @purchaseprice decimal (9,2)
declare @customername nvarchar (50)
declare @accountkey int
declare @orderstatus smallint
declare @orderstatusdesc nvarchar(50)
declare @ordernumber bigint
while (@outerloop < 3200000)
begin
Select @i = 0
begin tran
while (@i < 2000)
begin
set @ordernumber = @outerloop + @i
set @purchaseprice = rand() * 1000.0
set @accountkey = convert (int, RAND ()*1000)
set @orderstatus = convert (smallint, RAND()*5)
set @orderstatusdesc =
case @orderstatus
WHEN 0 THEN 'Order Started'
WHEN 1 THEN 'Order Closed'
WHEN 2 THEN 'Order Paid'
WHEN 3 THEN 'Order Fulfillment'
WHEN 4 THEN 'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey, (convert(varchar(6), @accountkey) + 'firstname'), @ordernumber, @purchaseprice, @orderstatus, @orderstatusdesc)
set @i += 1;
end
commit
set @outerloop = @outerloop + 2000
set @i = 0
end
go
Post Conditions
1. Once we have defined Columnstore indexes on the table(s) identified, we'll now query the table with and without NCCI (Non Clustered Columnstore index) defined on the table to see the performance.
1.1 CPU time when running Query with NCCI defined on Table:
1.2 CPU Time when running the same query without NCCI defined on the table:
NOTE: Note that the query with NCCI ran 25x faster and took 1/60th CPU resources. The query optimizer picks NCCI for analytics queries and provides significant speed up over traditional btree indexes. Also, note that the difference in performance will only widen as you add more data.
2. Now, let's check the details on the Columnstore index.
NOTE: You will see that 3 million rows are compressed while the new 200k rows that were added after defining the non-clustered Columnstore Index on the table are in the delta rowgroup. In SQL Server 2016, the delta rowgroups for Clustered Columnstore Index (CCI) and Non-Clustered Columnstore Index (NCCI) are uncompressed unlike in SQL Server 2014 where delta rowgroup was PAGE compressed.
NOTE: The query optimizer automatically picks Non Clustered ColumnStore Index for analytics queries and provides significant speed up.
To minimize the overhead of maintaining a non-clustered Columnstore index on an operational workload, you can use a filtered condition to create a non-clustered Columnstore index only on the warm or slowly changing data.
GOAL: To design the filtered condition and subsequent queries to access frequently changing "hot" data from the btree index, and to access the more stable "warm" data from the Columnstore index.
--Using a filtered condition to separate hot data in a rowstore table
-- From "warm" data in a Columnstore index.
CREATE TABLE
orders_filteredNCCI (
AccountKey int not null,
Customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint not null,
OrderStatusDesc nvarchar (50))
-- OrderStatusDesc
-- 0 => 'Order Started'
-- 1 => 'Order Closed'
-- 2 => 'Order Paid'
-- 3 => 'Order Fulfillment Wait'
-- 4 => 'Order Shipped'
-- 5 => 'Order Received'
--Create the Columnstore index with a filtered condition
CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci_filteredindex ON orders_filteredNCCI (accountkey, customername, purchaseprice, orderstatus)
where orderstatus = 5
NOTE: The query optimizer will consider, but not always choose, the Columnstore index for the query plan. When the query optimizer chooses the filtered Columnstore index, it transparently combines the rows both from Columnstore index as well as the rows that do not meet the filtered condition to allow real-time analytics.
2. This is all you need to do!
Let us now take an example of how NCCI can be used with predicate to improve query performance:
The example below shows how to use filtered indexes to minimize impact of non-clustered Columnstore index on transactional workload while still delivering analytics in real-time.
Taking the same example as above of Orders Table:
--Create Orders Tables
CREATE TABLE orders_filteredNCCI (
AccountKey int not null,
Customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint not null,
OrderStatusDesc nvarchar (50))
-- OrderStatusDesc
-- 0 => 'Order Started'
-- 1 => 'Order Closed'
-- 2 => 'Order Paid'
-- 3 => 'Order Fulfillment Wait'
-- 4 => 'Order Shipped'
-- 5 => 'Order Received'
--Creating Btree Indexes
CREATE CLUSTERED INDEX orders_ci_filteredindex ON orders_filteredNCCI(OrderStatus)
-- Loading 3 million rows with the data pattern that 95% of the orders have already been received by the customer.
declare @outerloop int = 0
declare @i int = 0
declare @purchaseprice decimal (9,2)
declare @customername nvarchar (50)
declare @accountkey int
declare @orderstatus smallint
declare @orderstatusdesc nvarchar(50)
declare @ordernumber bigint
while (@outerloop < 3000000)
begin
Select @i = 0
begin
tran
while (@i < 2000)
begin
set @ordernumber = @outerloop + @i
set @purchaseprice = rand() * 1000.0
set @accountkey = convert (int, RAND ()*1000)
set @orderstatus = convert (smallint, RAND()*100)
if (@orderstatus >= 5)
set @orderstatus = 5
set @orderstatusdesc =
case @orderstatus
WHEN 0 THEN 'Order Started'
WHEN 1 THEN 'Order Closed'
WHEN 2 THEN 'Order Paid'
WHEN 3 THEN 'Order Fulfillment'
WHEN 4 THEN 'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey, (convert (varchar (6), @accountkey) + 'firstname'),@ordernumber, @purchaseprice,@orderstatus, @orderstatusdesc)
set @i += 1;
end
commit
--Create the columnstore index with a filtered condition
CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci_filteredindex ON orders_filteredNCCI (accountkey, customername, purchaseprice, orderstatus)
where orderstatus = 5
--Load another 200k orders but these orders are not 'yet' received by the customers.
-- in the same way as above
NOTE: To minimize the overhead of maintaining a non-clustered columnstore index on an operational workload, you can use a filtered condition to create a non-clustered columnstore index only on the warm or slowly changing data. For example, in an order management application, you can create a non-clustered columnstore index on the orders that have already been shipped. Once the order has shipped, it rarely changes and therefore can be considered warm data. With Filtered index, the data in non-clustered columnstore index requires fewer updates thereby lowering the impact on transactional workload.
Post Conditions
1. Once we have defined Columnstore indexes with a filtered condition, we'll now query the table to see the performance improvement.
Run the below query on Orders table:
-- The following query returns the total purchase done by customers for items > $100 .00
SELECT top 5 customername, sum (PurchasePrice)
FROM orders_filteredNCCI
WHERE purchaseprice > 100.0
Group By customername
RESULT:
The analytics query will execute with the following query plan. Also, the rows not meeting the filtered condition are accessed through clustered btree index.
NOTE: SQL Server uses the clustered index to quickly scan the rows that did not meet the filtered condition. Without this clustered index, a full table scan of the rowstore table will be required to find these rows which can negatively impact the performance of analytics query significantly. In the absence of clustered index, you could create a complementary filtered non-clustered btree index to identify such rows but it is not recommended because accessing large range of rows through non-clustered btree indexes is expensive.
SQL Server 2016 combines memory-optimized table for extreme OLTP and in-memory analytics using columnstore to deliver real-time operational analytics to customers. You can achieve it with no changes to your operational workload.
Following are the key differences from NCCI:
NOTE: A columnstore index on in-memory table can only be added as part of table creation.
--Create a memory-optimized table with a columnstore index
create table orders_Inmemory (AccountKey int not null PRIMARY KEY NONCLUSTERED,
customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint NOT NULL,
OrderStatusDesc nvarchar (50),
INDEX t_orders_cci CLUSTERED COLUMNSTORE)
WITH (MEMORY_OPTIMIZED = ON);
Post Conditions
Once we have defined Columnstore indexes on the memory optimized table(s), analytics queries provides significant speed up as shown for disk based tables.