SQL Server Statistics
About Statistics:
Statistics is nothing but just like data about data and containing information about the column values of table or index.
Many factors are important for Sql server performance like types of locking, Blocking, Lock Escalation ,Transaction Isolation Levels, data partitioning etc.
Statistics use inside of query optimizer for determines to create and decide the best exaction plan estimation like optimizer checking estimate the number of rows affected by case by case for an SQL query.
Statistics always create on single columns when its auto created, but in manual creation can include multiple columns.
Create table:
create table dbo.saleshead
(
id bigint primary key identity,
si_id bigint not null,
custledger_id int not null,
branch_id int not null,
bill_amt decimal(9,2),
gst_amt decimal(9,2),
net_amt decimal(18,2),
);
SQL Server automatically create statistics object when we creating index on the table or indexed view.
For that case primary key created on “ID” columns and Primary key default creating clustered index so that case statistics automatically created on Id column.
Just Check stats:
Exec sp_helpstats ‘saleshead’,’All’
For more info also can check :
select t.name,s.stats_id,s.name stats_name,c.name columnname,c.column_id stats_column_id
from sys.stats s
inner join sys.stats_columns sc on s.object_id = sc.object_id and s.stats_id = sc.stats_id
inner join sys.columns c on sc.object_id = c.object_id and sc.column_id = c.column_id
inner join sys.tables t on t.object_id=sc.object_id
where object_name(s.object_id) = ‘saleshead’
order by s.stats_id, sc.column_id;
Populate data:
insert into dbo.saleshead
select *,bill_amt*12/100,bill_amt+(bill_amt*12/100) from (
select ROW_NUMBER()over(ORDER by a.object_id)+1000 si_id,
(cast(cast(NEWID() as VARBINARY(5))as BIGINT)%10000+1) custledger_id,
CEILING( ROW_NUMBER()over(ORDER by a.object_id)*.3) %10+1 branch_id,
(cast(cast(NEWID() as VARBINARY(5))as BIGINT)%1000000+1)bill_amt
from sys.objects a ,sys.objects b ,sys.objects c)p where p.si_id<1000000
SQL Server automatically creating statistics objects when column use in where
clause and its database’s auto create stats is on and statistics not created on the particular column.
For Check auto create stats is ON or OFF:
select is_auto_create_stats_on,is_auto_update_stats_on from sys.databases
where name =’demo’
If Set Auto stats is OFF and want to ON then execute following query :
ALTER DATABASE demo SET AUTO_CREATE_STATISTICS ON
And want to OFF
ALTER DATABASE demo SET AUTO_CREATE_STATISTICS OFF
Query Life Cycle :
For example: I’m selecting record from saleshead table where si_id=1000 and statistics not created on si_id colums, when following query executing that time sql server optimizer itself create Statistics over on si_id column before making execution plan :
select * from saleshead where si_id=1000
When checking 2 records found 1 was create with primary key and other one _WA_Sys_00000002_38996AB5 object create on the si_id column when 1st time execute select statement where si_id used in where clause. Auto created Statistic’s naming format is prefixed like
_WA : code for Washington state in the America.(starting 3 letter)
_Sys_:prefixed +right(’00000000’+column_id,8) _hex code of table object id
Statistics always create on single column when its auto created, but in manual creation can include multiple columns.
Exp-1 : I’m Create index on si_id,branch_id so statistics will create auto so there 2 new objects will create and it will be delete when index will drop.
create index ix_saleshead_si_id_branch_id on saleshead(si_id,branch_id)
Now check:
Exp-2 : there index already created on brach_id but this one is composite so stats will again create on branch_id when following sql query executing .
select * from saleshead where branch_id=15 and bill_amt>1000 and gst_amt>0
Exp-3 : When Ist index created on single column and after that execute select statement that case auto stats will not create like
create index ix_saleshead_net_amt on saleshead(net_amt)
After index create on net_amt column executing select statement with where clause but new stats object not created on net_amt like :
select * from saleshead where net_amt=0
Creating Manual statistics:
Sql Server automatically creating stats objects but we also can create own statistics when:
- Optimizer showing warnings as missing statistics in execution plan
- Database Engine Tuning Advisor recommend
- Query retrieves subset of data (like filter index)
- Query’s predicate references multiple correlated columns in separate indexes.
Ex.1
CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(net_amt) WHERE net_amt > 100 WITH FULLSCAN;
WITH FULLSCAN clause scan entire column in each stats object when updating the stats.
Ex.2
CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(net_amt) WITH FULLSCAN;
Ex.3
CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(branch_id,net_amt) WITH SAMPLE 10 PERCENT
Optimizer showing warnings as missing statistics in execution plan
For example I have create new demo1 database and set OFF Auto Stats creation and update
create database demo1
use demo1
alter database demo1 set auto_create_statistics off
alter database demo1 set auto_update_statistics off
Keep in mind stats always create when index will create and it’s no impact will come to auto stats creation is ON/OFF and yes when auto update state is OFF then its counter will not set automatically
1-Verify auto stats flag for testing purpose only:
select is_auto_create_stats_on,is_auto_update_stats_on from sys.databases where name =’demo1′
2-Recreate table and populating same as data
create table dbo.saleshead
(
id bigint primary key identity,
si_id bigint not null,
custledger_id int not null,
branch_id int not null,
bill_amt decimal(9,2),
gst_amt decimal(9,2),
net_amt decimal(18,2),
);
insert into dbo.saleshead
select ,bill_amt12/100,bill_amt+(bill_amt12/100) from ( select ROW_NUMBER()over(ORDER by a.object_id)+1000 si_id, (cast(cast(NEWID() as VARBINARY(5))as BIGINT)%10000+1) custledger_id, CEILING( ROW_NUMBER()over(ORDER by a.object_id).3) %10+1 branch_id,
(cast(cast(NEWID() as VARBINARY(5))as BIGINT)%1000000+1)bill_amt
from sys.objects a ,sys.objects b ,sys.objects c)p where p.si_id<1000000
3-Now Verify how many stats created: Their 1 state will found with Primary key
4-Now executing select statement and net_amt column use in where clause and checking graphical execution plan
select * from saleshead where net_amt=0
See their optimizer showing warning as No Statistics for net_amt column
Database Engine Tuning Advisor also recommend like :
Now recheck stats details their now new stats will found because Auto_create_Stats is OFF
Yes new stats no found
Warning will remove when stats manually on net_amt column just see-
Statistics Updating
SQL Server usually keeping up to date but in some circumstances we need to update manually like
- Queries are running slower and its modification date is very old
- When have identity column and lot of data/bulk inserting operation fire with identity inserting in ascending / descending order
modification_counter maintaining number of rows modified counter on column wise and its reset counter values as 0 and update last_updated date as now date time when its counter cross the threshold values
Threshold values calculation formula: 20% of Total rows +500
If TableA have 1000 rows then its 20% is 200 rows +500 =700 so counter will rest 0 when counter’s value>700 and after that particular column used in where clause of select statements.
Get more stats information using dynamic model views:
select object_name(s.object_id)tblname,
c.name column_name,s.stats_id ,s.name stats_name, sp.last_updated,
sp.rows,sp.rows_sampled,sp.unfiltered_rows, sp.modification_counter, s.filter_definition,s.auto_created, s.user_created ,case when s.auto_created=0 and s.user_created=0 then 1 else 0 end index__created
from sys.stats s
inner join sys.stats_columns sc on s.object_id = sc.object_id and s.stats_id = sc.stats_id
inner join sys.columns c on sc.object_id = c.object_id and sc.column_id = c.column_id
outer apply sys.dm_db_stats_properties(s.object_id,s.stats_id)sp
where s.object_id=object_id(‘saleshead’)
If saleshead table have 998999 rows then its 20% is 199799.8 rows +500 =200299.8 so counter will rest 0 when counter’s value>200300. Let’s check
Counter update the value and also cross the threshold values but its last update date not modify its showing same because after that select statement not fire:
Now selecting and branch_id column using where clause and recheck:
Yes, its counter is reset and last modify date also update
Keep in mind counter will not reset when we using multiple column in where clause
select * from saleshead where branch_id=15 and bill_amt>1000 and gst_amt>0
Specific Stats Update manually:
update statistics dbo.saleshead(ix_saleshead_si_id_branch_id) with fullscan, norecompute;
Update all Stats manually over on table :
database level all stats update:
EXEC sp_updatestats;
Statistics histograms
DBCC SHOW_STATISTICS (saleshead, ix_saleshead_si_id_branch_id)
Show_statistics: returns Default 3 result set :
Header:
- Name : Name of the statistics object
- Updated: stats last time update date and time
- Rows: no of rows in table
- Rows Sampled : no of rows as sampling, Sampled and Rows are same when stats is manual created
- Steps: number of steps within the histogram (3 no result )
- Density : density of values of this particular columns in table
- Average key length: Avg length of value
- String Index: its string index
- Filter Expression: if its filter expression used then print its alse show null
- Unfiltered Rows: Total records
Densities:
Density helps to determine the selectivity of the column values. The number of rows in this section depends on the number of participating columns.
Histogram:
Value distribution across the column (stats column) data, incremented in steps based on the number of rows, up to 200 steps per statistics object.
- RANGE_HI_KEY: show range/steps for histogram
- RANGE_ROWS :no of rows in the range like above histograms has 14 steps and 1st step is 101.00 & next is 30374.00 and range_rows is 26938 means total no of rows 26938 comes when execute following query select count(*) from saleshead where net_amt>101.00 and net_amt<30374.00 ;
- EQ_ROWS :show how many rows are equal in particular step like 7 values come for 30374.00 steps lets check select count(*) from saleshead where net_amt=30374.00 ;
- DISTINCT_RANGE_ROWS : Show how many unique values exists between the range select count(DISTINCT net_amt ) from saleshead where net_amt>101.00 and net_amt<30374.00 ;
- AVG_RANGE_ROWS :show avg no of rows come in between range like avg_range_rows=range_rows/distinct_range_rows
Recent Comments