SQL Server Statistics

About Statistics:

Statistics is nothing but just like data about data and containing information about the column values of table or index.

Many factors are important for Sql server performance like types of locking, Blocking, Lock Escalation ,Transaction Isolation Levels, data partitioning etc.

Statistics use inside of query optimizer for determines to create and decide the best exaction plan estimation like optimizer checking estimate the number of rows affected by case by case for an SQL query.

Statistics always create on single columns  when its auto created, but in manual creation can include multiple columns.

Create table:

create table dbo.saleshead
(
  id bigint primary key identity,
  si_id bigint not null,
  custledger_id int not null,
  branch_id int not null,
  bill_amt decimal(9,2),
  gst_amt decimal(9,2),
  net_amt decimal(18,2),
);

SQL Server automatically create statistics object when we creating index on the table or indexed view.

For that case primary key created on “ID” columns and Primary key  default creating clustered index so that case statistics automatically created  on Id column.

Just Check stats:

Exec sp_helpstats ‘saleshead’,’All’ 

Exec sp_helpstats 'saleshead','All'

For more info also can check :

select t.name,s.stats_id,s.name stats_name,c.name columnname,c.column_id stats_column_id
from sys.stats s
inner join sys.stats_columns sc on s.object_id = sc.object_id and s.stats_id = sc.stats_id
inner join sys.columns c on sc.object_id = c.object_id and sc.column_id = c.column_id
inner join sys.tables t on t.object_id=sc.object_id
where object_name(s.object_id) = ‘saleshead’
order by s.stats_id, sc.column_id;

 select t.name,s.stats_id,s.name stats_name,c.name columnname,c.column_id  stats_column_id 
 from sys.stats s 
 inner join sys.stats_columns sc on s.object_id = sc.object_id and s.stats_id = sc.stats_id
 inner join sys.columns c on sc.object_id = c.object_id and sc.column_id = c.column_id
 inner join sys.tables t on t.object_id=sc.object_id
 where object_name(s.object_id) = 'saleshead'
 order by s.stats_id, sc.column_id;

Populate data:

insert into dbo.saleshead
select *,bill_amt*12/100,bill_amt+(bill_amt*12/100) from (
select ROW_NUMBER()over(ORDER by a.object_id)+1000 si_id, 
          (cast(cast(NEWID() as VARBINARY(5))as BIGINT)%10000+1) custledger_id,
          CEILING( ROW_NUMBER()over(ORDER by a.object_id)*.3) %10+1 branch_id,
          (cast(cast(NEWID() as VARBINARY(5))as BIGINT)%1000000+1)bill_amt
from sys.objects a ,sys.objects b ,sys.objects c)p where p.si_id<1000000

SQL Server automatically creating statistics objects when column use in where clause and its database’s auto create stats is on and statistics not created on the particular column.

For Check auto create stats is ON or OFF:

select is_auto_create_stats_on,is_auto_update_stats_on from sys.databases
where name =’demo’

select is_auto_create_stats_on,is_auto_update_stats_on from sys.databases 
 where name ='demo'

If Set Auto stats is OFF and want to ON then execute following query :

ALTER DATABASE demo SET AUTO_CREATE_STATISTICS ON
And want to OFF
ALTER DATABASE demo SET AUTO_CREATE_STATISTICS OFF

Query Life Cycle :

Query Life Cycle

For example: I’m selecting record from saleshead table where si_id=1000 and statistics not created on si_id colums, when following query executing that time sql server optimizer itself create Statistics over on si_id column before making execution plan :

select * from saleshead where si_id=1000

select * from saleshead where si_id=1000

When checking 2 records found 1 was create with primary key and other one  _WA_Sys_00000002_38996AB5  object  create on the si_id column when 1st time execute select statement where  si_id used in where clause. Auto created Statistic’s naming format is prefixed like

_WA : code for Washington state in the America.(starting 3 letter)

_Sys_:prefixed +right(’00000000’+column_id,8) _hex code of table object id

Statistics always create on single column when its auto created, but in manual creation can include multiple columns.

Exp-1 : I’m Create index on si_id,branch_id so statistics will create auto so there 2 new objects will create and it will be delete when index will drop.

create index ix_saleshead_si_id_branch_id on saleshead(si_id,branch_id)

Now check:

new objects will create

Exp-2 : there index already created on brach_id  but this one is composite so stats will again create on branch_id when following sql query executing .

select * from saleshead where  branch_id=15  and bill_amt>1000 and gst_amt>0

 this one is composite so stats will again create on branch_id

Exp-3 : When Ist  index  created on single column and after that execute select  statement that case auto stats will not create like

create index ix_saleshead_net_amt on saleshead(net_amt)

After index create on net_amt column executing select statement with where clause but new stats object not created on net_amt like :

select * from saleshead  where  net_amt=0

After index create on net_amt column executing select statement with where clause but new stats object not created on net_amt like

Creating Manual statistics:

Sql Server automatically creating stats objects but we also can create own statistics when:

  • Optimizer showing warnings  as missing statistics in execution plan
  • Database Engine Tuning Advisor recommend
  • Query retrieves subset of data (like filter index)
  • Query’s predicate references multiple correlated columns in separate indexes.  

Ex.1   

CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(net_amt) WHERE net_amt > 100 WITH FULLSCAN;

WITH FULLSCAN clause scan entire column in each stats object when updating the stats.

Ex.2

CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(net_amt) WITH FULLSCAN;

Ex.3

CREATE STATISTICS stats_saleshead_netamt ON dbo.saleshead(branch_id,net_amt) WITH SAMPLE 10 PERCENT

Optimizer showing warnings as missing statistics in execution plan

For example I have create new demo1 database and set OFF Auto Stats creation and update

create database demo1
use demo1
 
alter database demo1 set auto_create_statistics off
alter database demo1 set auto_update_statistics off

Keep in mind stats always create when index will create and it’s no impact will come to auto stats creation is ON/OFF and yes when auto update state is OFF then its counter will not set automatically

1-Verify auto stats flag for testing purpose only:

select is_auto_create_stats_on,is_auto_update_stats_on from sys.databases where name =’demo1′

2-Recreate table and populating same as  data

create table dbo.saleshead
(
  id bigint primary key identity,
  si_id bigint not null,
  custledger_id int not null,
  branch_id int not null,
  bill_amt decimal(9,2),
  gst_amt decimal(9,2),
  net_amt decimal(18,2),
);

insert into dbo.saleshead
select ,bill_amt12/100,bill_amt+(bill_amt12/100) from ( select ROW_NUMBER()over(ORDER by a.object_id)+1000 si_id, (cast(cast(NEWID() as VARBINARY(5))as BIGINT)%10000+1) custledger_id, CEILING( ROW_NUMBER()over(ORDER by a.object_id).3) %10+1 branch_id,
(cast(cast(NEWID() as VARBINARY(5))as BIGINT)%1000000+1)bill_amt
from sys.objects a ,sys.objects b ,sys.objects c)p where p.si_id<1000000

3-Now Verify how many stats created: Their 1 state will found with Primary key

Now Verify how many stats created: Their 1 state will found with Primary key

4-Now executing select statement and net_amt column use in where clause and checking graphical execution plan

select * from saleshead  where  net_amt=0

Now executing select statement and net_amt column use in where clause and checking graphical execution plan

See their optimizer showing warning as No Statistics for net_amt column

Database Engine Tuning Advisor also recommend like :

Database Engine Tuning Advisor also recommend like

Now recheck stats details their now new stats will found because Auto_create_Stats is OFF 

Now recheck stats details their now new stats will found because Auto_create_Stats is OFF

Yes new stats no found

Warning will remove when stats manually on net_amt column just see-

Warning will remove when stats manually on net_amt column just see

Statistics Updating

SQL Server usually keeping up to date but in some circumstances we need to update manually like

  • Queries are running slower and its modification date is very old
  • When have identity column and lot of data/bulk inserting operation fire with identity inserting in ascending / descending order

modification_counter maintaining number of rows modified counter  on column wise and its reset counter values as 0 and update last_updated date as now date time when its counter cross the threshold values

Threshold values calculation formula: 20% of Total rows +500

If  TableA have 1000 rows then its 20% is 200 rows +500 =700 so counter will rest 0 when counter’s value>700 and after that particular column used in where clause of select statements.

Get more stats information using dynamic model views:

select object_name(s.object_id)tblname,
 c.name column_name,s.stats_id ,s.name stats_name, sp.last_updated,
 sp.rows,sp.rows_sampled,sp.unfiltered_rows, sp.modification_counter, s.filter_definition,s.auto_created, s.user_created ,case when s.auto_created=0 and s.user_created=0 then 1 else 0 end index__created
from sys.stats s
inner join sys.stats_columns sc on s.object_id = sc.object_id and s.stats_id = sc.stats_id
inner join sys.columns c on sc.object_id = c.object_id and sc.column_id = c.column_id
outer apply sys.dm_db_stats_properties(s.object_id,s.stats_id)sp
where s.object_id=object_id(‘saleshead’)

Get more stats information using dynamic model views

If  saleshead table have 998999 rows then its 20% is 199799.8 rows +500 =200299.8  so counter will rest 0 when counter’s value>200300. Let’s check

update statics column

Counter update the value and also cross the threshold values but its last update date not modify its showing same because after that select statement not fire:

Counter update the value and also cross the threshold values but its last update date not modify

Now selecting and branch_id column using where clause and recheck:

Now selecting and branch_id column using where clause and recheck:

Yes, its counter is reset and last modify date also update

Keep in mind counter will not reset when we using multiple column in where clause

select * from saleshead where  branch_id=15  and bill_amt>1000 and gst_amt>0

Specific Stats Update manually:

update statistics dbo.saleshead(ix_saleshead_si_id_branch_id) with fullscan, norecompute;

Specific Stats Update manually

Update all Stats manually over on table :

Update all Stats manually over on table

database level all stats update:

EXEC sp_updatestats;

Statistics histograms

DBCC SHOW_STATISTICS (saleshead, ix_saleshead_si_id_branch_id)

histogram

Show_statistics: returns Default 3 result set :

Header:  

  • Name : Name of the statistics object
  • Updated: stats last time update date and time     
  • Rows: no of rows in table
  • Rows Sampled : no of rows as sampling, Sampled and Rows are same when stats is manual created
  • Steps: number of steps within the histogram (3 no result )
  • Density : density of values of this particular columns in table
  • Average key length: Avg length of value
  • String Index:  its string index
  • Filter Expression:      if its filter expression used then print its alse show null
  • Unfiltered Rows: Total records

Densities:

Density helps to determine the selectivity of the column values. The number of rows in this section depends on the number of participating columns.

Histogram:

Value distribution across the column (stats column) data, incremented in steps based on the number of rows, up to 200 steps per statistics object.

  • RANGE_HI_KEY: show range/steps for histogram
  • RANGE_ROWS :no of rows in the range like above histograms has 14 steps and 1st step is 101.00 & next is 30374.00 and range_rows is 26938 means total no of rows 26938 comes when execute following query select count(*) from saleshead where net_amt>101.00 and net_amt<30374.00 ;
  • EQ_ROWS :show how many rows are equal in particular step like 7 values come for 30374.00 steps lets check select count(*) from saleshead where net_amt=30374.00 ;
  • DISTINCT_RANGE_ROWS : Show how many unique values exists between the range select count(DISTINCT net_amt ) from saleshead where net_amt>101.00 and net_amt<30374.00 ;
  •  AVG_RANGE_ROWS :show avg no of rows come in between range like avg_range_rows=range_rows/distinct_range_rows

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *