Window Functions

What is Window Functions

In simple terms window or ranking functions execute on set of rows and return a single aggregated value for each row.

Window functions using an OVER() clause but OVER() optional is some cases (in aggregate types function), it’s have 3 types . 

  1. Aggregate Window Functions like COUNT(),SUM(), MAX(), MIN(), AVG().
  2. Ranking Window Functions like RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
  3. Value Window Functions like LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

1. Aggregate Window Functions

Count (): Just count number records or rows, it’s also accept DISTINCT but can’t able to use when using OVER () clause.

  • Count(*): it will count Total number of rows,
    • Count(columnname):it will count Total Rows where have values means NULL not count  ,
    • Count(DISTINCT columnname): count unique column values  

SUM (): sum of numeric values, it’s also accept DISTINCT but can’t able to use when using OVER () clause.

  • SUM (columnname): it will add/sum Total values of specified column
    • Count(DISTINCT columnname): it will add/sum unique values of specified 

AVG (): Just calculating average of numeric column, it’s also accept DISTINCT but can’t able to use when using OVER () clause.

  • AVG (columnname): it will calculate average of specified column for total rows.
    • AVG (DISTINCT columnname): it will calculate average of specified column for unique values rows.

MIN (): retrieve a minimum values from specific column its use like MIN (columnname), its generally use in numeric column but it’s also can apply on string column there it will check 1st character and NULL excluding when minimum values retrieve.

MAX (): this same as MIN () but MAX () retrieving maximum values and MIN() is minimum value.

Note: Aggregate function’s always need group by clause when any one column not aggregating but there actual rows/data can’t display means always show Aggregated.

GroupBy

Just see you can’t see your actual data/row if we need to show Aggregated info with actual data then need to use OVER()clause like

group by with over

See in query there I’m aggregating data with vouchertypeid, datestamp columns but there group by not using because OVER () used with each aggregate function.

Just Notice there data aggregated on table based if want to any group based then need to use partition option in side of OVER () clause like

group by with partitioning

See in query there we have used OVER () clause with PARTITION (based on vouchertypeid column) so aggregating data according to vouchertypeid group.

When need to calculating running aggregation then also need to ORDER BY clause, like 

aggregation

2. Ranking Window Functions

RANKING functions similar aggregate function but ranking function making rank the row values of a specified column according to their rank, RANKING don’t work without OVER (ORDER BY <Columnname>) and PARTITION BY clause is optional for all RANKING functions.

ROW_NUMBER (): Used for give a unique row number to each record based on the specified column value, clause.  

RANK (): Its also use to give rank for each row but it have two records with same value then the RANK() function will assign the same rank to both records and it will skipping the next rank.

DENSE_RANK ():The DENSE_RANK()  same as RANK() but it does not skip any rank, means if 2 same value come then it will assign same rank but it will not skip in next rank .

NTILE (): NTILE divides rows into equal sized buckets. Like we have 10 rows and NTILE (5) then it will divide 10 into 5 buckets with 2 rows each. When use NTILE (6), then it will divided 6 buckets with 2 rows in starting 4 buckets and 1-1 row in last 2 bucket.

ranking functions

When using PARTITION BY clause then it makes groups ,in below example I have created partition on datestamp column and VoucherTypeID  column means it will making group on datestamp column and give ranking on VoucherTypeID like

ranking with partition

3. Value Window Functions

Lag function also similar to ranking functions but Lag helps to access values from previous records, and Lead function helps to access values from next records in the data set.

LAG (): Lag function retrieving values from previous without using self-join.

LEAD (): Lead function retrieving values from next without using self-join.

Syntax:

LAG / LEAD ([column name],[offset],[default value]) OVER([[PARTITION BY]  column name]| ORDER BY column name)

Offset:  Optional parameter, its default values is 1. It is decide how many rows travel from the current row in a result set.

Default: Optional parameter, its default values is null, if the offset goes out of the bounds from the result set then it will return default value PARTITION BY: This one also Optional parameter in OVER () clause and it is work same as Ranking and aggregate. 

LAG / LEAD
LAG / LEAD with over

-1 showing where was null value in above screen short and 2 row values jumping because there offset 2 set and default value is -1 assign.

LAG / LEAD with over function and jumping

There we have use PARTITION clause so there 2 groups created according to datestamp column

FIRST_VALUE (): returns the first value from result set.

FIRST_VALUE () ,LAST_VALUE (): returns the Last value from result set.

FIRST_VALUE and LAST_VALUE

See above screen short there first_value picked as minimum values from result set but last_values picked as id values because its checking max id as current rows value but we want to pick last id from this result set then need to s upper and lower bound like   

select id,datestamp,vouchertypeid, 
          first_value(id )over(order by id ) first_value, 
          last_value(id )over(order by id rows between unbounded preceding and unbounded following ) last_value
from voucher where datestamp =’20140418′

FIRST_VALUE and LAST_VALUE Example

When need to get first and last value according to group then we can use partition like

FIRST_VALUE and LAST_VALUE with partition

PERCENTILE_CONT():use for calculating a median but takes average of the two values closest to the middle.   

PERCENTILE_DISC(): use for calculating a median but will return a value that exists in the set. Ex.1

Ex.1

select vehicle_id , 
          percentile_cont(0.5) within group(order by vehicle_id)  over() as percentilecont,
    percentile_disc(0.5) within group(order by vehicle_id) over() as percentiledisc
from vehicles where company_id=1 and vehicle_id<22

percentile_cont

Formula 1: total_rows%2=1 then it will x= (total_rows/2)+1 row value print


Formula 2: total_rows%2=0 then row_value1=(total_rows/2), row_value2=(total_rows/2)+1, x=(row_value1+row_value2)/2) value print

 
For Ex1 have 10 rows so 10/2=0 so Formula 2 will apply rowval1=(10/2)=>5 and values of row 5 is 14 so rowval1=14 and rowval2=(10/2)=>5+1 means rownumber 6th values is 15 so (14+15)/2=14.5 so x=14.5 Is Percentile_Cont
=> Percentile_Cont value not exists in result set but 14 exists so 14 as Percentile Disc

Ex.2

percentile_dist

There formula1 will apply for percentilecount because total rows are 11 so 11%2=1 so 11/2=5+1 will select

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *