clapotiot-logo

Spark SQL PIVOT Clause

Facebook
Twitter
LinkedIn
WhatsApp

In this blog we will see how to implement pivot clause in Spark SQL with an example.

Pivot Clause :

The PIVOT clause is used for data perspective. We can get the aggregated values based on specific column values, which will be turned to multiple columns used in SELECT clause. The PIVOT clause can be specified after the table name or subquery.

Syntax:

  PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , … ] FOR column_list IN ( expression_list ) )

Parameters:

aggregate_expression :

Specifies an aggregate expression (SUM(a), COUNT(DISTINCT b), etc.).

aggregate_expression_alias :

Specifies an alias for the aggregate expression.

column_list:

Contains columns in the FROM clause, which specifies the columns we want to replace with new columns. We can use brackets to surround the columns, such as (c1, c2).

expression_list:

Specifies new columns, which are used to match values in column_list as the aggregating condition. We can also add aliases for them.

1. First we will read and view CSV data:

df=spark.read.option("header","true").option("inferSchema","True").csv("file:///home/skiganesh/codes/sources/telecomsales/telecomsale.csv")
df.show()

2. After reading CSV create and view temp view:

 df.createOrReplaceTempView("sales")
 spark.sql("select * From sales").show()

3. Finally we will aggregate required columns and view Pivot result:

spark.sql("select * from sales pivot ( sum(sales) for month in ('Jan' as Jan, 'Feb' as Feb, 'Mar' as Mar))".show()

Reference:

https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-pivot.html

Leave a Comment

Your email address will not be published. Required fields are marked *

Copyright © Claypot Technologies 2021