clapotiot-logo

Run Spark SQL On File Directly

Facebook
Twitter
LinkedIn
WhatsApp

Spark built-in data sources are json, parquet, jdbc, orc, libsvm, csv, text.

Instead of using read API to load a file into DataFrame and query it, you can also query that file directly with SQL.

To run SQL on CSV file:

df = spark.sql("select * from csv.`file:///home/skiganesh/codes/sources/emp.csv`")
df.show()

To run SQL on JSON file:

df2 = spark.sql("select * from json.`file:///home/skiganesh/codes/sources/people.json`")
df2.show()

To run SQL on ORC file:

df3 = spark.sql("select * from orc. `/user/hive/warehouse/salespartition_dynamic/state=Texas/000000_0`")
df3.show(5)

References:

https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html

Leave a Comment

Your email address will not be published. Required fields are marked *

Copyright © Claypot Technologies 2021