Spark built-in data sources are json, parquet, jdbc, orc, libsvm, csv, text.
Instead of using read API to load a file into DataFrame and query it, you can also query that file directly with SQL.
To run SQL on CSV file:
df = spark.sql("select * from csv.`file:///home/skiganesh/codes/sources/emp.csv`") df.show()
To run SQL on JSON file:
df2 = spark.sql("select * from json.`file:///home/skiganesh/codes/sources/people.json`") df2.show()
To run SQL on ORC file:
df3 = spark.sql("select * from orc. `/user/hive/warehouse/salespartition_dynamic/state=Texas/000000_0`") df3.show(5)