Uncategorized

Run Spark SQL On File Directly

Spark built-in data sources are json, parquet, jdbc, orc, libsvm, csv, text. Instead of using read API to load a file into DataFrame and query it, you can also query that file directly with SQL. To run SQL on CSV file: To run SQL on JSON file: To run SQL on ORC file: References: https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html