- Datasets are composed of typed objects, which means that transformation syntax errors(like a typo in the method name) and analysis errors (like an incorrect input variable type) can be caught at compile time.
- DataFrames are composed of untyped Row objects, which means that only syntax errors can be caught at compile time.
- Spark SQL is composed of a string, which means that syntax errors and analysis errors are only caught at runtime.
Error | SQL | DataFrames | DataSets |
Syntax | Run Time | Compile Time | Compile Time |
Analysis | Run Time | Run Time | Compile Time |
Also, note that Spark has encoders for all predefined basic data types like Int, String, etc. But, in case required then we have to write custom encoder to form a typed custom object dataset.
Comments
Post a Comment