- SQL CACHE TABLE, Dataframe.cache, spark.catalog.cacheTable
These persist in both on-heap RAM and local SSD's with the MEMORY_AND_DISK strategy. You can inspect where the RDD partitions are stored (in-memory or on disk) using Spark UI. The in-memory portion is stored in columnar format optimized for fast columnar aggregations and automatically compressed to minimize memory and GC pressure. This cache should be considered scratch/temporary space as the data will not survive a Worker failure.
- dbutils.fs.cacheTable(), and Table view -> Cache Table
These only persist to the local SSDs mounted at /local_disk. This cache will survive cluster restarts.
Comments
Post a Comment