Spark Dataframe Metadata

Published in

Towards Dev

2 min readDec 26, 2022

Spark Dataframe is structurally the same as the table. However, it does not store any schema information in the metadata store. Instead, we have a runtime metadata catalog to store the Dataframe schema information. It is similar to the metadata store, but Spark will create it at the runtime to store schema information in the catalog.

We have two reasons for storing schema information in the catalog.

Spark Dataframe is a runtime object –You can create a Spark Data frame at runtime and keep it in memory until your program terminates. Once your program terminates, your Dataframe is gone. So, it is an in-memory object.

2.Spark Dataframe supports schema-on-read –Dataframe does not have a fixed and predefined schema stored in the metadata store. We load the data into a Dataframe and tell the schema when loading the data. And Spark will read the file, apply the schema at the time of reading, create the Dataframe using the schema and load the data.

Spark Dataframe Metadata

Written by SIRIGIRI HARI KRISHNA