Databricks display() function equivalent or alternative to Jupyter

First Recommendation: When you use Jupyter, don't use df.show() instead use df.limit(10).toPandas().head() which results perfect display even better Databricks display()

Second Recommendation: Zeppelin Notebook. Just use z.show(df.limit(10))

Additionally in Zeppelin;

You register your dataframe as SQL Table df.createOrReplaceTempView('tableName')
Insert new paragraph beginning %sql then query your table with amazing display.

In recent IPython, you can just use display(df) if df is a panda dataframe, it will just work. On older version you might need to do a from IPython.display import display. It will also automatically display if the result of the last expression of a cell is a data_frame. For example this notebook. Of course the representation will depends on the library you use to make your dataframe. If you are using PySpark and it does not defined a nice representation by default, then you'll need to teach IPython how to display the Spark DataFrame. For example here is a project that teach IPython how to display Spark Contexts, and Spark Sessions.

When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.

# Alternative to Databricks display function.
import pandas as pd
pd.set_option('max_columns', None)

myDF.limit(10).toPandas().head() enter image description here

Databricks display() function equivalent or alternative to Jupyter

Tags:

Jupyter Notebook

Apache Spark

Databricks

Related

Recent Posts