Databricks Markdown



Structuring Your Databricks Notebooks with Markdown, Titles, Widgets and Comments Posted on November 28, 2019 November 28, 2019 by mrpaulandrew Just a short post following a recent question I got from my delivery team. Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question.Provide details and share your research! Asking for help, clarification, or responding to other answers.

  1. Databricks Markdown Bold
  2. Databricks Markdown Font Color
  3. Databricks Markdown Latex
  4. Databricks Markdown Table Of Contents

Databricks supports Scala, SQL, Python and R. You can use multiple languages within a notebook as well as shell, markdown and file system commands. Just for this example, let’s go back to using Scala. To read a table and display it’s contents, we can type out the following Scala code. This Markdown cheat sheet provides a quick overview of all the Markdown syntax elements. It can’t cover every edge case, so if you need more information about any of these elements, refer to the reference guides for basic syntax and extended syntax. Databricks.koalas.DataFrame.tomarkdown¶ DataFrame. Tomarkdown ( buf = None, mode = None ) → str ¶ Print Series or DataFrame in Markdown-friendly format.

This is part two in the Writing in Markdown series. If you prefer, read John Gruber's original guide. I will not be able to add anything new.

Lists come in two flavors in Markdown. There are unordered lists and ordered lists. The first item in a list must be preceded by an empty line. A list item can contain other Markdown formatting, however the list bullet or item number can not. It must be plain text.

An unordered list is a simple bullet list. There's not much to an unordered list. It must be separated from a text block by one blank like and each list item must be preceded by a hyphen, plus or asterisk. I use hyphen characters ('-') exclusively for bullet lists. I have run into problems when using a mix of hyphens and asterisks with lists. Particularly, I have had issues when copying markdown into other applications.

To create nested lists, indent by one tab (or four spaces if you're antediluvian. Markdown processors will automatically vary the bullet character between list levels. That's the main reason it doesn't matter much whether I use an asterisk or dash for bullets.

Is converted to this:

  • Talk to Luke about his father
    • Skip the part where I leave him for dead
    • Don't mention the youngling 'thing'
  • Dinner with Yoda
    • Bring DEET
    • Bring Pepto
    • Dessert?
      • Wookie Pie
  • Stop by to see Anakin on Death Star
  • Submit restraining order against JarJar

But of course sometimes the order of items is crucial. That's where ordered lists come in. The sequence of an ordered list is defined by the sequence of the list items and not by the number prefix of the item

For example, even though the first line is prefixed as item #3 in the list, the Markdown is converted to display as item #1. The actual number is irrelevant. It is only there to indicate that the line should be considered as an ordered list item. Also note that unordered and ordered items can be commingled in the same list, but not at the same indentation level. An ordered list can contain a nested ordered list. But an unordered list items are converted to numbered list items if they are at the same indentation level as another numbered item.

Is converted to the list below. The originally drafted sequence would be lost in converting the Markdown to html or PDF.

  1. Stop by to see Anakin on Death Star
    1. Get a ride
    2. Wash robe
  2. Clean blood (and hand) off Anakin's old light saber
  3. Talk to Luke about his father
    • Skip the part where I leave him for dead
    • Don't mention the youngling 'thing'
  4. Dinner with Yoda
    • Bring DEET
    • Bring Pepto
    • Dessert?
      • Wookie Pie
  5. Submit restraining order against JarJar

As with all Markdown, these are just tags to interpret the start of a ul or ol html tag block. As long as the first item in a list, all subsequent list items will be interpreted as an ordered list. For example:

Generates this formatted list

Pack Suitcase

  1. Lightsaber
  2. Leisure brown robe
  3. Formal brown robe
  4. Night time brown robe
  5. Dress Sandals

Unfortunately, Markdown never implemented an option to start an ordered (numbered) list at an arbitrary number. The only option I have found is to create the list manually by escaping the Markdown using a <pre> or <code> block. These blocks tell the Markdown processor to skip interpretation of the list all together.

To generate this list:

Alternatively, I can skip the <pre> tag by indenting each list item with one tab (or four spaces). This just makes it a code block which also escapes the Markdown processor.

There are many more details about extending lists to include multiple paragraphs as well as code blocks. I highly recommend reading from the original source as handed down by Gruber.

-->

You can manage notebooks using the UI, the CLI, and by invoking the Workspace API. This article focuses on performing notebook tasks using the UI. For the other methods, see Databricks CLI and Workspace API.

Create a notebook

  1. Click the Workspace button or the Home button in the sidebar. Do one of the following:
    • Next to any folder, click the on the right side of the text and select Create > Notebook.

    • In the Workspace or a user folder, click and select Create > Notebook.

  2. In the Create Notebook dialog, enter a name and select the notebook’s default language.
  3. If there are running clusters, the Cluster drop-down displays. Select the cluster you want to attach the notebook to.
  4. Click Create.

Open a notebook

In your workspace, click a . The notebook path displays when you hover over the notebook title.

Delete a notebook

See Folders and Workspace object operations for information about how to access the workspace menu and delete notebooks or other items in the Workspace.

Copy notebook path

To copy a notebook file path without opening the notebook, right-click the notebook name or click the to the right of the notebook name and select Copy File Path.

Rename a notebook

To change the title of an open notebook, click the title and edit inline or click File > Rename.

Control access to a notebook

If your Azure Databricks account has the Azure Databricks Premium Plan, you can use Workspace access control to control who has access to a notebook.

Notebook external formats

Azure Databricks supports several notebook external formats:

  • Source file: A file containing only source code statements with the extension .scala, .py, .sql, or .r.
  • HTML: An Azure Databricks notebook with the extension .html.
  • DBC archive: A Databricks archive.
  • IPython notebook: A Jupyter notebook with the extension .ipynb.
  • RMarkdown: An R Markdown document with the extension .Rmd.

In this section:

Import a notebook

You can import an external notebook from a URL or a file.

  1. Click the Workspace button or the Home button in the sidebar. Do one of the following:

    • Next to any folder, click the on the right side of the text and select Import.

    • In the Workspace or a user folder, click and select Import.

  2. Specify the URL or browse to a file containing a supported external format.

  3. Click Import.

Export a notebook

In the notebook toolbar, select File > Export and a format.

Databricks markdown

Note

When you export a notebook as HTML, IPython notebook, or archive (DBC), and you have not cleared the results, the results of running the notebook are included.

Notebooks and clusters

Before you can do any work in a notebook, you must first attach the notebook to a cluster. This section describes how to attach and detach notebooks to and from clusters and what happens behind the scenes when you perform these actions.

In this section:

Execution contexts

When you attach a notebook to a cluster, Azure Databricks creates an execution context. An execution context contains the state for a REPL environment for each supported programming language: Python, R, Scala, and SQL. When you run a cell in a notebook, the command is dispatched to the appropriate language REPL environment and run.

You can also use the REST 1.2 API to create an execution context and send a command to run in the execution context. Similarly, the command is dispatched to the language REPL environment and run.

A cluster has a maximum number of execution contexts (145). Once the number of execution contexts has reached this threshold, you cannot attach a notebook to the cluster or create a new execution context.

Idle execution contexts

An execution context is considered idle when the last completed execution occurred past a set idle threshold. Last completed execution is the last time the notebook completed execution of commands. The idle threshold is the amount of time that must pass between the last completed execution and any attempt to automatically detach the notebook. The default idle threshold is 24 hours.

When a cluster has reached the maximum context limit, Azure Databricks removes (evicts) idle execution contexts (starting with the least recently used) as needed. Even when a context is removed, the notebook using the context is still attached to the cluster and appears in the cluster’s notebook list. Streaming notebooks are considered actively running, and their context is never evicted until their execution has been stopped. If an idle context is evicted, the UI displays a message indicating that the notebook using the context was detached due to being idle.

If you attempt to attach a notebook to cluster that has maximum number of execution contexts and there are no idle contexts (or if auto-eviction is disabled), the UI displays a message saying that the current maximum execution contexts threshold has been reached and the notebook will remain in the detached state.

If you fork a process, an idle execution context is still considered idle once execution of the request that forked the process returns. Forking separate processes is not recommended with Spark.

Configure context auto-eviction

You can configure context auto-eviction by setting the Spark propertyspark.databricks.chauffeur.enableIdleContextTracking.

  • In Databricks 5.0 and above, auto-eviction is enabled by default. You disable auto-eviction for a cluster by setting spark.databricks.chauffeur.enableIdleContextTracking false.
  • In Databricks 4.3, auto-eviction is disabled by default. You enable auto-eviction for a cluster by setting spark.databricks.chauffeur.enableIdleContextTracking true.

Attach a notebook to a cluster

To attach a notebook to a cluster:

  1. In the notebook toolbar, click Detached .
  2. From the drop-down, select a cluster.

Important

An attached notebook has the following Apache Spark variables defined.

ClassVariable Name
SparkContextsc
SQLContext/HiveContextsqlContext
SparkSession (Spark 2.x)spark

Do not create a SparkSession, SparkContext, or SQLContext. Doing so will lead to inconsistent behavior.

Determine Spark and Databricks Runtime version

To determine the Spark version of the cluster your notebook is attached to, run:

To determine the Databricks Runtime version of the cluster your notebook is attached to, run:

Scala
Python

Note

Both this sparkVersion tag and the spark_version property required by the endpoints in the Clusters API and Jobs API refer to the Databricks Runtime version, not the Spark version.

Detach a notebook from a cluster

  1. In the notebook toolbar, click Attached .

  2. Select Detach.

You can also detach notebooks from a cluster using the Notebooks tab on the cluster details page.

Databricks Markdown Bold

When you detach a notebook from a cluster, the execution context is removed and all computed variable values are cleared from the notebook.

Tip

Databricks Markdown Font Color

Azure Databricks recommends that you detach unused notebooks from a cluster. This frees up memory space on the driver.

View all notebooks attached to a cluster

The Notebooks tab on the cluster details page displays all of the notebooks that are attached to a cluster. The tab also displays the status of each attached notebook, along with the last time a command was run from the notebook.

Schedule a notebook

To schedule a notebook job to run periodically:

  1. In the notebook toolbar, click the button at the top right.
  2. Click + New.
  3. Choose the schedule.
  4. Click OK.

Distribute notebooks

To allow you to easily distribute Azure Databricks notebooks, Azure Databricks supports the Databricks archive, which is a package that can contain a folder of notebooks or a single notebook. A Databricks archive is a JAR file with extra metadata and has the extension .dbc. The notebooks contained in the archive are in an Azure Databricks internal format.

Import an archive

Databricks Markdown Latex

  1. Click or to the right of a folder or notebook and select Import.
  2. Choose File or URL.
  3. Go to or drop a Databricks archive in the dropzone.
  4. Click Import. The archive is imported into Azure Databricks. If the archive contains a folder, Azure Databricks recreates that folder.

Databricks Markdown Table Of Contents

Export an archive

Click or to the right of a folder or notebook and select Export > DBC Archive. Azure Databricks downloads a file named <[folder|notebook]-name>.dbc.