Databricks is an industry-leading, cloud-based data engineering tool used for processing and transforming massive quantities of data and exploring the data through machine learning models. Recently added to Azure, it's the latest big data tool for the Microsoft cloud.
Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Azure DataBricks Training Content – 35 hours
Introduction to Microsoft Azure
Free trial features
What features are paid and what can be availed for free
Setting up your account for Azure
What features are used for Databricks (storage account)
Introduction to Databricks
Components of Databricks (Workspaces, Clusters, Workers)
Apache Spark
Spark APIs
How to use Databricks to accomplish data analytics
UsingSpark APIs (SQL, python and R) on Databricks
Creating notebooks in Databricks
Scheduling jobs in Databricks
Creating / updating tables in Databricks
Using flat files
Existing tables in Databricks
Importing data (from flat files, other databases) into Databricks environment
Exporting data out of the Databricks environment (to flat files, other databases)
Connecting Databricks to tools like Tableau, Alteryx and PowerBI
Connecting Databricks to other databases like Teradata, SQL Server
Best practices while using Databricks
Alternative ways / tools for accessing Databricks (other than the URL based approach)
Advanced analytics on Databricks / Spark