Skip to main content Microsoft Intelligent Data Platform Azure Arc Azure databases Power BI SQL Server 2025 SQL Server BI SQL Server 2022 SQL Server 2019 SQL Server 2017 SQL Server 2016 SQL Server 2005 - 2014 Downloads Community SQL End of Support Data Security - SQL Server Encryption SQL Server blog SQL Server and Azure SQL workshops Browse Microsoft Solutions Hub SQL Server Tech Community Azure Databases Tech Community Azure Synapse Analytics Tech Community Developer Find a partner Become a partner Partner resources Try SQL Server 2025 Microsoft Security Azure Dynamics 365 Microsoft 365 Microsoft Teams Windows 365 Microsoft AI Azure Space Mixed reality Microsoft HoloLens Microsoft Viva Quantum computing Sustainability Education Automotive Financial services Government Healthcare Manufacturing Retail Find a partner Become a partner Partner Network Microsoft Marketplace Marketplace Rewards Software development companies Blog Microsoft Advertising Developer Center Documentation Events Licensing Microsoft Learn Microsoft Research View Sitemap

Visual Studio Code: Develop PySpark jobs for SQL Server 2019 Big Data Clusters

Today we’re announcing the support in Visual Studio Code for SQL Server 2019 Big Data Clusters PySpark development and query submission. It provides complementary capabilities to Azure Data Studio for data engineers to author and productionize PySpark jobs after data scientist’s data explore and experimentation. The Visual Studio Code Apache Spark and Hive extension enables you to enjoy cross platform and enhanced light weight Python editing capabilities. It covers scenarios around Python authoring, debugging, Jupyter Notebook integration, and notebook like interactive query.

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current line, run selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

Highlights of key features

  • You can link to SQL Server: The toolkit enables you to connect and submit PySpark jobs to SQL Server 2019 Big Data Clusters.
  • Python editing: Develop PySpark applications with native Python authoring support (e.g. IntelliSense, auto format, error checking, etc.).
  • Jupyter Notebook integration: Import and export .ipynb files.
  • PySpark interactive: Run selected lines of code, or notebook like cell PySpark execution, and interactive visualizations.
  • PySpark batch: Submit PySpark applications to SQL Server 2019 Big Data Clusters.
  • PySpark monitoring: Integrate with the Apache Spark history server to view job history, debug, and diagnose Spark jobs.

How to install or update

First, install Visual Studio Code and download Mono 4.2.x for Linux or Mac. Then get the latest Apache Spark and Hive tools by going to the Visual Studio Code extension repository or the Visual Studio Code Marketplace and searching for Spark.

For more information about Apache Spark  and Hive tools for Visual Studio Code, please use the following resources:

If you have questions, feedback, comments, or bug reports, please use the comments below or send a note to hdivstool@microsoft.com.

English (United States)
Your Privacy Choices Opt-Out Icon Your Privacy Choices
Consumer Health Privacy Sitemap Contact Microsoft Privacy Manage cookies Terms of use Trademarks Safety & eco Recycling About our ads