David Vrba
  • d
  • h
  • m
  • s

Introduction to Spark / PySpark

  • Apache Spark is a distributed computing system that can be used for various purposes. In this lecture we will focus on one specific and that is interactive data analysis. We will look at the dataframe API, which is the most used API in the current version of Spark, and get acquainted with its basic concepts.
  •  
  • This API is supported in various languages (Python, Scala, Java, SQL, R). We will show the difference and marginally touch on how this API works below the surface. In the second part, we will show specific examples of how this API can be used in Spark for interactive data analysis in a laptop environment in the Python programming language.
  •  
  • Agenda
    • Dataframe API
    • Scala vs Python vs SQL
    • How subframes work under the surface
    • Spark and laptop environment
    • Interactive data analysis
  •  

About DevOps Artisan by Bittnet
We believe you can learn anything faster, with hands-on practice and expert guidance. Since 2007, we have been training thousands of tech enthusiasts and professionals, helping companies leverage the true potential of their teams, enabling engineers to advance their careers through solid, specialised know-how.

We have been working with more than 20,000 IT&C professionals, developing DevOps seminars that range from Associate to Expert level and cover a wide set of skillsets - from networking programming and operating systems to Machine Learning and Artificial Intelligence.

David Vrba

About David Vrba

Is a Ph.D. Machine Learning Engineer at Emplifi. David works with predictive analytics and processing of small and large volumes of data, optimizes Spark jobs and conducts educational trainings focused on Spark. He works in a team where Spark integrates with other technologies and the world of data science meets the world of data engineering.