Learning PySpark

Building and deploying data-intensive applications at scale using Python and Apache Spark
4.20 (189 reviews)
Udemy
platform
English
language
Databases
category
Learning PySpark
649
students
2.5 hours
content
Apr 2018
last update
$19.99
regular price

Why take this course?

👩‍💻 Course Title: Learning PySpark - Building and Deploying Data-Intensive Applications at Scale using Python and Apache Spark


Course Headline:

Unlock the Power of Big Data with PySpark!


Introduction to Apache Spark and its Ecosystem

Welcome to our comprehensive guide on PySpark, where we will delve into the capabilities of Apache Spark, a powerful open-source distributed engine for data processing. In this course, you'll get a solid grasp of Spark's architecture and how to set up a Python environment tailored for Spark. 🛠️✨

Key Learning Points:

  • Apache Spark Overview: We start by exploring the core components of Apache Spark and its ecosystem, understanding how it simplifies big data processing and analytics.

  • Python in Spark: Discover the synergy between Python's simplicity and Spark's scalability. Learn how to harness Python within the Spark ecosystem to build robust, data-intensive applications.

  • Data Collection Techniques: Master various methods for collecting data efficiently, and understand the differences between them for better data processing decisions.

  • RDDs vs DataFrames: Get hands-on experience with Resilient Distributed Datasets (RDDs) and DataFrames. Learn their use cases and how to choose the right one for your application.

  • Lazy Execution in Spark: Understand lazy execution, one of Spark's most powerful features, which allows for more efficient processing of large datasets.

  • Transformations and Actions: Dive into transformations like map, filter, reduceByKey, and actions like collect and count. Learn how to apply these effectively in your PySpark scripts.

  • DataFrame Operations: Learn how to read data from various sources, including files and HDFS, specify schemas, and perform complex operations on DataFrames using reflection or programmatic definitions.

  • SQL with DataFrames: Utilize Spark SQL to execute SQL queries on DataFrames, making your data processing tasks more intuitive and straightforward.

Practical Approach:

This course is designed to provide you with a practical approach to building PySpark applications. You'll engage with real-world datasets and scenarios that reflect the challenges of deploying large-scale applications.

Who Should Take This Course?

This course is ideal for data scientists, software engineers, data analysts, or anyone looking to leverage the power of Python and Apache Spark to handle big data at scale. No prior experience with Spark or PySpark is required, but familiarity with Python programming is assumed.


Instructor Profile: Tomasz Drabas

Tomasz Drabas, a seasoned Data Scientist with over 12 years of international experience in data analytics and data science across various sectors, leads this course. His expertise spans from advanced technology to telecommunications, finance, and consulting. 🧠💻

  • Professional Background: Tomasz started his career with LOT Polish Airlines and has since worked with top firms like Beyond Analysis Australia and Vodafone Hutchison Australia. Currently, he is a part of Microsoft's team in Seattle, where he continues to solve complex problems in high-dimensional spaces.

  • Academic Achievements: With a Master's degree in strategy management from the University of New South Wales and a doctoral degree in operations research from the School of Aviation, Tomasz has a strong academic foundation that complements his professional experience.

  • Research and Publications: His research contributions and publications reflect his deep understanding of data science and analytics, showcasing his ability to analyze and interpret large datasets effectively.

By the end of this course, you'll be equipped with the skills to process and analyze large volumes of data using PySpark. Join us on this journey to master big data processing and unlock the full potential of your data! 🚀📊

Course Gallery

Learning PySpark – Screenshot 1
Screenshot 1Learning PySpark
Learning PySpark – Screenshot 2
Screenshot 2Learning PySpark
Learning PySpark – Screenshot 3
Screenshot 3Learning PySpark
Learning PySpark – Screenshot 4
Screenshot 4Learning PySpark

Loading charts...

Related Topics

1594214
udemy ID
13/03/2018
course created date
22/06/2020
course indexed date
Bot
course submited by