Master Apache Spark using Spark SQL and PySpark 3
Master Apache Spark using Spark SQL as well as PySpark with Python3 with complementary lab access
4.62 (2446 reviews)

18 283
students
32 hours
content
May 2024
last update
$74.99
regular price
What you will learn
Setup the Single Node Hadoop and Spark using Docker locally or on AWS Cloud9
Review ITVersity Labs (exclusively for ITVersity Lab Customers)
All the HDFS Commands that are relevant to validate files and folders in HDFS.
Quick recap of Python which is relevant to learn Spark
Ability to use Spark SQL to solve the problems using SQL style syntax.
Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL.
Apache Spark Application Development Life Cycle
Apache Spark Application Execution Life Cycle and Spark UI
Setup SSH Proxy to access Spark Application logs
Deployment Modes of Spark Applications (Cluster and Client)
Passing Application Properties Files and External Dependencies while running Spark Applications
Course Gallery




Charts
Students
Price
Rating & Reviews
Enrollment Distribution
Comidoc Review
Our Verdict
The 'Master Apache Spark using Spark SQL and PySpark 3' course truly shines when it comes to providing in-depth knowledge of Spark SQL and PySpark. While there are minor issues concerning setup and accent comprehension, these drawbacks do not significantly impact the overall value. The course excels in its blend of theory and hands-on exercises while ensuring relevance in today's data engineering landscape by focusing on PySpark 3 – making it a worthy investment for both beginners and seasoned professionals.
What We Liked
- The course provides a comprehensive overview of both Spark SQL and PySpark 3, making it a one-stop solution for data engineers looking to enhance their skillset in these areas.
- Real-world examples, hands-on exercises, and theory blend seamlessly throughout the content, ensuring that learners can grasp complex concepts with ease.
- PySpark 3 focus keeps the course relevant, aligning with the current demands of data engineering projects.
- Instructor effectively breaks down complex topics into understandable components.
Potential Drawbacks
- Non-native English speakers may find the instructor's accent challenging to comprehend due to fast speech.
- Instructions for AWS Cloud9 setup are not well defined, leading to confusion and wasting learners' time.
- HDFS commands that are simple or easy to understand can be reduced or removed, allowing more focus on crucial topics.
- Three installation sections could be condensed, with some instructions combined into single sections.
1398116
udemy ID
17/10/2017
course created date
20/11/2019
course indexed date
Bot
course submited by