Differences between Python and PySpark



Sno.
                     PYTHON
                         PYSPARK
1
Python is an interpreter high level language for several purpose programming.
Pyspark is the python shell of spark. i e Pyspark is the interface that give access to Spark using Python
2
It is slower compared to pyspark.
It is 10 times faster than Python
3
Comparatively easier to learn for Java programmers because of syntax and standard libraries.
Pyspark arcane syntax makes it difficult to master(Verbose Language)
4
It is dynamically typed language. So it is less safer compared to Pyspark
It is statistically typed language. So it is safer than Python
5
Programs written in python cannot be submitted to a spark cluster and runs locally.
Program written in pyspark can be submitted to a spark cluster and run in a distributed manner.
6
There are also inbuilt packages and libraries available with python which are also available with Pyspark mostly.
It is thought of as a set of libraries, since there are more sub packages in Pyspark like spark, SQL, spark ML etc
7
Python works like an interpreter
In Pyspark, python is only a scripting front end, i.e.,  no interpreted Python code  is executed once  the spark  job starts
8
Waste lots of  memory (especially in case of iterations)
It doesn't waste memory. It Creates a counter value one by one.
9
Python does support heavy weight process forking using WSGI but it does not support true multi-threading.
Supports powerful concurrency  through primitives like Akka's actors
10
RDD operations cannot be done
RDD operations can be done


Author:
A.Yoga Sai Satwik
Noble John Paul

No comments

Algae Services. Powered by Blogger.