Spark Dataframe Cheat Sheet
Having a good cheatsheet at hand can significantly speed up the development process.One of the best cheatsheet I have came across is sparklyr’s cheatsheet.
This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that's not all. You'll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet. This cheat sheet will help you learn PySpark and write PySpark apps faster. Everything in here is fully functional PySpark code you can run or adapt to your programs. These snippets are licensed under the CC0 1.0 Universal License. Continue Reading HBase Shell Commands Cheat Sheet. Spark todate – Convert String to Date format. Post author: NNK. Spark DataFrame example of how to retrieve the last day of a month from a Date using Scala language and Spark SQL Date and Time functions. Df.distinct #Returns distinct rows in this DataFrame df.sample#Returns a sampled subset of this DataFrame df.sampleBy #Returns a stratified sample without replacement Subset Variables (Columns) key 3 22343a 3 33 3 3 3 key 3 33223343a Function Description df.select #Applys expressions and returns a new DataFrame Make New Vaiables 1221.
For my work, I’m using Spark’s DataFrame API in Scala to create data transformation pipelines. These are some functions and design patterns that I’ve found to be extremely useful.
Load data
Get SparkContext information
Get Spark version
Get number of partitions
Count number of rows
Print schema
Spark Dataframe Cheat Sheet Printable
Preview top 20 rows
Design pattern for constructing as data transformation pipeline
Drop duplicate rows
Spark Dataframe Cheat Sheet Download
For an exhaustive list of the functions, you can check out the Spark’s Dataset class documentation.
Spark Dataframe Cheat Sheet Template
Hope you’ve found this cheatsheet useful. Thank you!