PySpark | Cookbook

Ralph/ March 31, 2020/ Apache Spark, Cookbook, PySpark

Websites The Blaze Ecosystem (Blaze) Dask: Flexible library for parallel computing in Python. DataShape: Data layout language for array programming.  Odo: Shapeshifting for your dataIt efficiently migrates data from the source to the target through a network of conversions. Reading Textfiles Read CSV file with known structure