Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code-P2P
Mastering Large Datasets with Python teaches you to write code that can handle datasets of any size. You’ll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You’ll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you’ll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.
What’s inside
- An introduction to the map and reduce paradigm
- Parallelization with the multiprocessing module and pathos framework
- Hadoop and Spark for distributed computing
- Running AWS jobs to process large datasets

Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code-P2P
English | 2020-01-21 | ISBN: 1617296236 | 350 Pages | PDF, ePUB | 16.8 Mb – 7.1 Mb
Download FILEBONUS: PDF - ePUB
