Mastering Large Datasets with Python Parallelize and Distribute Your Python Code
All offers (1)
| Price | Condition | Seller | |
|---|---|---|---|
| $64.10Best price | New | Basi6 International LLC |
Stock and pricing refresh on page load. Tez can also compare prices on Amazon, AbeBooks, and ThriftBooks if you ask.
About this book
Summary <br>Modern data science solutions need to be clean, easy to read, and scalable. In <i>Mastering Large Datasets with Python</i>, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You’ll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project.<br> <br> <br> <br>Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.<br><br> About the technology <br>Programming techniques that work well on laptop-sized data can slow to a crawl—or fail altogether—when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change.<br><br> About the book <br><i>Mastering Large Datasets with Python</i> teaches you to write code that can handle datasets of any size. You’ll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You’ll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you’ll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.<br><br> What's inside <br> <ul> <li>An introduction to the map and reduce paradigm</li> <li>Parallelization with the multiprocessing module and pathos framework</li> <li>Hadoop and Spark for distributed computing</li> <li>Running AWS jobs to process large datasets</li> </ul> <br><br> About the reader <br>For Python programmers who need to work faster with more data.<br><br> About the author <br><b>J. T. Wolohan</b> is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington.<br> <br> <br><br>Table of Contents:<br> <br>PART 1<br> <br>1 ¦ Introduction<br> <br>2 ¦ Accelerating large dataset work: Map and parallel computing<br> <br>3 ¦ Function pipelines for mapping complex transformations<br> <br>4 ¦ Processing large datasets with lazy workflows<br> <br>5 ¦ Accumulation operations with reduce<br> <br>6 ¦ Speeding up map and reduce with advanced parallelization<br> <br>PART 2<br> <br>7 ¦ Processing truly big datasets with Hadoop and Spark<br> <br>8 ¦ Best practices for large data with Apache Streaming and mrjob<br> <br>9 ¦ PageRank with map and reduce in PySpark<br> <br>10 ¦ Faster decision-making with machine learning and PySpark<br> <br>PART 3<br> <br>11 ¦ Large datasets in the cloud with Amazon Web Services and S3<br> <br>12 ¦ MapReduce in the cloud with Amazon’s Elastic MapReduce
Details
Categories
Computers, Languages, Python, Data Science
Ask Tez ✨