Full width home advertisement

Travel the world

Climb the mountains

Post Page Advertisement [Top]

APIbookData SciencePython

Data Wrangling with Python: Tips and Tools to Make Your Life Easier By Jacqueline Kazil & Katharine Jarmul

Data Wrangling with Python: Tips and Tools to Make Your Life Easier By Jacqueline Kazil & Katharine Jarmul PDF
spinningbot-temp-inter-7

Informations about the book:

TitleData Wrangling with Python: Tips and Tools to Make Your Life Easier

AuthorJacqueline Kazil & Katharine Jarmul

Size: 9.7

Format: PDF

Year: 2016

Pages: 501

Book Contents:

1. Introduction to Python
Why Python
Getting Started with Python 
Which Python Version 
Setting Up Python on Your Machine 
Test Driving Python 
Install pip 
Install a Code Editor 
Optional: Install IPython
2. Python Basics
Basic Data Types 
Strings 
Integers and Floats 
Data Containers 
Variables 
Lists 
Dictionaries 
What Can the Various Data Types Do? 
String Methods: Things Strings Can Do 
Numerical Methods: Things Numbers Can Do 
List Methods: Things Lists Can Do 
Dictionary Methods: Things Dictionaries Can Do 
Helpful Tools: type, dir, and help 
type
dir 
help 
Putting It All Together 
What Does It All Mean? 
3. Data Meant to Be Read by Machines
CSV Data 
How to Import CSV Data 
Saving the Code to a File; Running from Command Line 
JSON Data 
How to Import JSON Data 
XML Data 
How to Import XML Data
4. Working with Excel Files
Installing Python Packages 
Parsing Excel Files 
Getting Started with Parsing
5. PDFs and Problem Solving in Python
Avoid Using PDFs! 
Programmatic Approaches to PDF Parsing 
Opening and Reading Using slate 
Converting PDF to Text 
Parsing PDFs Using pdfminer 
Learning How to Solve Problems 
Exercise: Use Table Extraction, Try a Different Library 
Exercise: Clean the Data Manually 
Exercise: Try Another Tool 
Uncommon File Types 
6. Acquiring and Storing Data
Not All Data Is Created Equal 
Fact Checking 
Readability, Cleanliness, and Longevity 
Where to Find Data 
Using a Telephone 
US Government Data
Government and Civic Open Data Worldwide 
Organization and Non-Government Organization (NGO) Data 
Education and University Data 
Medical and Scientific Data 
Crowdsourced Data and APIs 
Case Studies: Example Data Investigation 
Ebola Crisis 
Train Safety 
Football Salaries 
Child Labor 
Storing Your Data: When, Why, and How? 
Databases: A Brief Introduction 
Relational Databases: MySQL and PostgreSQL 
Non-Relational Databases: NoSQL 
Setting Up Your Local Database with Python 
When to Use a Simple File 
Cloud-Storage and Python 
Local Storage and Python 
Alternative Data Storage 
7. Data Cleanup: Investigation, Matching, and Formatting
Why Clean Data? 
Data Cleanup Basics 
Identifying Values for Data Cleanup 
Formatting Data 
Finding Outliers and Bad Data 
Finding Duplicates 
Fuzzy Matching 
RegEx Matching 
What to Do with Duplicate Records
8. Data Cleanup: Standardizing and Scripting
Normalizing and Standardizing Your Data 
Saving Your Data 
Determining What Data Cleanup Is Right for Your Project 
Scripting Your Cleanup 
Testing with New Data
9. Data Exploration and Analysis
Exploring Your Data 
Importing Data 
Exploring Table Functions 
Joining Numerous Datasets 
Identifying Correlations 
Identifying Outliers 
Creating Groupings 
Further Exploration 
Analyzing Your Data 
Separating and Focusing Your Data 
What Is Your Data Saying? 
Drawing Conclusions 
Documenting Your Conclusions 
10. Presenting Your Data
Avoiding Storytelling Pitfalls 
How Will You Tell the Story? 
Know Your Audience 
Visualizing Your Data 
Charts 
Time-Related Data 
Maps 
Interactives 
Words 
Images, Video, and Illustrations 
Presentation Tools 
Publishing Your Data 
Using Available Sites 
Open Source Platforms: Starting a New Site 
Jupyter (Formerly Known as IPython Notebooks)
11. Web Scraping: Acquiring and Storing Data from the Web
What to Scrape and How 
Analyzing a Web Page 
Inspection: Markup Structure 
Network/Timeline: How the Page Loads 
Console: Interacting with JavaScript 
In-Depth Analysis of a Page 
Getting Pages: How to Request on the Internet
Reading a Web Page with Beautiful Soup 
Reading a Web Page with LXML 
A Case for XPath
12. Advanced Web Scraping: Screen Scrapers and Spiders
Browser-Based Parsing 
Screen Reading with Selenium 
Screen Reading with Ghost.Py 
Spidering the Web 
Building a Spider with Scrapy 
Crawling Whole Websites with Scrapy 
Networks: How the Internet Works and Why It’s Breaking Your Script 
The Changing Web (or Why Your Script Broke) 
A (Few) Word(s) of Caution 
13. APIs
API Features 
REST Versus Streaming APIs 
Rate Limits 
Tiered Data Volumes 
API Keys and Tokens 
A Simple Data Pull from Twitter’s REST API 
Advanced Data Collection from Twitter’s REST API 
Advanced Data Collection from Twitter’s Streaming API
14. Automation and Scaling
Why Automate? 
Steps to Automate 
What Could Go Wrong? 
Where to Automate 
Special Tools for Automation 
Using Local Files, argv, and Config Files 
Using the Cloud for Data Processing 
Using Parallel Processing 
Using Distributed Processing 
Simple Automation 
CronJobs 
Web Interfaces 
Jupyter Notebooks 
Large-Scale Automation 
Celery: Queue-Based Automation 
Ansible: Operations Automation 
Monitoring Your Automation 
Python Logging 
Adding Automated Messaging 
Uploading and Other Reporting 
Logging and Monitoring as a Service 
No System Is Foolproof 
15. Conclusion
Duties of a Data Wrangler 
Beyond Data Wrangling 
Become a Better Data Analyst 
Become a Better Developer 
Become a Better Visual Storyteller 
Become a Better Systems Architect 
Where Do You Go from Here? 
A. Comparison of Languages Mentioned
B. Python Resources for Beginners
C. Learning the Command Line
D. Advanced Python Setup
E. Python Gotchas
F. IPython Hints
G. Using Amazon Web Services

png+button

No comments:

Post a Comment

Bottom Ad [Post Page]