Informations about the book:
Title: Understanding Machine Learning: From Theory to Algorithms
Author: Shalev-Shwartz S. & Ben-David S.
Size: 3 Mb
Format: PDF
Year: 2014
Pages: 416
Book Contents:
1 Introduction
1.1 What Is Learning?
1.2 When Do We Need Machine Learning?
1.3 Types of Learning
1.4 Relations to Other Fields
1.5 How to Read This Book
1.6 Notation
Part 1 Foundations
2 A Gentle Start
2.1 A Formal Model – The Statistical Learning Framework
2.2 Empirical Risk Minimization
2.3 Empirical Risk Minimization with Inductive Bias
2.4 Exercises
3 A Formal Learning Model
3.1 PAC Learning
3.2 A More General Learning Model
3.4 Bibliographic Remarks
3.5 Exercises
4 Learning via Uniform Convergence
4.1 Uniform Convergence Is Sufficient for Learnability
4.2 Finite Classes Are Agnostic PAC Learnable
4.4 Bibliographic Remarks
4.5 Exercises
5 The Bias-Complexity Tradeoff
5.1 The No-Free-Lunch Theorem
5.2 Error Decomposition
5.4 Bibliographic Remarks
5.5 Exercises
6 The VC-Dimension
6.1 Infinite-Size Classes Can Be Learnable
6.2 The VC-Dimension
6.3 Examples
6.4 The Fundamental Theorem of PAC learning
6.5 Proof of Theorem 6.7
6.7 Bibliographic remarks
6.8 Exercises
7 Nonuniform Learnability
7.1 Nonuniform Learnability
7.2 Structural Risk Minimization
7.3 Minimum Description Length and Occam’s Razor
7.4 Other Notions of Learnability – Consistency
7.5 Discussing the Different Notions of Learnability
7.7 Bibliographic Remarks
7.8 Exercises
8 The Runtime of Learning
8.1 Computational Complexity of Learning
8.2 Implementing the ERM Rule
8.3 Efficiently Learnable, but Not by a Proper ERM
8.4 Hardness of Learning
8.6 Bibliographic Remarks
8.7 Exercises
Part 2 From Theory to Algorithms
9 Linear Predictors
9.1 Halfspaces
9.2 Linear Regression
9.3 Logistic Regression
9.5 Bibliographic Remarks
9.6 Exercises
10 Boosting
10.1 Weak Learnability
10.2 AdaBoost
10.3 Linear Combinations of Base Hypotheses
10.4 AdaBoost for Face Recognition
10.6 Bibliographic Remarks
10.7 Exercises
11 Model Selection and Validation
11.1 Model Selection Using SRM
11.2 Validation
11.3 What to Do If Learning Fails
11.5 Exercises
12 Convex Learning Problems
12.1 Convexity, Lipschitzness, and Smoothness
12.2 Convex Learning Problems
12.3 Surrogate Loss Functions
12.5 Bibliographic Remarks
12.6 Exercises
13 Regularization and Stability
13.1 Regularized Loss Minimization
13.2 Stable Rules Do Not Overfit
13.3 Tikhonov Regularization as a Stabilizer
13.4 Controlling the Fitting-Stability Tradeoff
13.6 Bibliographic Remarks
13.7 Exercises
14 Stochastic Gradient Descent
14.1 Gradient Descent
14.2 Subgradients
14.3 Stochastic Gradient Descent (SGD)
14.4 Variants
14.5 Learning with SGD
14.7 Bibliographic Remarks
14.8 Exercises
15 Support Vector Machines
15.1 Margin and Hard-SVM
15.2 Soft-SVM and Norm Regularization
15.3 Optimality Conditions and “Support Vectors”
15.4 Duality
15.5 Implementing Soft-SVM Using SGD
15.7 Bibliographic Remarks
15.8 Exercises
16 Kernel Methods
16.1 Embeddings into Feature Spaces
16.2 The Kernel Trick
16.3 Implementing Soft-SVM with Kernels
16.5 Bibliographic Remarks
16.6 Exercises
17 Multiclass, Ranking, and Complex Prediction Problems 190
17.1 One-versus-All and All-Pairs 190
17.2 Linear Multiclass Predictors 193
17.3 Structured Output Prediction 198
17.4 Ranking
17.5 Bipartite Ranking and Multivariate Performance Measures
17.7 Bibliographic Remarks
17.8 Exercises
18 Decision Trees
18.1 Sample Complexity
18.2 Decision Tree Algorithms
18.3 Random Forests
18.5 Bibliographic Remarks
18.6 Exercises
19 Nearest Neighbor
19.1 k Nearest Neighbors
19.2 Analysis
19.3 Efficient Implementation
19.5 Bibliographic Remarks
19.6 Exercises
20 Neural Networks
20.1 Feedforward Neural Networks
20.2 Learning Neural Networks
20.3 The Expressive Power of Neural Networks
20.4 The Sample Complexity of Neural Networks
20.5 The Runtime of Learning Neural Networks
20.6 SGD and Backpropagation
20.8 Bibliographic Remarks
20.9 Exercises
Part 3 Additional Learning Models
21 Online Learning
21.1 Online Classification in the Realizable Case
21.2 Online Classification in the Unrealizable Case
21.3 Online Convex Optimization
21.4 The Online Perceptron Algorithm
21.6 Bibliographic Remarks
21.7 Exercises
22 Clustering
22.1 Linkage-Based Clustering Algorithms
22.2 k-Means and Other Cost Minimization Clusterings
22.3 Spectral Clustering
22.4 Information Bottleneck
22.5 A High Level View of Clustering
22.7 Bibliographic Remarks
22.8 Exercises
23 Dimensionality Reduction
23.1 Principal Component Analysis (PCA)
23.2 Random Projections
23.3 Compressed Sensing
23.4 PCA or Compressed Sensing?
23.6 Bibliographic Remarks
23.7 Exercises
24 Generative Models
24.1 Maximum Likelihood Estimator
24.2 Naive Bayes
24.3 Linear Discriminant Analysis
24.4 Latent Variables and the EM Algorithm
24.5 Bayesian Reasoning
24.7 Bibliographic Remarks
24.8 Exercises
25 Feature Selection and Generation
25.1 Feature Selection
25.2 Feature Manipulation and Normalization
25.3 Feature Learning
25.5 Bibliographic Remarks
25.6 Exercises
Part 4 Advanced Theory
26 Rademacher Complexities
26.1 The Rademacher Complexity
26.2 Rademacher Complexity of Linear Classes
26.3 Generalization Bounds for SVM
26.4 Generalization Bounds for Predictors with Low Norm
26.5 Bibliographic Remarks
27 Covering Numbers
27.1 Covering
27.2 From Covering to Rademacher Complexity via Chaining
27.3 Bibliographic Remarks
28 Proof of the Fundamental Theorem of Learning Theory
28.1 The Upper Bound for the Agnostic Case
28.2 The Lower Bound for the Agnostic Case
28.3 The Upper Bound for the Realizable Case
29 Multiclass Learnability
29.1 The Natarajan Dimension
29.2 The Multiclass Fundamental Theorem
29.3 Calculating the Natarajan Dimension
29.4 On Good and Bad ERMs
29.5 Bibliographic Remarks
29.6 Exercises
30 Compression Bounds
30.1 Compression Bounds
30.2 Examples
30.3 Bibliographic Remarks
31 PAC-Bayes
31.1 PAC-Bayes Bounds
31.2 Bibliographic Remarks
31.3 Exercises
Appendix A Technical Lemmas
Appendix B Measure Concentration
B.1 Markov’s Inequality
B.2 Chebyshev’s Inequality
B.3 Chernoff’s Bounds
B.4 Hoeffding’s Inequality
B.5 Bennet’s and Bernstein’s Inequalities
B.6 Slud’s Inequality
B.7 Concentration of χ2 Variables
Appendix C Linear Algebra
C.1 Basic Definitions
C.2 Eigenvalues and Eigenvectors
C.3 Positive definite matrices
C.4 Singular Value Decomposition (SVD)
1 Introduction
1.1 What Is Learning?
1.2 When Do We Need Machine Learning?
1.3 Types of Learning
1.4 Relations to Other Fields
1.5 How to Read This Book
1.6 Notation
Part 1 Foundations
2 A Gentle Start
2.1 A Formal Model – The Statistical Learning Framework
2.2 Empirical Risk Minimization
2.3 Empirical Risk Minimization with Inductive Bias
2.4 Exercises
3 A Formal Learning Model
3.1 PAC Learning
3.2 A More General Learning Model
3.4 Bibliographic Remarks
3.5 Exercises
4 Learning via Uniform Convergence
4.1 Uniform Convergence Is Sufficient for Learnability
4.2 Finite Classes Are Agnostic PAC Learnable
4.4 Bibliographic Remarks
4.5 Exercises
5 The Bias-Complexity Tradeoff
5.1 The No-Free-Lunch Theorem
5.2 Error Decomposition
5.4 Bibliographic Remarks
5.5 Exercises
6 The VC-Dimension
6.1 Infinite-Size Classes Can Be Learnable
6.2 The VC-Dimension
6.3 Examples
6.4 The Fundamental Theorem of PAC learning
6.5 Proof of Theorem 6.7
6.7 Bibliographic remarks
6.8 Exercises
7 Nonuniform Learnability
7.1 Nonuniform Learnability
7.2 Structural Risk Minimization
7.3 Minimum Description Length and Occam’s Razor
7.4 Other Notions of Learnability – Consistency
7.5 Discussing the Different Notions of Learnability
7.7 Bibliographic Remarks
7.8 Exercises
8 The Runtime of Learning
8.1 Computational Complexity of Learning
8.2 Implementing the ERM Rule
8.3 Efficiently Learnable, but Not by a Proper ERM
8.4 Hardness of Learning
8.6 Bibliographic Remarks
8.7 Exercises
Part 2 From Theory to Algorithms
9 Linear Predictors
9.1 Halfspaces
9.2 Linear Regression
9.3 Logistic Regression
9.5 Bibliographic Remarks
9.6 Exercises
10 Boosting
10.1 Weak Learnability
10.2 AdaBoost
10.3 Linear Combinations of Base Hypotheses
10.4 AdaBoost for Face Recognition
10.6 Bibliographic Remarks
10.7 Exercises
11 Model Selection and Validation
11.1 Model Selection Using SRM
11.2 Validation
11.3 What to Do If Learning Fails
11.5 Exercises
12 Convex Learning Problems
12.1 Convexity, Lipschitzness, and Smoothness
12.2 Convex Learning Problems
12.3 Surrogate Loss Functions
12.5 Bibliographic Remarks
12.6 Exercises
13 Regularization and Stability
13.1 Regularized Loss Minimization
13.2 Stable Rules Do Not Overfit
13.3 Tikhonov Regularization as a Stabilizer
13.4 Controlling the Fitting-Stability Tradeoff
13.6 Bibliographic Remarks
13.7 Exercises
14 Stochastic Gradient Descent
14.1 Gradient Descent
14.2 Subgradients
14.3 Stochastic Gradient Descent (SGD)
14.4 Variants
14.5 Learning with SGD
14.7 Bibliographic Remarks
14.8 Exercises
15 Support Vector Machines
15.1 Margin and Hard-SVM
15.2 Soft-SVM and Norm Regularization
15.3 Optimality Conditions and “Support Vectors”
15.4 Duality
15.5 Implementing Soft-SVM Using SGD
15.7 Bibliographic Remarks
15.8 Exercises
16 Kernel Methods
16.1 Embeddings into Feature Spaces
16.2 The Kernel Trick
16.3 Implementing Soft-SVM with Kernels
16.5 Bibliographic Remarks
16.6 Exercises
17 Multiclass, Ranking, and Complex Prediction Problems 190
17.1 One-versus-All and All-Pairs 190
17.2 Linear Multiclass Predictors 193
17.3 Structured Output Prediction 198
17.4 Ranking
17.5 Bipartite Ranking and Multivariate Performance Measures
17.7 Bibliographic Remarks
17.8 Exercises
18 Decision Trees
18.1 Sample Complexity
18.2 Decision Tree Algorithms
18.3 Random Forests
18.5 Bibliographic Remarks
18.6 Exercises
19 Nearest Neighbor
19.1 k Nearest Neighbors
19.2 Analysis
19.3 Efficient Implementation
19.5 Bibliographic Remarks
19.6 Exercises
20 Neural Networks
20.1 Feedforward Neural Networks
20.2 Learning Neural Networks
20.3 The Expressive Power of Neural Networks
20.4 The Sample Complexity of Neural Networks
20.5 The Runtime of Learning Neural Networks
20.6 SGD and Backpropagation
20.8 Bibliographic Remarks
20.9 Exercises
Part 3 Additional Learning Models
21 Online Learning
21.1 Online Classification in the Realizable Case
21.2 Online Classification in the Unrealizable Case
21.3 Online Convex Optimization
21.4 The Online Perceptron Algorithm
21.6 Bibliographic Remarks
21.7 Exercises
22 Clustering
22.1 Linkage-Based Clustering Algorithms
22.2 k-Means and Other Cost Minimization Clusterings
22.3 Spectral Clustering
22.4 Information Bottleneck
22.5 A High Level View of Clustering
22.7 Bibliographic Remarks
22.8 Exercises
23 Dimensionality Reduction
23.1 Principal Component Analysis (PCA)
23.2 Random Projections
23.3 Compressed Sensing
23.4 PCA or Compressed Sensing?
23.6 Bibliographic Remarks
23.7 Exercises
24 Generative Models
24.1 Maximum Likelihood Estimator
24.2 Naive Bayes
24.3 Linear Discriminant Analysis
24.4 Latent Variables and the EM Algorithm
24.5 Bayesian Reasoning
24.7 Bibliographic Remarks
24.8 Exercises
25 Feature Selection and Generation
25.1 Feature Selection
25.2 Feature Manipulation and Normalization
25.3 Feature Learning
25.5 Bibliographic Remarks
25.6 Exercises
Part 4 Advanced Theory
26 Rademacher Complexities
26.1 The Rademacher Complexity
26.2 Rademacher Complexity of Linear Classes
26.3 Generalization Bounds for SVM
26.4 Generalization Bounds for Predictors with Low Norm
26.5 Bibliographic Remarks
27 Covering Numbers
27.1 Covering
27.2 From Covering to Rademacher Complexity via Chaining
27.3 Bibliographic Remarks
28 Proof of the Fundamental Theorem of Learning Theory
28.1 The Upper Bound for the Agnostic Case
28.2 The Lower Bound for the Agnostic Case
28.3 The Upper Bound for the Realizable Case
29 Multiclass Learnability
29.1 The Natarajan Dimension
29.2 The Multiclass Fundamental Theorem
29.3 Calculating the Natarajan Dimension
29.4 On Good and Bad ERMs
29.5 Bibliographic Remarks
29.6 Exercises
30 Compression Bounds
30.1 Compression Bounds
30.2 Examples
30.3 Bibliographic Remarks
31 PAC-Bayes
31.1 PAC-Bayes Bounds
31.2 Bibliographic Remarks
31.3 Exercises
Appendix A Technical Lemmas
Appendix B Measure Concentration
B.1 Markov’s Inequality
B.2 Chebyshev’s Inequality
B.3 Chernoff’s Bounds
B.4 Hoeffding’s Inequality
B.5 Bennet’s and Bernstein’s Inequalities
B.6 Slud’s Inequality
B.7 Concentration of χ2 Variables
Appendix C Linear Algebra
C.1 Basic Definitions
C.2 Eigenvalues and Eigenvectors
C.3 Positive definite matrices
C.4 Singular Value Decomposition (SVD)
No comments:
Post a Comment