ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Probabilistic Machine Learning : An Introduction

دانلود کتاب یادگیری ماشین احتمالی: مقدمه

Probabilistic Machine Learning : An Introduction

مشخصات کتاب

Probabilistic Machine Learning : An Introduction

ویرایش:  
نویسندگان:   
سری:  
ISBN (شابک) : 2021027430, 9780262369305 
ناشر: MIT Press 
سال نشر: 2022 
تعداد صفحات:  
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 24 Mb 

قیمت کتاب (تومان) : 47,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 14


در صورت تبدیل فایل کتاب Probabilistic Machine Learning : An Introduction به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب یادگیری ماشین احتمالی: مقدمه نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Brief Contents
Contents
Preface
Introduction
	What is machine learning?
	Supervised learning
		Classification
		Regression
		Overfitting and generalization
		No free lunch theorem
	Unsupervised learning
		Clustering
		Discovering latent ``factors of variation\'\'
		Self-supervised learning
		Evaluating unsupervised learning
	Reinforcement learning
	Data
		Some common image datasets
		Some common text datasets
		Preprocessing discrete input data
		Preprocessing text data
		Handling missing data
	Discussion
		The relationship between ML and other fields
		Structure of the book
		Caveats
I Foundations
	Probability: Univariate Models
		Introduction
			What is probability?
			Types of uncertainty
			Probability as an extension of logic
		Random variables
			Discrete random variables
			Continuous random variables
			Sets of related random variables
			Independence and conditional independence
			Moments of a distribution
			Limitations of summary statistics *
		Bayes\' rule
			Example: Testing for COVID-19
			Example: The Monty Hall problem
			Inverse problems *
		Bernoulli and binomial distributions
			Definition
			Sigmoid (logistic) function
			Binary logistic regression
		Categorical and multinomial distributions
			Definition
			Softmax function
			Multiclass logistic regression
			Log-sum-exp trick
		Univariate Gaussian (normal) distribution
			Cumulative distribution function
			Probability density function
			Regression
			Why is the Gaussian distribution so widely used?
			Dirac delta function as a limiting case
		Some other common univariate distributions *
			Student t distribution
			Cauchy distribution
			Laplace distribution
			Beta distribution
			Gamma distribution
			Empirical distribution
		Transformations of random variables *
			Discrete case
			Continuous case
			Invertible transformations (bijections)
			Moments of a linear transformation
			The convolution theorem
			Central limit theorem
			Monte Carlo approximation
		Exercises
	Probability: Multivariate Models
		Joint distributions for multiple random variables
			Covariance
			Correlation
			Uncorrelated does not imply independent
			Correlation does not imply causation
			Simpson\'s paradox
		The multivariate Gaussian (normal) distribution
			Definition
			Mahalanobis distance
			Marginals and conditionals of an MVN *
			Example: conditioning a 2d Gaussian
			Example: Imputing missing values *
		Linear Gaussian systems *
			Bayes rule for Gaussians
			Derivation *
			Example: Inferring an unknown scalar
			Example: inferring an unknown vector
			Example: sensor fusion
		The exponential family *
			Definition
			Example
			Log partition function is cumulant generating function
			Maximum entropy derivation of the exponential family
		Mixture models
			Gaussian mixture models
			Bernoulli mixture models
		Probabilistic graphical models *
			Representation
			Inference
			Learning
		Exercises
	Statistics
		Introduction
		Maximum likelihood estimation (MLE)
			Definition
			Justification for MLE
			Example: MLE for the Bernoulli distribution
			Example: MLE for the categorical distribution
			Example: MLE for the univariate Gaussian
			Example: MLE for the multivariate Gaussian
			Example: MLE for linear regression
		Empirical risk minimization (ERM)
			Example: minimizing the misclassification rate
			Surrogate loss
		Other estimation methods *
			The method of moments
			Online (recursive) estimation
		Regularization
			Example: MAP estimation for the Bernoulli distribution
			Example: MAP estimation for the multivariate Gaussian *
			Example: weight decay
			Picking the regularizer using a validation set
			Cross-validation
			Early stopping
			Using more data
		Bayesian statistics *
			Conjugate priors
			The beta-binomial model
			The Dirichlet-multinomial model
			The Gaussian-Gaussian model
			Beyond conjugate priors
			Credible intervals
			Bayesian machine learning
			Computational issues
		Frequentist statistics *
			Sampling distributions
			Gaussian approximation of the sampling distribution of the MLE
			Bootstrap approximation of the sampling distribution of any estimator
			Confidence intervals
			Caution: Confidence intervals are not credible
			The bias-variance tradeoff
		Exercises
	Decision Theory
		Bayesian decision theory
			Basics
			Classification problems
			ROC curves
			Precision-recall curves
			Regression problems
			Probabilistic prediction problems
		Choosing the ``right\'\' model
			Bayesian hypothesis testing
			Bayesian model selection
			Occam\'s razor
			Connection between cross validation and marginal likelihood
			Information criteria
			Posterior inference over effect sizes and Bayesian significance testing
		Frequentist decision theory
			Computing the risk of an estimator
			Consistent estimators
			Admissible estimators
		Empirical risk minimization
			Empirical risk
			Structural risk
			Cross-validation
			Statistical learning theory *
		Frequentist hypothesis testing *
			Likelihood ratio test
			Null hypothesis significance testing (NHST)
			p-values
			p-values considered harmful
			Why isn\'t everyone a Bayesian?
		Exercises
	Information Theory
		Entropy
			Entropy for discrete random variables
			Cross entropy
			Joint entropy
			Conditional entropy
			Perplexity
			Differential entropy for continuous random variables *
		Relative entropy (KL divergence) *
			Definition
			Interpretation
			Example: KL divergence between two Gaussians
			Non-negativity of KL
			KL divergence and MLE
			Forward vs reverse KL
		Mutual information *
			Definition
			Interpretation
			Example
			Conditional mutual information
			MI as a ``generalized correlation coefficient\'\'
			Normalized mutual information
			Maximal information coefficient
			Data processing inequality
			Sufficient Statistics
			Fano\'s inequality *
		Exercises
	Linear Algebra
		Introduction
			Notation
			Vector spaces
			Norms of a vector and matrix
			Properties of a matrix
			Special types of matrices
		Matrix multiplication
			Vector–vector products
			Matrix–vector products
			Matrix–matrix products
			Application: manipulating data matrices
			Kronecker products *
			Einstein summation *
		Matrix inversion
			The inverse of a square matrix
			Schur complements *
			The matrix inversion lemma *
			Matrix determinant lemma *
			Application: deriving the conditionals of an MVN *
		Eigenvalue decomposition (EVD)
			Basics
			Diagonalization
			Eigenvalues and eigenvectors of symmetric matrices
			Geometry of quadratic forms
			Standardizing and whitening data
			Power method
			Deflation
			Eigenvectors optimize quadratic forms
		Singular value decomposition (SVD)
			Basics
			Connection between SVD and EVD
			Pseudo inverse
			SVD and the range and null space of a matrix *
			Truncated SVD
		Other matrix decompositions *
			LU factorization
			QR decomposition
			Cholesky decomposition
		Solving systems of linear equations *
			Solving square systems
			Solving underconstrained systems (least norm estimation)
			Solving overconstrained systems (least squares estimation)
		Matrix calculus
			Derivatives
			Gradients
			Directional derivative
			Total derivative *
			Jacobian
			Hessian
			Gradients of commonly used functions
		Exercises
	Optimization
		Introduction
			Local vs global optimization
			Constrained vs unconstrained optimization
			Convex vs nonconvex optimization
			Smooth vs nonsmooth optimization
		First-order methods
			Descent direction
			Step size (learning rate)
			Convergence rates
			Momentum methods
		Second-order methods
			Newton\'s method
			BFGS and other quasi-Newton methods
			Trust region methods
		Stochastic gradient descent
			Application to finite sum problems
			Example: SGD for fitting linear regression
			Choosing the step size (learning rate)
			Iterate averaging
			Variance reduction *
			Preconditioned SGD
		Constrained optimization
			Lagrange multipliers
			The KKT conditions
			Linear programming
			Quadratic programming
			Mixed integer linear programming *
		Proximal gradient method *
			Projected gradient descent
			Proximal operator for 1-norm regularizer
			Proximal operator for quantization
			Incremental (online) proximal methods
		Bound optimization *
			The general algorithm
			The EM algorithm
			Example: EM for a GMM
		Blackbox and derivative free optimization
		Exercises
II Linear Models
	Linear Discriminant Analysis
		Introduction
		Gaussian discriminant analysis
			Quadratic decision boundaries
			Linear decision boundaries
			The connection between LDA and logistic regression
			Model fitting
			Nearest centroid classifier
			Fisher\'s linear discriminant analysis *
		Naive Bayes classifiers
			Example models
			Model fitting
			Bayesian naive Bayes
			The connection between naive Bayes and logistic regression
		Generative vs discriminative classifiers
			Advantages of discriminative classifiers
			Advantages of generative classifiers
			Handling missing features
		Exercises
	Logistic Regression
		Introduction
		Binary logistic regression
			Linear classifiers
			Nonlinear classifiers
			Maximum likelihood estimation
			Stochastic gradient descent
			Perceptron algorithm
			Iteratively reweighted least squares
			MAP estimation
			Standardization
		Multinomial logistic regression
			Linear and nonlinear classifiers
			Maximum likelihood estimation
			Gradient-based optimization
			Bound optimization
			MAP estimation
			Maximum entropy classifiers
			Hierarchical classification
			Handling large numbers of classes
		Robust logistic regression *
			Mixture model for the likelihood
			Bi-tempered loss
		Bayesian logistic regression *
			Laplace approximation
			Approximating the posterior predictive
		Exercises
	Linear Regression
		Introduction
		Least squares linear regression
			Terminology
			Least squares estimation
			Other approaches to computing the MLE
			Measuring goodness of fit
		Ridge regression
			Computing the MAP estimate
			Connection between ridge regression and PCA
			Choosing the strength of the regularizer
		Lasso regression
			MAP estimation with a Laplace prior (1 regularization)
			Why does 1 regularization yield sparse solutions?
			Hard vs soft thresholding
			Regularization path
			Comparison of least squares, lasso, ridge and subset selection
			Variable selection consistency
			Group lasso
			Elastic net (ridge and lasso combined)
			Optimization algorithms
		Regression splines *
			B-spline basis functions
			Fitting a linear model using a spline basis
			Smoothing splines
			Generalized additive models
		Robust linear regression *
			Laplace likelihood
			Student-t likelihood
			Huber loss
			RANSAC
		Bayesian linear regression *
			Priors
			Posteriors
			Example
			Computing the posterior predictive
			The advantage of centering
			Dealing with multicollinearity
			Automatic relevancy determination (ARD) *
		Exercises
	Generalized Linear Models *
		Introduction
		Examples
			Linear regression
			Binomial regression
			Poisson regression
		GLMs with non-canonical link functions
		Maximum likelihood estimation
		Worked example: predicting insurance claims
III Deep Neural Networks
	Neural Networks for Tabular Data
		Introduction
		Multilayer perceptrons (MLPs)
			The XOR problem
			Differentiable MLPs
			Activation functions
			Example models
			The importance of depth
			The ``deep learning revolution\'\'
			Connections with biology
		Backpropagation
			Forward vs reverse mode differentiation
			Reverse mode differentiation for multilayer perceptrons
			Vector-Jacobian product for common layers
			Computation graphs
		Training neural networks
			Tuning the learning rate
			Vanishing and exploding gradients
			Non-saturating activation functions
			Residual connections
			Parameter initialization
			Parallel training
		Regularization
			Early stopping
			Weight decay
			Sparse DNNs
			Dropout
			Bayesian neural networks
			Regularization effects of (stochastic) gradient descent *
		Other kinds of feedforward networks *
			Radial basis function networks
			Mixtures of experts
		Exercises
	Neural Networks for Images
		Introduction
		Common layers
			Convolutional layers
			Pooling layers
			Putting it all together
			Normalization layers
		Common architectures for image classification
			LeNet
			AlexNet
			GoogLeNet (Inception)
			ResNet
			DenseNet
			Neural architecture search
		Other forms of convolution *
			Dilated convolution
			Transposed convolution
			Depthwise separable convolution
		Solving other discriminative vision tasks with CNNs *
			Image tagging
			Object detection
			Instance segmentation
			Semantic segmentation
			Human pose estimation
		Generating images by inverting CNNs *
			Converting a trained classifier into a generative model
			Image priors
			Visualizing the features learned by a CNN
			Deep Dream
			Neural style transfer
	Neural Networks for Sequences
		Introduction
		Recurrent neural networks (RNNs)
			Vec2Seq (sequence generation)
			Seq2Vec (sequence classification)
			Seq2Seq (sequence translation)
			Teacher forcing
			Backpropagation through time
			Vanishing and exploding gradients
			Gating and long term memory
			Beam search
		1d CNNs
			1d CNNs for sequence classification
			Causal 1d CNNs for sequence generation
		Attention
			Attention as soft dictionary lookup
			Kernel regression as non-parametric attention
			Parametric attention
			Seq2Seq with attention
			Seq2vec with attention (text classification)
			Seq+Seq2Vec with attention (text pair classification)
			Soft vs hard attention
		Transformers
			Self-attention
			Multi-headed attention
			Positional encoding
			Putting it all together
			Comparing transformers, CNNs and RNNs
			Transformers for images *
			Other transformer variants *
		Efficient transformers *
			Fixed non-learnable localized attention patterns
			Learnable sparse attention patterns
			Memory and recurrence methods
			Low-rank and kernel methods
		Language models and unsupervised representation learning
			ELMo
			BERT
			GPT
			T5
			Discussion
IV Nonparametric Models
	Exemplar-based Methods
		K nearest neighbor (KNN) classification
			Example
			The curse of dimensionality
			Reducing the speed and memory requirements
			Open set recognition
		Learning distance metrics
			Linear and convex methods
			Deep metric learning
			Classification losses
			Ranking losses
			Speeding up ranking loss optimization
			Other training tricks for DML
		Kernel density estimation (KDE)
			Density kernels
			Parzen window density estimator
			How to choose the bandwidth parameter
			From KDE to KNN classification
			Kernel regression
	Kernel Methods *
		Mercer kernels
			Mercer\'s theorem
			Some popular Mercer kernels
		Gaussian processes
			Noise-free observations
			Noisy observations
			Comparison to kernel regression
			Weight space vs function space
			Numerical issues
			Estimating the kernel
			GPs for classification
			Connections with deep learning
			Scaling GPs to large datasets
		Support vector machines (SVMs)
			Large margin classifiers
			The dual problem
			Soft margin classifiers
			The kernel trick
			Converting SVM outputs into probabilities
			Connection with logistic regression
			Multi-class classification with SVMs
			How to choose the regularizer C
			Kernel ridge regression
			SVMs for regression
		Sparse vector machines
			Relevance vector machines (RVMs)
			Comparison of sparse and dense kernel methods
		Exercises
	Trees, Forests, Bagging, and Boosting
		Classification and regression trees (CART)
			Model definition
			Model fitting
			Regularization
			Handling missing input features
			Pros and cons
		Ensemble learning
			Stacking
			Ensembling is not Bayes model averaging
		Bagging
		Random forests
		Boosting
			Forward stagewise additive modeling
			Quadratic loss and least squares boosting
			Exponential loss and AdaBoost
			LogitBoost
			Gradient boosting
		Interpreting tree ensembles
			Feature importance
			Partial dependency plots
V Beyond Supervised Learning
	Learning with Fewer Labeled Examples
		Data augmentation
			Examples
			Theoretical justification
		Transfer learning
			Fine-tuning
			Adapters
			Supervised pre-training
			Unsupervised pre-training (self-supervised learning)
			Domain adaptation
		Semi-supervised learning
			Self-training and pseudo-labeling
			Entropy minimization
			Co-training
			Label propagation on graphs
			Consistency regularization
			Deep generative models *
			Combining self-supervised and semi-supervised learning
		Active learning
			Decision-theoretic approach
			Information-theoretic approach
			Batch active learning
		Meta-learning
			Model-agnostic meta-learning (MAML)
		Few-shot learning
			Matching networks
		Weakly supervised learning
		Exercises
	Dimensionality Reduction
		Principal components analysis (PCA)
			Examples
			Derivation of the algorithm
			Computational issues
			Choosing the number of latent dimensions
		Factor analysis *
			Generative model
			Probabilistic PCA
			EM algorithm for FA/PPCA
			Unidentifiability of the parameters
			Nonlinear factor analysis
			Mixtures of factor analysers
			Exponential family factor analysis
			Factor analysis models for paired data
		Autoencoders
			Bottleneck autoencoders
			Denoising autoencoders
			Contractive autoencoders
			Sparse autoencoders
			Variational autoencoders
		Manifold learning *
			What are manifolds?
			The manifold hypothesis
			Approaches to manifold learning
			Multi-dimensional scaling (MDS)
			Isomap
			Kernel PCA
			Maximum variance unfolding (MVU)
			Local linear embedding (LLE)
			Laplacian eigenmaps
			t-SNE
		Word embeddings
			Latent semantic analysis / indexing
			Word2vec
			GloVE
			Word analogies
			RAND-WALK model of word embeddings
			Contextual word embeddings
		Exercises
	Clustering
		Introduction
			Evaluating the output of clustering methods
		Hierarchical agglomerative clustering
			The algorithm
			Example
			Extensions
		K means clustering
			The algorithm
			Examples
			Vector quantization
			The K-means++ algorithm
			The K-medoids algorithm
			Speedup tricks
			Choosing the number of clusters K
		Clustering using mixture models
			Mixtures of Gaussians
			Mixtures of Bernoullis
		Spectral clustering *
			Normalized cuts
			Eigenvectors of the graph Laplacian encode the clustering
			Example
			Connection with other methods
		Biclustering *
			Basic biclustering
			Nested partition models (Crosscat)
	Recommender Systems
		Explicit feedback
			Datasets
			Collaborative filtering
			Matrix factorization
			Autoencoders
		Implicit feedback
			Bayesian personalized ranking
			Factorization machines
			Neural matrix factorization
		Leveraging side information
		Exploration-exploitation tradeoff
	Graph Embeddings *
		Introduction
		Graph Embedding as an Encoder/Decoder Problem
		Shallow graph embeddings
			Unsupervised embeddings
			Distance-based: Euclidean methods
			Distance-based: non-Euclidean methods
			Outer product-based: Matrix factorization methods
			Outer product-based: Skip-gram methods
			Supervised embeddings
		Graph Neural Networks
			Message passing GNNs
			Spectral Graph Convolutions
			Spatial Graph Convolutions
			Non-Euclidean Graph Convolutions
		Deep graph embeddings
			Unsupervised embeddings
			Semi-supervised embeddings
		Applications
			Unsupervised applications
			Supervised applications
Notation
	Introduction
	Common mathematical symbols
	Functions
		Common functions of one argument
		Common functions of two arguments
		Common functions of >2 arguments
	Linear algebra
		General notation
		Vectors
		Matrices
		Matrix calculus
	Optimization
	Probability
	Information theory
	Statistics and machine learning
		Supervised learning
		Unsupervised learning and generative models
		Bayesian inference
	Abbreviations
Index
Bibliography




نظرات کاربران