I generally learn by reading and then, if possible, doing. Thus, for me a key step in learning a new subject is to find a well written book that’s on the right topic and set at the right level. Below I list some of the books I’ve read along with comments on their level, what they cover, their style of presentation etc.
Data Mining: Practical Machine Learning Tools and Techniques
I. H. Witten and E. Frank
This is a gentle and well written introduction to machine learning. The emphasis is on the essential ideas behind various learning algorithms and how to apply these algorithms, rather than getting into all the theory behind them. As such, only basic undergraduate mathematics is required. The book has an emphasis on classification, that is, algorithms that learn to automatically decide which class something belongs to based on various features. The book makes extensive use of WEKA: a fairly comprehensive open source machine learning system written in java that can be freely downloaded. If you’re new to machine learning, this is the place to start.
Machine Learning and Pattern Recognition
C. Bishop
This is a very well written introduction to the theory of machine learning. The book is written for people wanting to get into research, for example PhD students. You will need solid undergraduate mathematics and mathematical statistics. More advanced topics, such as the calculus of variations, are explained in the text as required. The main area it doesn’t cover is reinforcement learning: though given that it’s already over 700 pages I can understand why! The style is mathematical, but not formal. For example, he does not adopt a theorem/proof format. Even on topics that I knew fairly well, the way in which he put all the key ideas together and quickly proved and motivated some of the most important results was wonderfully clear and concise. If you’re serious about machine learning, I highly recommend this book.
Reinforcement Learning
R. S. Sutton and A. G. Barto
This is the standard introductory text book on reinforcement learning. It’s easy to read and not very technical, however, I don’t think it’s as clearly written as it could be. This might be due to the informal style they adopted. I’m not sure. Anyway, if you’re new to reinforcement learning this is the best place to start. At the other extreme is the very technical book Neuro-Dynamic Programming by Bertsekas and Tsitsikli.
Foundations of Statistical Natural Language Processing
C. D. Manning and H. Schütze
If you want to apply machine learning to langauge, for example information retrival, this is a clearly written book that covers many of the basics of the area and doesn’t assume anything more than basic undergraduate mathematics.
Introduction to the Theory of Computation
M. Sipser
This is my favourite book on theoretical computer science. It covers automata through to complexity classes. The format is theorem/proof with plenty of supporting text. If find the style of writing and explanation to be exceptionally clear and easy to understand for what can, at points, be a difficult subject. In particular, I’m a big fan of how he often gives a “proof sketch” before the formal proof in order to give you the intuitive idea of how the proof works.
An Introduction to Kolmogorov Complexity and Its Applications
M. Li and P. M. Vitányi
For Kolmogorov complexity and its various applications, this is the book. In particular, it covers algorithmic probability theory and Solomonoff induction in a greater depth than any other book that I know of. The format is lemma/theorem/proof with a fair amount of supporting text. You’ll need solid undergraduate mathematics and theoretical computer science in order to tackle this book. In my opinion, its main fault is that it contains too many mistakes. The first edition was really bad in this respect, the second edition is better, and there is currently a third version being prepared which I hope will be mostly bug free. For another book on Kolmogorov complexity that has a more mathematical style and focus (connections to randomness, real analysis etc. but not artificial intelligence) see also the book by C. S. Calude.
Universal Artificial Intelligence
M. Hutter
If you want to study all of the technical details of AIXI, this is the book. If you don’t know, AIXI is a theoretical model of a universally “optimal” artificial intelligence. While it’s super powerful in theory, the catch is that it’s not computable and difficult to computationally approximate. Nevertheless, for those with a mathematical inclination, the fact that such a model can be defined and at all is quite something. But be warned: it’s not an easy read. I’d recommend people interested in this book first read the 65 page paper Universal Algorithmic Intelligence: A mathematical top -> down approach. If that’s no problem and you want the 300 page version with all the technical details and extentions, go for the book. The book/paper are essentially error free, as far as I know. For a 30 page gentle introduction to AIXI, try Chapter 2 of my PhD thesis.
Probability with Martingales
D. Williams
Have you ever wondered why in statistics you were given theorems of probability to use without ever really being told where these came from? After reading this book you’ll understand why they didn’t tell you: about 70 pages of equations into the book they finally get to defining p(X|Y). So be warned, it’s a text book written for mathematicians, which makes sense given that generally only mathematicians need to understand probability theory in this much technical depth. Its main limitation is that it doesn’t cover stochastic processes in continuous time. However, once you understand the theory of martigales well in discrete time, going to continuous time isn’t too much of a step.