This book introduces mathematical foundations of statistical modeling of natural language. The author attempts to explain a few statistical power laws satisﬁed by texts in natural language in terms of non-Markovian and non-hidden Markovian discrete stochastic processes with some sort of long-range dependence. To achieve this, he uses various concepts and technical tools from information theory and probability measures. This book begins with an introduction. The ﬁrst half of the book is an introduction to probability measures, information theory, ergodic decomposition, and Kolmogorov complexity, which is provided to make the book relatively self-contained. This section also covers less standard concepts and results, such as excess entropy and generalization of conditional mutual information to σ-ﬁelds. The second part of the book discusses the results concerning power laws for mutual information and maximal repetition, such as theorems about facts and words. There is also a separate chapter discussing toy examples of stochastic processes, which should inspire future work in statistical language modeling.