139,99 €
The parameter estimation and hypothesis testing are the basic tools in statistical inference. These techniques occur in many applications of data processing., and methods of Monte Carlo have become an essential tool to assess performance. For pedagogical purposes the book includes several computational problems and exercices. To prevent students from getting stuck on exercises, detailed corrections are provided.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 337
Cover
Title
Copyright
Preface
Notations and Abbreviations
A Few Functions of Python®
1 Useful Maths
1.1. Basic concepts on probability
1.2. Conditional expectation
1.3. Projection theorem
1.4. Gaussianity
1.5. Random variable transformation
1.6. Fundamental theorems of statistics
1.7. A few probability distributions
2 Statistical Inferences
2.1. First step: visualizing data
2.2. Reduction of dataset dimensionality
2.3. Some vocabulary
2.4. Statistical model
2.5. Hypothesis testing
2.6. Statistical estimation
3 Inferences on HMM
3.1. Hidden Markov models (HMM)
3.2. Inferences on HMM
3.3. Filtering: general case
3.4. Gaussian linear case: Kalman algorithm
3.5. Discrete finite Markov case
4 Monte-Carlo Methods
4.1. Fundamental theorems
4.2. Stating the problem
4.3. Generating random variables
4.4. Variance reduction
5 Hints and Solutions
5.1. Useful maths
5.2. Statistical inferences
5.3. Inferences on HMM
5.4. Monte-Carlo methods
Bibliography
Index
End User License Agreement
Cover
Table of Contents
Begin Reading
C1
iii
iv
v
ix
x
xi
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
G1
G2
G3
G4
G5
Maurice Charbit
First published 2017 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2017
The rights of Maurice Charbit to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2016955620
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78630-126-0
This book addresses the fundamental bases of statistical inferences. We shall presume throughout that readers have a good working knowledge of Python® language and of the basic elements of digital signal processing.
The most recent version is Python® 3.x, but many people are still working with Python® 2.x versions. All codes provided in this book work with both these versions. The official home page of the Python® Programming Language is https://www.python.org/. Spyder® is a useful open-source integrated development environment (IDE) for programming in the Python® language. Briefly, we suggest to use the Anaconda Python distribution, which includes both Python® and Spyder®. The Anaconda Python distribution is located at https://www.continuum.io/downloads/.
The large part of the examples given in this book mainly use the modules numPy, which provides powerful numerical arrays objects, Scipy with high-level data processing routines, such as optimization, regression, interpolation and Matplotlib for plotting curves, histograms, Box and Whiskers plots, etc. See a list of useful functions p. xiii.
A brief outline of the contents of the book is given below.
In the first chapter, a short review of probability theory is presented, focusing on conditional probability, projection theorem and random variable transformation. A number of statistical elements will also be presented, including the great number law and the limit-central theorem.
The second chapter is devoted to statistical inference. Statistical inference consists of deducing some features of interest from a set of observations to a certain confidence level of reliability. This refers to a variety of techniques. In this chapter, we mainly focus on hypothesis testing, regression analysis, parameter estimation and determination of confidence intervals. Key notions include the Cramer–Rao bound, the Neyman–Pearson theorem, likelihood ratio tests, the least squares method for linear models, the method of moments and the maximum likelihood approach. The least squares method is a standard approach in regression analysis, and it is discussed in detail.
In many problems, the variables of interest are only partially observed. Hidden Markov models (HMM) are well suited to accommodate this kind of problem. Their applications cover a wide range of fields, such as speech processing, handwriting recognition, the DNA analysis and monitoring and control. There are several issues with HMM inference. The key algorithms are the well-known Kalman filter, the Baum–Welch algorithm and the Viterbi algorithm to list only the most famous ones.
Monte-Carlo methods refer to a broad class of algorithms that serve to perform quantities of interest. Typically, the quantities are integrals, i.e. the expectations of a given function. The key idea is using random sequences instead of deterministic sequences to achieve this result. The main issues are first the choice of the most appropriate random mechanism and, second, how to generate such a mechanism. In Chapter 4, the acceptance–rejection method, the Metropolis–Hastings algorithm, the Gibbs sampler, the importance sampling method, etc., are presented.
Maurice CHARBIT
October 2016
To get function documentation, use .__doc__, e.g. print(range.__doc__), or help, e.g. help(zeros) or help(’def’), or ?, e.g. range.count?
–
def
: introduces a function definition
–
if, else, elif
: an if statement consists of a Boolean expression followed by one or more statements
–
for
: executes a sequence of statements multiple times
–
while
: repeats a statement or group of statements while a given condition is true
–
1j
or
complex
: returns complex value, e.g.
a=1.3+1j*0.2
or
a=complex(1.3,0.2)
Methods:
– type
A=array([0,4,12,3])
, then type
A.
and
tab
, it follows a lot of methods, e.g. the argument of the maximum using
A.argmax
. For help type, e.g.
A.dot?
.
Functions:
–
int
: converts a number or string to an integer
–
len
: returns the number of items in a container
–
range
: returns an object that produces a sequence of integers
–
type
: returns the object type
From numpy:
–
abs
: returns the absolute value of the argument
–
arange
: returns evenly spaced values within a given interval
–
argwhere
: finds the indices of array elements that are non-zero, grouped by element
–
array
: creates an array
–
cos, sin, tan
: respectively calculate the cosine, the sine and the tangent
–
cosh
: calculates the hyperbolic cosine
–
cumsum
: calculates the cumulative sum of array elements
–
diff
: calculates the
n
-th discrete difference along a given axis
–
dot
: product of two arrays
–
exp, log
: respectively calculate the exponential, the logarithm
–
fft
: calculates the fft
–
isinf
: tests element-wise for positive or negative infinity
–
isnan
: tests element-wise for nan
–
linspace
: returns evenly spaced numbers over a specified interval
–
loadtxt
: loads data from a text file
–
matrix
: returns a matrix from an array-like object, or from a string of data
–
max
: returns the maximum of an array or maximum along an axis
–
mean, std
: respectively return the arithmetic mean and the standard deviation
–
min
: returns the minimum of an array or maximum along an axis
–
nanmean, nanstd
: respectively return the arithmetic mean and the standard deviation along a given axis while ignoring NaNs
–
nansum
: sum of array elements over a given axis, while ignoring NaNs
–
ones
: returns a new array of given shape and type, filled with ones
–
pi
: 3.141592653589793
–
setdiff1d
: returns the sorted, unique values of one array that are not in the other
–
size
: returns the number of elements along a given axis
–
sort
: returns a sorted copy of an array
–
sqrt
: computes the positive square-root of an array
–
sum
: sum of array elements over a given axis
–
zeros
: returns a new array of given shape and type, filled with zeroes
From numpy.linalg:
–
eig
: computes the eigenvalues and right eigenvectors of a square array
–
pinv
: computes the (Moore–Penrose) pseudo-inverse of a matrix
–
inv
: computes the (multiplicative) inverse of a matrix
–
svd
: computes Singular Value Decomposition
From numpy.random:
–
rand
: draws random samples from a uniform distribution over (0, 1)
–
randn
: draws random samples from the “standard normal” distribution
–
randint
: draws random integers from ‘low’ (inclusive) to ‘high’ (exclusive)
From scipy:
(for the random distributions, use the methods .pdf, .cdf, .isf, .ppf, etc.)
–
norm
: Gaussian random distribution
–
gamma
: gamma random distribution
–
f
: Fisher random distribution
–
t
: Student’s random distribution
–
chi2
: chi-squared random distribution
From scipy.linalg:
–
sqrtm
: computes matrix square root
From matplotlib.pyplot:
–
box, boxplot, clf, figure, hist, legend, plot, show, subplot
–
title, txt, xlabel, xlim, xticks, ylabel, ylim, yticks
Datasets:
–
statsmodels.api.datasets.co2, statsmodels.api.datasets.nile, statsmodels.api.datasets.star98, statsmodels.api.datasets.heart
–
sklearn.datasets.load_boston, sklearn.datasets.load_diabetes
–
scipy.misc.ascent
From sympy:
–
Symbol, Matrix, diff, Inverse, trace, simplify