The Univariate Kalman Filter
Consider a linear dynamical system observed in Gaussian noise:
\[s_{t+1} = as_t + w_t\\ o_{t} = cs_t + v_t\\ w_t \sim N(0,\sigma_w^2)\\ v_t \sim N(0,\sigma_v^2)\\ \mathbf{b}_0 = N(\hat{s}_0, \Sigma_0)\\ s_t \sim \mathbf{b}_t\]with state and observation spaces \(S = O= \mathbb{R}\), discrete time \(t=0,1,...\), and coefficients \(a,c \in \mathbb{R}\). We assume: (i) that \(v_t\) and \(v_k\) for any \(t\) and \(k\) are independent; (ii) \(w_t\) and \(w_k\) for any \(t\) and \(k\) are independent; (iii) \((v_t)_{t \geq 0}\) and \((w_t)_{t=\geq 0}\) are independent processes; and (iv) \(s_0\) is independent of \(v_t\) and \(w_t\) for any \(t\).
The optimal filtering recursion is:
\[\mathbf{b}_{t+1}(s_{t+1}) = \frac{z(o_{t+1} \mid s_{t+1}) \int_{X}p(s_{t+1} \mid s_t)\mathbf{b}_t(s_t)ds_t}{\int_{S}z(o_{t+1} \mid s_{t+1})\int_{S}p(s_{t+1} \mid s_t)\mathbf{b}(s_t)ds_tds_{t+1}}\]and the optimal filtered state estimate at time \(t+1\) is
\[\hat{s}_{t+1|t+1} = \mathbb{E}[s_{t+1} \mid y_{1:t+1}] = \int_{S}s\mathbf{b}_{t+1}(s)ds\]The univariate Kalman filter is defined as
\[\mathbf{b}_{t} = N(\hat{s}_{t|t}, \Sigma_{t|t})\\ \hat{s}_{t|t} = \hat{s}_{t|t-1} + b_t(o_t-c\hat{s}_{t|t-1})\\ \Sigma_{t|t} = \Sigma_{t|t-1} - cb_t\Sigma_{t|t-1}\\ b_t = \frac{c\Sigma_{t|t-1}}{c^2\Sigma_{t|t-1} + \sigma_v^2}\\ \hat{s}_{t|t-1} = a\hat{s}_{t-1|t-1}\\ \Sigma_{t|t-1} = a^2\Sigma_{t-1|t-1} + \sigma^2_w\]where \(\mathbf{b}_{t}\) is the posterior at time t and
\[\hat{s}_{t|t}\]is the state estimate at time t.
To show that the above equations are correct, I derive the Kalman from the Bayesian recursion above. I start by noting some properties of the system.
Lemma 1: At any time \(t \geq 0\), \(s_t\) and \(o_t\) are Gaussian random variables.
Proof:
By definition,
\[s_1 = as_0 + w_0\\ s_2 = as_1 + w_1 = a(as_0 + w_0) + w_1 = a^2s_0 + aw_0 + w_1\\ s_3 = as_2 + w_2 = a(a^2s_0 + aw_0 + w_1) + w_2 = a^3s_0 + a^2w_0 + aw_1 + w_2\\ \vdots\\ s_t = a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k\]Hence, \(s_t\) is a linear combination of the random variables \(s_0, w_0, w_1,...,w_{t-1}\). Since all of these variables are individually Gaussian and independent, they are jointly Gaussian. As Gaussians are closed under linear transformations, it follows that \(s_t\) is Gaussian.
Now consider the measurement \(o_t\), by definition:
\[o_t = cs_t + v_t\]Since, \(s_t\) is Gaussian and \(v_t\) is Gaussian, \(o_t\) is a linear combination of two Gaussians, and is thus also Gaussian.
Using the above lemma, the system can also be written in terms of transition and observation kernels rather than a difference equation:
\(s_{t+1} \sim p(s_{t+1} \mid s_t) = \frac{1}{\sigma_w\sqrt[]{2\pi}}\operatorname{exp}\left(-\frac{1}{2}\left(\frac{s_{t+1} - as_t}{\sigma_w}\right)^2\right)\\ o_{t} \sim z(o_{t} \mid s_t) = \frac{1}{\sigma_v\sqrt[]{2\pi}}\operatorname{exp}\left(-\frac{1}{2}\left(\frac{o_t - cs_t}{\sigma_v}\right)^2\right)\) Q.E.D
Another important property of the system is the following.
Lemma: \(s_t\) and \(o_t\) at any time \(t \geq 0\) are jointly Gaussian.
Proof:
By Lemma 1 we know that
\[s_t = a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k\\ o_t = c\left(a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k\right) + v_t\]Since the random variables \((s_0, w_0,w_1,...,w_t,v_0,v_1,...,v_t)\) are jointly Gaussian (see Lemma 1), \(s_t\) and \(o_t\) are both linear combinations of a set of jointly Gaussian random variables. Let \(\mathbf{x}=(s_0, w_0,w_1,...,w_t,v_0,v_1,...,v_t)\) and let \(\mathbf{n},\mathbf{m}\) denote the coefficients of the linear combinations for \(s_t\) and \(o_t\) respectively. Since \(\mathbf{n}^T\mathbf{x}\) and \(\mathbf{m}^T\mathbf{x}\) are Gaussians, any linear combination of \(\mathbf{n}^T\mathbf{x}\) and \(\mathbf{m}^T\mathbf{x}\) is a Gaussian. Now consider the vector \((\mathbf{n}^T\mathbf{x}, \mathbf{m}^T\mathbf{x})\). Since any linear combination of it components is a Gaussian, \((\mathbf{n}^T\mathbf{x}, \mathbf{m}^T\mathbf{x})\) is by definition a multivariate Gaussian. Q.E.D.
Derivation of the univariate Kalman filter
We derive the Kalman filter equations using mathematical induction. We start with \(t=1\). Using Lemmas 1–2 we obtain
\[\begin{bmatrix} s_0\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} \hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} \Sigma_0 & c\Sigma_0\\ c\Sigma_0 & c\Sigma_0 + \sigma_v^2 \end{bmatrix} \right)\\ \begin{bmatrix} s_1\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} a\hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} a^2\Sigma_0 + \sigma_w^2 & ac\Sigma_0\\ ac\Sigma_0 & c^2\Sigma_0\sigma_v^2 \end{bmatrix} \right)\\ \begin{bmatrix} o_1\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} ac\hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} c^2(a^2\Sigma_0 + \sigma_w^2) + \sigma_v^2 & ac^2\Sigma_0\\ ac^2\Sigma_0 & c^2\Sigma_0\sigma_v^2 \end{bmatrix} \right)\]We know from probability calculus that the conditional of a multivariate Gaussian is also Gaussian. In particular, applying standard Gaussian transformation rules, we obtain
\[p(s_1 \mid o_0) = N(\hat{s}_{1|0}, \Sigma_{1|0})\\ p(o_1 \mid o_0) = N(c\hat{s}_{1|0}, c^2\Sigma_{1|0} + \sigma_v^2)\]where
\[\hat{s}_{1|0} = a\hat{x}_0 + \frac{ac\Sigma_0}{c^2\Sigma_0 + \sigma^2_v}(o_0 - c\hat{s}_0) = a\hat{x}_{0|0}\\ \Sigma_{1|0} = a^2\Sigma_0 + \sigma^2_w - \frac{(ac\Sigma_0)^2}{c^2\Sigma_0 + \sigma_v^2} = a^2\Sigma_{0|0} + \sigma_w^2\]and the expectation and variance are
\[\mathbb{E}[o_1 \mid o_0] = ac\hat{s}_0 + \frac{ac^2\Sigma_0}{c^2\Sigma_0 + \sigma^2_v}(o_0 - c\hat{s}_0) = c\hat{s}_{1,0}\\ Var(o_1 \mid o_0) = c^2(a^2\Sigma_0 + \sigma^2_w) + \sigma_w^2 - \frac{(ac^2\Sigma_0)^2}{c^2P_0 + \sigma_v^2} = c^2\Sigma_{1|0} + \sigma_w^2\]Furthermore
\[\begin{bmatrix} s_1\\ o_1 \end{bmatrix} \mid o_0 \sim N\left( \begin{bmatrix} \hat{s}_{1|0}\\ c\hat{s}_{1|0} \end{bmatrix}, \begin{bmatrix} \Sigma_{1|0} & c\Sigma_{1|0}\\ c\Sigma_{1|0} & c^2\Sigma_{1|0} + \sigma_v^2 \end{bmatrix} \right)\]This means that
\[p(s_1 \mid o_0, o_1) = N(\hat{s}_{1|1}, \Sigma_{1|1})\]where
\[\hat{s}_{1|1} = \hat{s}_{1|0} + \frac{c\Sigma_{1|0}}{c^2\Sigma_{1|0} + \sigma^2_v}(o_1 - c\hat{x}_{1|0})\\ \Sigma_{1|1} = \Sigma_{1|0} - \frac{(c\Sigma_{1|0})^2}{c^2\Sigma_{1|0} + \sigma_v^2}\]This proves the inductive base case. Now assume by induction that the Kalman filter equations hold for \(t=k-1\). Applying the exact same calculations as above, we obtain
\[p(s_k \mid \mathbf{o}_{0:k-1}) = N(\hat{s}_{k|k-1}, \Sigma_{k|k-1})\\ p(s_k \mid \mathbf{o}_{0:k-1}) = N(c\hat{s}_{k|k-1}, c^2\Sigma_{k|k-1} + \sigma_v^2)\]where
\[\hat{s}_{k|k-1} = a\hat{s}_{k-1|k-1}\]and
\[\Sigma_{k|k-1} = a^2\Sigma_{k-1|k-1} + \sigma_w^2.\]Further
\[\begin{bmatrix} s_k\\ o_k \end{bmatrix} \mid \mathbf{o}_{0:k-1} \sim N\left( \begin{bmatrix} \hat{s}_{k|k-1}\\ c\hat{s}_{k|k-1} \end{bmatrix}, \begin{bmatrix} \Sigma_{k|k-1} & c\Sigma_{k|k-1}\\ c\Sigma_{k|k-1} & c^2\Sigma_{k|k-1} + \sigma_v^2 \end{bmatrix} \right)\]which means that
\[p(x_k \mid \mathbf{o}_{0:k}) = N(\hat{s}_{k|k}, \Sigma_{k|k}),\]where
\[\hat{s}_{k|k} = \hat{s}_{k|k-1} + b_k(o_k-c\hat{s}_{k|k-1})\\ \Sigma_{k|k} = \Sigma_{k|k-1} - cb_k\Sigma_{k|k-1}\\ b_k = \frac{c\Sigma_{k|k-1}}{c^2\Sigma_{k|k-1} + \sigma_v^2}\]Which are exactly the Kalman equations. This completes the induction step. Q.E.D.