Stochastic LQR: Part 2 – Bellman’s Principle of Optimality (+ some life updates)

2 minute read

Published:

Wow, it’s been a month already…A lot happend. Here is a partial snapshot of what happened:

  • Got my phone and wallet robbed robbed in Chicago. A lot to tell, but saving it for later.
  • Moved into a new APT! Had many problems at my old…well, since I was in a rush finding.
  • My committee is formed. It was a relatively smooth experience for me!

Okay, so I am going to continue from where we left off from the last.

In our last post, we defined our dynamics, and the cosst as follows:

\[x_{k+1} = Ax_{k} + Bu_{k} + w_{k}.\] \[J := \min_{u_{0}, \cdots, u_{N-1}} \mathbb{E}\large[ \sum_{\ell = k}^{N-1} ( x_{k}^{\top} Q x_{k} + u_{k}^{\top} R u_{k}) + x_{N}^{\top} Q_{f} x_{N} \large],\]

Similarly, we can also define the quantity called the value function with the expectation:

\[V_{k}(z) := \min_{u_{0}, \cdots, u_{N-1}} \mathbb{E}\large[ \sum_{\ell = k}^{N-1} ( x_{k}^{\top} Q x_{k} + u_{k}^{\top} R u_{k}) + x_{N}^{\top} Q_{f} x_{N} \large | x_{k} = z],\]

which encodes the minimum expected cost starting from the state \(z\).

By the Bellman’s Principle of Optimality, we can define the value function recursively:

\[V_{k}(z) := \min_{u} \{\underbrace{z^{\top}Qz + u^{\top}Ru}_{\text{Cost at state } x_{k}=z} + \underbrace{\mathbb{E}_{w_{k}}\large[ V_{k+1}(\underbrace{Az + Bu + w_{k}}_{=z_{k+1}})}_{\text{Expected future cost}} \large]\},\]

At this point, we sort of take a leap of faith (?) for a moment…

Assume an ansatz (“educated guess”, German) of the form \(V_{k}(z) = z^{\top}P_{k}z + q_{k}\), and some boundary conditions, \(P_{N} = Q_{f}\), and \(q_{N} = 0\), such that \(V_{N}(z) = z^{\top}P_{N}z = z^{\top}Q_{f}z\).

Well, is this “ansatz” true? To check, we can do an induction, backwards in time, from \(k+1 = N\) to \(k\).

(Hint: see below)

…To cut to the chase, we get something like this, which we can use to solve for the minimizer \(u^{*}\):

\[V_{k}(z) := \min_{u} \{\begin{bmatrix}z \\ u \end{bmatrix}^{\top}\begin{bmatrix}A^{\top}P_{k+1}A + Q & A^{\top}P_{k+1}B \\ B^{\top}P_{k+1}A & B^{\top}P_{k+1}B + R\end{bmatrix} \begin{bmatrix}z \\ u\end{bmatrix}+ \underbrace{Tr(P_{k+1} \Sigma_{w}) + q_{k+1}}_{\text{Constant with respect to }u}\},\]

The minimizer has the following form:

\[u^{*}(z) = -\underbrace{(B^{\top}P_{k+1}B + R)^{-1}(B^{\top}P_{k+1}A)}_{K_{k} = (B^{\top}P_{k+1}B + R)^{-1}(B^{\top}P_{k+1}A)}z,\]

or simply, \(u^*(z) = -K_{k}z\). Notice how the expression \(K_{k}\) depends on this matrix variable \(P_{k}\).

The variable \(P_{k}\) is in fact the solution to the Riccati recursion (found by plugging in \(u^*(z)\) back into \(V_{k}(z)\)).

With th eboundary conditions we assumed, and the steady-state assumption, we can find the steady-state solutions to the Discrete-time Algebraic Riccati Equations (DARE).

Then, at steady state, the optimal policy becomes time-invariant:

\(u^{*}(z) = -Kz\),

where \(K = (B^{\top}PB + R)^{-1}(B^{\top}PA)\)

We will stop here for today. In the next, I am planning to discuss the duality principle, and look at the Kalman “Filter” that mirrors the “Controller” we just derived.