Stochastic LQR: Part 2 – Bellman’s Principle of Optimality (+ some life updates)
Published:
Wow, it’s been a month already…A lot happend. Here is a partial snapshot of what happened:
- Got my phone and wallet robbed robbed in Chicago. A lot to tell, but saving it for later.
- Moved into a new APT! Had many problems at my old…well, since I was in a rush finding.
- My committee is formed. It was a relatively smooth experience for me!
Okay, so I am going to continue from where we left off from the last.
In our last post, we defined our dynamics, and the cosst as follows:
\[x_{k+1} = Ax_{k} + Bu_{k} + w_{k}.\] \[J := \min_{u_{0}, \cdots, u_{N-1}} \mathbb{E}\large[ \sum_{\ell = k}^{N-1} ( x_{k}^{\top} Q x_{k} + u_{k}^{\top} R u_{k}) + x_{N}^{\top} Q_{f} x_{N} \large],\]Similarly, we can also define the quantity called the value function with the expectation:
\[V_{k}(z) := \min_{u_{0}, \cdots, u_{N-1}} \mathbb{E}\large[ \sum_{\ell = k}^{N-1} ( x_{k}^{\top} Q x_{k} + u_{k}^{\top} R u_{k}) + x_{N}^{\top} Q_{f} x_{N} \large | x_{k} = z],\]which encodes the minimum expected cost starting from the state \(z\).
By the Bellman’s Principle of Optimality, we can define the value function recursively:
\[V_{k}(z) := \min_{u} \{\underbrace{z^{\top}Qz + u^{\top}Ru}_{\text{Cost at state } x_{k}=z} + \underbrace{\mathbb{E}_{w_{k}}\large[ V_{k+1}(\underbrace{Az + Bu + w_{k}}_{=z_{k+1}})}_{\text{Expected future cost}} \large]\},\]At this point, we sort of take a leap of faith (?) for a moment…
Assume an ansatz (“educated guess”, German) of the form \(V_{k}(z) = z^{\top}P_{k}z + q_{k}\), and some boundary conditions, \(P_{N} = Q_{f}\), and \(q_{N} = 0\), such that \(V_{N}(z) = z^{\top}P_{N}z = z^{\top}Q_{f}z\).
Well, is this “ansatz” true? To check, we can do an induction, backwards in time, from \(k+1 = N\) to \(k\).
(Hint: see below)
…To cut to the chase, we get something like this, which we can use to solve for the minimizer \(u^{*}\):
\[V_{k}(z) := \min_{u} \{\begin{bmatrix}z \\ u \end{bmatrix}^{\top}\begin{bmatrix}A^{\top}P_{k+1}A + Q & A^{\top}P_{k+1}B \\ B^{\top}P_{k+1}A & B^{\top}P_{k+1}B + R\end{bmatrix} \begin{bmatrix}z \\ u\end{bmatrix}+ \underbrace{Tr(P_{k+1} \Sigma_{w}) + q_{k+1}}_{\text{Constant with respect to }u}\},\]The minimizer has the following form:
\[u^{*}(z) = -\underbrace{(B^{\top}P_{k+1}B + R)^{-1}(B^{\top}P_{k+1}A)}_{K_{k} = (B^{\top}P_{k+1}B + R)^{-1}(B^{\top}P_{k+1}A)}z,\]or simply, \(u^*(z) = -K_{k}z\). Notice how the expression \(K_{k}\) depends on this matrix variable \(P_{k}\).
The variable \(P_{k}\) is in fact the solution to the Riccati recursion (found by plugging in \(u^*(z)\) back into \(V_{k}(z)\)).
With th eboundary conditions we assumed, and the steady-state assumption, we can find the steady-state solutions to the Discrete-time Algebraic Riccati Equations (DARE).
Then, at steady state, the optimal policy becomes time-invariant:
\(u^{*}(z) = -Kz\),
where \(K = (B^{\top}PB + R)^{-1}(B^{\top}PA)\)
We will stop here for today. In the next, I am planning to discuss the duality principle, and look at the Kalman “Filter” that mirrors the “Controller” we just derived.