When rereading the lecture notes about coordinate descent, I struggle understanding how to recurse the fina step using iterated expectations: I feel like to obtain the afformentioned result, I would need

which is only true for a constant random variable, and definitely not the case here. I therefore assume that there must be another way, but don't find it (the reference paper does not mention it either)

Hi Yann, indeed. You have to be careful what random variable the expectations are 'over'. The randomness in subsequent optimization steps is independent, so you can combine the expectations easily.

In fact the step I was not seeing is \(\mathbb{E}[f(x_{t+1} - f^*] \leq (1-\frac{\mu}{dL})[f(x_t)-f^*] \Rightarrow \mathbb{E}[f(x_{t+1} - f^*] \leq (1-\frac{\mu}{dL})\mathbb{E}[f(x_t)-f^*] \) (2)

In the recursion we then apply t-1 times (2) and 1 time (1) to obtain the result if I got it well? Correct me if I'm wrong :)

Hi Yann, indeed. You have to be careful what random variable the expectations are 'over'. The randomness in subsequent optimization steps is independent, so you can combine the expectations easily.

## Iterated Expectations, Thrm 10.1

Hello,

When rereading the lecture notes about coordinate descent, I struggle understanding how to recurse the fina step using iterated expectations: I feel like to obtain the afformentioned result, I would need

\(f(x_{t+1}) - f^{*} \leq \mathbb{E}[f(x_{t+1}) - f^{*}) ]\)

which is only true for a constant random variable, and definitely not the case here. I therefore assume that there must be another way, but don't find it (the reference paper does not mention it either)

Thank you in advance for any help!

Best

Yann Mentha

Hi Yann, indeed. You have to be careful what random variable the expectations are 'over'. The randomness in subsequent optimization steps is independent, so you can combine the expectations easily.

Hi Yann, sorry for the late response. Could you please add a reference to the particular result / derivation you are referring to?

Hello Thijs,

I was referring to the last equality in the proof of theorem 10.1 (p.148 of lecture notes):

\(\mathbb{E}[f(x_{t+1} - f^*] \leq (1-\frac{\mu}{dL})[f(x_t)-f^*]\) (1)

In fact the step I was not seeing is

\(\mathbb{E}[f(x_{t+1} - f^*] \leq (1-\frac{\mu}{dL})[f(x_t)-f^*] \Rightarrow \mathbb{E}[f(x_{t+1} - f^*] \leq (1-\frac{\mu}{dL})\mathbb{E}[f(x_t)-f^*] \) (2)

In the recursion we then apply t-1 times (2) and 1 time (1) to obtain the result if I got it well? Correct me if I'm wrong :)

Best,

Yann

Hi Yann, indeed. You have to be careful what random variable the expectations are 'over'. The randomness in subsequent optimization steps is independent, so you can combine the expectations easily.

## Add comment