Given that performance is so close to the same, my only question now is whether there's a measurable difference in power usage between shift and multiply.
Most processors will decode an instruction into micro-ops and guess what that decode phase can do? It can say "Is this a power of 2? Great, engage the shifting operator".
Only on super simple processors (think embedded systems) would this ever actually make a difference. Anything else, this is an optimization that your processor is going to do for you automatically.
Presumably the GP was talking about multiplying by an immediate value embedded in the instruction. That would be possible, but I doubt there are any current processors that do so.
You can do it in decode if the relevant operand is encoded as an immediate. I'm not aware of any processors that do any of the strength reductions discussed here in that way.