Sounds like you can't get to a lot of the optimisation based on what you guys are saying until you realise what the data is going to be. i.e. you are going to make your code brital at that point, and you guys are advocating you can't future proof your code easily, so therefore you have to wait until there is some awareness of the data.

Also how much does traditional academic optimisation come into it, i.e. using divide and conquer vs linear etc.
Just attach + debug break it :) - Kidding here - but worked for me quite in a lots of cases (especially for tools)
I found to know when and how much memory do u need through a frame is important, often time most data does not need to be persistent . it opens up memory sharing.
BTW I find the non-live nature of these hangouts annoying. i.e. depends when you start viewing it starts from the beginning.
I have been teaching DOD at work and what I have found is that while it is critical to learn about profiling, I find people just can't interpret the profiling data correctly if they don't understand the hardware architecture. For example some people get fixed on instructions retired and don't see LLC misses which dominate performance. I would say understanding the hardware (at the very least the latencies) is critical to doing something useful with profiling data.
