I've only had a few minutes to check out the document, but it looks pretty interesting. Existing performance analysis tools, like Intel's VTune and AMD's CodeAnalyst, generally create significant overhead when gathering performance information. They usually need code that runs in supervisor mode, for example, and they're just for developer use--they aren't meant to be used in production systems.
LWP lets applications gather their own performance data in real time with new user-mode instructions. This should make it possible for applications to adapt their execution behavior to maximize performance from moment to moment even while other software is running.
I'll have to wait to see what software developers say about this proposal, but I suspect it'll be well received by the developer community. We'll also have to see if Intel accepts the proposal as-is, rejects it outright, or suggests some kind of alternative.
AMD scored big points by defining a practical 64-bit x86 instruction set before Intel could, which shut down Intel's parallel effort before it was ever announced. (Rumors persist that the "Prescott" version of Intel's Pentium 4 was initially designed with Intel-proprietary 64-bit extensions that gave way to an AMD64-compatible implementation later.)
LWP is a small thing by comparison, but AMD could regain a bit of that AMD64 luster if this proposal is accepted.