Given that most visually impaired users would likely already have some
screen reading technology, perhaps the scope of the html->audio might be
more useful as text augmentation to aid the screen reader?
Just thinking from the point of view of the end user benefits, there seems
to be two broad classes of service - those that could in principle be
performed by client software, and those that can't.
Services that can be local:
services such as transcoding/translation, ad-stripping, and content
resizing, refactoring can all potentially be performed on the client.
Reasons why users may prefer to use proxy-based services include:
Processor power (especially for edge devices - PDA's and the like).
Cost (per use service charge rather than paying for a rarely used
Mongolian<->English translation module).
Remote administration (many people would rather phone their ISP than try
and configure text-voice software themselves).
Security
Most of these points have been raised elsewhere, but I don't recall
discussion of the benefit of remote administration. For example, many
people have anti-virus software but fail to keep it up to date. Remote
adminstration of software upgrades of virus, transcoding and translating
software would be a compelling benefit for many users (and hence a
product-differentiator for ISPs providing the service).
Proxy-only services:
Services that can in principle only be performed on the proxy include
performance related services such as caching and content-stripping for
low-bandwith connections, and services that provide users with information
not available otherwise - for example a search engine on locally cached
pages could be enhanced by modifying the search field on popular search
engines to a drop-down box displaying recently used key words (with
obvious privacy caveats).