<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Productive Waste of Time: Gradients and Altivec</title>
	<atom:link href="http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/</link>
	<description>On Mac OS X programming</description>
	<pubDate>Tue, 02 Dec 2008 12:24:07 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: mr_noodle</title>
		<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-15</link>
		<dc:creator>mr_noodle</dc:creator>
		<pubDate>Fri, 06 Oct 2006 15:09:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-15</guid>
		<description>Thanks for the very informative comment. So it seems I was right on both counts. My test was sub-optimal but in the end it probably doesn't matter. I probably won't be spending much more time on this (the VRSAVE suggestion is beyond the scope of my knowledge on the subject at this point), but in the end, it's always good when I can learn from my failed experiments.</description>
		<content:encoded><![CDATA[<p>Thanks for the very informative comment. So it seems I was right on both counts. My test was sub-optimal but in the end it probably doesn&#8217;t matter. I probably won&#8217;t be spending much more time on this (the VRSAVE suggestion is beyond the scope of my knowledge on the subject at this point), but in the end, it&#8217;s always good when I can learn from my failed experiments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: alexr</title>
		<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-13</link>
		<dc:creator>alexr</dc:creator>
		<pubDate>Fri, 06 Oct 2006 00:53:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-13</guid>
		<description>The faster way to do this function would be to either know that in is aligned and do one load, or do do two loads and a vec_perm(vec1,vec2,vec_lvsl(0,in)) to get the misaligned data. Loading four floats and storing them to memory, then reloading as a vector is slower.

You need to "Show Assembly Code" from Xcode to see that the compiler is generating the instructions you expect. Pointer aliasing and other extra memory operations become evident this way.

As you've noted, the real problem here is that the overhead of the callback mechanism swamps the work done in the function. If you can be 100% sure that CG calls this function back in the same thread as you, you could compile the function to not update VRSAVE in the prolog/epilog and then set VRSAVE to 0xFFFFFFFF before calling CG to run the function. That would save a few more instructions, but the overhead of the callbacks is probably still too high.</description>
		<content:encoded><![CDATA[<p>The faster way to do this function would be to either know that in is aligned and do one load, or do do two loads and a vec_perm(vec1,vec2,vec_lvsl(0,in)) to get the misaligned data. Loading four floats and storing them to memory, then reloading as a vector is slower.</p>
<p>You need to &#8220;Show Assembly Code&#8221; from Xcode to see that the compiler is generating the instructions you expect. Pointer aliasing and other extra memory operations become evident this way.</p>
<p>As you&#8217;ve noted, the real problem here is that the overhead of the callback mechanism swamps the work done in the function. If you can be 100% sure that CG calls this function back in the same thread as you, you could compile the function to not update VRSAVE in the prolog/epilog and then set VRSAVE to 0xFFFFFFFF before calling CG to run the function. That would save a few more instructions, but the overhead of the callbacks is probably still too high.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
