<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Productive Waste of Time: Gradients and Altivec</title>
	<atom:link href="http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/</link>
	<description>On Mac OS X programming</description>
	<lastBuildDate>Sat, 07 Jan 2012 04:23:42 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: mr_noodle</title>
		<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/comment-page-1/#comment-15</link>
		<dc:creator>mr_noodle</dc:creator>
		<pubDate>Fri, 06 Oct 2006 15:09:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-15</guid>
		<description>Thanks for the very informative comment. So it seems I was right on both counts. My test was sub-optimal but in the end it probably doesn&#039;t matter. I probably won&#039;t be spending much more time on this (the VRSAVE suggestion is beyond the scope of my knowledge on the subject at this point), but in the end, it&#039;s always good when I can learn from my failed experiments.</description>
		<content:encoded><![CDATA[<p>Thanks for the very informative comment. So it seems I was right on both counts. My test was sub-optimal but in the end it probably doesn&#8217;t matter. I probably won&#8217;t be spending much more time on this (the VRSAVE suggestion is beyond the scope of my knowledge on the subject at this point), but in the end, it&#8217;s always good when I can learn from my failed experiments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: alexr</title>
		<link>http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/comment-page-1/#comment-13</link>
		<dc:creator>alexr</dc:creator>
		<pubDate>Fri, 06 Oct 2006 00:53:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.noodlesoft.com/blog/2006/10/05/productive-waste-of-time-gradients-and-altivec/#comment-13</guid>
		<description>The faster way to do this function would be to either know that in is aligned and do one load, or do do two loads and a vec_perm(vec1,vec2,vec_lvsl(0,in)) to get the misaligned data. Loading four floats and storing them to memory, then reloading as a vector is slower.

You need to &quot;Show Assembly Code&quot; from Xcode to see that the compiler is generating the instructions you expect. Pointer aliasing and other extra memory operations become evident this way.

As you&#039;ve noted, the real problem here is that the overhead of the callback mechanism swamps the work done in the function. If you can be 100% sure that CG calls this function back in the same thread as you, you could compile the function to not update VRSAVE in the prolog/epilog and then set VRSAVE to 0xFFFFFFFF before calling CG to run the function. That would save a few more instructions, but the overhead of the callbacks is probably still too high.</description>
		<content:encoded><![CDATA[<p>The faster way to do this function would be to either know that in is aligned and do one load, or do do two loads and a vec_perm(vec1,vec2,vec_lvsl(0,in)) to get the misaligned data. Loading four floats and storing them to memory, then reloading as a vector is slower.</p>
<p>You need to &#8220;Show Assembly Code&#8221; from Xcode to see that the compiler is generating the instructions you expect. Pointer aliasing and other extra memory operations become evident this way.</p>
<p>As you&#8217;ve noted, the real problem here is that the overhead of the callback mechanism swamps the work done in the function. If you can be 100% sure that CG calls this function back in the same thread as you, you could compile the function to not update VRSAVE in the prolog/epilog and then set VRSAVE to 0xFFFFFFFF before calling CG to run the function. That would save a few more instructions, but the overhead of the callbacks is probably still too high.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

