Archive: A question of speed.


6th July 2004 10:49 UTC

A question of speed.
When I build superscopes with individual coords for each points, I mostly use bnot() to do this.
Example: superscope has three points.
Coords are: x1, x2, x3 and y1, y2, y3
In perpoint textfield:
cnt=i*(n-1);
x=bnot(cnt)*x1+bnot(cnt-1)*x2+bnot(cnt-2)*x2;
y=bnot(cnt)*y1+bnot(cnt-1)*y2+bnot(cnt-2)*y2;

So I recently read that multiplications are not good for this, because they're too slow.
Are multiplication with 1 or 0 as slow as, let's say .693482*1.593782?
I mean 'whatever'*1 is 'whatever' and 'whatever'*0 is 0.
Is it faster to use IFs in this case?


6th July 2004 13:14 UTC

For only three points in a superscope, I don't think any optimization is really needed ;)

About multiplication, unless the compiler tries to make an optimization, the time needed to make a compute will be the same whatever the values multiplied.

For a real optimization, you should better use a series of if(), because AVS only executes the necessary code (in you example, it always executes 3 bnot() and 3 multiplications, using if() it can only execute from 1 to 3 bnot(), and that's all).


6th July 2004 16:24 UTC

why not use ifs?
i think they the fastest. for less than... less say 200 points you really dont have to worry if you use ifs or multiplications
:p
just dont use divisions =)


6th July 2004 16:41 UTC

Is there no case differentiation in the multiplication algo, that makes multiplication with whole numbers faster?
I prefer the bnot() method, because it's easier to read the source.
Having tons of if(b,if(1-c,if(a,0,d),1),5); things sux.
I gave up using division too.
Though /3 is shorter than *.333. :)


6th July 2004 22:35 UTC

if you want easy to read just do this:
frame:
a=0;
point:
a=a+1;
x1=if(equal(a,1),point position on x axis,x1);
y1=if(equal(a,1),point position on y axis,y1);
x1=if(equal(a,2),point position on x axis,x1);
etc
x=x1;
y=y1;


7th July 2004 20:45 UTC

I've been experimenting along these lines & I've been using i & if statements.

For your scope, you have 3 points, so n=3. The i "points" are located at 0, .5 and 1

With this in mind, using a if statement short cut, you can skip the equal or above conditions & just use values of i.

I'm having trouble explaining it, but in the end you'll come up with:

x=if(i,if(i-.5,x3,x2),x1);
y=if(i,if(i-.5,y3,y2),y1);

I've done larger statements like these where n=9;

x=if(i,if(i-1/8,if(i-2/8,if(i-3/8,if(i-4/8,if(i-5/8,if(i-6/8,if(i-7/8,x9,x8),x7),x6),x5),x4),x3),x2),x1)


7th July 2004 21:09 UTC

Looks interesting nemo, but get rid of all those needless divisions. Makes it much slower.

1/8 = 0.125
2/8 = 0.25
3/8 = 0.375
etc


8th July 2004 12:46 UTC

Originally posted by your-dentist
Is there no case differentiation in the multiplication algo, that makes multiplication with whole numbers faster?
By "whole numbers" I think you mean integer values (like 1 or 3, but not 1.5 or 3.05). Integer computes are faster than floating point computes (I'm not so sure with the actual processors), but these are two types of data stored differently in memory, and then, they use different instructions.
Unfortunately, AVS only uses floating point numbers, so, 0, 1, 12546.4567862 are all the same ... ;)

8th July 2004 16:31 UTC

Originally posted by Raz
Looks interesting nemo, but get rid of all those needless divisions. Makes it much slower.

1/8 = 0.125
2/8 = 0.25
3/8 = 0.375
etc
Of course, but I wanted to point out the relationship between the fractions and n.

Although, are you sure this is correct? I remember somebody mentioning how 1/3 is faster than .3333. Maybe I'm wrong.

8th July 2004 21:50 UTC

Anthoer way using megabuf().
I doubt that it's faster, because it needs to access the cache.

All coords are put into the megabuf.
Like this:
assign(megabuf(0),x1);assign(megabuf(1),x2);
assign(megabuf(0+yoffset),y1);assign(megabuf(1+yoffset),y2);

Then used like this:
Per point:
cnt=i*(n-1);
x=megabuf(cnt);
y=megabuf(cnt+yoffset);

The var 'yoffset' can be chosen freely, though it must be bigger than the amount of individual point coords.


9th July 2004 16:28 UTC

Originally posted by NemoOrange
Although, are you sure this is correct? I remember somebody mentioning how 1/3 is faster than .3333. Maybe I'm wrong. [/B]
single blank (even no n=1 foo) ssc, code in frame part window size at 256x256

loop(2000,
loop(2000,
assign(a,a/3)
)
);

8.8 FPS

loop(2000,
loop(2000,
assign(a,a*.333)
)
);

11.1 FPS

yes, it does make difference


[edit]
@yourdentist, not that new idea, apparently slower too.
Its realyl useful in TomyLobos triangle ape, if it gets released youll see my examples and way of using gmegabuf :p

9th July 2004 17:57 UTC

1/3 is more accurate than .3333, but not faster

*however*

unless you're going over a division 4,000,000 times, you're not going to lose 3.3 fps over it. and it would take running over a division 100,000 times to even lose .1 fps. it's really not that big of a deal, and in many cases using the division is a lot more convenient than the decimal.
so jaak, for the love of god, stop obsessing over divisions. there are more important optimisations.


your-dentist: avs is really quite shitty and not well optimised, and afaik x*1 and x*0 are no faster than x*1.23456789. the bottom line is, ifs are faster than mults, and the difference in speed is not quite as negligible as that between a division and a multiplication.
(so yeah, this is one of those important optimisations)


10th July 2004 09:13 UTC

Originally posted by Atero
so jaak, for the love of god, stop obsessing over divisions. there are more important optimisations.
yes, it does't matter *that* much, but in quite heavy scopes it will matter, besides it allways feels good to code optimized code :p

13th July 2004 20:44 UTC

about the speed difference between using megabuf() and if/equal/bnot/whatever:
megabuf() is most probably faster in terms of memory access because it accesses memory only 4 times:
1. fetch the index (if it isnt already in the fpu from previous calculations) (prolly fld [e*x])

2. store the index to a temporary integer (fist [ebp+/-something])

3. put the index into a cpu register (mov e*x, [ebp+/-something])

4. get the value from the megabuf using that index and the base address of the megabuf (fld [e*x+e*x])

that would be a complexity of 4 which is a constant cost while using if()s, equal()s or bnot()s would require a linear amount of time which would be slower if you have more than 3 or 4 or so...


14th July 2004 15:42 UTC

Sounds plausible.
So for high amounts of points, using megabuf() is faster.
Whoa, you really seem to have become an expert at AVS code!


17th July 2004 15:40 UTC

a little ASM knowledge helps ;)


28th July 2004 20:01 UTC

btw, if you want to know whats fastest you should get the winamp sdk and look at the evallib source. the actual functions are in nseel-cfunc.c, the pro/epilogs are in the nseel-addfuncs.h file

its all pretty interesting but you need to be a real tech head to understand 90% of the rest of it cos its written in c and asm and in a code style almost totally unreadable to me (and therefore by logical extension everyone :P).