Archive for June, 2009

The quest for graphics performance: part I

lightspark-tech-demo2-revamp

Devel­op­ing and opti­miz­ing Lightspark, the mod­ern Flash player, I’m greatly expand­ing my knowl­edge and under­stand­ing of GPU inter­nals. In the last few days I’ve man­aged to find out a cou­ple of nice tricks that boosted up per­for­mance and, as nice side effects, saved CPU time and added features :-)

First of all, I’d like to intro­duce a bit of the Lightspark graph­ics architecture

The project is designed from the ground up to make use of the fea­tures offered by mod­ern graph­ics hard­ware. Namely 3D accel­er­a­tion and pro­gram­ma­ble shaders. The Flash file for­mat encodes the geome­tries to be drawn as set of edges. This rep­re­sen­ta­tion is quite dif­fer­ent from the one under­stood by GPUs. So the geome­tries are first tri­an­gu­lated (reduced to a set of tri­an­gles). This oper­a­tion is done on the CPU and is quite com­pu­ta­tion­ally inten­sive, but the results are cached, so over­all this does not hit performance.

More­over Flash offer sev­eral dif­fer­ent fill styles that should be applied on geom­e­try, for exam­ple solid color and var­i­ous kind of gra­di­ents. Lightspark han­dles all those pos­si­bil­i­ties using a sin­gle frag­ment shader, a lit­tle piece of code that is invoked on every pixel to com­pute the desired color. Of course the shader has to know about the cur­rent fill style. This infor­ma­tion along with sev­eral other para­me­ters could be passed with dif­fer­ent meth­ods. More on this on the next issue.

There is one pecu­liar thing about the shader though, let’s look at a sim­ple pseudo code:
gl_FragColor=solid_color()*selector[0]+linear_gradient()*selector[1]+circular_gradient()*selector[2]...;

Selec­tor is a binary array, the only allowed val­ues are zero or one. More­over only one value is one. This means that the cur­rent frag­ment (pixel) color is com­puted for every pos­si­ble fill style and only after­ward the cor­rect result is selected. This may look like  a waste of com­put­ing power, but it is actu­ally more effi­cient than some­thing like this:

if(selector[0])
       gl_FragColor=solid_color();
else if(selector[1])
       gl_FragColot=linear_gradient();
...

This counter intu­itive fact comes from the nature of the graph­ics hard­ware. GPUs are very dif­fer­ent from CPUs and are capa­ble of cruch­ing tons of vec­to­r­ial oper­a­tions blind­ingly fast. But they totally fall down on their knees when encoun­ter­ing branches in the code. This is actu­ally quite com­mon on deeply pipelined archi­tec­ture which misses com­plex branch pre­dic­tion cir­cuitry, not only GPUs but also num­ber crunch­ing devices and mul­ti­me­dia mon­sters such as IBM Cell. Keep this in mind when work­ing on such platforms.

, , , , , , ,

2 Comments

The quest for graphics performance: part I

lightspark-tech-demo2-revamp

Devel­op­ing and opti­miz­ing Lightspark, the mod­ern Flash player, I’m greatly expand­ing my knowl­edge and under­stand­ing of GPU inter­nals. In the last few days I’ve man­aged to find out a cou­ple of nice tricks that boosted up per­for­mance and, as nice side effects, saved CPU time and added features :-)

First of all, I’d like to intro­duce a bit of the Lightspark graph­ics architecture

The project is designed from the ground up to make use of the fea­tures offered by mod­ern graph­ics hard­ware. Namely 3D accel­er­a­tion and pro­gram­ma­ble shaders. The Flash file for­mat encodes the geome­tries to be drawn as set of edges. This rep­re­sen­ta­tion is quite dif­fer­ent from the one under­stood by GPUs. So the geome­tries are first tri­an­gu­lated (reduced to a set of tri­an­gles). This oper­a­tion is done on the CPU and is quite com­pu­ta­tion­ally inten­sive, but the results are cached, so over­all this does not hit performance.

More­over Flash offer sev­eral dif­fer­ent fill styles that should be applied on geom­e­try, for exam­ple solid color and var­i­ous kind of gra­di­ents. Lightspark han­dles all those pos­si­bil­i­ties using a sin­gle frag­ment shader, a lit­tle piece of code that is invoked on every pixel to com­pute the desired color. Of course the shader has to know about the cur­rent fill style. This infor­ma­tion along with sev­eral other para­me­ters could be passed with dif­fer­ent meth­ods. More on this on the next issue.

There is one pecu­liar thing about the shader though, let’s look at a sim­ple pseudo code:
gl_FragColor=solid_color()*selector[0]+linear_gradient()*selector[1]+circular_gradient()*selector[2]...;

Selec­tor is a binary array, the only allowed val­ues are zero or one. More­over only one value is one. This means that the cur­rent frag­ment (pixel) color is com­puted for every pos­si­ble fill style and only after­ward the cor­rect result is selected. This may look like  a waste of com­put­ing power, but it is actu­ally more effi­cient than some­thing like this:

if(selector[0])
       gl_FragColor=solid_color();
else if(selector[1])
       gl_FragColot=linear_gradient();
...

This counter intu­itive fact comes from the nature of the graph­ics hard­ware. GPUs are very dif­fer­ent from CPUs and are capa­ble of cruch­ing tons of vec­to­r­ial oper­a­tions blind­ingly fast. But they totally fall down on their knees when encoun­ter­ing branches in the code. This is actu­ally quite com­mon on deeply pipelined archi­tec­ture which misses com­plex branch pre­dic­tion cir­cuitry, not only GPUs but also num­ber crunch­ing devices and mul­ti­me­dia mon­sters such as IBM Cell. Keep this in mind when work­ing on such platforms.

, , , , , , ,

2 Comments

Lightspark second technical demo announcement

lightspark-techdemo2

I’m cur­rently fin­ish­ing some last cleanups and enhance­ments before releas­ing a sec­ond tech­ni­cal demo of the Lightspark Project. Much time is passed from the first demo, and the project is grow­ing healty. This release aims at ren­der­ing the fol­low­ing movie, selected from adobe demo. The results may not be very impres­sive. But many things are going on under the hood.

The most inter­est­ing fea­ture in this release are:

  • GLSL based ren­der­ing of fill styles (eg. gradients)
  • LLVM based Action­Script exe­cu­tion. Code is com­piled just in time
  • A few tricks are also played to decrease the stack traf­fic tipi­cal of stack machines.
  • First, although sim­ple, fram­er­ate timing
  • Frame­work to han­dle Action­Script asyn­chro­nous events. Cur­rently only the enter­Frame event works, as the input sub­sys­tem is not yet in place. But stay tuned, as I’ve some nice plan about that.

The code will be released in a cou­ple of more days, or at least I hope so :-)

, , , , ,

No Comments

Lightspark second technical demo announcement

lightspark-techdemo2

I’m cur­rently fin­ish­ing some last cleanups and enhance­ments before releas­ing a sec­ond tech­ni­cal demo of the Lightspark Project. Much time is passed from the first demo, and the project is grow­ing healty. This release aims at ren­der­ing the fol­low­ing movie, selected from adobe demo. The results may not be very impres­sive. But many things are going on under the hood.

The most inter­est­ing fea­ture in this release are:

  • GLSL based ren­der­ing of fill styles (eg. gradients)
  • LLVM based Action­Script exe­cu­tion. Code is com­piled just in time
  • A few tricks are also played to decrease the stack traf­fic tipi­cal of stack machines.
  • First, although sim­ple, fram­er­ate timing
  • Frame­work to han­dle Action­Script asyn­chro­nous events. Cur­rently only the enter­Frame event works, as the input sub­sys­tem is not yet in place. But stay tuned, as I’ve some nice plan about that.

The code will be released in a cou­ple of more days, or at least I hope so :-)

, , , , ,

No Comments

ActionScript meets LLVM: part II

Just a quick update. The nice tricks I’m play­ing to build a fast Action­Script VM using LLVM are now the topic of my bach­e­lor the­sis, the com­ple­tion of which will still need an hand­ful of months. If you are inter­ested in the devel­op­ment you may fol­low the git changelog here or con­tact me privately.

No Comments

ActionScript meets LLVM: part II

Just a quick update. The nice tricks I’m play­ing to build a fast Action­Script VM using LLVM are now the topic of my bach­e­lor the­sis, the com­ple­tion of which will still need an hand­ful of months. If you are inter­ested in the devel­op­ment you may fol­low the git changelog here or con­tact me privately.

No Comments