Archive for category Insane Projects

RFC: is OpenGL not the Right Thing (TM)?

It’s late at night and I’m back in Pisa for my last year of mas­ter. And, as often hap­pens, a weird idea struck my mind. What if OpenGL is not the right thing for Lightspark? No, I’m not talk­ing about drop­ping hard­ware accel­er­ated ren­der­ing as that’s surely the right way to go, but using OpenGL really looks unnat­ural. In the design of the advanced graph­ics engine OpenGL is being basi­cally used only to upload images ren­dered with cairo to the VRAM, and to blit and com­pos­ite all the ren­dered chunks together... do we really need all the OpenGL com­plex­ity to do this??

Well... OpenGL is basi­cally the only thing we have, the only way to talk with the graph­ics hard­ware. But, here it comes the gal­lium project! As gal­lium splits the API from the dri­ver it could be pos­si­ble, in the­ory, to write a spe­cial­ized gal­lium state tracker to do only the work we need... and maybe do it better.

I’m writ­ing here first because I’m not (yet) expe­ri­enced enough with the gal­lium plat­form to know if the idea is sane, and sec­ond because I some­how feel the same approach could be use­ful for other apps... for exam­ple Lightspark and com­posit­ing win­dow man­agers have sim­i­lar needs. So I’d like to have some feed­back about writ­ing a small API and a gal­lium state tracker to do:

  • DMA accel­er­ated trans­fers of ren­dered image data
  • Blit­ting and com­posit­ing of such data on screen
  • Notify the appli­ca­tion when asyn­chro­nous work (such as DMA trans­fers) has ended (BTW: what’s the right way of doing this in OpenGL?)
  • Enqueue to-be-uploaded-to-vram images and have them sequen­tially transfered
  • Apply sim­ple but pro­gram­ma­ble (shader-like) trans­for­ma­tion to pixel data

Big dis­claimer: I’ve not yet started work­ing on this idea. I’ve not even seri­ously though about it’s fea­si­bil­ity. I’d just like to have some feed­back on this.

Flattr this

, , ,

14 Comments

Lightspark 0.4.0 released

Just a quick update. I’ve released ver­sion 0.4.0 of Lightspark, a free flash player imple­men­ta­tion. This release was focused on improv­ing sta­bil­ity, so all the crashes found by many testers should be fixed now. Thanks a lot for test­ing, sev­eral issues were related to par­tic­u­lar graph­ics hard­ware and I would have never found them with­out your col­lab­o­ra­tion. Please keep test­ing and report­ing any issue.

Now focus shift on YouTube sup­port, which was lost after one of the last update of YouTube’s infra­struc­ture. And believe me, we’re not far! I’m attach­ing a screen shot of the cur­rent sta­tus (in GIT mas­ter) as a proof. Full sup­port will be deliv­ered with release 0.5.0

,

22 Comments

The quest for graphics performance: part II

I’d like to talk a bit about the archi­tec­ture I’ve using to effi­ciently ren­der the video stream in Lightspark. As often hap­pens the key in high per­for­mance com­put­ing is using the right tools for each job. First of all video decod­ing and ren­der­ing are asyn­chro­nous and exe­cuted by dif­fer­ent threads.

Decod­ing itself it’s done by the widely known FFM­peg, no spe­cial tricks are played here. So the start­ing con­di­tion of the opti­mized fast path is a decoded frame data struc­ture. The data struc­ture is short lived and it is over­writ­ten by the next decoded frame, so it must be copied to a more sta­ble buffer. The decod­ing thread main­tains a short array of decoded frames ready to be ren­dered, to account for vari­ance in the decod­ing delay. The decoded frame is in YUV420 for­mat, this means that res­o­lu­tion of color data is one half of the res­o­lu­tion of the lumi­nance chan­nel. The data is returned by FFm­peg as 3 dis­tinct buffers, one for each of the YUV chan­nels, so we actu­ally save 3 buffers per frame. This copy is nec­es­sary and it’s the only one that will be done on the data.

Ren­der­ing is done using a tex­tured quad and tex­ture data is loaded using OpenGL Pixel buffer objects (PBOs). PBOs are mem­ory buffers man­aged by the GL and it’s pos­si­ble to load tex­ture data from them. Unfor­tu­nately they must be explic­itly mapped to the client address space to be accessed, and unmapped when the updated. The advan­tage is that data trans­fer between PBOs and video or tex­ture mem­ory will be done by the GL using asyn­chro­nous DMA trans­fers. Using 2 PBOs it’s pos­si­ble to guar­an­tee a con­tin­u­ous stream of data to video mem­ory: while one PBOs is being copied to tex­ture mem­ory by DMA, new data is been com­puted and trans­ferred to the other using the CPU. This usage pat­tern is called stream­ing tex­tures.

In this case such data is the next frame, taken from the decoded frames buffer. Tex­tures data for OpenGL must be pro­vided in pla­nar form. So we must pack a 1-buffer-per-channel frame in a sin­gle buffer. This can be done in a zero-copy fash­ion using instruc­tion pro­vided by the SSE2 exten­sion. Data is loaded in 128 bit chunks from each of the Y, U and V chan­nels, then using reg­is­ter only oper­a­tions it is cor­rectly packed and padded. Results are writ­ten back using non-temporal moves. This means that the proces­sor may feel free to post­pone the actual com­mit­ment of data to mem­ory, for exam­ple to exploit burst trans­fers on the bus. If we ever want to be sure that the changes has been com­mit­ted in mem­ory we have to call the sfence instruc­tion. For more infor­ma­tion see the Intel ref­er­ence man­u­als on movapd, movntpd, sfence, pun­pcklb.

The result is a sin­gle buffer with the for­mat YUV0, padding has been added to increase tex­ture trans­fer effi­ciency, as the video cards inter­nally works with 32-bit data any­way. The des­ti­na­tion buffer is one of the PBOs so, at the end of the con­ver­sion rou­tine, data will be trans­ferred to video mem­ory using DMA.

Using the stream­ing tex­ture tech­nique and SSE2 data pack­ing we man­aged to effi­ciently move the frame data to tex­ture mem­ory, but it’s still in YUV for­mat. Con­ver­sion to the RGB color space is basi­cally a lin­ear alge­bra oper­a­tion, so it’s ideal to offload this com­pu­ta­tion to a pixel shader.

No Comments

Lightspark gets video streaming

Just a brief news. It’s been a long way, and today I’m very proud to announce video stream­ing sup­port for Lightspark, the effi­cient open source flash player. More­over, per­for­mance looks very promis­ing, I’m not going to pub­lish any results right now as I’d like to do some test­ing before. Any­way, Lightspark seems to be out­per­form­ing Adobe’s player by a good extent, at least on linux.

In the next post I’ll talk a bit about some per­for­mance tricks that made it pos­si­ble to reach such result.

, , , ,

1 Comment

Lightspark’s news

Lightspark pro­gresses are never been so good. The last achieve­ment was to cor­rectly load, exe­cute and par­tially ren­der the YouTube player. As you may have seen YouTube has recently switched to Flash 10 and Action­Script 3.0 to serve some HD con­tent, while keep­ing the old AS2 based player for lower qual­ity videos. The old player is sup­ported by Gnash but, until now, there where no open source alter­na­tives to play newer, high def­i­n­i­tion con­tent. As Lightspark AS3 engine matures, that gap is almost closed. Stay tuned, as I’m plan­ning to release a new tech­nol­ogy demo very soon.

UPDATE: Demo tar­ball released on source­forge

, ,

No Comments

Extreme FLEXibility

Although there has been no offi­cial news about Lightspark for sev­eral months, i’ve been doing a great deal of work under the hood. As my bach­e­lor the­sis, I’ve mostly com­pleted and throughly tested my LLVM based Action­script 3 .0 JIT engine and, dur­ing the last days, I’ve been work­ing on pol­ish­ing a bit the Vir­tual Machine. I’m proudly announc­ing that, in some days, a new tech­ni­cal demo of Lightspark will be released, but this time we’re not talk­ing about basic exam­ples. Lightspark is now mature enough to run a sim­ple appli­ca­tion based on Flex.

Flex is a rich open source frame­work writ­ten in Action­script and devel­oped mainly by Adobe. Even if right now the test appli­ca­tion fea­tures only a progress bar and a square, there is a lot of stuff being done by the frame­work under the hood.

If the frame­work works it means that the engine is now sta­ble enough to move from a pre-alpha to an alpha sta­tus. The design is also now sat­is­fy­ing enough for me to allow other peo­ple to join the project and work on on sub­sys­tems with­out know­ing the inter­nal details of every­thing. As an added bonus pre­lim­i­nary sup­port for the Win­dows plat­form will be included in the release.

The screen­shot above is the result of the exe­cu­tion of my test appli­ca­tion, for curi­ous peo­ple the flash file is gen­er­ated using the mxmlc com­piler, from the fol­low­ing source file

<?xml version="1.0" encoding="utf-8"?>
<mx:Application
xmlns:mx="http://www.adobe.com/2006/mxml"
horizontalAlign="center" verticalAlign="middle">
<mx:VBox x="0" y="0" width="201" height="200" backgroundColor="0x0080C0" alpha="0.8"/>
</mx:Application>

, , ,

No Comments

Lightspark second technical demo announcement

lightspark-techdemo2

I’m cur­rently fin­ish­ing some last cleanups and enhance­ments before releas­ing a sec­ond tech­ni­cal demo of the Lightspark Project. Much time is passed from the first demo, and the project is grow­ing healty. This release aims at ren­der­ing the fol­low­ing movie, selected from adobe demo. The results may not be very impres­sive. But many things are going on under the hood.

The most inter­est­ing fea­ture in this release are:

  • GLSL based ren­der­ing of fill styles (eg. gradients)
  • LLVM based Action­Script exe­cu­tion. Code is com­piled just in time
  • A few tricks are also played to decrease the stack traf­fic tipi­cal of stack machines.
  • First, although sim­ple, fram­er­ate timing
  • Frame­work to han­dle Action­Script asyn­chro­nous events. Cur­rently only the enter­Frame event works, as the input sub­sys­tem is not yet in place. But stay tuned, as I’ve some nice plan about that.

The code will be released in a cou­ple of more days, or at least I hope so :-)

, , , , ,

No Comments

ActionScript meets LLVM: part II

Just a quick update. The nice tricks I’m play­ing to build a fast Action­Script VM using LLVM are now the topic of my bach­e­lor the­sis, the com­ple­tion of which will still need an hand­ful of months. If you are inter­ested in the devel­op­ment you may fol­low the git changelog here or con­tact me privately.

No Comments

ActionScript meets LLVM: part I

One of the major chal­lenges in the design of lightspark is the Action­Script exe­cu­tion engine. Most of the more recent flash con­tent is almost com­pletely build out of the Action­Script tech­nol­ogy, which with ver­sion 3.0 matured enough to become a foun­da­tional block of the cur­rent, and prob­a­bly future web. The same tech­nol­ogy is going to become also wide­spread offline if the Adobe AIR plat­form suc­ceedes as a cross plat­form appli­ca­tion framework.

But what is Action­Script? Basi­cally it is an almost ECMAscript com­pla­iant lan­guage; the spec­i­fi­ca­tion cov­ers the lan­guage itself, a huge library of com­po­nents and the byte­code for­mat that is used to deliver code to the clients, usu­ally as part of a SWF (flash) file.

The byte­code mod­els a stack-machine as most of the argu­ments are passed on the stack and not as operands in the code. This oper­a­tional descrip­tion — although quite dense — requires a lot of stack traf­fic, even for sim­ple com­pu­ta­tions. It should be noted that mod­ern x86/amd64 proces­sors employ spe­cific stack trac­ing units to opti­mize out such traf­fic, but this is highly archi­tec­ture depen­dent and not guaranteed.

LLVM (which stands fot Low-Level Vir­tual Machine) is on the other hand based on an Inter­me­di­ate Lan­guage in SSA form. This means that each sym­bol can be assigned only one time. This form is extremely use­ful when doing opti­miza­tion over the code. LLVM offers a nice inter­face for a bunch of fea­ture, most notably sophis­ti­cated opti­miza­tion of the code and Just-In-Time com­pi­la­tion to native assemply.

The chal­lenge is: how to exploit llvm power to build a fast Action­Script engine.

The answer is, as usual, a mat­ter of com­pro­mises. Quite a lot of com­mon usage pat­terns of the stack-machine can be heav­ily opti­mized with lim­ited work, for exam­ple most of the data pushed on the stack is going to be used right away! More details on this on the next issue...

, , , , , ,

1 Comment

The Lightspark Project, a modern flash player implementation

lighspark-demo1When some months ago Adobe released the com­plete SWF file for­mat spec­i­fi­ca­tion I though that it would be nice to develop a well designed open source flash player. Now I’ve been work­ing for some time on this idea and I’ve recently relased the code on SourceForge.

The project objec­tives are quite ambi­tious, as the flash spec­i­fi­ca­tion are really com­plex. The project is designed to take advance of the feau­tures present on mod­ern hadr­ware, so it not sup­posed to run on older machines. All the graphic ren­der­ing is done using OpenGL and in the future pro­gram­ma­ble shaders will be used to offload even more cal­cu­la­tions on the GPU. Exten­sive mul­ti­thread­ing is employed to make use of mul­ti­core and hyper-threading proces­sors. I’ll write a more detailed post about some tricky and inter­est­ing part of the project soon.

, , ,

No Comments