Glamo Xrender Benchmark with Expedite
[ openmoko efl ]Yesterday I’ve been testing the xrender engine on evas using the current EXA acceleration found in glamo (that is: solid fills and surface blitting). Sadly the test was taking ages to finish and even after walking up and leaving it the whole night it didnt finish but hang on the text test.
So, i wanted to test just the glue found on XRender and the implementation of it using EXA, but without painting anything, just memory moves from system memory to VRAM and the neccesary logic found on the Evas’ Xrender engine. So I “removed” (just return TRUE) the functions from the xf86-video-glamo driver and …. here are the results:
Benchmark | Software X11 | XRender without painting |
Image Blend Unscaled | 2.76 | ???? |
Image Blend Solid Unscaled | 12.69 | 13.72 |
Image Blend Nearest Scaled | 1.56 | 18.14 |
Image Blend Nearest Solid Scaled | 8.77 | 18.00 |
Image Blend Smooth Scaled | 0.45 | 18.22 |
Image Blend Smooth Solid Scaled | 5.93 | 17.59 |
Image Blend Nearest Same Scaled | 5.02 | 21.26 |
Image Blend Nearest Solid Same Scaled | 22.05 | 17.73 |
Image Blend Smooth Same Scaled | 1.27 | 20.96 |
Image Blend Smooth Solid Same Scaled | 11.84 | 17.76 |
Image Blend Border | 0.51 | 1.83 |
Image Blend Solid Border | 6.67 | 1.97 |
Image Blend Border Recolor | 0.44 | 1.23 |
Image Quality Scale | 4.29 | 1.97 |
Image Data ARGB | 7.22 | 3.71 |
Image Data ARGB Alpha | 4.89 | 1.70 |
Image Data YCbCr 601 Pointer List | 6.54 | 3.16 |
Image Data YCbCr 601 Pointer List Wide Stride | 6.04 | 5.40 |
Image Crossfade | 6.67 | 4.61 |
Text Basic | 9.28 | 2.25 |
Text Styles | 1.05 | 0.17 |
Text Styles Different Strings | 0.79 | 0.14 |
Text Change | 5.64 | 1.86 |
Textblock Basic | 5.67 | 1.50 |
Textblock Intl | 4.67 | 2.46 |
Rect Blend | 1.81 | 9.66 |
Rect Solid | 9.57 | 18.02 |
Rect Blend Few | 69.84 | ????? |
Rect Solid Few | 84.22 | 61.79 |
Image Blend Occlude 1 Few | 41.09 | 196.75 |
Image Blend Occlude 2 Few | 24.00 | 47.37 |
Image Blend Occlude 3 Few | 17.50 | 70.32 |
Image Blend Occlude 1 | 43.26 | 26.20 |
Image Blend Occlude 2 | 14.59 | 14.03 |
Image Blend Occlude 3 | 4.87 | 21.06 |
Image Blend Occlude 1 Many | 27.31 | 12.14 |
Image Blend Occlude 2 Many | 6.81 | 4.61 |
Image Blend Occlude 3 Many | 2.21 | ???? |
Image Blend Occlude 1 Very Many | 3.79 | 1.54 |
Image Blend Occlude 2 Very Many | 0.66 | 0.43 |
Image Blend Occlude 3 Very Many | 0.36 | 0.58 |
Polygon Blend | 3.51 | 1.69 |
EVAS SPEED | 11.86 | 18.66 |
The results are very disappointing, there are several places where drawing on software is better than just doing the logic on XRender/EXA to achieve the same result but without drawing. And in the tests where XRender/EXA is better the speed up doesn’t worth as the drawing will be for sure slower. Note that the Glamo chip can only do raster operations into a destination surface of format RGB565, which means that there wont be any acceleration even if the blending is possible on hardware as Evas uses ARGB8888 premul.
Then, how to improve the speed of the rendering on Evas specifically for this chip? The path through XRender/EXA is worthless, is there any other way? Well. one possibility we could use, is to use the Evas’ software_16 engine (a destination surface of format RGB565) to reduce the bandwidth needed, but how to match that with the XRender API?
Another solution could be to leave the efforts on xf86-video-glamo acceleration and just build a specific Evas engine for glamo. Mmap the whole framebuffer memory and manage it through Eina’s memory pool manager, handle the surfaces ourselves and do a mix between software_16 and this specific engine. A lot of work, yes, but looks like the only solution (X away) that can give us some results. But there’s a problem, how to send the changes into the displayed X window? because in our engine we’ll use a VRAM backbuffer and we can’t know from a X client the phyisical memory of the area the window is being drawn. So we’ll have a roundtrip here, physical memory (our glamo surface) -> virtual memory (Xshm/X memory) -> physical memory (destination framebuffer), that for sure will remove any speedup.
Suggestions?