EDIT: This article was originally published on my personal blog in January ’09. Since then, UtopiaGL has been further developed and used in most of the work we’ve done here at Ideal Binary. I’m republishing this article (and its follow-up on performance) here to serve as a base for future posts on UtopiaGL.
Last month I completed my middleware 3D engine, UtopiaGL for iPhone. I’d never actually used a Mac before this project and I wasn’t sure what kind of effort would be required to get up to speed on OSX and the Mac development process (short answer – about 2 hours of effort!). Here are some notes on my experience.
The Engine: UtopiaGL
It’s a C++ OpenGL ES, Shader-based engine and tool chain similar in feature-set to PC Games from around 5 years ago. All OpenGL ES features supported by the iPhone are exposed through the Shader system, so you can do anything the PVR hardware can do. The pipeline is optimized according to the PVR recommendations, as well as my own testing. The helper tools include a Shader Compiler, Font Compiler, Model Compiler, 3DS Exporter and State Machine Generator.
The core engine and tools took 8 weeks to write. I wrote it in C++ on Vista using an OpenGL ES emulation layer. I’m very familiar with Visual Studio – Xcode is cool, but at the time I just wasn’t comfortable with it so I battled on in Windows. I wrote the engine to be portable across OS platforms so it has a System layer to abstract away the Platform specifics.
Once I had the engine ready and running well on Windows I moved it over to Xcode and rolled up my sleeves, expecting quite a bit of work. Amazingly, it only took a few hours to get working, and most of that was learning the ins and outs of Xcode. When I ran it on a device using a test scene it initially ran at 30 fps. I was pretty disappointed at that speed given the optimizations that were being used. I took a quick look over the code and saw that the Framebuffer was being created as a 32bit RGB buffer – I had just taken the setup code from the OpenGL ES Hello World app to get up and running. I set it to RGB565 and sure enough it shot up to 60 fps. It does drop down to 40/45 fps when you turn on Framebuffer effects (like Bloom glow).
EDIT: After a few weeks of further optimizations and testing I’ve managed to squeeze quite a bit more GL performance out of the device (to my sheer astonishment). For example Framebuffer effects now run at a constant 60 fps, details coming soon…
I wasn’t sure what to make of Xcode initially. After getting used to it I’m quite happy with it as a development environment. There are a few things that feel a bit minimalist about it – it doesn’t feel as advanced as Visual Studio, but at the same time, you do feel like you’re getting more signal and less noise, considering what’s there. (Actually Mac OSX in general feels quite minimalist, in a good way.) One thing I find annoying is there is no clean, simple way of including one iPhone project in another (this has been the single biggest influence on my view of Xcode). VS does this extremely well with ‘Solutions’ which wrap Projects. Obviously when you’re writing a middleware engine this is something of a requirement. Right now I’m including all the engine source directly in the Application projects in Xcode, unlike in Visual Studio, where they have a neat Application Solution that simply includes the engine library project. You can get Xcode to play ball, and a few people have blogged about it, but I’m not happy with the way it works – it should be a trivial thing and it certainly is not – it involves many, many steps, when it should require only one: inclusion of a reference to the library project.
EDIT: When I wrote the above, there was no Static Library project template in Xcode, and I didn’t know enough about the process on Mac to set it up quickly. The Static Library project template does exist now (and has done for a while) and works extremely well, allowing exactly the same library project dependency set-up I was used to in Visual Studio. Thanks to Simon for pointing me towards this in the comments to this post.
If you haven’t written Objective C before, it can look a bit weird at first if you’re coming from a more traditional C derived language background. It is growing on me, and I really do like the fact that the Apple Framework APIs look super clean. I just have one problem with it – the way parameters are interleaved with Message names. The idea is sound, but I don’t like the way it’s done in Objective C. Many languages have done it well and they do it by allowing you to name parameters as you pass them, like MessageName( FirstParam=47, SecondParam="Hello" ). The reason I’m not keen on the Objective C way of doing it is I find I need to stare at the code to extract the meaning – it takes effort. This is something I’ll probably get used to, but right now it’s a bit annoying. I think it’s that there are no brackets delineating the parameter list – my eyes are scanning for it and can’t find it, making it a bit jarring. If I had been proficient with Objective C and moved to C++ I might find C++ equally jarring. I’ve learned many new languages over the years, but I’ve never found any so odd – not even Lisp or Prolog. Maybe it’s just me.
In any case, I was able to avoid Objective C almost entirely. The only Objective C is in the Platform code to set things up, drive the message pump, and provide implementations of system functions like getting the App path. Maybe 200 lines of trivial code. Everything else is portable C++.
Garbage Collection has been disabled on the iPhone (a good thing – full blown Garbage Collection is expensive, and arguably a nonsense feature to have on a mobile, power constrained device). I wrote 3 memory management systems for UtopiaGL, all very simple, and very specialized. The first is for the core engine back-end which is very light, extremely fast and results in zero fragmentation. It’s not Garbage Collected but it does allow you to nail memory leaks immediately. You can also use it in the front-end Application code for making gross allocations, like loading large App-specific data. The second one is exclusively for the Geometry processing engine and is super simple, and extremely fast – it’s an allocate-and-forget system designed for permanent allocations. It gets zapped entirely at the end of each scene render. Lastly, there’s the client memory manager (the client being an Application written using the engine). This is designed exclusively for the front-end client code to use and is Garbage Collected (reference counted, but detects cyclic refs). It’s also extremely fast (nowhere near as expensive as power-hungry Mark and Sweep GC for example), and is actually a relatively tiny piece of code. When objects are no longer referenced they are deleted immediately unlike traditional GC. Objects allocated with this system are only visible through reference pointer objects and reference pointer array objects (like in Java or C#). You can create arrays (even multi-dimensional arrays) of object references that behave exactly like the Java or C# equivalents, complete with a .length member per dimension. You never have to explicitly delete anything, space permitting. All allocations go through the ‘placement’ new operator, so it is very comfortably integrated with C++.
Touch UI and the Accelerometer
These are the most interesting new features you get to play with. The Accelerometer is impressive – it’s extremely sensitive, way more than I was expecting. It provides a 3 vector with force in each spatial dimension
– simple and to the point. You can control the rate at which the accelerometer feeds samples to your app, but for most interactive apps you’ll want that to be as close to 60Hz as possible. In order to extract meaningful information from the data coming from the accelerometer you will eventually need to apply some kind of filtering. Some knowledge of Digital Signal Processing is very useful here, but not required (a High or Low Pass filter can be written in a few lines of code and is very straight forward to understand). The other input method is of course the Touch interface. Your app gets a handful of messages informing it of touch events (when they start, move, end etc). Each individual touch is tracked, so if you can imagine pressing your finger on the screen, that creates a ‘Touch’ object, if you move your finger that particular Touch object gets updated with new position information and a ‘Phase’ field that reflects the current phase of the touch event: Began, Moved, Ended etc. When you finally lift your finger, the Touch object gets it’s Phase set to Ended and after you have been informed about it, the Touch object gets recycled by the system. One Touch object exists for each point of contact on screen and is updated as its point of contact changes. Tap events are modeled as touches also, with the tapCount field indicating the number of taps that have occurred.
I have abstracted Touch events essentially identically in UtopiaGL, except each Touch can be identified with an ID. UtopiaGL Touches can track several seconds of movement (you can configure this), unlike the raw events which just give you a current and previous position. This eases the burden on Gesture Recognition somewhat.
UtopiaGL has it’s own event system, so in the platform specific code, native events are translated and fed to UtopiaGL’s system object in a format it recognises. From there, they are distributed to the rest of the engine in an entirely platform independent way.
OpenGL ES on iPhone and iPod Touch
The features exposed by the PVR hardware are excellent. There was one disappointing omission and that was the Vertex Program extension. The PVR hardware supports Dot3 blending which means you can do Bump/Normal mapping. Unfortunately, without the Vertex Program extension you are forced to enlist the CPU if you’re doing Tangent Space bump mapping, with a matrix multiply per vertex. If you’re doing Object Space bump mapping you don’t need to do that, but you are stuck with rigid models that can’t deform without breaking their lighting. Another reason why I was hoping the VP extension would be exposed is to supply my own lighting equation. The standard OpenGL ES lighting system is expensive. All you need in most cases is a very trivial ambient + diffuse lighting equation. Without the VP extension you either bite the bullet and use the Standard OpenGL ES lighting model (which allows you to store your geometry in video ram) or write your own simplified lighting code and upload the vertex colors per lighting change. If your models require CPU work, for example if you’re applying some kind of CPU-based deformation to them then it may make sense to implement the lighting on the CPU and upload everything in one go. If your geometry is entirely static, then it may make sense to just use the OpenGL ES lighting model. It’s not a clear cut situation. Right now I’m using the OpenGL ES lighting pipeline for all lighting, but I have left a stub for CPU based lighting – I’ll be experimenting with this shortly.
EDIT: The shared memory model on the iPhone and the fact that the VBO extension offers no speed up really mean you’re faced with a more level playing field: you just operate off system memory vertices.
The Framebuffer (FBO) extension is supported, and actually the primary way you render to screen. The FBO extension opens up a wealth of possibilities (using render-to-texture) and I was delighted to see it on the device.
Performance is excellent, above what I was expecting from a non-dedicated games device.
The Shader system is a pass-based renderer, which is configured by a compiled shader script. The compiled shaders are tiny, usually around 150 bytes. The reason I compile them offline is to remove the burden of parsing them on-device at run time. They’re not a million miles from Quake3 shaders, but they give lower-level control, for example you can fully control multi-texture (2 TMUs are expected) and the texture combiners.
I wrote a packing system for vertex attributes to minimize
VGP and CPU cache misses. It’s very straight forward and works like this: Shaders expect certain attributes to be present in a vertex in order to execute, e.g. if a shader does multi-texture mapping and lighting then it needs XYZ, Normal, TC0 and TC1 attributes. You have two options when representing these attributes in ram. You can either have a Structure of Arrays (SOA) or you can have an Array of Structures (AOS). SOA is conceptually easier. Essentially you have an array of XYZs per vertex, an array of Normals per vertex and so on. This has the advantage that if you are performing CPU-based deformation on any attribute array, you limit cache misses. You pay for it when the GPU gets to work though, because the CPU cache is under-utilized as it pulls vertex information from very different locations in ram. The alternative is to interleave the attributes ala AOS. In this case you have and array of vertex structures, each of which has the XYZ, Normal, TC0 and TC1 attributes. This means you under utilize the cache if you need to perform CPU-based deformation on any attribute, but it means the cache is well utilized when the GPU is pulling in vertices. My solution to the problem was to not use either method, but a hybrid of both. Any static, non-changing attributes get interleaved into a per-vertex structure. This doesn’t incur any cache abuse w.r.t the CPU because it never looks at them and the GPU maximizes cache hits as it pulls in the vertices. Any volatile attributes that need to be processed by the CPU are arranged into arrays for quick processing and I take the cache miss on the GPU end which is essentially limited to those specific attributes. You get the best of both worlds.
Before the geometry ever gets to the Shader system, I apply a reordering algorithm to both the triangles and then the vertices to ensure maximum
VGP cache usage. It’s a fast process and is performed offline in a Model Compilation tool which is part of the engine tool-chain.
EDIT: There seems to be an advantage to using strips with a call to glDrawElements: see my more recent post on gl performance.
One thing I found odd about writing the shader code was the PVR compressed texture support. PVR compressed textures appear flipped along the y axis – they load upside down! The apparent reason for this is to maintain consistency with the render-to-texture support. That doesn’t make any sense to me! Anyway, our lives are now slightly harder – that’s how it is, so unfortunately you need to add an entirely redundant step into your content build process to manually flip textures before PVR-compressing them, or resort to other hacks as you load the textures. Bad smell. Apart from that oddity, the compression is excellent and the quality is likewise impressive.
EDIT: Since writing this, the PVR utilities were updated to allow you to parametrically flip-on-compress, which pretty much removes the issue.
iPhone development has been fun so far. I’ve done a lot of work in mobile game development, and the iPhone is easily the best thing I’ve ever experienced in a mobile device. I’ll be submitting my first appl
ications built using UtopiaGL to Apple soon, with a little luck. Time permitting, I will blog about specific aspects of the engine in detail and iPhone development in general.