iPhone 3D Engine Development


EDIT: This article was originally published on my personal blog in January ’09.  Since then, UtopiaGL has been further developed and used in most of the work we’ve done here at Ideal Binary.  I’m republishing this article (and its follow-up on performance) here to serve as a base for future posts on UtopiaGL.

Last month I completed my middleware 3D engine, UtopiaGL for iPhone.  I’d never actually used a Mac before this project and I wasn’t sure what kind of effort would be required to get up to speed on OSX and the Mac development process (short answer – about 2 hours of effort!).  Here are some notes on my experience.

The Engine: UtopiaGL

It’s a C++ OpenGL ES, Shader-based engine and tool chain similar in feature-set to PC Games from around 5 years ago.  All OpenGL ES features supported by the iPhone are exposed through the Shader system, so you can do anything the PVR hardware can do.  The pipeline is optimized according to the PVR recommendations, as well as my own testing.  The helper tools include a Shader Compiler, Font Compiler, Model Compiler, 3DS Exporter and State Machine Generator.


The core engine and tools took 8 weeks to write.  I wrote it in C++ on Vista using an OpenGL ES emulation layer.  I’m very familiar with Visual Studio – Xcode is cool, but at the time I just wasn’t comfortable with it so I battled on in Windows.  I wrote the engine to be portable across OS platforms so it has a System layer to abstract away the Platform specifics.

Once I had the engine ready and running well on Windows I moved it over to Xcode and rolled up my sleeves, expecting quite a bit of work.  Amazingly, it only took a few hours to get working, and most of that was learning the ins and outs of Xcode.  When I ran it on a device using a test scene it initially ran at 30 fps.  I was pretty disappointed at that speed given the optimizations that were being used.  I took a quick look over the code and saw that the Framebuffer was being created as a 32bit RGB buffer – I had just taken the setup code from the OpenGL ES Hello World app to get up and running.  I set it to RGB565 and sure enough it shot up to 60 fps.  It does drop down to 40/45 fps when you turn on Framebuffer effects (like Bloom glow).

EDIT: After a few weeks of further optimizations and testing I’ve managed to squeeze quite a bit more GL performance out of the device (to my sheer astonishment).  For example Framebuffer effects now run at a constant 60 fps, details coming soon…


I wasn’t sure what to make of Xcode initially.  After getting used to it I’m quite happy with it as a development environment.  There are a few things that feel a bit minimalist about it – it doesn’t feel as advanced as Visual Studio, but at the same time, you do feel like you’re getting more signal and less noise, considering what’s there.  (Actually Mac OSX in general feels quite minimalist, in a good way.) One thing I find annoying is there is no clean, simple way of including one iPhone project in another (this has been the single biggest influence on my view of Xcode).  VS does this extremely well with ‘Solutions’ which wrap Projects.  Obviously when you’re writing a middleware engine this is something of a requirement.  Right now I’m including all the engine source directly in the Application projects in Xcode, unlike in Visual Studio, where they have a neat Application Solution that simply includes the engine library project.  You can get Xcode to play ball, and a few people have blogged about it, but I’m not happy with the way it works – it should be a trivial thing and it certainly is not – it involves many, many steps, when it should require only one: inclusion of a reference to the library project.

EDIT: When I wrote the above, there was no Static Library project template in Xcode, and I didn’t know enough about the process on Mac to set it up quickly.  The Static Library project template does exist now (and has done for a while) and works extremely well, allowing exactly the same library project dependency set-up I was used to in Visual Studio.  Thanks to Simon for pointing me towards this in the comments to this post.

Objective C

If you haven’t written Objective C before, it can look a bit weird at first if you’re coming from a more traditional C derived language background.  It is growing on me, and I really do like the fact that the Apple Framework APIs look super clean.  I just have one problem with it – the way parameters are interleaved with Message names.  The idea is sound, but I don’t like the way it’s done in Objective C.  Many languages have done it well and they do it by allowing you to name parameters as you pass them, like MessageName( FirstParam=47, SecondParam="Hello" ).  The reason I’m not keen on the Objective C way of doing it is I find I need to stare at the code to extract the meaning – it takes effort.  This is something I’ll probably get used to, but right now it’s a bit annoying.  I think it’s that there are no brackets delineating the parameter list – my eyes are scanning for it and can’t find it, making it a bit jarring.  If I had been proficient with Objective C and moved to C++ I might find C++ equally jarring.  I’ve learned many new languages over the years, but I’ve never found any so odd – not even Lisp or Prolog.  Maybe it’s just me.

In any case, I was able to avoid Objective C almost entirely.  The only Objective C is in the Platform code to set things up, drive the message pump, and provide implementations of system functions like getting the App path.  Maybe 200 lines of trivial code.  Everything else is portable C++.

Memory Management

Garbage Collection has been disabled on the iPhone (a good thing – full blown Garbage Collection is expensive, and arguably a nonsense feature to have on a mobile, power constrained device). I wrote 3 memory management systems for UtopiaGL, all very simple, and very specialized.  The first is for the core engine back-end which is very light, extremely fast and results in zero fragmentation.  It’s not Garbage Collected but it does allow you to nail memory leaks immediately.  You can also use it in the front-end Application code for making gross allocations, like loading large App-specific data.  The second one is exclusively for the Geometry processing engine and is super simple, and extremely fast – it’s an allocate-and-forget system designed for permanent allocations.  It gets zapped entirely at the end of each scene render.  Lastly, there’s the client memory manager (the client being an Application written using the engine).  This is designed exclusively for the front-end client code to use and is Garbage Collected (reference counted, but detects cyclic refs).  It’s also extremely fast (nowhere near as expensive as power-hungry Mark and Sweep GC for example), and is actually a relatively tiny piece of code.  When objects are no longer referenced they are deleted immediately unlike traditional GC.  Objects allocated with this system are only visible through reference pointer objects and reference pointer array objects (like in Java or C#).  You can create arrays (even multi-dimensional arrays) of object references that behave exactly like the Java or C# equivalents, complete with a .length member per dimension.  You never have to explicitly delete anything, space permitting. All allocations go through the ‘placement’ new operator, so it is very comfortably integrated with C++.

Touch UI and the Accelerometer

These are the most interesting new features you get to play with.  The Accelerometer is impressive – it’s extremely sensitive, way more than I was expecting.  It provides a 3 vector with force in each spatial dimension
– simple and to the point.  You can control the rate at which the accelerometer feeds samples to your app, but for most interactive apps you’ll want that to be as close to 60Hz as possible.  In order to extract meaningful information from the data coming from the accelerometer you will eventually need to apply some kind of filtering.  Some knowledge of Digital Signal Processing is very useful here, but not required (a High or Low Pass filter can be written in a few lines of code and is very straight forward to understand).  The other input method is of course the Touch interface.  Your app gets a handful of messages informing it of touch events (when they start, move, end etc).  Each individual touch is tracked, so if you can imagine pressing your finger on the screen, that creates a ‘Touch’ object, if you move your finger that particular Touch object gets updated with new position information and a ‘Phase’ field that reflects the current phase of the touch event: Began, Moved, Ended etc.  When you finally lift your finger, the Touch object gets it’s Phase set to Ended and after you have been informed about it, the Touch object gets recycled by the system.  One Touch object exists for each point of contact on screen and is updated as its point of contact changes.  Tap events are modeled as touches also, with the tapCount field indicating the number of taps that have occurred.

I have abstracted Touch events essentially identically in UtopiaGL, except each Touch can be identified with an ID. UtopiaGL Touches can track several seconds of movement (you can configure this), unlike the raw events which just give you a current and previous position.  This eases the burden on Gesture Recognition somewhat.

UtopiaGL has it’s own event system, so in the platform specific code, native events are translated and fed to UtopiaGL’s system object in a format it recognises.  From there, they are distributed to the rest of the engine in an entirely platform independent way.

OpenGL ES on iPhone and iPod Touch

The features exposed by the PVR hardware are excellent.  There was one disappointing omission and that was the Vertex Program extension.  The PVR hardware supports Dot3 blending which means you can do Bump/Normal mapping.  Unfortunately, without the Vertex Program extension you are forced to enlist the CPU if you’re doing Tangent Space bump mapping, with a matrix multiply per vertex.  If you’re doing Object Space bump mapping you don’t need to do that, but you are stuck with rigid models that can’t deform without breaking their lighting.  Another reason why I was hoping the VP extension would be exposed is to supply my own lighting equation.  The standard OpenGL ES lighting system is expensive.  All you need in most cases is a very trivial ambient + diffuse lighting equation.  Without the VP extension you either bite the bullet and use the Standard OpenGL ES lighting model (which allows you to store your geometry in video ram) or write your own simplified lighting code and upload the vertex colors per lighting change.  If your models require CPU work, for example if you’re applying some kind of CPU-based deformation to them then it may make sense to implement the lighting on the CPU and upload everything in one go.  If your geometry is entirely static, then it may make sense to just use the OpenGL ES lighting model.  It’s not a clear cut situation. Right now I’m using the OpenGL ES lighting pipeline for all lighting, but I have left a stub for CPU based lighting – I’ll be experimenting with this shortly.

EDIT: The shared memory model on the iPhone and the fact that the VBO extension offers no speed up really mean you’re faced with a more level playing field: you just operate off system memory vertices.

The Framebuffer (FBO) extension is supported, and actually the primary way you render to screen.  The FBO extension opens up a wealth of possibilities (using render-to-texture) and I was delighted to see it on the device.

Performance is excellent, above what I was expecting from a non-dedicated games device. 

UtopiaGL Shaders

The Shader system is a pass-based renderer, which is configured by a compiled shader script.  The compiled shaders are tiny, usually around 150 bytes.  The reason I compile them offline is to remove the burden of parsing them on-device at run time.  They’re not a million miles from Quake3 shaders, but they give lower-level control, for example you can fully control multi-texture (2 TMUs are expected) and the texture combiners.

I wrote a packing system for vertex attributes to minimize VGP and CPU cache misses.  It’s very straight forward and works like this: Shaders expect certain attributes to be present in a vertex in order to execute, e.g. if a shader does multi-texture mapping and lighting then it needs XYZ, Normal, TC0 and TC1 attributes.  You have two options when representing these attributes in ram.  You can either have a Structure of Arrays (SOA) or you can have an Array of Structures (AOS).  SOA is conceptually easier.  Essentially you have an array of XYZs per vertex, an array of Normals per vertex and so on.  This has the advantage that if you are performing CPU-based deformation on any attribute array, you limit cache misses.  You pay for it when the GPU gets to work though, because the CPU cache is under-utilized as it pulls vertex information from very different locations in ram.  The alternative is to interleave the attributes ala AOS.  In this case you have and array of vertex structures, each of which has the XYZ, Normal, TC0 and TC1 attributes.  This means you under utilize the cache if you need to perform CPU-based deformation on any attribute, but it means the cache is well utilized when the GPU is pulling in vertices.  My solution to the problem was to not use either method, but a hybrid of both.  Any static, non-changing attributes get interleaved into a per-vertex structure.  This doesn’t incur any cache abuse w.r.t the CPU because it never looks at them and the GPU maximizes cache hits as it pulls in the vertices.  Any volatile attributes that need to be processed by the CPU are arranged into arrays for quick processing and I take the cache miss on the GPU end which is essentially limited to those specific attributes.  You get the best of both worlds.

Before the geometry ever gets to the Shader system, I apply a reordering algorithm to both the triangles and then the vertices to ensure maximum VGP cache usage.  It’s a fast process and is performed offline in a Model Compilation tool which is part of the engine tool-chain.

EDIT: There seems to be an advantage to using strips with a call to glDrawElements: see my more recent post on gl performance.

One thing I found odd about writing the shader code was the PVR compressed texture support.  PVR compressed textures appear flipped along the y axis – they load upside down!  The apparent reason for this is to maintain consistency with the render-to-texture support.  That doesn’t make any sense to me!  Anyway, our lives are now slightly harder – that’s how it is, so unfortunately you need to add an entirely redundant step into your content build process to manually flip textures before PVR-compressing them, or resort to other hacks as you load the textures. Bad smell.  Apart from that oddity, the compression is excellent and the quality is likewise impressive.

EDIT: Since writing this, the PVR utilities were updated to allow you to parametrically flip-on-compress, which pretty much removes the issue.


iPhone development has been fun so far.  I’ve done a lot of work in mobile game development, and the iPhone is easily the best thing I’ve ever experienced in a mobile device.  I’ll be submitting my first appl
ications built using UtopiaGL to Apple soon, with a little luck.  Time permitting, I will blog about specific aspects of the engine in detail and iPhone development in general.

Filed under: , , .

16 Responses to iPhone 3D Engine Development

  1. Paul says:

    Great !

    I’m beginner for game engine and shading lang. Please let me know how to study about shading lang for powerVR MBX ~. I strive to find what lang to use in powerVR MBX, but there is no info~.


  2. Kevin Doolan says:


    OpenGL ES 1.1 as it appears on the iPhone does not have any shader extensions (Vertex or Fragment Programs) although if you look at Instruments while profiling on the device you can see that a flavour of the Vertex Program extension appears to be present! – Maybe they’ll expose it in a future update…

    I wrote my own shader language for UtopiaGL, which is really just a simple configuration script that allows you to parameterise the OpenGL render state. There are no conditionals, loops etc. Ultimately it executes by setting the GL state i.e. issuing calls to glEnable/Disable, and the various other state functions.

    If you haven’t done so already, hit imgtec.com and pull down the PVR OpenGL ES 1.1 SDK – it contains tutorials and examples that will introduce you to the various concepts. You can also check out the 2.0 SDK, which will allow you to play with real Vertex and Fragment programs – although you won’t be able to run those on the current iPhone of course.

  3. Dennis says:

    Interesting read!

    I’ve got a question: Without vertex programs, how do you renormalize the light vector in tangent space for use with dot3 bumpmapping? In the fixed function days I remember doing this with a renormalization cubemap, but it seems cubemaps are not supported on the iPhone(?)

  4. Dennis says:

    Sorry, I meant: without fragment programs…

  5. Kevin Doolan says:

    I haven’t looked into it yet. I have previously used two methods: Cube maps as you mentioned and an nvidia register combiner trick that approximated renormalization but didn’t require any additional maps (I can’t remember exactly how it worked, except that it was trivial – I found it in an nvidia paper, I think circa 2002). It required full blown nvidia style combiners though. I’m not sure if it’s doable on the iPhone.

    The absence of cubemaps really complicates the general case. You either bite the bullet and suffer the artifacts (which can be awful, depending on the geometry) or you try to invest in some kind of multi-pass hack to patch things up, which may kill performance.

    There are of course cases where it’s not an issue – flat, planar geometry like walls, together with parallel light. No renormalization is required here.

  6. Dennis says:

    I think I’ll stick to vertex lighting then. In the scenes I will be using (e.g. point lights close to large planar surfaces) the artifacts are unacceptable.

    I’m still interested in how to do the approximated renormalization trick with the combiners though. :)

  7. Kevin Doolan says:

    If you’re coming from PC development (with a nice big screen) it makes sense to sit back and re-evaluate exactly how to make use of the screen space available on the iPhone.

    I have found the normal mapping doesn’t provide the same kind of impact that it does on a large screen. That said, it always depends on the app. If you’re rendering with point lights and large planar surfaces you could subdivide them – this can be extremely effective if it’s done correctly.

    Re your interest in the renormalization trick: Since I mentioned it I have been trying to locate it, with no joy unfortunately. I’ll keep my eyes open (I’m sure it’s buried on the nvidia dev site somewhere). If I locate it I’ll post it here…

  8. Patrick says:

    Hi again Kevin,

    I was hoping to quiz you a little on your experiences with FBO effects. I was playing around with it a little the other day trying to see if I could do a simple bloom filter. Performance was okay, although it doesn’t look all that great. I was wondering what kind of optimizations you found and how exactly you were implementing a bloom filter. I just tried rendering to a FBO with an attached texture (fairly small, 128 x 128 or such like) and then drawing it with an additive blend over the rendered scene. What comes out looks interesting enough, although it’s fairly blocky, even with linear filtering. I was pondering trying to read the pixels out and doing a blur on the CPU or something, but that would be incredibly slow I think. I could try and generate mipmaps and draw each mipmap over each other, but a fullscreen draw is actually quite time intensive, never mind doing it several times over.

    Any thoughts?

  9. Kevin Doolan says:

    I’ve had pretty much the same experience re quality. My conclusion with the work I’ve done on this is: Less is More! If you reduce the overall intensity of the bloom, and keep it subtle, you can mask the blockiness somewhat.

    I also settled on 128×128 – that seems to be the sweet spot.

    I tried a few different things, namely rendering the scene to a 256×256 texture, then applying the bloom to that, and finally rendering the result to the screen – this obviously has pretty big quality implications (so it’ll depend largely on your app as to whether it is acceptable) but it is fast and may be useful in some scenarios.

    Re doing CPU based filtering – I haven’t tried it, but I’m not sure it will help all that much. I agree – I think it would just kill performance.

    Not having the Fragment Program extension puts a pretty low ceiling on what you can do here, I think. It’s still possible to pull of something that looks nice, but you have to approach it from an app-specific point of view. I have found that full scene bloom doesn’t look as well as a contrained approach where only some of the scene actually contributes to the bloom effect – so when you’re rendering the bloom effect, you render contributers normally (or as white), and render everything else black. That way only certain parts of the scene glow – this helps to contain the artifacts.

    My current app doesn’t use it (at least not right now), so I haven’t played around with that much.

  10. Patrick says:

    Ah well. I guess we’ll have to hope the next generation iPhone is OpenGL ES 2.0 :) Thanks!

  11. Gable says:

    Hi Kevin,

    it’s great to read your articles about OGL ES and iPhone dev. I would be interested in some raw power benchmarks from your side.. currently I’m displaying a mesh with ~8500 triangles, using vertex, texcoord, normal arrays and vertex indices. One draw cycle happens in between 25-35 msecs.. I’m curious where it fits regarding an ‘optimized’ scale. Should I consider this slow and work on improving it? I’m not doing any shader trick, I’m just interested now in pure triangle-rendering perf.


  12. purpledog says:


    I can’t get the FBO working on the iphone. Basically the only thing I manager to render into is the main FBO, nothing else.

    Did you succeed in allocating and using a brand new FBO (let says, 128×128)?


  13. Kevin Doolan says:


    I haven’t spent a huge amount of time benchmarking – my initial optimizations were derived from past experience (mostly from PC and a little from BREW), the Apple/PVR docs and some brute force testing. At the end of my current Apps dev cycle I’ll look at optimizing again if necessary – I’ll let the App tell me where to optimize. Right now I’m not Fill or TnL limited, so I’m not focused on the graphics side.

    8500 tris at 30ms sounds in the ball park – I did a quick test with one of my scenes and bumped some of the procedural geometry meshes up to ~8500 – the performance dipped to 30fps, which is in line with what you have. I’m doing other stuff though so it’s not a controlled test. I’m also mostly pre caching light.


    FBO works fine for me – I’ve tested up to 512×512, no problems.

  14. Simon says:

    Just as an FYI – to reference one xcode project within another, just drag the project icon (letter A on a blue background) from the first project (ProjA) to a directory within the ‘Groups and Files’ section of the second project (ProjB). Xcode will reference (or copy, at your option) the project and any targets in ProjA become available to ProjB (as does dependency tracking) as if they were in ProjB.

    I have a games-server daemon project, a network packet library project, a network transport library project, a game-daemon project and two game-client (CLI and GUI) projects all linked up together. Works like a charm.


  15. Kevin Doolan says:


    Excellent, I’ll give it a shot. Thanks very much for the tip! Perfect timing too – I’m about to finish one app and start another, using the same core engine and tools.

  16. Pingback says:

    Pingback from iphonedevelopmentbits.com Comparison of Game Engines available for iPhone and iPod Touch Games Development | iPhone Development | iPhone Programming | iPhone Application Development | iPhone Game Development

Leave a Reply

Your email address will not be published. Required fields are marked *