Archive for the 'E15' Category

E15: Visual Effects and Texture Management

Saturday, March 22nd, 2008

picture-1.png

I’ve been spending a lot of time on implementing E15’s texture management. We wanted to have a way to upload textures asynchronously (lazy load) so that when we apply filters to the bitmaps, we don’t have the whole process blocking. This means we need threading, and I went through some rough waters with OpenGL and Cocoa threads.

The design is pretty simple. Whenever a new texture needs to be created, the bitmap is initially split into a series of tiles, then uploaded with mipmaps onto the GPU. For images that require filtering (with Core Image filters) we create a new thread that will create CIImage instances and applies a series of CIFilters. When the image is processed, the CIImage is converted back into bitmap, then sent to a singleton texture manager which periodically uploads the queue on the main thread. There were some confusing implementation details, and I’ll write them down after I describe the result.

So far, I implemented a couple different effects. The top picture shows blurring. Elements look blurry when viewed from far away, but becomes focused as you move close. Using this idea of zoomable user interface, I can show different information on the same quad depending on how far away the camera is. Here’s some captures showing various web pages.

picture-3.png

picture-4.png

picture-5.png

…and here’s the obligatory Facebook shot

picture-6.png

Now for some implementation notes. For starters, we need to remember that we need to run all OpenGL calls on the main thread. We will have to spawn a new thread for each texture we want to apply textures to, and have a singleton TextureUploader that will upload textures on the main thread. The real pain is to workout how the images should be constructed and processed when in a secondary thread.

I’ve tried a few different ways, and most of them resulted in unexpected crashes. The one I settled with is the following. During awakeOnNib, a static CIContext is created with the main CGContextRef:

CGContextRef cgContext = [[NSGraphicsContext currentContext] graphicsPort];
_drawingContext = [[CIContext contextWithCGContext:cgContext options:nil] retain];

This is the CIContext in which all the CIImages will be rendered to a CGImageRef. This method seems to be thread friendly.

In order to lazily create the filtered image, and uploaded; the initial bitmap texture that is uploaded is first converted to a CIImage. This image is sent to a new thread, which applies the CIFilters. The finished image will then be converted as a CGImageRef and drawn to a bitmap. The bitmaps are sent to a singleton class TextureUploader, which will upload textures every 500ms if there are any pending uploads. To convert CIImage to CGImageRef, just use this:

- (CGImageRef)cgImageCreateFromCIImage:(CIImage *)cimg fromRect:(CGRect)rect

{

    CGImageRef cgImage;

    cgImage = [_drawingContext createCGImage:cimg fromRect:rect];

    if (cgImage == NULL) {

        NSLog(@”failed to create cgImage”);

    }

    return cgImage;

}

This all seems simple, but it took a lot of trial and error. I hate threads…

Two OpenGL Optimizations

Wednesday, February 20th, 2008

In the last week, I’ve been working on optimizing E15 for speed and efficiency. So far I’ve implemented two ways to increase performance: texture tiling and frustrum culling. I’ll start with texture tiling.

Texture Tiling

We were using non power-of-two (POT) textures in E15, since they are supported, and it makes everything easy since source images for our textures don’t necessarily come in POT. For most images, this is fine, since they are small and manageable. With OpenGL 2.1, most things work with non POT. Performance issues arise when you have large textures, in our case rendering web pages. When web pages get turned into a bitmap, they become huge. Blogs are especially large, easily reach 15,000 pixels high. Of course these textures are too large for OpenGL, and so we decided to go back to POT and tile the images by subdividing them in multiple textures applied to multiple quads.

Going back to POT was a good move, since on the ATI X1900 it seems hardware mipmaps are only supported with POT (so originally we where using gluBuildMipmap). Implementing this was relatively straight forward. Here’s what needs to get done:

  1. Obtain texture, then create a new image with the next largest POT dimension.
  2. Create new image by placing original image onto the new image.
  3. Create textures by subdividing image with predefined tile size.

All images are supplied as a CGImageRef, so I implemented a new method that will go through and accomplish the above task. It is pretty simple. You pass a CGImageRef and tile size and it will return an array of OpenGL texture ids.

- (GLuint *)createTiledTexturesFromCGImage:(CGImageRef)cgImage
                 tileSize:(int)newTileSize
{
  GLuint *textureNames;
  if(cgImage) {
    float image_w = CGImageGetWidth(cgImage);
    float image_h = CGImageGetHeight(cgImage);
    float remain_x = image_w/newTileSize;
    float remain_y = image_h/newTileSize;
    float spacing_w = (ceil(remain_x)-remain_x)*(float)newTileSize;
    float spacing_h = (ceil(remain_y)-remain_y)*(float)newTileSize;
    float width = image_w + spacing_w;
    float height = image_h + spacing_h;

    void* tData = calloc(width * 4, height);
    CGRect rect = CGRectMake(0, spacing_h, image_w, image_h);
    CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
    CGContextRef myBitmapContext = CGBitmapContextCreate(
     tData, width, height, 8, width*4, color_space,
     kCGImageAlphaPremultipliedFirst);
    CGContextDrawImage(myBitmapContext, rect, cgImage);

    int perWidth = (int)ceil(width/newTileSize);
    int perHeight = (int)ceil(height/newTileSize);

    int numTextureNames = perWidth*perHeight;
    textureNames = malloc(sizeof(GLuint)*numTextureNames);

    textureType = GL_TEXTURE_2D;
    glEnable(textureType);
    glGenTextures(numTextureNames, textureNames);

    //backup default pixel store state
    glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT);

    //setup bitmap attributes
    glPixelStorei(GL_UNPACK_ROW_LENGTH, width);
    glPixelStorei(GL_UNPACK_ALIGNMENT, 1);

    int onY;
    for(onY = 0; onY < perHeight; onY++) {
      int onX;
      for(onX = 0; onX < perWidth; onX++) {
        int onTexture = onY*perWidth + onX;

        //setup offsets
        int x = onX*newTileSize;
        int y = onY*newTileSize;

        //setup extents
        int dx = MINOF2(width-x, newTileSize);
        int dy = MINOF2(height-y, newTileSize);

        //skip to x,y for read from bitmap
        glPixelStorei(GL_UNPACK_SKIP_PIXELS, x);
        glPixelStorei(GL_UNPACK_SKIP_ROWS, y);

        glBindTexture(textureType, textureNames[onTexture]);

        glTexParameteri(textureType, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
        glTexParameteri(textureType, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
        glTexParameteri(textureType, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
        glTexParameteri(textureType, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
        glTexParameteri(textureType, GL_TEXTURE_BASE_LEVEL, 0);
        glTexParameteri(textureType, GL_TEXTURE_MAX_LEVEL, 4);
        glTexParameteri(textureType, GL_TEXTURE_MIN_LOD, 0);
        glTexParameteri(textureType, GL_TEXTURE_MAX_LOD, 4);

        glTexParameteri(textureType, GL_GENERATE_MIPMAP, GL_TRUE);
        glTexImage2D(textureType, 0, GL_RGBA, newTileSize,
             newTileSize, 0, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8, NULL);
        glTexSubImage2D(textureType, 0, 0, 0, dx, dy, GL_BGRA,
                                                GL_UNSIGNED_INT_8_8_8_8, tData);
      }
    }

    //restore default pixel store state
    glPopClientAttrib();

    glDisable(textureType);

    // release
    CGColorSpaceRelease(color_space);
    CGContextRelease(myBitmapContext);
    free(tData);
  }
  return textureNames;
}

We use glTexImage2D with NULL data and instead use glTexSubImage2D to insert an image of size dx, dy to account for the images at the edges. I’m not sure if that was necessary. Now all we need to do is iterate through the textures and create the necessary quads in our scene. Initially, I had rendered the quad size to be the texture size (which is the tileSize) but many times the quad sizes are too big and had rendering quirks with overlapping quads. The solution is to make sure you size the quad to be the same size as the original image. So for edge textures, you would have not create square quads, instead you will have whatever size necessary to show the original image. Here’s a code snippet from the scene:

unsigned i = 0;
float x, y;

for (y = 0; y > -dob.h; y -= textureSize) {
  float dy = MINOF2(dob.h + y, textureSize);
  glPushMatrix();
  glTranslatef(0, 2*y/mapScaler, 0);
  for (x = 0; x < dob.w; x += textureSize) {
    glPushMatrix();
    glTranslatef(2*x/mapScaler, 0, 0);
    if (dob.textureIds[i]) {
      if (renderMode == GL_SELECT) {
        glLoadName(j);
      }
      float dx = MINOF2(dob.w - x, textureSize);
      glBindTexture(GL_TEXTURE_2D, dob.textureIds[i]);
      glBegin(GL_QUADS);
        //Page textures are flipped. Compensate for that.
        glTexCoord2f(0.0f, 0.0f);
        glVertex3f(0.0f, 0.0f, 0.0f);
        glTexCoord2f(dx/textureSize, 0.0f);
        glVertex3f(2*dx/mapScaler, 0.0f, 0.0f);
        glTexCoord2f(dx/textureSize, dy/textureSize);
        glVertex3f(2*dx/mapScaler, -2*dy/mapScaler, 0.0f);
        glTexCoord2f(0.0f, dy/textureSize);
        glVertex3f(0.0f, -2*dy/mapScaler, 0.0f);
      glEnd();
      glPopMatrix();
    }
    i++;
  }
  glPopMatrix();
}

Now we can handle large sites since we’re just rendering 256×256 images.

Frustrum Culling

Implementing frustrum culling was pretty straight forward. I just had problems since I was applying my matrix transformation in the wrong order. Remember, they don’t commute! This is a good article that you can follow to implement it. Once implemented, the performance boost was noticeable for examples using lots of large textures. Now we need to work on a texture manager that will do manual mipmapping to display different images at different camera positions.

Building a Console for E15

Saturday, February 16th, 2008

console.png

This week I worked on implementing a console on E15 that can display contents of stdout while also accepting stdin. Until now, we all used the Xcode console, but of course if we want to distribute a binary, we are going to need a console for the users. The task wasn’t as trivial as I initially thought, and the result is a bit of a hack, but I thought I’d just write it up.

The console is just a simple NSTextView that is contained in a NSDrawer attached to the bottom of the OpenGLView. The first thing to do is to print the contents of error output into that view. It’s easy enough to do, we just need to add the contents to the NSTextStorage of the console. This seemed to work fine, but it would hang the application if we write to it repeatedly in a short time. Something like:

for i in range(1000):
  print i

would hang at some arbitrary point. For a while I was convinced it was a problem in appending strings to the text view, but even after trying to append strings every way possible, it still didn’t work. In the end, the problem was due to threading. The python interpreter runs in a secondary thread, and I had thought that it was fine to modify the GUI from a secondary thread, since drawing to a view from a secondary thread was safe. Anyway, as soon as I used

[NSObject performSelectorOnMainThread:withObject:waitUntilDone:]

to write the contents, everything was good to go. It would be nice to know the exact things that must run on the main thread in Cocoa are, but the Apple documentation is pretty bad about that. So the lesson learned: when something barfs unexpectedly, run in the main thread!

The other part to the console is to accept user input in the console and pipe it to stdin, to the Python interpreter. This is important since sometimes you would want to receive user information from input() or raw_input(). Initially I thought using NSTask and NSPipe would be the best, but that required me to set up the Python interpreter as a subprocess, but Kyle worked out all the stuff dealing with Python and threading and saving states and I didn’t want to mess with that. I knew that in Python, you can set any file handler as a stdin, stderror or stdout. So this is a total hack, and I don’t really know if there is a nicer way to go about it, but the idea is to grab the contents of the last line on the console when the user presses return or enter, then create a tmp file with the contents written inside it.

On the Python end, we would have to redefine raw_input(). I instead have a new method console_input() which is defined like this:

def console_input(message, magic_string='Yes'):
  print message
  open("/tmp/STDIN.txt", "w").write("# STDIN #")
  saved_state = sys.stdin
  sys.stdin = open("/tmp/STDIN.txt")
  line = sys.stdin.readline().strip()
  prev_result = line
  first_time = True
  while upper(line) != upper(magic_string):
    if prev_result != line and first_time == False:
        print 'Please answer ' + magic_string
        prev_result = line
    first_time = False
    sys.stdin = open("/tmp/STDIN.txt")
    line = sys.stdin.readline().strip()
  sys.stdin = saved_state

So, the method prints out a message to the console, then also waits for the user to input the magic_string. If the user enters the magic string, the method will exit. Ghetto, I know…but it works and I can’t come up with a better solution.