Fading Memories

About

Ramblings about books and other things that will soon fade from my memory.

Boudewijn Rempt

index | rss1.0

There's more...

Creative Commons License
The original artwork is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

Roundabout through identi.ca

    follow me on Identi.ca

    Categories, too

    Find


    Archives

    Other things here at valdyas.org

    2006-10-01

    Getting at pixels

    As discussed in my blog on graphics libraries, common schemes for storing large images are tiles, scanlines and blocks of memory as big as the width * height * size_of_pixel. Tiles are the most common, though. Photoshop, JAI, Krita, GIMP, GEGL, MosfetPaint -- these all break up the image in small squares.

    The big difference is in how the application or plugin developer accesses the data in the tiles.


    Tiles

    The easy way out for the application developer is to simply provide access to the tiles themselves. Photoshop does this, or at least, that was the case around Photoshop 6, the last version for which the sdk was freely available. To quote some of the comments (in the hope that this constitutes fair use) from the sample Dissolve filter:

    ...
        // we only need a tile per plane plus the maskData
        // inTileHeight and inTileWidth are invalid at this
        // point. Assume the tile size is 256 max.
    ...
        // round up to the nearest horizontal and vertical tile count
    ...
        // loop through each tile making sure we don't go over the bounds
        // of the rectHeight or rectWidth
    ...
    

    As you can see, this puts considerable strain on the plugin developer. It has the advantage that only one tile needs to be in memory at a time, or perhaps two when needing to go look over the fence: this minimizes memory requirements and makes it easy to develop multi-threaded applications that take advantage of many-core machines. (This is where Vips shines, even though it doesn't use tiles but mmaps the whole image.).

    Tiles & pseudo-scanlines

    The GIMP uses a similar technique as Photoshop, but couples it with pseudo scanlines: if we look at the bumpmap.c filter plugin, we see code to compute the number tiles to get one row, cache those tiles, and then code to retrieve the pixels in that row. This takes, obviously, more memory than the one tile at a time approach of Photoshop, but makes life a little easier, and looping through the pixels may well be faster:

      /* Set the tile cache size */
      /* Compute number of tiles needed for one row of the drawable */
      drawable_tiles_per_row =
        1
        + (sel_x2 + gimp_tile_width () - 1) / gimp_tile_width ()
        - sel_x1 / gimp_tile_width ();
    
      /* Compute number of tiles needed for one row of the bitmap */
      bm_tiles_per_row = (bm_width + gimp_tile_width () - 1) / gimp_tile_width ();
    
      /* Cache one row of source, destination and bitmap */
      gimp_tile_cache_ntiles (bm_tiles_per_row + 2 * drawable_tiles_per_row);
      
      /* Initialize row buffers */
      bm_row1 = g_new (guchar, bm_width * bm_bpp);
      
      /* Initialize pixel regions */
      gimp_pixel_rgn_init (&src_rgn, drawable,
                           sel_x1, sel_y1, sel_width, sel_height, FALSE, FALSE);
    

    Cobbling

    JAI -- the Java Advanced Imaging API works with tiles, too, but advocates the process of "cobbling", that is, to quote Bob Deen:

    Cobbling is simply the process of pulling together data from multiple tiles to make one big "tile".
    For example, a point operation would not need to do this, because to compute a tile you need just the corresponding input tile. However, for a convolution, each pixel needs a bunch of neighbors from the input image. Thus, computing everything in a tile requires some data from the surrounding tiles in order to correctly compute the edge pixels.
    Cobbling pulls all that data together into a single Raster for you, making it easier for the computeRect to do its work. Without cobbling, the method would have to look at multiple source tiles individually, which greatly complicates most algorithms.

    This is essentially an extension of the pseudo-scanline method: instead of just copying one line of data, you copy all data in the rectangle you are working with in one memory area. Krita provides the same possibility through the readBytes() and writeBytes functions. We'd like to deprecate those functions though. The big disadvantage of this technique is the possible insane memory requirements.

    Krita -- iterators

    Three years ago, Krita was still at the here-are-the-tiles-have-fun stage. Code to composite two layers together was fast, but completely impenetrable, and it was very hard to write filters, although probably pretty easy to convert Photoshop or GIMP filters.

    To state the problem in a generic way: we want to store our image in tiles (so we can swap unused tiles to disk, don't need a big contiguous chunk of memory for every layer and so we can easily store old tiles as undo information), but we don't want to have to reckon with tile boundaries when doing work. So, we need a generic, opaque (that is, you don't see the tiles) way to get at pixels, and preferably a fast way.

    Since we're writing in C++ iterators, defined as "a generalization of pointers", suggested themselves. I made the suggestion in 2004, Patrick Julien, the then-maintainer didn't object, Cyrille provided the first implementation, Casper improved it and Cyrille made using the iterators faster and provided more types of iterators.

    We now have a rich set of iterators, divided into two types: iterators that return a KisPixel object, and iterators that return a pointer to a chunk of bytes that represent one or more pixels. The latter kind is used together with our colorspace implementation. Colorspaces provide a rich set of algorithms that work on just a pointer to an array of bytes that the colorspace know how to interpret as pixels. Had we gone the STL way, we would have passed iterators, not pointers to the colorspaces.

    Krita iterators can return the current value of a pixel, the value of a pixel at a certain checkpoint and the value of the selection mask, if present, and the number of pixels that are stored consecutively in memory after the current pixel -- important for optimization

    These are the Krita iterators:

    • KisHLineIterator: iterate over pointers to pixels in rows. nextRow skips to the next row in the specified rect.
    • KisVLineIterator: iterate over pointers to pixels in columns, nextCol skips to the next column in the specified rect.
    • KisRectIterator: iterate over pointers to pixels in a rect we don't guarantee anything about the location of the next pixel you get, just that it's in the rectangle and that at the end, you've seen all pixels. This is because this iterator might be optimized to return the pixels per tile, to minimize tile faults.
    • KisRandomAccessor: allow random access to pointers to pixels by x,y coordinate. Much faster than calling getPixel on KisPaintDevice.
    • KisHLineIteratorPixel: iterate over pixels in rows.
    • KisVLineIteratorPixel: iterate over pixels in columns
    • KisRectIteratorPixel: iterate over pixels in a rectangle.
    • KisRandomAccessorPixel: allow random access to pixels by x,y coordinate. Much faster than calling getPixel on KisPaintDevice.

    However, we haven't completely followed the STL lead: the STL idiom is to create algorithm functions that take iterators as parameters. What we've done is create functions that take a KisPaintDevice (or two, or three) and inside the body create the iterators it needs:

    void KisFilterInvert::process(KisPaintDeviceSP src, KisPaintDeviceSP dst, const QRect& rect)
    {
        // XXX: Don't use rect iterators here: if the tiles of the two paint devices are not
        // aligned, we don't guarantee that the src pixel will be copied to the correct place
        // in the dst paint device.
        KisRectIteratorPixel dstIt = dst->createRectIterator(rect.x(), rect.y(), rect.width(), rect.height(), true );
        KisRectIteratorPixel srcIt = src->createRectIterator(rect.x(), rect.y(), rect.width(), rect.height(), false);
    
        int pixelsProcessed = 0;
    
        KisColorSpace * cs = src->colorSpace();
        Q_INT32 psize = cs->pixelSize();
    
        while( ! srcIt.isDone() )
        {
            if(srcIt.isSelected())
            {
                if (src!=dst)
                    memcpy(dstIt.rawData(), srcIt.oldRawData(), psize);
    
                cs->invertColor( dstIt.rawData(), 1);
            }
            ++srcIt;
            ++dstIt;
        }
    }
    

    Is this good or bad? I don't know. It fitted best with Krita's architecture at the time, because we already passed paint devices (encased in smart pointers) around, and we certainly didn't want to rework everything again. It has made development of plugins for Krita really, really easy. But there's a performance hit.

    Vigra

    Ullrich Köthe was the main reason I thought iterators were a viable option. In his thesis Generische Programmierung für die Bildverarbeitung, he already warned about the performance loss. Especially page 7 of STL-Style Generic Programming with Images is interesting.

    Köthe's Vigra library was a serious candidate for use in Krita at that time, despite being meant mainly for research in computer vision. The krita example above inverts an image: the Vigra equivalent looks like this:

    vigra::BRGBImage in(info.width(), info.height());
    vigra::BRGBImage out(info.width(), info.height());
               
    importImage(info, destImage(in)); // Reads the image from a file
                
    vigra::RGBValue offset(-255, -255, -255);
                
    // create a negative image by applying the expression
    //       newvalue = -1 * (oldvalue + RGBValue(-255, -255, -255))
    // to each pixel
    transformImage(srcImageRange(in), destImage(out),
                    vigra::linearIntensityTransform(-1, offset));
    

    Vigra iterators do not handle tiled images, as far as I can ascertain. This follows naturally from the goal of the library, but the library is designed to make it possible for Vigra's algorithms to work with a tiled datastructure by re-implementing the default iterators and accessors.

    Vigra has a number of very fun iterators:

    • ColumnIterator: Iterator adapter to linearly access ccolumns
    • CrackContourCirculato: Circulator that walks around a given region.
    • LineIterator: Iterator adapter to iterate along an arbitrary line on the image.
    • NeighborhoodCirculator: Circulator that walks around a given location in a given image.
    • RestrictedNeighborhoodCirculator: Circulator that walks around a given location in a given image, uusinga restricted neighborhood.
    • RowIterator: Iterator adapter to linearly access row.

    One disadvantage of Vigra, as far as I can tell, is that it is parametrized at compile time: that means that the algorithms and iterators are completely generic, as long as you know beforehand what colormodel and type of image you're going to mess with.

    GIL

    Adobe's GIL is relatively new: it proposes to do something similar to Vigra, namely provide a set of algorithms, colorspaces and iterators that make working with images an STL-like experience. GIL makes heavily use of Boost, which means that it's C++ can sometimes be a little hard to follow. And the apidox are a bit scarce. I will probably make a lot of mistakes in the following examination.

    The GIL tutorial starts out showing compile-time parametrized handling of images. Gil provides "views" on images -- temporary rectangles that can provide iterators over the pixels it refers to. The iterators themselves are little more than pointers, at least in the simplest use-cases:

    void x_gradient(const gray8c_view_t& src,
                    const gray8s_view_t& dst)
    {
        assert(src.dimensions() == dst.dimensions());
        for (int y = 0; y < src.height(); ++y) {
            gray8c_view_t::x_iterator src_it = src.row_begin(y);
            gray8s_view_t::x_iterator dst_it = dst.row_begin(y);
            for (int x = 1; x < src.width() - 1; ++x) {
                dst_it[x] = (src_it[x + 1] - src_it[x - 1]) / 2;
            }
        }
    }
    
    void ComputeXGradientGray8(const unsigned char* src_pixels,
                               ptrdiff_t src_row_bytes,
                               int w, int h,
                               signed short* dst_pixels,
                               ptrdiff_t dst_row_bytes)
    {
        gray8c_view_t src =
            interleaved_view(w, h,
                             (const gray8_pixel_t*)src_pixels,
                             src_row_bytes);
        gray8s_view_t dst =
            interleaved_view(w, h,
                             (gray8s_pixel_t*)dst_pixels,
                             dst_row_bytes);
        x_gradient(src,dst);
    }
    

    The ComputeXGradientGray8 binds the view object to the pixels (that need to be in big block in memory) and then calls the x_gradient methods with those views. This is comparable to Krita, apart from the fact that KisPaintDevice isn't a view, but the actual owner of the pixels. Inside the x_gradient function, the iterators are created. Again, that is similar to Krita in technique. What is different, is that our iterators do not care about the type of image they iterate over.

    However, Gil allows templating those parameters, so it is possible to create a generic algorithm, and since you can simply provide classes that conform to the Gil requirements, you can use your own colorspaces, image views and so on. Whether or not a class is conformant is determined using the boost::concept_check. A generic version of the above code would be:

    template 
    void x_gradient(const S_VIEW& src, const D_VIEW& dst) {
        for (int y=0; y < src.height(); ++y) {
            typename S_VIEW::x_iterator src_it = src.row_begin(y);
            typename D_VIEW::x_iterator dst_it = dst.row_begin(y);
    
            for (int x=1; x < src.width()-1; ++x)
                for (int c=0; c < S_VIEW::num_channels; ++c)
                    dst_it[x][c] = (src_it[x-1][c]- src_it[x+1][c])/2;
        }
    }
    

    The loop that iterates over the channels can also be templated out, which is nice because such loops typically are performance problems, unless unrolled, which is possible because the number of channels is known at compile time.

    However, Gil also supports writing generic operations that work with run-time parametrized images, like a new colorspace or a new pixel layout. For that, Gil provides any_image and any_image_view. Using this makes the code appreciably more complex, but there should be little performance loss.

    any_image takes a boost mpl vector of known image types:

    
    typedef mpl::vector<gray8_image_t, gray16_image_t, rgb8_image_t, rgb16_image_t> my_img_types;
    any_image<my_img_types> runtime_image;
    jpeg_read_image("input.jpg", runtime_image);
    

    I'm afraid I don't understand the example code for the generic algorithm well enough to be able to represent it here. I probably need a good book on Boost and STL...

    GEGL

    Pippin regularly regales us with tales about his babl, gggl and GEGL projects. Very instructive, but sometimes a little frustrating. GEGL basically wants image operator authors (and image operators include everything, even a layer in an image op in gegl) to work on just an chunk of bytes. GEGL provides the pointer and the length, the op then does its worst with it:

    static gboolean
    process (GEGLOperation *op,
             void          *in_buf,
             void          *out_buf,
             glong          samples) 
    {
      gint i;
      gfloat *in  = in_buf;
      gfloat *out = out_buf;
    
      for (i=0; i<samples; i++)
        {
          int  j;
          for (j=0; j<3; j++)
            {
              gfloat c;
              c = in[j];
              c = 1.0 - c;
              out[j] = c;
            }
          out[3]=in[3];
          in += 4;
          out+= 4;
        }
      return TRUE;
    }
    

    This is pretty simple. For point and composition operations, the library itself just shoves pointers to pixels into the method. Higher level operations get a GEGLBuffer. GEGL operations advertise which colorspace/channelsize they want to work on, and the input is converted before feeding into the operation. This means that plugin authors don't have to worry about tiles, and that operations are optimized for a particular colorspace.

    The GEGL architecture is heavily based on Michael Shantzis' A Model for Efficient and Flexible Image Computing, which, unfortunately, is only available from the ACM.

    Conclusion

    Interesting problems I haven't touched upon are things like network transparency, multi-processing, multi-threading, memory requirements, pyramids and so on -- it's all been just about getting at that pixel without undue pain. In the end, I think that the Krita team hasn't been doing badly at all: we tend to think of ourselves as amateurs, and we know that there's a lot of improvement possible and necessary. But I think that at the same time, we've got the most generic image library that makes it easiest to get at the pixels and do horrible things to the poor fellers.