*This is version 2 of bopti, included in the first fx-CG 50-compatible version of gint, version 2.0.* ## Introduction The bitmap drawing module, *bopti*, renders images using direct bitwise operations on video RAM (vram) longwords. This method makes extensive use of the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid costly single-bit operations. In gint's development workflow, images in usual formats are first converted to the *bopti* format at compile-time. The *bopti* format is designed for fast rendering: it consists of one or several monochrome bitmaps called *layers*, and a piece of assembler code that uses the layers' data to render the image. ## Performance (TODO) Probably about 15 times as fast as MonochromeLib. ## Color profiles When converting an image, *fxconv* first quantizes the colors by mapping transparent pixels to `alpha` and each other to the closest color in these four: | Color name | Hexadecimal | | ---------- | ----------- | | `black` | `#000000` | | `dark` | `#555555` | | `light` | `#aaaaaa` | | `white` | `#ffffff` | Then the image is assigned the smallest color profile that can represent all the colors: | Profile | Supported colors | | ------------ | ------------------------------------------ | | `mono` | `black`, `white` | | `mono_alpha` | `black`, `white`, `alpha` | | `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` | | `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` | ## Layers Each profile is associated with a certain number of *layers* and their *blitting methods*. During rendering, all of the layers are blit in order to produce the image. The number of layers in a profile is always minimal: it is $`\lceil 1 + \log n \rceil`$ where $`n`$ is the number of supported colors. On fx-9860G II, the vram is either monochrome or 4-color gray, so pixel colors can only take 2 or 4 different values. This makes logical operations a privileged method to implement blitting methods, because logical operations can effortlessly be extended to apply on multiple pixels at once. The current version of *bopti* uses the following types of layers: | Layer name | Category | Effect for 0-bits | Effect for 1-bits | | ----------- | ---------- | ------------------- | ---------------------- | | `fill` | Monochrome | Paints white | Paints black | | `white` | Monochrome | - | Paints white | | `black` | Monochrome | - | Paints black | | `lfill` | Gray | Clears light vram | Paints light vram | | `dfill` | Gray | Clears dark vram | Paints dark vram | | `light` | Gray | - | Paints light gray | | `dark` | Gray | - | Paints dark gray | When performing an operation, *bopti* takes mask data from the encoded image and applies bitwise operations for all layers. It then moves to a different part of the image. The previous version of *bopti* applied each layer independently, but the current version applies them all at once, saving a large amount of time. Note that most functions do nothing on 0-bits; this is an optimization related to *rectangle masks*. When a VRAM longword is loaded to a register, the blitted image will often not cover all the bits. The pixels that must be preserved are represented in a structure called a rectangle mask. Having this neutral 0-bit makes it simple to preserve relevant pixels while drawing the image. See later for more details. Here is the relationship between color profiles and their layers: * The `mono` profile only has a `fill` layer. * The `mono_alpha` profile starts with a `white` layer to clear the non-transparent region of the image, then blits a `black` layer to render the content. * The `gray` profile has an `lfill` and a `dfill` layer. These two types of layer act on different VRAMs. * The `gray_alpha` profile start by blitting a `white` layer on both VRAMs, then adds a `light` layer and a `dark` layer. ## Logical operations on pixels As a reference, here are the logical operations used to blit layers on past and present versions of bopti. The $`x`$ parameter is a boolean; the transformation must happen iff $`x=1`$. The significance of $`x`$ appears when extending the logical operations to a longword: it allows controlling 32 pixels individually while still using only a couple logical instructions. ```c black (data, x) = data | x white (data, x) = data & ~x inverse (data, x) = data ^ x ``` For gray images, we need to know that the gray engine produces an illusion of intermediate color by quickly alternating two buffers on the screen, with a different duration for each. This way, the proportion of time each pixel is black is one of four different values. Assuming `long` and `short` represent the value of a pixel in the buffer that stays longer and shorter on the screen, we have the following encoding: white = 0 (long=0 short=0) lightgray = 1 (long=0 short=1) darkgray = 2 (long=1 short=0) black = 3 (long=1 short=1) So operations on gray pixels will modify two VRAMs at once. Among interesting operations, we have `ligthen`, which shifts all values towards white (and white remains white), as if decrementing them, and `darken` that shifts all values towards black (and black remains black), as if incrementing them. ```c black (light, dark, x) = (light | x, dark | x) dark (light, dark, x) = (light & ~x, dark | x) light (light, dark, x) = (light | x, dark & ~x) white (light, dark, x) = (light & ~x, dark & ~x) inverse (light, dark, x) = (light ^ x, dark ^ x) lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x)) darken (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x)) ``` I'll leave it to you, if you want, to check that these function do their job when $`x=1`$ and return their arguments unchanged when $`x=0`$. ## Assembler-driven rendering The previous implementation of bopti was already fast, usually about 8 times as fast as MonochromeLib. Half of it was due to vram alignment, the other was related to implementation and format. It had, however, two limiting factors: 1. The operation function was a generic function taking the color as argument, and it used a switch to decide which operation to apply; 2. Each layer was drawn independently, so the 2D structure of the image was unnecessarily traversed several times. These two limitations are related and can be overcome by specializing the rendering code which is the deepest in the critical loop. The current version of *bopti* has one specialized rendering function per color profile, implemented in assembler. ## Image format The conversion is performed by *fxconv* at compile-time and outputs a big-endian data structure that can be efficiently traversed from the add-in. The image is first extended to make its width a multiple of 32 pixels, then stored in row-major order: 32 32 32 +--------+--------+--------+ | 1 | 2 | 3 | 1 +--------+--------+--------+ | 4 | 5 | 6 | 1 +--------+--------+--------+ A set of 32 pixels as numbered on the diagram above is called a *position*. This in an important concept for the rendering algorithm. For each position, the data of all layers is stored in rendering order, so the layers are interwoven in the storage. It also means that the data for a position will consist of several longwords, not just one. Note that extending the image to a multiple of 32 in width is not a hard requirement, it can be avoided by defining and implementing 16-bit an 8-bit positions, but this is currently not done. Along with this data, the image object contains a number of metadata: ```c typedef struct { /* Image can only be rendered with the gray engine */ uint gray :1; /* Left for future use */ uint :3; /* Image profile (uniquely identifies a rendering function) */ uint profile :4; /* Full width, in pixels */ uint width :12; /* Full height, in pixels */ uint height :12; /* Raw layer data */ uint8_t data[]; } GPACKED(4) image_t; ``` The first byte indicate the color profile and whether this profile is gray-only. `width` and `height` are the natural dimensions of the image, before width extension (which is only relevant for storage). The number of columns is deduced from the width. ## Rendering algorithm (TODO)