{"id":50,"date":"2013-09-13T15:23:42","date_gmt":"2013-09-13T15:23:42","guid":{"rendered":"https:\/\/blogs.scummvm.org\/richiesams\/?p=50"},"modified":"2022-05-22T20:21:36","modified_gmt":"2022-05-22T20:21:36","slug":"one-frame-at-a-time","status":"publish","type":"post","link":"https:\/\/blogs.scummvm.org\/richiesams\/2013\/09\/13\/one-frame-at-a-time\/","title":{"rendered":"One frame at a time"},"content":{"rendered":"<p>So, we&#8217;re entering into the final weekend before the soft pencil&#8217;s down for GSoC. It&#8217;s been a very busy couple of weeks since university here in the US started 3 weeks ago. So I&#8217;ve been juggling homework, labs, and working on this. But you&#8217;re not here to for that, so let&#8217;s launch into the actual post:<\/p>\n<p>Animations in-game come in two formats: AVI and a custom format called RLF. AVI is simple because I can use the ScummVM AVI decoder. But I had to reverse engineer the RLF file format so it can be played as a video or frame by frame.<\/p>\n<p>Before I go into the format of the file, I want to explain the general schema of animations, or more specifically, video frame compression techniques. (For another reference, the article\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Video_compression_picture_types\" target=\"_blank\" rel=\"noopener\">here<\/a>\u00a0is pretty good) The first frame of an animation has to include every pixel, aka, no compression. These frames are called I-frames or key frames. For the next frame, we\u00a0<i><b>could<\/b><\/i>\u00a0store every pixel, but that seems kind of wasteful. Instead, we store only the pixels that changed between the last frame and this frame. These are called P-frames. Optionally, a frame can also store the pixels that changed between the next frame and this frame. These are called B-frames. This allows animations to be played forwards\u00a0<i><b>or<\/b><\/i>\u00a0backwards. With P-frames, we can only go forwards. In order to seek within an animation, we have to find the closest I-frame, then add P\/B-frames until we&#8217;re at the frame we want. To make this less painful for long animations, video encoders insert I-frames every 120 frames or so. (About 1 I-frame every 5 seconds. Assuming 24 fps, 24 * 5 = 120).<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.scummvm.org\/richiesams\/wp-content\/uploads\/sites\/34\/2022\/05\/2000px-I_P_and_B_frames.svg_.png\" \/><\/p>\n<div class=\"separator\"><\/div>\n<p>RLF files only use I-frames and P-frames. If they need to go backwards, the whole animation is encoded both forwards and backwards. For example: 0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0<br \/>\nIt seems pretty wasteful in my opinion, but that&#8217;s what they do.<\/p>\n<p>The RLF file starts off with a header to describe the information it contains:<\/p>\n<pre class=\"brush:cpp\">bool RlfAnimation::readHeader() {\r\n    if (_file.readUint32BE() != MKTAG('F', 'E', 'L', 'R')) {\r\n        return false;\r\n    }\r\n\r\n    \/\/ Read the header\r\n    _file.readUint32LE();                \/\/ Size1\r\n    _file.readUint32LE();                \/\/ Unknown1\r\n    _file.readUint32LE();                \/\/ Unknown2\r\n    _frameCount = _file.readUint32LE();  \/\/ Frame count\r\n\r\n    \/\/ Since we don't need any of the data, we can just seek right to the\r\n    \/\/ entries we need rather than read in all the individual entries.\r\n    _file.seek(136, SEEK_CUR);\r\n\r\n    \/\/\/\/ Read CIN header\r\n    \/\/_file.readUint32BE();          \/\/ Magic number FNIC\r\n    \/\/_file.readUint32LE();          \/\/ Size2\r\n    \/\/_file.readUint32LE();          \/\/ Unknown3\r\n    \/\/_file.readUint32LE();          \/\/ Unknown4\r\n    \/\/_file.readUint32LE();          \/\/ Unknown5\r\n    \/\/_file.seek(0x18, SEEK_CUR);    \/\/ VRLE\r\n    \/\/_file.readUint32LE();          \/\/ LRVD\r\n    \/\/_file.readUint32LE();          \/\/ Unknown6\r\n    \/\/_file.seek(0x18, SEEK_CUR);    \/\/ HRLE\r\n    \/\/_file.readUint32LE();          \/\/ ELHD\r\n    \/\/_file.readUint32LE();          \/\/ Unknown7\r\n    \/\/_file.seek(0x18, SEEK_CUR);    \/\/ HKEY\r\n    \/\/_file.readUint32LE();          \/\/ ELRH\r\n\r\n    \/\/\/\/ Read MIN info header\r\n    \/\/_file.readUint32BE();          \/\/ Magic number FNIM\r\n    \/\/_file.readUint32LE();          \/\/ Size3\r\n    \/\/_file.readUint32LE();          \/\/ OEDV\r\n    \/\/_file.readUint32LE();          \/\/ Unknown8\r\n    \/\/_file.readUint32LE();          \/\/ Unknown9\r\n    \/\/_file.readUint32LE();          \/\/ Unknown10\r\n    _width = _file.readUint32LE();   \/\/ Width\r\n    _height = _file.readUint32LE();  \/\/ Height\r\n\r\n    \/\/ Read time header\r\n    _file.readUint32BE();                    \/\/ Magic number EMIT\r\n    _file.readUint32LE();                    \/\/ Size4\r\n    _file.readUint32LE();                    \/\/ Unknown11\r\n    _frameTime = _file.readUint32LE() \/ 10;  \/\/ Frame time in microseconds\r\n\r\n    return true;\r\n}\r\n<\/pre>\n<p>The magic number &#8216;FELR&#8217; refers to the run-length encoding used in the file. I&#8217;ll explain the specifics later on. I&#8217;m kind of curious what all the extra information in the header is used for, so if you guys have any ideas, I&#8217;m all ears. The useful information is pretty self-explanatory.<\/p>\n<p>After the header is the actual frame data. Each frame also has a header.<\/p>\n<pre class=\"brush:cpp\">RlfAnimation::Frame RlfAnimation::readNextFrame() {\r\n    RlfAnimation::Frame frame;\r\n\r\n    _file.readUint32BE();                        \/\/ Magic number MARF\r\n    uint32 size = _file.readUint32LE();          \/\/ Size\r\n    _file.readUint32LE();                        \/\/ Unknown1\r\n    _file.readUint32LE();                        \/\/ Unknown2\r\n    uint32 type = _file.readUint32BE();          \/\/ Either ELHD or ELRH\r\n    uint32 headerSize = _file.readUint32LE();    \/\/ Offset from the beginning of this frame to the frame data. Should always be 28\r\n    _file.readUint32LE();                        \/\/ Unknown3\r\n\r\n    frame.encodedSize = size - headerSize;\r\n    frame.encodedData = new int8[frame.encodedSize];\r\n    _file.read(frame.encodedData, frame.encodedSize);\r\n\r\n    if (type == MKTAG('E', 'L', 'H', 'D')) {\r\n        frame.type = Masked;\r\n    } else if (type == MKTAG('E', 'L', 'R', 'H')) {\r\n        frame.type = Simple;\r\n        _completeFrames.push_back(_lastFrameRead);\r\n    } else {\r\n        warning(\"Frame %u doesn't have type that can be decoded\", _lastFrameRead);\r\n    }\r\n\r\n    _lastFrameRead++;\r\n    return frame;\r\n}\r\n<\/pre>\n<p>If a frame is of type DHLE, it is a P-frame, if it is of type HRLE, it&#8217;s an I-frame. We hold off decoding until we actually need to render the frame. This allows for less memory use.<\/p>\n<p>So now we&#8217;ve read in all our data. How do we render a frame? The simplest case is to render the next frame. Note: _currentFrameBuffer is a Graphics::Surface that stores the current frame.<\/p>\n<pre class=\"brush:cpp\">const Graphics::Surface *RlfAnimation::getNextFrame() {\r\n    assert(_currentFrame + 1 &lt; (int)_frameCount);\r\n\r\n    if (_stream) {\r\n        applyFrameToCurrent(readNextFrame());\r\n    } else {\r\n        applyFrameToCurrent(_currentFrame + 1);\r\n    }\r\n\r\n    _currentFrame++;\r\n    return &amp;_currentFrameBuffer;\r\n}\r\n\r\nvoid RlfAnimation::applyFrameToCurrent(uint frameNumber) {\r\n    if (_frames[frameNumber].type == Masked) {\r\n        decodeMaskedRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);\r\n    } else if (_frames[frameNumber].type == Simple) {\r\n        decodeSimpleRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);\r\n    }\r\n}\r\n\r\nvoid RlfAnimation::applyFrameToCurrent(const RlfAnimation::Frame &amp;frame) {\r\n    if (frame.type == Masked) {\r\n        decodeMaskedRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);\r\n    } else if (frame.type == Simple) {\r\n        decodeSimpleRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);\r\n    }\r\n}\r\n<\/pre>\n<p>The decode&#8230;.() functions simultaneously decode the frame data we read in earlier, and then blit it directly on-top of the _currentFrameBuffer pixels. I&#8217;ll explain the details of each function further down.<\/p>\n<p>You might be wondering what the _stream variable refers to? I&#8217;ve created the RlfAnimation class so that it can decode in two different ways: it can load all the data from the file into memory and then do all decoding\/blitting from memory, or it can stream the data from file, one frame at a time. The first option allows you to seek within the animation, but it uses quite a bit of memory (roughly the size of the file). The second option uses far less memory, but you can only play the animation forwards and can not seek.<\/p>\n<p>On to the decoding functions:<\/p>\n<p>I-frames contain every single pixel within a frame. Again, we\u00a0<b><i>could<\/i><\/b>\u00a0store every one of these, but that would be kind of expensive. So we use a simple compression algorithm called\u00a0<i>Run Length Encoding<\/i>. (There are tons of frame compression algorithms out there. This is just the one they chose to use). Consider this image:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.scummvm.org\/richiesams\/wp-content\/uploads\/sites\/34\/2022\/05\/smilieface.png\" \/><\/p>\n<div class=\"separator\"><\/div>\n<p>And then, let&#8217;s choose a specific line of pixels:<\/p>\n<div class=\"separator\"><img decoding=\"async\" src=\"https:\/\/blogs.scummvm.org\/richiesams\/wp-content\/uploads\/sites\/34\/2022\/05\/smilieFaceLine.png\" \/><\/div>\n<p>If we were to encode each pixel we would need to store:<\/p>\n<div>YYYYYYYBYYYYYYBYYYYYYY<\/div>\n<p>where Y means yellow and B means black. That&#8217;s a lot of repeated yellows. Lets instead store this:<\/p>\n<div>7Y1B6Y1B<\/div>\n<p>The numbers represent how many of the following pixels are of the same color. So the decoder would interpret that as: render 7 yellow pixels, 1 black pixel, 6 yellow pixels, 1 black pixel, then 7 yellow pixels.<\/p>\n<p>The RLF files take this idea further. Consider this line of data, where G means green, R means red:<\/p>\n<div>YYYYYBGRYBYGBYYYYYY<\/div>\n<p>If we use the same method as before we get:<\/p>\n<div>5Y1B1G1R1Y1B1Y1G1B6Y<\/div>\n<p>It&#8217;s almost as long as the original data! If a color doesn&#8217;t have any repetition, using encoding actually takes up more space. To counter that, the RLF files do the following:<\/p>\n<div>5Y-8BGRYBYGB6Y<\/div>\n<p>If the number is negative, the next N pixels are copied directly to the destination. If it&#8217;s positive, the next N pixels are filled with the color directly following the number.<\/p>\n<p>Here&#8217;s that algorithm in code form:<\/p>\n<pre class=\"brush:cpp\">void RlfAnimation::decodeSimpleRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {\r\n    uint32 sourceOffset = 0;\r\n    uint32 destOffset = 0;\r\n\r\n    while (sourceOffset &lt; sourceSize) {\r\n        int8 numberOfSamples = source[sourceOffset];\r\n        sourceOffset++;\r\n\r\n        \/\/ If numberOfSamples is negative, the next abs(numberOfSamples) samples should\r\n        \/\/ be copied directly from source to dest\r\n        if (numberOfSamples &lt; 0) {\r\n            numberOfSamples = ABS(numberOfSamples);\r\n\r\n            while (numberOfSamples &gt; 0) {\r\n                if (sourceOffset + 1 &gt;= sourceSize) {\r\n                    return;\r\n                } else if (destOffset + 1 &gt;= destSize) {\r\n                    return;\r\n                }\r\n\r\n                byte r, g, b;\r\n                _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);\r\n                uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);\r\n                WRITE_UINT16(dest + destOffset, destColor);\r\n\r\n                sourceOffset += 2;\r\n                destOffset += 2;\r\n                numberOfSamples--;\r\n            }\r\n\r\n        \/\/ If numberOfSamples is &gt;= 0, copy one sample from source to the \r\n        \/\/ next (numberOfSamples + 2) dest spots\r\n        } else {\r\n            if (sourceOffset + 1 &gt;= sourceSize) {\r\n                return;\r\n            }\r\n\r\n            byte r, g, b;\r\n            _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);\r\n            uint16 sampleColor = _pixelFormat565.RGBToColor(r, g, b);\r\n            sourceOffset += 2;\r\n\r\n            numberOfSamples += 2;\r\n            while (numberOfSamples &gt; 0) {\r\n                if (destOffset + 1 &gt;= destSize) {\r\n                    return;\r\n                }\r\n\r\n                WRITE_UINT16(dest + destOffset, sampleColor);\r\n                destOffset += 2;\r\n                numberOfSamples--;\r\n            }\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>To encode the P-frames, we use a similar method as above. Remember that P-frames are partial frames. They only include the pixels that changed from the last frame. An example pixel line could look like this, where O is a placeholder for empty space:<\/p>\n<div>OOOOBRGOOYYBRGOOOOO<\/div>\n<p>To encode this we do the following:<\/p>\n<div>4-3BRG2-5YYBRG5<\/div>\n<p>If the number read is positive, the next N pixels should be skipped. If the number is negative, the next N pixels should be copied directly to the destination.<\/p>\n<p>Here is that algorithm in code form:<\/p>\n<pre class=\"brush:cpp\">void RlfAnimation::decodeMaskedRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {\r\n    uint32 sourceOffset = 0;\r\n    uint32 destOffset = 0;\r\n\r\n    while (sourceOffset &lt; sourceSize) {\r\n        int8 numberOfSamples = source[sourceOffset];\r\n        sourceOffset++;\r\n\r\n        \/\/ If numberOfSamples is negative, the next abs(numberOfSamples) samples should\r\n        \/\/ be copied directly from source to dest\r\n        if (numberOfSamples &lt; 0) {\r\n            numberOfSamples = ABS(numberOfSamples);\r\n\r\n            while (numberOfSamples &gt; 0) {\r\n                if (sourceOffset + 1 &gt;= sourceSize) {\r\n                    return;\r\n                } else if (destOffset + 1 &gt;= destSize) {\r\n                    return;\r\n                }\r\n\r\n                byte r, g, b;\r\n                _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);\r\n                uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);\r\n                WRITE_UINT16(dest + destOffset, destColor);\r\n\r\n                sourceOffset += 2;\r\n                destOffset += 2;\r\n                numberOfSamples--;\r\n            }\r\n\r\n        \/\/ If numberOfSamples is &gt;= 0, move destOffset forward ((numberOfSamples * 2) + 2)\r\n        \/\/ This function assumes the dest buffer has been memset with 0's.\r\n        } else {\r\n            if (sourceOffset + 1 &gt;= sourceSize) {\r\n                return;\r\n            } else if (destOffset + 1 &gt;= destSize) {\r\n                return;\r\n            }\r\n\r\n            destOffset += (numberOfSamples * 2) + 2;\r\n        }\r\n    }\r\n}<\/pre>\n<p>Whew! Almost there. The last thing to talk about is frame seeking. This requires that you&#8217;re\u00a0<i><b>not<\/b><\/i>\u00a0streaming directly from disk. (Well, you\u00a0<i>could<\/i>\u00a0do it, but it would probably be more trouble than it was worth). As we read in the frames, we stored which frames were I-frames. So to seek to a frame, we iterate through that list of I-frames and find the I-frame closest to our destination frame. Then we use applyFrameToCurrent() to move from the I-frame to the destination frame:<\/p>\n<pre class=\"brush:cpp\">void RlfAnimation::seekToFrame(int frameNumber) {\r\n    assert(!_stream);\r\n    assert(frameNumber &lt; (int)_frameCount || frameNumber &gt;= -1);\r\n\r\n    if (frameNumber == -1) {\r\n        _currentFrame = -1;\r\n        return;\r\n    }\r\n\r\n    int closestFrame = _currentFrame;\r\n    int distance = (int)frameNumber - _currentFrame;\r\n    for (Common::List&lt;uint&gt;::const_iterator iter = _completeFrames.begin(); iter != _completeFrames.end(); iter++) {\r\n        int newDistance = (int)frameNumber - (int)(*iter);\r\n        if (newDistance &gt; 0 &amp;&amp; (closestFrame == -1 || newDistance &lt; distance)) {\r\n            closestFrame = (*iter);\r\n            distance = newDistance;\r\n        }\r\n    }\r\n\r\n    for (int i = closestFrame; i &lt;= frameNumber; i++) {\r\n        applyFrameToCurrent(i);\r\n    }\r\n\r\n    _currentFrame = frameNumber;\r\n}\r\n<\/pre>\n<p>That&#8217;s it! If you want to look at the full class, you can find it\u00a0<a href=\"https:\/\/github.com\/RichieSams\/scummvm\/blob\/zengine\/engines\/zengine\/rlf_animation.h\" target=\"_blank\" rel=\"noopener\">here<\/a>\u00a0and\u00a0<a href=\"https:\/\/github.com\/RichieSams\/scummvm\/blob\/zengine\/engines\/zengine\/rlf_animation.cpp\" target=\"_blank\" rel=\"noopener\">here<\/a>. And as always, if you have ANY questions, feel free to comment. Happy coding!<\/p>\n<p>-RichieSams<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, we&#8217;re entering into the final weekend before the soft pencil&#8217;s down for GSoC. It&#8217;s been a very busy couple of weeks since university here in the US started 3 weeks ago. So I&#8217;ve been juggling homework, labs, and working on this. But you&#8217;re not here to for that, so let&#8217;s launch into the actual [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-50","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/posts\/50","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/comments?post=50"}],"version-history":[{"count":1,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/posts\/50\/revisions"}],"predecessor-version":[{"id":54,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/posts\/50\/revisions\/54"}],"wp:attachment":[{"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/media?parent=50"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/categories?post=50"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/richiesams\/wp-json\/wp\/v2\/tags?post=50"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}