Forum Casio - img2py and pyimage for MicroPython par Calamari

Inikiwi Hors ligne Membre Points: 595 Défis: 8 Message

Citer : Posté le 21/08/2022 14:22 | #

-> 💼 KiwiSuite <-

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 21/08/2022 21:24 | #

Looks good! From experience importing modules is relatively slow so once we commit to importing a file it needs to be worth it, eg. contain a lot of useful data. I think it would be useful to have an automatic setup for the converter to convert a set of images and put them in a one or multiple imports based on the total file size.

Colors might be useful on the fx-CG, although I don't see real demand for that at the moment.

Ultimately I think the best selling point would be to have it ready-to-use with a small tutorial or something similar.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 21/08/2022 23:57 | #

Lephenixnoir a écrit :
Looks good! From experience importing modules is relatively slow so once we commit to importing a file it needs to be worth it, eg. contain a lot of useful data. I think it would be useful to have an automatic setup for the converter to convert a set of images and put them in a one or multiple imports based on the total file size.

Colors might be useful on the fx-CG, although I don't see real demand for that at the moment.

Ultimately I think the best selling point would be to have it ready-to-use with a small tutorial or something similar.

I had not considered import speed. That enhancement will go on the TODO list. Glad you alluded to the 150 line editor limit (files can of course be much longer, but they're not editable), as it raised some questions for me. I previously performed some experiments appending bytes to a string where the memory ran out around 18000 bytes. However, as strings are immutable, it's possible that it was doing some kind of horrible copy, then throwing that away and fragmenting the memory in the process. A quick check of a="a"*60000 succeeded. I need to do more experiments. For example, if I'm done with an image and want to save memory, would, say, setting the variable to None allow that space to be freed? In a normal Python, I'd say yes, but in MicroPython I don't know... I hope so. I should try and find out. Could be useful if a game wants a fancy intro screen that is no longer needed once the game is started.

I considered the PRIZM, but there are so many more bits in a color image (24x as many!), and especially if it's noisy (JPG) I doubted RLE could do much to cut the size. Even on a busy mono image, the simple RLE was giving around 40% savings, so it was worth it for that. A lot of times the busy sections get hidden in a byte or two, then they are surrounded by white, which compresses well (although I have a couple ideas on how to improve the RLE further, and maybe use those savings to squeeze in a primitive token pass). I'm sure it'd be decently smaller with a proper LZ compression, although I'd have to weigh the module size increase. For the PRIZM, it'd probably require LZ and some kind of lossy support as well. It might make the module quite large. It still might be interesting, I've studied JPG a bit but I never ended up implementing something like it.

And, finally, I totally agree. The current tutorial doesn't cut it. I need to put in a lot more detail, syntax documentation and examples.

Great feedback all around, thank you!

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 22/08/2022 00:01 | #

Right. The line editor limit is pretty relevant IMO, no one wants to edit generated files on the calc. In the Bad Apple experiment I went far enough as to even put literal null bytes in the file since MicroPython accepts them, so they're really not editable.

I don't think the memory hit from constantly extending a string is too relevant because I assume the assets to be loaded once from code and kept unchanged afterwards. You should have access to the full 128 kiB heap or something along these lines.

To free the memory from an import you can del the object or module, which will free the memory, guaranteed. We do that in the bad apple program to free memory as we go from frame to frame.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 22/08/2022 01:23 | #

It's true, nobody is going to be editing these files on the calc. It's quite painful, really, especially in the SHELL where it doesn't offer the programming menus at all. Even Casio Basic isn't terribly easy to enter on the modern Casio; the enhanced functionality means that more stuff must be buried in the next pages of menus. It could have been a little better if Casio utilized Shift-3/5/6. It's much quicker to enter Basic programs on my ancient CFX-9800G... although of course you can do so much more on the modern calculator.

Thanks for the link. My limited experience on the calc lines up with that. Early on I had inadvertently included some Unicode chess chars and MicroPython didn't seem to mind at all. As long as I avoid quotation marks and backslashes and newlines (as I already am), the wasted bits space would go down a lot as I'll be able to use the full 8-bits (with escaping to cover the excluded bytes). I could scan for the least used byte and use that as the escape character to cover the full 8 bit range, and maybe keep another for RLE. I'm thinking maybe the 127 char per line limit might again only be for the editor. If I can avoid newlines I imagine it should be slightly faster for Python to parse and will save bytes.

That demo was quite interesting! Maybe I'm reinventing the wheel right now and my project should be abandoned...

EDIT: Came up with a related project idea where I think I could avoid duplicating existing efforts. Since open() doesn't work, I should make my own file object implementation. A compressed filesystem could be designed to hold the files, and the Python module could be used to read the files and provide the familiar interface to the programmer. The filesystem itself would of course be very limited (no attributes or permissions, all files contiguous, etc). The filesystem metadata itself could use Python structures, so hopefully it'd be simple. Off the top of my head I'm thinking a dict with filenames as keys and values holding references to the strings that actually have the file contents. Then maybe files could be "deleted" too as you mentioned above. Yes, this sounds like a fun project.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 22/08/2022 08:14 | #

The demo is pretty limited: it only works for one type of video with one encoding, the setup isn't automated, and it's not maintained at all. It did do a lot of work towards finding compact/quick formats but that's about it. You already have some interesting ideas here such as designing the encoding to dodge escaped characters, which I think is quite brilliant.

As I mentioned the integration into projects' development is key IMO, so supporting a couple of image formats, or other resources like fonts, would be super useful. The compressed filesystem idea also sounds great in that respect. Whatever you go for, there is a lot of space to be covered.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 23/08/2022 00:05 | #

I appreciate your kind words of encouragement. I reworked to use 8 bits and escaping, which is of course a lot shorter. The usefulness of RLE dropped significantly, though. This makes sense in hindsight, but I did not predict it. The reasons are that with 6 bit there are only 64 possibilities, so the chance of accidentally repeated characters is higher than in 8-bit, giving a small boost. But additionally, the runs of repeated characters (such as in white background areas) are now shorter, and it's harder to break even. And breaking even is one byte harder: since I don't have any spare characters like I did in 6-bit mode, I had to reserve a character to indicate an RLE sequence, where that was automatic before. I think it's these last two that lessened the usefulness of the RLE the most.

The result is still shorter than it was before, so overall it's a win, but I think it could be even better. I should investigate proper LZSS compression. As long as the decompression part isn't too taxing (and on a quick check it seems like that's the case), then it should be okay.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 23/08/2022 15:34 | #

Fascinating! Indeed RLE is quite strict for repeats which leaves few opportunities to use it. This reminds me of the QOI format which has some compact entries for "quite close to the previous pixel" with very low-precision difference fields. Not that it would be applicable here, though.

I agree that general-purpose compression at least needs to be tested. I didn't know about LZSS and it sounds delightfully simple; I was worried about how expensive decoding for general algorithms could be, but LZSS is easy enough.

Speaking of the cost of decoding, I believe as soon as you index a bytes object you get an integer and there is no optimisation for small integers to not use the heap. At least, that's the impression the Bad Apple stuff gave me. If so, then there might be ways to reassign integers or otherwise manipulate them in ways that avoid heap allocations, for speed.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 24/08/2022 02:56 | #

Lephenixnoir a écrit :
I believe as soon as you index a bytes object you get an integer and there is no optimisation for small integers to not use the heap. At least, that's the impression the Bad Apple stuff gave me. If so, then there might be ways to reassign integers or otherwise manipulate them in ways that avoid heap allocations, for speed.

I'd be interested in learning more about which things are costly (in terms of memory) in MicroPython. The sliding window algorithm involves constantly shifting a buffer to the left and appending to the end. If I were writing this in C, I'd use a char array, as I could just shift it without needing to remove and add things to a linked list. It'd be slow from the shifting, but memory churn would be 0, and malloc and free can be slow. However, my C intuitions don't hold for Python, as strings are immutable, and I don't really have any idea when stack or heap might be used or when they might be freed.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 24/08/2022 03:02 | #

I've been playing around with LZSS compression. Many explanations for it are so academic and complex, but after trying things out and thinking about it a bit, it turns out that the key concept is really not that complicated at all. Here's my explanation (which maybe is complex as well, haha, oh well):

Say we're writing a file, and the next characters we need to write are "ABCDEFG". Instead of writing those out to the file, it first tries to find "A" in a buffer, then it tries to find "AB", then "ABC", etc, the longest it can find. You're probably asking yourself "what exactly is in this magical buffer"? And the answer is "the contents of the file that we've written up to now".

Now obviously the buffer might have more than one "A" in it. Let's say the buffer contains "ABZZZABCD". The first matching sequence it'd find is "AB", but if it keeps searching it will find "ABCD", and that's longer (and longer is good, because it means higher compression). That's easy to solve: as it finds each sequence, it throws it away if it's not longer than the current best, or if it's longer, it becomes the new best. So first it'd be (offset 0, length 2), and then it'd be replaced with (offset 5, length 4).

Eventually, it will reach the end of the buffer. Hopefully it found a match, or maybe it didn't. At the very beginning of the file the buffer will be empty and it can't find a match, but the farther it gets in the file the bigger the buffer gets and the better chance of finding a match.

If it didn't find something, it just appends the current character "A" to the buffer and also writes "A" to the file. But, say it found "ABCD" at offset 5, then it can instead write "5,4" to the file. That's smaller than "ABCD" so we just saved space. It also writes "ABCD" to the buffer. Again, after each byte the buffer is identical to the contents of the file up to this point.

And that's actually the main idea. Either it writes a character from the file, or it writes where it can find a sequence from the buffer. As things in the file repeat, then it'll find it in the buffer, and the longer those repeating sequences are, the more bytes are saved.

Okay that's the overall idea for encoding. Obviously some details were glossed over. Still, understanding that basic concept makes it easy to understand the rest.

First problem: we can't just let the buffer grow as big as it wants to, because that would use up all of the calculator's RAM on a large file, and it also takes longer and longer to search it (the search operation is O(n^2)). The solution is to restrict the size of the buffer. In reality, before it adds a byte to the end of the buffer, it gets rid of a byte from the front. This is called a "sliding window". It's still identical to the file's contents, but it's just the last "X" bytes of the file. Obviously, that means we won't be able to find some matches, because it could have had the matching sequence at the beginning of the file and that gets lost. However, on the plus side, we've limited the amount of RAM used and also we know the max number of bits it will take to represent the buffer offset and length. The latter is important, because the more bits it takes to store offset and length, the more bits we have to write for each match, making the file larger.

That's LZ77 compression in a nutshell. To make it LZSS, it just does one more simple thing: if writing that buffer offset and length would actually take more space than just writing out the uncompressed bytes directly, then it just writes the bytes. Example: say it found "CD", but it actually takes 3 bytes "9,2" to store the offset and length, then we wasted a byte. Note: obviously we wouldn't literally be storing "9,2" (there's no need to store the comma, and we'd be using binary rather than storing text numbers), this is just an example.

When decompressing, it builds buffer exactly the same way, but instead of searching for sequences, it's just reading the ones stored in the file. So it's super trivial to decompress.

I have a chess board image where the squares are 7 bits wide. RLE performs poorly on that, because the only thing it can compress is the white space on the right side. I wasn't sure how LZSS would do because the squares aren't a clean 8 bits. However, LZSS still loves it. Sure, at the beginning there isn't anything being comrpessed, but as soon as it gets past the first line, that multibyte sequence repeats and now it can play back that same sequence. However, it gets even better: after it writes two lines, now it can repeat BOTH, so it's doubling in power each time. The best results are of course when the sliding window is unlimited in size, but even with a 256 byte window it's still really good. It compresses the chess board from 1024 bytes to 327 (or 233 with the infinite window). This is before escaping and such, so the efficiency will drop, but it gives a general idea. On a more complicated image, the compression is worse, but it still usually finds something.

I've still got some work to do on this to properly implement compression and decompression, but it's turning out to be a great project. General file compression has been a mystery to me for decades, so it was nice to finally sit down and understand it. I'm confident that the decompression is not going to be large at all.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 24/08/2022 09:22 | #

That's a pretty good summary! It is worth noting that the references can extend through references and in even through themselves (thus repeating the pattern), which means you have to reconstruct the sliding window.

First problem: we can't just let the buffer grow as big as it wants to, because that would use up all of the calculator's RAM on a large file, and it also takes longer and longer to search it (the search operation is O(n^2)).

I can't help but want to experiment with a variation (which might already exist!) which only searches for references within the literal bytes of the file. This way you could reference the compressed buffer directly, so you don't have to rebuild an entire window of decoded data. This would save time on the calculator, and also allow you to search for references in the entire file since the whole byte string will be loaded in RAM anyway. The drawback is that each reference "cuts" the literal stream so you wouldn't gain much on the chess board. It might not be too good...

Curiosity: I don't know if KMP's precomputed failure function can be maintained in constant time through window slides; if so that would avoid the O(n²) complexity for the search. If you have any intuition there I'd like to know.

Good job so far! I'd be happy to help with building a benchmark or test setup to measure how good the algorithms are and how fast they run on the calculator. I'm sure there's a sweet spot to be found for programs to use and not worry about

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 24/08/2022 09:24 | #

If I were writing this in C, I'd use a char array, as I could just shift it without needing to remove and add things to a linked list. It'd be slow from the shifting, but memory churn would be 0, and malloc and free can be slow.

In C you should either use a circular buffer to avoid the shifting at the cost of some extra operation during accesses, which you can often optimize away with careful loop writing; or have a buffer double the size, so for instance if the sliding window is 2048 bytes you can do 2048 appends before you have to do the first shift back (instead of doing 2048 shifts back).

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 24/08/2022 09:54 | #

Lephenixnoir a écrit :
That's a pretty good summary! It is worth noting that the references can extend through references and in even through themselves (thus repeating the pattern), which means you have to reconstruct the sliding window.

Ahh, then maybe I'm doing it slightly differently? During compression, the window only contains data from the input file, not any of the compressed output data, it's just the last X bytes of input data. That seems to maybe be what you were looking for in the variant?

The window definitely needs to be reconstructed during decompression, though, as in that case the input file is a mix of raw bytes and offset/length pairs. So as it decompresses bytes those decompressed bytes become part of the window. The window contents during each step of compression and the window contents during each step of decompression must always be exactly the same, otherwise things go very badly!

The decompression can happen in small increments, as long as the input offset and window are kept alive. That way the entire decompressed file does not need to be in RAM, just the last part of it. I plan on serving bytes straight from the window, because it will provide a handy buffer for serving up output bytes 1 at a time when they do a read(1) and 3 input bytes suddenly decompress to 50 output bytes.

I increased the window size from 256 to 1024, so now the max length is 66. The chess board now compresses to 268 bytes, and that's escaped and ready to import as a Python file (although it's not in the filesystem structure yet, I wanted to get LZSS working first). I plan to handle raw, RLE, and LZSS formats. It'll store whichever is smallest.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 24/08/2022 09:57 | #

Lephenixnoir a écrit :
In C you should either use a circular buffer to avoid the shifting at the cost of some extra operation during accesses, which you can often optimize away with careful loop writing; or have a buffer double the size, so for instance if the sliding window is 2048 bytes you can do 2048 appends before you have to do the first shift back (instead of doing 2048 shifts back).

True. Guess I wasn't awake yet as I use circular buffers often.

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 24/08/2022 13:21 | #

Ahh, then maybe I'm doing it slightly differently? During compression, the window only contains data from the input file, not any of the compressed output data, it's just the last X bytes of input data. That seems to maybe be what you were looking for in the variant?

You're doing it correctly, I was indeed just imagining a variant where you only reference the compressed data itself to avoid reconstructing the window. It appears that it might be quite ineffective though.

I increased the window size from 256 to 1024, so now the max length is 66. The chess board now compresses to 268 bytes, and that's escaped and ready to import as a Python file (although it's not in the filesystem structure yet, I wanted to get LZSS working first). I plan to handle raw, RLE, and LZSS formats. It'll store whichever is smallest.

Looking great! Having multiple formats is pretty smart.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 25/08/2022 08:52 | #

Making progress. Can create file systems: decided to call it mpfs. Can compress with RLE or LZSS, or it'll store raw if it didn't compress: the compression format I called czip. Yeah, I'm great at naming things.

Implemented the useful (or what I considered useful) io/os/os.path/iterator/with stuff. Only the read() is remaining. It was cool writing a "with open(...) as f" on something what was completely fake, haha.

Fortunately, __import__ works fine on MicroPython so I could implement a fake shell with dynamic filesystem importing. Now I can do an "ls", "ls -l", or "rm" inside a mpfs file, and eventually "cat" should work, but that needs read() of course.

Currently I'm working on un-escaping code for the Czip class. Before the data can be decompressed the escaping first needs to be removed. Keeping that in a different class means that the Lzss class doesn't have to care about un-escaping, it can just read() a byte from Czip, and Czip in turn will be getting bytes from the actual string literal.

Ran into some trouble with the string literals, though. They work great on the calculator, but of course Python 3 is not happy. I built a native Linux MicroPython 1.9.4, hoping it'd work like the calculator, but of course not.

It's throwing away a lot of characters >= 0x80 , and I'm 99% sure that's due to some kind of Unicode conversion.

I found a workaround, but I'm not sure it's a good one: if I store b"" string literals, then it works on both MicroPythons, and as a bonus I get int's when I subscript the string, which are very nice to work with; I don't need a bunch of ord()'s everywhere. I'm worried though... what if accessing a b"" string literal pulls the entire thing into the heap? Any ideas on how to test that?

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 26/08/2022 07:23 | #

Got an initial version of mpfs working!

I added a view command to the shell to display image files (and as a good binary test). Drawing is currently not very fast and definitely wouldn't be able to keep up with that Bad Apple animation. There are probably ways to improve the speed, though.

Just to show how mpfs might be used, this is how "cat" is displaying a file:

with mpfs.open(argv[0]) as f:
for line in f:
cprint(line, end="")

I redid my RLE implementation for mpfs, but it doesn't produce a file smaller than LZSS unless I'm forcing it with silly situations (like in hello.txt, below), so I might just remove it.

As I was minifying the sources I had a random thought: mpfs could be used to compress python scripts. You'd read() them, then run them with exec(). Unlikely that'd ever be needed, but I suppose it's an option. Right now mpfs.py (the only script needed for mpfs support on the Casio) is currently <2kB minified, which isn't too bad.

Now comes the hard part: I need to comment the sources, test everything and create user documentation.

Here is a hexdump of the beginning of test.py, that I "mounted" in the screenshots below. The file is 855 bytes in size:

00000000  4d 50 46 53 3d 7b 22 68  65 6c 6c 6f 2e 74 78 74  |MPFS={"hello.txt|
00000010  22 3a 28 32 31 2c 31 2c  62 22 02 8a 03 8d 04 a2  |":(21,1,b"......|
00000020  05 dc 06 7f 08 80 09 81  22 2c 62 22 50 6c 61 6e  |........",b"Plan|
00000030  65 74 20 43 61 73 69 6f  21 06 07 02 22 29 2c 22  |et Casio!..."),"|
00000040  71 75 61 72 6b 2e 74 78  74 22 3a 28 34 32 2c 32  |quark.txt":(42,2|
00000050  2c 62 22 02 8a 03 8d 04  a2 05 dc 06 7f 07 80 08  |,b".............|
00000060  81 22 2c 62 22 53 65 65  20 42 72 61 6b 20 61 63  |.",b"See Brak ac|
00000070  71 75 69 72 65 2e 20 41  06 f7 c2 2c 06 e9 c1 2c  |quire. A...,...,|
00000080  06 e8 c4 21 02 22 29 2c  22 6b 69 74 74 79 2e 63  |...!."),"kitty.c|
00000090  69 6d 67 22 3a 28 34 33  35 2c 32 2c 62 22 13 8a  |img":(435,2,b"..|
000000a0  14 8d 17 a2 19 dc 1b 7f  21 8c 23 91 22 2c 62 22  |........!.#.",b"|

Here are some screenshots of the shell in action (the shell isn't part of mpfs, but it was fun). The file sizes displayed by "ls -l" are the original uncompressed sizes:

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 26/08/2022 09:44 | #

What quick progress! Everything looks great too.

Let me quickly address your string problems:

I found a workaround, but I'm not sure it's a good one: if I store b"" string literals, then it works on both MicroPythons, and as a bonus I get int's when I subscript the string, which are very nice to work with; I don't need a bunch of ord()'s everywhere. I'm worried though... what if accessing a b"" string literal pulls the entire thing into the heap? Any ideas on how to test that?

You should absolutely use bytes not strings; in fact I always assumed you were doing that. The Bad Apple demo does that, and indexing is indeed much faster. Note however than MicroPython accepts literal NUL bytes in b"" but CPython doesn't, which can be a deal breaker - it is common for uncompressed streams to have lots of them.

Now as far as the filesystem is concerned, I think this has potentiel to be quite useful if handled well. I see two aspects to this:

First, your filesystem is obviously not permanent, and even if a program generates a file you cannot save it to the storage memory. However, you can always print lines to the shell, and have the user copy them with a keyboard shortcut and then paste them the Python editor. Which means, for small files and fluent users at least, you have an opportunity to provide some kind of file storage. Considering the amount of applications that would like to write files, I think this is worth exploring for a bit, testing, and then polishing so we have a state-of-the-art of simulated file access.

Second, you probably want to stick to standard APIs, such that for instance you can substitute from os import * with from mpfs import * and then keep the same with open(...) as ... code. You seem to be doing that already, but I think it'd be cool if the documentation mentions what APIs you have so we can efficiently assess whether existing code has a chance to work.

Overall, super exciting! I'll be looking forward to your next move.

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)

Calamari En ligne Membre Points: 560 Défis: 0 Message

Citer : Posté le 27/08/2022 05:31 | #

Lephenixnoir a écrit :
You should absolutely use bytes not strings; in fact I always assumed you were doing that. The Bad Apple demo does that, and indexing is indeed much faster. Note however than MicroPython accepts literal NUL bytes in b"" but CPython doesn't, which can be a deal breaker - it is common for uncompressed streams to have lots of them.

I forgot to follow up on this: I was able to compile the v1.9.4 tag of the micropython repo to act like the calculator (at least for NUL's). I edited this line in ports/unix/mpconfigport.h:

#define MICROPY_PY_BUILTINS_STR_UNICODE (1)

I changed (1) to (0) and followed the compilation steps in README.md (the steps on the GitHub page are for the most recent version and don't work for that older version) and it built a MicroPython where I could store raw bytes and NUL bytes in any kind of strings (byte or otherwise) without Unicode getting in the way, just like on the calculator. I decided to stick with the byte strings, thank you for the speed tips about them.

Lephenixnoir a écrit :
First, your filesystem is obviously not permanent, and even if a program generates a file you cannot save it to the storage memory. However, you can always print lines to the shell, and have the user copy them with a keyboard shortcut and then paste them the Python editor. Which means, for small files and fluent users at least, you have an opportunity to provide some kind of file storage. Considering the amount of applications that would like to write files, I think this is worth exploring for a bit, testing, and then polishing so we have a state-of-the-art of simulated file access.

How can that be done? I couldn't seem to get Shift-8 CLIP to work in input() or in the SHELL. I came up with a workaround, although it would be a little tedious and requires a computer. If an image is drawn of the bits to save (plus say a 2 byte header giving the length), we could save up to 1022 bytes of data on the B&W model, and of course way more on the PRIZM. The only catch is that the program wanting to save data will then need to exit, because Shift-7 CAPTURE doesn't work while a script is running. The program can run show_screen() just before exit. When the program exits, the user would CAPTURE the image and copy the capture to their computer. On the computer a script could be run to convert the image to a file. It's a lot of steps, though. I think this could possibly be acceptable for saving progress in an adventure game or RPG, but honestly I'd say just code the game up in C then you can save files without all the headaches, right? Speaking of C, can it access captures? Then the capture conversion could be done on the calculator, which would be quite nice.

Lephenixnoir a écrit :
Second, you probably want to stick to standard APIs, such that for instance you can substitute from os import * with from mpfs import * and then keep the same with open(...) as ... code. You seem to be doing that already, but I think it'd be cool if the documentation mentions what APIs you have so we can efficiently assess whether existing code has a chance to work.

I'm with you on that. I kept the names intact, although I did mix together the builtin stuff, os and os.path. To give yourself a nice illusion of a working open() function you could do a "from mpfs import open". Here's what's currently implemented. It's not much, but honestly a lot of stuff really isn't needed, as the compressed fs is read only and there is no directory tree or file attributes other than size:
* builtin stuff: mount, open, with, iteration (such as for loops)
* os/os.path: listdir, remove, exists, getsize. These methods are available once a filesystem has been mounted, using [mpfs.]mount(module)
* io: read, readline, readlines, seek, tell, close. These methods may be called on the Mpfs object returned from open()

“Remember to have fun doing this, or it ain't worth it.” — Robert Alan Koeneke
“They call me the king of the spreadsheets, got 'em all printed out on my bedsheets.” — “Weird Al” Yankovic

Lephenixnoir Hors ligne Administrateur Points: 25169 Défis: 174 Message

Citer : Posté le 27/08/2022 14:13 | #

I forgot to follow up on this: I was able to compile the v1.9.4 tag of the micropython repo to act like the calculator (at least for NUL's).

Thanks for mentioning the version and option, we can never have too much notes on building MicroPython!

How can that be done? I couldn't seem to get Shift-8 CLIP to work in input() or in the SHELL.

I was really convinced you could... now I'm disappointed you can't!

Right, an add-in could access the captures (although only libfxcg has the APIs for it right now) so it's easier than the PC. But then if you have an add-in we might as well use a different version of Python. You see, when dealing with these kinds of problem in the old BASIC days, using add-ins wasn't an option because many models didn't support them. But now every officially-Python-supporting CASIO calculator also has add-ins, so we can just hammer away at our problems with C. x)

Ultimately I agree with you it's much simpler to save data from an add-in, it's just that not everyone knows how to write them.

Here's what's currently implemented. It's not much, but honestly a lot of stuff really isn't needed, as the compressed fs is read only and there is no directory tree or file attributes other than size:
* builtin stuff: open, with, iteration (such as for loops)
* os/os.path: listdir, remove, exists, getsize. These methods are available once a filesystem has been imported, such as: from filesystem_module_name import MPFS
* io: read, readall, readline, readlines, seek, tell, close. These methods may be called on the Mpfs object returned from open()

That's quite a bit actually! It also seems enough to implement robust asset loading with some sweet automation... and compression too, dang. :o

Mon graphe (28 Janvier): (MPM ; serial gint ; (Rogue Life || HH2) ; PythonExtra ; ? ; Boson X ; passe gint 3 ; ...) || (shoutbox v5 ; v5)