Outils communautaires de programmation on-calc
Posté le 10/01/2025 13:58
Dans le topic
Les projets de Planète Casio pour 2025 Sabercat a relancé l'idée d'avoir des bons outils de programmation on-calc (en plus de la compatibilité 35+E II mais ça ça ira dans un autre topic peut-être).
Je liste ici les messages de cette discussion avec un résumé.
Messages principaux :
#198728,
#198735,
#198761,
#198763,
#198767,
#198773,
#198774,
#198775,
#198777,
#198786,
#198788
Ce qu'on pourrait viser comme langages :
- Python : ok, PythonExtra
- LuaFX : à porter
- Malical : à porter — y a-t-il de la demande ?
- Quelque chose pour coder des add-ins (C ? Autre ?)
- Basic : il y a déjà C.Basic (intégration sans doute impossible)
Ce qu'on peut viser comme éditeur :
- A priori plutôt un éditeur séparé plutôt qu'un éditeur embarqué dans chaque appli
- Kiwi Text : mais copyright, pas de sources, apparemment pas complètement stable
- Micropy : existe déjà et marche, toutefois basé sur le PrizmSDK, et le support langage reste à coder
- Nouveau programme à base de gint + JustUI comme PythonExtra ou text-viewer
Sabercat a mentionné qu'il serait bien de pouvoir coder des add-ins sur la calto. Je suis d'accord. Par contre, avoir un compilateur + linker sur la calto c'est trèèès ambitieux et porter les outils GNU c'est pas possible. Personnellement, je pense qu'il serait plus intelligent de coder des add-ins sur la calto
dans un autre langage que le C. Je sais pas ce que vous en pensez...
Citer : Posté le 30/05/2025 20:10 | #
Comme éditeur il y a fxpyedit aussi.
Couldn't one use the UBC as a makeshift error handling system? I don't actually know much about it but it feels like you could just use that and not have to make that sacrifice.
Citer : Posté le 30/05/2025 20:13 | #
Add-ins run in privileged mode, so if they write somewhere they shouldn't it can break stuff (in the worst case brick the calculator).
Citer : Posté le 30/05/2025 20:20 | #
I'm not convinced it's either possible or straightforward to sanitize the user code in a way that prevents crashes or malfunctions. There's just so many ways things can go wrong... which is not to say the assembler isn't worth making, but I'm doubtful the "sandbox" approach is tenable. For one thing, valid addresses wouldn't be a clean interval. Then many addresses are computed on-the-fly with non-trivial addressing modes so the assembler would have to rewrite these, which requires an extra register, so computations using that register would also have to be rewritten. Then you need to worry about what code gets called, cause if you can jump anywhere that won't do. Oh and also if you have a heap you could break it, but masking won't safeguard things properly the way e.g. AddressSanitizer would. Right now you can even just misalign the stack to cause a crash (gint doesn't realign it in interrupts and I don't believe the OS does either).
Basically I feel like it'd be easier to interpret the assembly than actually run it.
Citer : Posté le 31/05/2025 08:04 | #
I'm not convinced it's either possible or straightforward to sanitize the user code in a way that prevents crashes or malfunctions.
...
Basically I feel like it'd be easier to interpret the assembly than actually run it.
You may be right about that. I think interpreting is appealing especially for things like single-stepping in a debugger which I think an on-calculator assembler would definitely need. It certainly solves a lot of problems. On the other hand, I think I have come up with a way to sanitize the code that might work. I put all of the instructions that seem relevant in a spreadsheet: SH4 instructions. Here's what I would do:
Code memory
- The assembler writes machine code to a part of RAM that the assembly code can't read or write.
- Branch instructions to fixed targets work without modification but must take a named label as an argument rather than a number to insure they jump to a valid instruction: BF, BF/S, BT, BT/S, BRA, BSR
For branches there are two possible options:
Option 1
- There is no way for the user to access the PR register directly.
- A macro like PUSH_PR pushes PR onto a stack that the user can't access. POP_PR does the opposite and checks for stack underflow.
- Instructions that load an address from a register are excluded: BRAF, BSRF, JMP, and JSR.
- JMP and JSR can be replaced with an alternate JMP and JSR that take a label name instead of a register if the +/-4K range of BRA and BSR isn't big enough.
Option 2
- PR returns an ID number instead of an address so no problem if the user accesses and manipulates the ID. The ID is an index into a table of target addresses created by the assembler.
- Loading a label into a register loads the label's ID instead of its real address.
- BRAF, BSRF, JMP, JSR, and RTS all expect an ID number and check that it's a valid table index before jumping. Supplying the wrong ID number jumps to somewhere unexpected but all ID numbers are valid targets so no crashes.
Data memory
- The assembled program is allocated a block of memory for data that it can read and write which is aligned to it's own size, ie a 64K block starting at 0x....0000.
- All constant data like tables and strings are copied into this block at startup. No constant data is stored in code memory since the assembled program has no way to access it.
- Addressing for the data memory starts at 0, so each instruction that accesses memory is masked (0xFFFF if data is 64K) then added to the base address (0x....0000). There are a couple of ways to do the masking:
Option 1
- Two registers (R14 and R15 for example) are off limits. The assembler will error if either one is used in an instruction.
- One holds the mask and the other holds the base address. Address sanitizing is AND R14, Rm then ADD R15, Rm, so just two instructions in most cases and four instructions for @(R0,Rn) and @Rm+,@Rn+ addressing.
Option 2
- All registers are free for the user to use including R14 and R15.
- The masking code stashes R15 somewhere temporarily to free the register up. Is DBR free for this?
- Another possibility is GBR. The assembler can make a copy of GBR and restore it before calling any gint functions but not sure if GBR is also needed in interrupts.
- Another possibility is PR as long as it's free before BSR or JSR.
- After R15 is stashed somewhere, the mask and base are loaded from a constant pool. 6 instructions: store R15, load mask, and with address, load base, add to address, restore R15.
- (Anywhere after an unconditional jump (BRA, BSR, JMP, JSR, RTS) is ok for the constant pool since all branches have to be to named labels so no way to jump between an unconditional jump and the next label and accidentally execute constants as code. In the worst case, output the constant pool with a jump over it.)
- Since branches only go to named labels, an easy optimization is to skip storing and restoring R15 until it's needed or there's a jump which brings the masking code from 6 to 4 instructions.
Option 3
- Like option 2, all registers including R14 and R15 are free.
- Use LDC/STC Rm,Rn_BANK to hold mask and base. Does gint use the other set of R0-R7 for exception handling?
- This should be faster since there's nothing to fetch from the constant pool. Also, running out of room and putting the constant pool in a random place will work but is annoying.
Instructions
- Instructions with Rm,@-Rn addressing decrements the address by up to 4 after masking, so 4 extra bytes before the data memory need to be available.
- Instructions with @(disp,Rm),Rn addressing increments the address by up to 60 bytes after masking, so 60 extra bytes after the the data memory need to be available.
- The only logical place for GBR to point is the beginning of data memory which is offset zero, so GBR can be ignored even if gint doesn't need it in interrupts. @(R0,GBR) just becomes the equivalent of @(R0) and @(disp,GBR) just becomes @(disp).
Citer : Posté le 31/05/2025 10:21 | #
Just as a quick reaction... I think you're getting near "safe" territory but at what cost? If you end up with a reduced programming model and significant expansions for many instructions it's not quite the original assembler. I'm not saying it's not fine, just not quite the same result.
Anyway, on a technical level... for masks you can't just modify the target register. If I do mov r0, r4 to backup a string pointer then go @r4+ a bunch of times I can measure the length of my run as r4-r0, but not if you modify r4. You need another register and you need to rewrite the access to use that register. (You can save on r14 if the size is 0xffff as you can do extu.w, but that's only for 64 kiB specifically.) For @(r0,rm) and related addressing mode you don't need to AND twice. Just ADD, AND, ADD, which also doesn't modify r0 (still modifies rm but that's a start).
For stashing registers using other registers creates risks for your own program to fail. Using the stack can work. A better approach IMO is to stash in a small permanent struct lying around pointed to at all times by e.g. r12. You can keep r14/r15 stashed (spilled, basically) at all times and just get them out for computations. You can even statically analyze whether there are multiple memory accesses, or multiple computations involving r14/r15 in a row, and optimize the stores and reloads, if performance is a concern. I believe this is similar to your Option 2 optimization.
I'd advise against using "rare" registers randomly. GBR can point anywhere, limiting it to 0 seems silly to me, in fact having it point to after the constants would probably be useful. DBR could be used by gint. The alternate bank is currently not used but I would like to use it to improve interrupt performance in the future.
As far as jumps I concerned I find the limitations a bit dizzying. Why not just replace potentially-arbitrary jumps (jmp, jsr, rts, braf, bsrf) with a short call that validates that the target address is in the correct range? Remember than control flow doesn't necessarily follow functions (longjmp). braf is also used for switch. I'd rather take a small performance hit than constrain the programming model this much.
(FYI half the things you're discussing, and most of my response, are in the scope of a thought experiment I attempted about making a g1a emulator on CG.)
Citer : Posté le 31/05/2025 16:53 | #
Just as a quick reaction... I think you're getting near "safe" territory but at what cost? If you end up with a reduced programming model and significant expansions for many instructions it's not quite the original assembler. I'm not saying it's not fine, just not quite the same result.
Hmm, I don't get what you mean about extu.w. The 0xFFFF mask is just an example. As long as the base is aligned to the size of the data memory, the data memory can be any size that is a power of 2. The mask is just size-1. For a 32K block, the base might be 0x12348000 (note aligned to 32K boundary) and the mask would be 7FFF. ANDing and ADDing means any input address will always fall in the correct 32K range between 0x12348000 and 0x1234FFFF.