(July 2021)
It was raining hard, a week ago.
And what could you possibly do on a rainy Saturday afternoon?
Well... I was inspired by a series of articles on Hackaday to...
...then put it inside a 1.5$ Blue Pill microcontroller...
...then repeatedly squash it until it fits inside an Arduino UNO...
...i.e. inside 2K of RAM!
I haven't done anything even remotely close to this in decades.
I loved building it :-)
The rainy afternoon turned into a week-long hackfest (was looking forward every day to the post-work FORTH-tinkering in the afternoon...)
The result: a tiny, mini-Forth. In portable C++ :-)
It has...
Here's a recording of it in action:
...and another one, where a small Forth program blinks the Arduino LED:
I meant it when I said "portable" - the source code compiles as-is for both native and Arduino builds.
Another part of my reasoning was that in addition to targeting multiple platforms (e.g. BluePill and Arduino) I wanted to be able to use Valgrind and AddressSanitizer to detect - in the host! - any issues I have with my memory handling. And just as important, debugging with GDB in the native host is much easier/faster.
Since I had embedded targets in mind, I tried ArduinoSTL - but it was too wasteful memory-wise. It also made the build process significantly slower. I therefore built my own memory pool, as well as list, tuple and string-like C++ templates. It was a nice challenge, re-inventing a tiny C++ STL...
And I understand STL a lot better now, after building small pieces of it myself :-)
I setup simulation via simavr. This tremendously improved my developing speed, since a simulator spawns and runs much faster than the real board. Due to the code being portable, debugging took place mostly in the host GDB; and after Valgrind and AddressSanitizer gave their blessing, I usually found out that the simulator (and the real board) worked fine as well.
Thanks to ArduinoSTL, I quickly reached the point of running inside the BluePill. The 1.5$ mini-monster has 10 times more SRAM than an Arduino UNO; so in a couple of days, I had a working branch.
Before the Arduino UNO, I run my mini-FORTH inside the 1.5$ Blue Pill
But as said above, that wasn't nearly enough to make it work in my Arduino UNO. That required far more work (see below).
As for the BluePill, I should note that, as in all my other embedded targets,
I prefer a development workflow that is based on normal bootloaders
(not on programmers). I therefore burned the
stm32duino
bootloader on the BluePill, which allowed me to easily program it
in subsequent iterations via the USB connection (and a simple make upload
).
The same USB connection would then function as a serial port immediately afterwards - allowing me to interact with the newly uploaded Forth in the BluePill.
The screenshot below is from a tmux
: on the left, the output from make upload
;
and on the right, I used picocom
to interact with my mini-Forth
over the serial port:
That covered the first two days.
But when I tried compiling for the Arduino UNO, I realised that the ArduinoSTL was not enough. I run out of memory...
So I built my own mini-STL, and tightly controlled all memory utilisation.
I also used macro-magic to move all strings to Flash at compile-time
(see dprintf
in the code)... And saved memory everywhere I could,
re-using error messages across various operations - and storing the
entire array of native operations in Flash.
Nothing flexes your coding muscles as much as optimising; whether it is for speed or for space. See the implementation of ".S" for example, where the (obvious) stack reversal code is also the most wasteful... Changing it to a slower but memory-preserving algorithm allowed me to use ".S" even when almost all my memory is full.
I know that many developers hate C++. I even wrote a blog post about it.
And I understand why - they see code like this...
#include "mini_stl.h"
template<class T>
typename forward_list<T>::box *forward_list<T>::_freeList = NULL;
...and they start screaming - "what the hell is that", "incomprehensible madness", etc.
But there are very important benefits in using C++ - and templates in particular. You write less code, with no additional run-time or memory overhead compared to C, and with a lot more compile-time checks that watch your back (for things that would otherwise blow up in your face).
See my Optional
Optional<int> Forth::needs_a_number(const __FlashStringHelper *msg)
{
if (_stack.empty())
return error(emptyMsgFlash, msg);
auto topVal = *_stack.begin();
if (topVal._kind == StackNode::LIT)
return topVal._u.intVal;
else
return FAILURE;
}
You can't "forget" to check the potential for a failure coded inside your returned value - because your code has to "unwrap" it. I could have done this better, but I chose to implement it via simple tuples (this was a one-weeks-afternoons hack, after all :-)
As for the template "magic" incantation above - it is true magic: My
forward_list
template is using free-lists to store the pop_front
-ed
elements and reuse them in subsequent allocations. I wanted these free-lists to
be global (i.e. static members) because lists of the same type must re-use a
single, commonly-shared free-list. The magic spell tells the compiler I want to
instantiate these globals once, for each type T that I use in any
lists in my code.
Yep, FizzBuzz - we are fully Turing complete. And would surely pass Joel's interview :-)
." Reset... " RESET
." Check comments... " \ Yes, we support the new-style comments :-)
." Computing simple addition of 3 + 4... " 3 4 + .
." Is 1 = 2 ?... " 1 2 = .
." Is 1 > 2 ?... " 1 2 > .
." Is 1 < 2 ?... " 1 2 < .
." Define pi at double-word precision... " : pi 355 113 */ ;
." Use definition to compute 10K times PI... " 10000 pi .
." Check: 23 mod 7... " 23 7 MOD .
." Defining 1st level function1... " : x2 2 * ;
." Defining 1st level function2... " : p4 4 + ;
." 2nd level word using both - must print 24... " 10 x2 p4 .
." Defining a variable with value 123... " 123 variable ot3
." Printing variable's value... " ot3 @ .
." Defining The Constant (TM)... " 42 constant lifeUniverse
." Printing The Constant (TM)... " lifeUniverse .
." Setting the variable to The Constant (TM)... " lifeUniverse ot3 !
." Printing variable's value... " ot3 @ .
." Setting the variable to hex 0x11... " $11 ot3 !
." Printing variable's value... " ot3 @ .
." Defining helper... " : p5 5 U.R . ;
." Defining 3 times loop... " : x3lp 3 0 DO I p5 LOOP ;
." Calling loop... " x3lp
." Defining loop calling loop 2 times... " : x6lp 2 0 DO x3lp LOOP ;
." Nested-looping 2x3 times... " x6lp
." Inline: " : m 3 1 DO 3 1 DO CR J p5 I p5 ." = " J I * p5 LOOP LOOP ;
." Use inline loops with two indexes... " m
." Make multiples of 7 via DUP... " : m7s 10 0 DO DUP I * . LOOP DROP ;
." Print them and DROP the 7... " 7 m7s
." Reset... " RESET
\ Time for Turing completeness...
." Let's do Fizz-Buzz! " \ Turing Completeness check...
\ fizz ( n -- 0_or_1 n )
." Define fizz... " : fizz DUP 3 MOD 0 = IF ." fizz " 1 ELSE 0 THEN SWAP ;
\ buzz ( n -- 0_or_1 n )
." Define buzz... " : buzz DUP 5 MOD 0 = IF ." buzz " 1 ELSE 0 THEN SWAP ;
\ emitNum ( 0_or_1 0_or_1 n -- )
." Define emitNum... " : emitNum ROT ROT + 0 = if . ELSE DROP THEN ;
\ mainloop ( n -- )
." Define mainloop... " : mainloop fizz buzz emitNum ;
\ fb ( -- )
." Define fizzbuzz... " : fb 37 1 DO I mainloop LOOP ;
." Run it! " fb
." Report memory usage... " .S
I am a strong believer in automation. The final form of my Makefile
therefore has many rules - e.g. make arduino-sim
- that automate
various parts of the workflow.
Here's what they do:
arduino: Compiles the code for Arduino UNO - builds src/tmp/myforth.ino.{elf,hex}
arduino-sim: After building, launches the compiled mini-Forth in simduino
.
upload: After building, uploads to an Arduino attached to the port
configured inside config.mk
.
terminal: After uploading, launches a picocom
terminal with
all appropriate settings to interact with my Forth.
x86: Builds for x86. Actually, should easily build for any native target (ARM, etc).
test-address-sanitizer: Uses the x86 binary to test the code, executing all steps of the scenario shown above. The binary is built with the address sanitizer enabled (to detect memory issues).
test-valgrind: Same, but with Valgrind.
test-simulator: Spawns simavr
and sends the entire test scenario shown
above to it - while showing the responses received from it.
test-arduino: Sends the entire test scenario shown above to an
Arduino Uno connected to the port specified in config.mk
and shows the responses received over that serial port.
Another example of automation - the complete test scenario shown in the previous section, is not just an example in the documentation; it is extracted automatically from this text and fed into the Valgrind and AddressSanitizer tests... and also into the Python testing script that sends the data to the board in real-time.
DRY, folks.
I thoroughly enjoyed building this. I know full well that Forths are not supposed to be built in C++; they are supposed to be built in assembly, and also, utilise the Flash to store the user-compiled code at run-time.
But that wasn't the point of this - the point was to have fun and learn Forth.
What better way to learn a language... than to actually implement it! :-)
And... as a child of the 80s... I now know first-hand what Jupiter Ace was about :-)
Fork the code, and enjoy tinkering with it!
Index | CV | Updated: Sun Nov 19 23:06:10 2023 |