Engineering Montage#
Over the past few weeks, I have been working on a technical project that requires a ton of (computer) engineering. I think it’s time to talk about what I learned from it.
Note
Fun Not so fun fact:
You may find the title of this post a bit awkward (lame?). It’s probably because I borrowed the word “montage” from what I’ve also been working on—writing college essays. Specifically, I have been attending College Essay Guy’s course on how to write the personal statement and it mentioned “montage” as a possible structure.
Important
I am not in any way affiliated with College Essay Guy.
Nevertheless, I believe that the word “montage” captures the essence of this post pretty well, as it is indeed a, well, montage of many unrelated bits of engineering.
Tcl#
Let’s start with what I’ve already mentioned I would be working on.
I previously said that Tcl was a string-based language. I take that back. Actually no, just take
it with a grain of salt. This is because by diving into the internals of Tcl (because of an obscure
error that’s related, long story), I discovered that while strings are essential to Tcl, there’s a
lot more to it.
Consider the following declaration of a Tcl_Obj
(taken from Tcl 8.6.14 source code):
1typedef struct Tcl_Obj {
2 int refCount; /* When 0 the object will be freed. */
3 char *bytes; /* This points to the first byte of the
4 * object's string representation. The array
5 * must be followed by a null byte (i.e., at
6 * offset length) but may also contain
7 * embedded null characters. The array's
8 * storage is allocated by ckalloc. NULL means
9 * the string rep is invalid and must be
10 * regenerated from the internal rep. Clients
11 * should use Tcl_GetStringFromObj or
12 * Tcl_GetString to get a pointer to the byte
13 * array as a readonly value. */
14 int length; /* The number of bytes at *bytes, not
15 * including the terminating null. */
16 const Tcl_ObjType *typePtr; /* Denotes the object's type. Always
17 * corresponds to the type of the object's
18 * internal rep. NULL indicates the object has
19 * no internal rep (has no type). */
20 union { /* The internal representation: */
21 long longValue; /* - an long integer value. */
22 double doubleValue; /* - a double-precision floating value. */
23 void *otherValuePtr; /* - another, type-specific value,
24 not used internally any more. */
25 Tcl_WideInt wideValue; /* - a long long value. */
26 struct { /* - internal rep as two pointers.
27 * the main use of which is a bignum's
28 * tightly packed fields, where the alloc,
29 * used and signum flags are packed into
30 * ptr2 with everything else hung off ptr1. */
31 void *ptr1;
32 void *ptr2;
33 } twoPtrValue;
34 struct { /* - internal rep as a pointer and a long,
35 not used internally any more. */
36 void *ptr;
37 unsigned long value;
38 } ptrAndLongRep;
39 } internalRep;
40} Tcl_Obj;
We can see that there’s a large union
at the bottom. This is the internal (alternative)
representation of a Tcl object, as opposed to its string representation: char *bytes
. In this
case, it can either be a long
, a double
, a pointer, a long long
, two pointers, or a pointer
and a long
. Phew!
This means that in some corner cases, thinking that every Tcl object is a string can actually be erroneous. (no example here since I haven’t fully understood it myself)
Furthermore, the conversion between the internal representation and strings are problematic and caused me a ton of trouble. So the fact that Tcl objects have an internal representation isn’t going to change my opinion of Tcl. (-:
Makefile#
I have been interacting with Makefile
s quite a lot these days. If you want to learn more about
Makefile
s, Makefile
Tutorial is definitely one of the best
resources. But here is what I found most useful:
Recursive
make
:target: cd subdir && $(MAKE)
:=
,=
and?=
Basically,
:=
is the normal assignment.=
evaluates at run time and?=
only assigns when not already present. See here for a more detailed explanation..PHONY
: run even when the file is presentCaution
This can sometimes cause the build process to be excessively long, especially if used together with dependencies, like this:
.PHONY: b b: <build b> .PHONY: a a: b <build a>
b
will build again ifmake a
is invoked, even whenb
is already built previously, for example, with amake b
.One possible solution would be to remove the build dependency.
Suppress errors by adding a
-
to the beginning of each command
Fortran#
When some legacy Fortran 77 code needs to be linked to C code, f2c
is still an option. Just make
sure that you use the latest version since it include important fixes (e.g. building on 64-bit systems).
If you can modify the Fortran code (and have the time to do it), the modern Fortran bind(c)
is
probably the way to go.
C#
Several random takeaways.
long
and long long
are especially tricky#
At least when you’re dealing with legacy 32-bit code. Just use the fixed width versions such as int32_t
or uint64_t
from <stdint.h>
.
Also when dealing with pointers, size_t
and ptrdiff_t
are really useful (from <stddef.h>
).
printf
format specifiers you’ve never heard of#
You probably need to use the proper printf
specifier for the aforementioned types. You can find the
full details on cppreference.com
You should never name your symbol after a libc
symbol#
This will cause you a ton of pain during linking. What I did was that I renamed everything to something else. You could probably make it work with compiler flags, but honestly, I think that way is just too hacky.
What’s Next#
I’m still working on the project. Expect a post on lexer and parser generators and probably something else as well.