Welcome to CCjs
Let’s imagine what C transformed into JavaScript might look like.
We’ll start with a very simple C program, primes.c, that prints the first N prime numbers:
Here’s how you would compile and run it from a command prompt:
cc primes.c -o primes
primes 10
the first 10 primes are: 2 3 5 7 11 13 17 19 23 29
Here’s what an equivalent JavaScript version, primes.js, might look like:
The JavaScript version works just like the original. Below is a window containing the live output of the command “c/tests/primes/primes.js 100”:
NOTE: The windows created for these demos are intended to operate like small “terminal” windows, managed by a Terminal class that allows you to run CCjs apps as if they were built-in commands (eg,
c/tests/primes/primes.js
). The Terminal class also supports ArrowUp and ArrowDown to recall previous commands and Ctrl-C to terminate a running app. A CCjs app runs in a separate JavaScript worker thread, and all other commands are passed to a dedicated “background” worker thread that callseval()
and returns the result.
The above demo works only because primes.js was written by hand. The challenge of CCjs is converting C to JavaScript automatically.
The first step is the CCjs Parser. It has a long way to go, but it is far enough long that it can parse an entire C application into tokens. It can be run from the command-line (using Node) or on a web page (like this one).
Here’s primes.c converted into tokens, by running “src/parse.js c/tests/primes/primes.c”:
The CCjs Parser includes a complete C preprocessor and should be capable of dealing with any valid ANSI C syntax, along with a few C99 extensions (eg, long long support). In fact, I can run all the C code from the SimH PDP10 Emulator through the CCjs Parser, compile the resulting token stream, and produce an identical binary.
The next steps are building an AST (Abstract Syntax Tree) from the token stream and then rewriting the AST as JavaScript. And if that weren’t enough work, other major obstacles include properly supporting all C data types:
- Numbers (JavaScript has only one numeric type)
- Characters and Strings (JavaScript uses UTF-8, not ASCII)
- Pointers, Arrays, Structures, and memory accesses in general
Even our simple demo poses some challenges. For example, char *argv[] refers to an array of pointers to characters. So, when calling main(), it would be natural to create the argv array like this:
primes.main(2, ["primes", "100"]);
but that will only work if we know that nothing is done with the argv pointers other than passing them to functions that expect char * parameters. More generally, we might have to do something like this:
primes.main(2, [new pointer("primes"), new pointer("100")]);
And even more generally, if argv is defined as char **, then we might be required to do this:
primes.main(2, new pointer(new pointer("primes"), new pointer("100")));
which means that any references to, say, argv[1] must be transformed into something like argv.getElement(1).
Pointers (and arrays) will have to be implemented as JavaScript objects with references to a buffer (up to a specified size) and position (initially 0). If no buffer is provided, then it’s effectively a “null pointer.” If a string initializer is provided, a buffer must be initialized with the contents of the string, including a terminating null.
Later, when one of those pointers is passed to a CCjs library function, like stdlib.atoi(), if the library function is expecting a char * pointer, it will generally call the pointer’s getString() method to obtain a JavaScript string, so that it can implement the function using JavaScript operations.
Pointer objects will also have methods that enable the following operations:
- Initializing a pointer to zero
- Assigning a pointer to another pointer
- Adding or subtracting an integer to/from a pointer
- Subtracting or comparing two pointers
“Run-time” pointer assignments will likely only verify that the buffer type of the source pointer (1, 2, 4, or 8-byte elements) matches that of the target pointer; most verification checks will be performed at “compile-time” (i.e., verifying that the data type of the source and target pointers match).
Motivation
To be able to run the SimH PDP10 Emulator in a browser.
Resources
Since I want to start simple, and I have a natural fondness for old architectures, I’m targeting the ANSI C language specification, finalized in the mid-1980’s and memorialized in The C Programming Language, Second Edition.
I’ll worry about newer features and data types (like long long) later, and as the code bases that I’m working with require them.
License
Copyright © 2017 Jeff Parsons and the PCjs Project.
CCjs, a software translation project, is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
You are required to include the above copyright notice in every modified copy of this work and to display that notice whenever the software is started.
See LICENSE for details.