Letter to a Young PL Enthusiast

jules · on June 21, 2010

> Ignore the siren calls of the virtual machines. You have to be the master of your domain. Use C or assembler to implemente your language. The free Unices are your friends. Learne about the internals of your operating system, how it compiles, links, loads, and runs executables, the fruits of your labor. Learn about calling conventions, system calls, and the god-given hierarchy of the memory.

Might want to skip the low level stuff and quickly get to business with high level libraries like the Dynamic Language Runtime.

lallysingh · on June 21, 2010

Consider C--[1] as a first-pass approximation. It'll probably treat you pretty well. Or LLVM[2], actually, the new hotness[3]. Let the hard-but-shared-across-languages optimization part be done by existing systems.

[1] http://en.wikipedia.org/wiki/C-- [2] http://www.llvm.org [3] Yes, I've got a thing for LLVM. Can you blame me?

pwpwp · on June 21, 2010

Sure. But IME, somewhat paradoxically, the lower you go, the more freedom you gain. Not to speak of understanding.

jules · on June 21, 2010

True, but if you have a great idea for a new language you really don't want to spend years implementing your own code generator, garbage collector, runtime system and standard library; you want to get a useful language as quickly as possible.

As an educational experience doing everything yourself from the machine code layer up is great, but for getting stuff off the ground it's not.

prog · on June 21, 2010

I agree that the VM can be a lot of work. However, some language features may need support from the VM. E.g. Scala and Clojure don't have full TCO support as JVM doesn't support that (yet!). If a languages uses an existing VM, it may need to work around the VM limitation. I suspect thats the reason Jython is slower than CPython even though JVM is much faster than the CPython VM.

IMO the big advantage of using a mature existing VM is the library. It should be possible to have at least a usable GC (say simple mark-and-sweep), code generator etc. without too much effort.

Scriptor · on June 21, 2010

It'd be a good idea to decide on what features you want most and whether the disadvantages of a VM offset its advantages. Although Clojure lacks TCO, it's already gotten very popular very quickly, maybe more than other lisps. Hard data for the JVM's role in this is the poll posted not long ago that showed many Clojure programmers were former Java programmers.

I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?

prog · on June 21, 2010

> I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?

I know Scala does it at function level (i.e. the function calls itself at tail position). I think Clojure have a 'recur' keyword to similar effect. The issue with JVM is that if f() calls g() at tail position and g() calls f() at tail position, it can't be optimized away (at least not without an undue amount of work so the advantage is lost). Clojure uses a trampoline[1] based approach to handle such a situation. I think Scala 2.8 also adds support for that. This works well with constant space, the only issue is that its a performance hit as its not done by the VM.

[1] http://richhickey.github.com/clojure/clojure.core-api.html#c...

dfox · on June 21, 2010

compiling self-recursive function into loop catches many cases of tail recursion but certainly not all. Real TCO requires some support from VM to be efficient (if you don't care about efficiency it is possible to fake it with exceptions).

pwpwp · on June 21, 2010

I don't know, a new language will take you years anyway until its useful. And it's a very cool hobby, so I'm not even interested in getting it done as quickly as possible. ;) I want to make it as good as I can.

And I get a warm fuzzy feeling from doing stuff myself (although I'll use the Boehm GC and compile to C for my next language).

chc · on June 21, 2010

How confident are you that a novice language designer will be able to do better cross-platform codegen than LLVM?

pwpwp · on June 21, 2010

I would rather that language designers get basic stuff like lexical scope right, before they care about performance.

And LLVM is effectively a huge black box, which I would caution any new language implementor against using. Sure, it may get you off the ground easier, but that's because you'll no longer be standing on the ground, you'll be standing on LLVM, a massive codebase you don't understand nothing about.

jules · on June 21, 2010

That's going to be the case regardless. If you compile to x86 machine code you're standing on a huge black box, but this time it's also an ugly, platform specific one.

You think people should rather worry about getting lexical scope right, then why should they worry about the low level details? Sure, they have to have at least some idea about how it works on the lowest level, but they don't have to know all the stupid man-given details. Choosing LLVM over x86 assembly is good for performance and productivity. Performance with LLVM will be better unless you are going to spend an extraordinary amount of time to build better low level optimizations, register allocation and code generation than LLVM.

chc · on June 21, 2010

I agree entirely that language designers should focus on getting language issues like scopes right — and that's why I don't think they should waste their time reinventing codegen over and over again unless there's a compelling need for it. I mean, if you're just playing around and don't really want to make a language, fine. But wasting your time on details that aren't useful is the best way to make sure your project never amounts to anything.

dedward · on June 21, 2010

Speaking from the systems side of things - it's plainly obvious when you get a piece of software where the developers don't understand the system level at all - it's obvious they only understand things at an abstract, programming level, and don't really understand how their software is going to work in the real world. (The software will do what it's supposed to, and they may have implemented some fancy algorithms, but it will be a PITA to debug, PITA to install, PITA for every sysadmin who has to touch it, and PITA to try to design systems to support it.)

The point of doing the low-level projects is to learn for yourself, not to literally create the best new language (but you never know.)

chc · on June 21, 2010

Sure, like I said, if your goal is just idle curiosity and a desire to learn, that's fine. But it's kind of moving the goalposts to frame that as "creating a programming language" rather than "fruitlessly messing around with the science and techniques behind programming languages."

mdon · on June 21, 2010

Kind of funny that the only negative listed for C# is "(it) hails from the evil Northwest."

A backhanded way of saying C# is really nice.

pwpwp · on June 21, 2010

You are right. C# is a cool language, especially v4.0, and originally, I wanted to add that to the post.

speek · on June 20, 2010

This is inspiring to me, as a young computer scientist. Maybe I'll go off to create my own language and OS.

nostrademons · on June 21, 2010

Be aware that it is a giant rathole. If you're looking for something commercially viable that might actually make a difference in the world, you're better off in a field like information retrieval, machine learning, geo, or image/audio processing.

There seems to be a siren's call of language/OS/editor development, though. If you really can't resist it, go do it. At the very least, you'll learn a lot, and it beats CRUDscreen Web2.0 apps as a mental exercise. But other fields of CS are much, much, more useful.

speek · on June 21, 2010

I'm a systems guy who happens to be obsessed with biologically-inspired machine learning stuff. There's just something about my genetic makeup that makes me love PL/OSes, though they're more of a means to an end than anything else.

FraaJad · on June 21, 2010

In your opinion, what languages have the best return on investment when it comes to IR, ML and related data sciences?

nostrademons · on June 21, 2010

In my opinion, languages don't matter as much as the underlying algorithms. Learn the math, and then you can implement them in any language.

...but if you had to choose, I'd say to learn Python so you can prototype quickly, and then C++ so you can make it run fast in production. The two also have the nice benefit of working quite well together, so that you can push things into the C++ layer as you understand them better, and keep experimenting by gluing together those libraries with Python.

dedward · on June 21, 2010

And we used to say prototype in C++ and then re-implement in C to make it fast in production.

The math and the algorithms are important - but one shouldn't dismiss a fundamental understanding of the lower levels of the system - even though they'll change over time and you probably won't have to "go there". Real-world software runs on real-world systems, and there is no reason for a budding computer scientist to deprive himself of at least a cursory understanding of how things work underneath - you never know when he'll want to break out of the toolset Vendor X provides him and do something radical and new (like implement something in hardware, or recognizing there is some feature there he can use to massive real-world benefit)

pwpwp · on June 21, 2010

Thanks! Comments like this make blogging even more worthwhile.

And go for it. There's no better way to learn CS.

gruseom · on June 21, 2010

Well now, this gem was unexpected:

you must create a programming language, or be enslav'd by another man's.

William Blake allusion FTW!

Nothing at all is lost.

arethuza · on June 21, 2010

The original is:

"I must Create a System, or be enslav'd by another Man's; I will not Reason and Compare: my business is to Create"

From "Jerusalem The Emanation of The Giant Albion":

http://www.blakearchive.org/exist/blake/archive/work.xq?work...

kabdib · on June 21, 2010

Write the debugger first.

You'll need it anyway. And it will provide /much/ insight into your mistakes.

thunk · on June 21, 2010

It's odd that he dismisses most existing Lisps, then advises the reader to create their own. I wasn't sure if he meant it pedagogically or pragmatically or both.

pwpwp · on June 21, 2010

Good question. I think that the crux is that if you're enthusiastic about PLs, you have to create your own or be enslaved by another man's. That's my feeling at least. And on the way to creating your own PL, you'll also learn to appreciate the existing PLs and implementations better, and learn to live with them, warts and all.

thunk · on June 21, 2010

Gotcha. And an influx of new and experimental Lisps is always welcome.

pwpwp · on June 21, 2010

Right on :)

skybrian · on June 21, 2010

I can see it as purely an academic exercise. But what good is inventing a language that nobody else knows? How do we communicate? How do we share code?

Languages are important to the extent that people decide to learn them and use them, and that way lies politics.

pwpwp · on June 21, 2010

Some people, like me, simply can't not create PLs. This post is for them. Politics doesn't even enter the picture.

teaspoon · on June 21, 2010

Question from someone who maybe doesn't spend enough time examining PLs: How is JavaScript "like assembler with hashtables"? What "messed-up-ness" are Python and Ruby legendary for?

pwpwp · on June 21, 2010

First, you should take the whole post with more than a grain of salt. ;)

Re JavaScript, I'm referring to the lowlevel nature of many of its constructs:

- its impoverished way to pass parameters (i.e. there are no keyword parameters; you don't get an error if you pass too many or too few arguments)

- its impoverished exception handling (i.e. you can't catch exceptions of a specified type)

- its impoverished standard library and built-in data structures

- its lowlevel and complicated OOP system

There's more, see http://pwpwp.blogspot.com/2009/08/awesome-helma-and-lacking-...

Re Python and Ruby, I'm mostly referring to the fact that both languages started out with broken lexical scoping, and had to change their scoping rules repeatedly, which is a huge red warning sign. Additionally, it seems very hard to implement both languages so that they run fast, another factor. Ruby also has an array of different kinds of first-class functions (blocks, procs, lambdas, I lost track), with different capabilities and restrictions, which is a design failure to me. That said, I think they're acceptable languages, even if they force you to memorize a lot of irrelevant stuff.

nostrademons · on June 21, 2010

JavaScript has a certain elegance to the minimalism, though, much like Scheme:

- You can emulate keyword parameters by passing an 'options' dict, and you can emulate defaults by 'var myOpt = options.myOpt || default". You can also define a function, like $.extend, to do this for you.

- You can catch exceptions of a specified type by checking the type and rethrowing if it's not appropriate. And you can define a function to do this for you:

  function try_catch_if(exc_type, body, exc_handler) {
    try {
      body();
    } catch (e) {
      if (e instanceof exc_type) {
        exc_handler();
      } else {
        throw e;
      }
    }
  }

- You can define your own standard library a la JQuery or YUI, or even modify the built-in one a la Prototype (though I don't recommend this). And the built-in data structures aren't all that bad.

- You can build whatever OOP system you want on top of prototypes, and many libraries do just that. (In this respect, it's quite similar to Scheme, where every programmer starts by defining his own incompatible object system).

Most of the sucky parts of JavaScript come from it introducing things that weren't really thought through, eg. the 'this' keyword nonsense is ridiculous, as is the lack of argument-checking by default.

_ivvf · on June 21, 2010

You can build your own keyword-argument passing style, exception handling, standard library, and OOP system with assembly as well. Most of would rather start with a language that got these features right in the first place, rather than defending languages that didn't.

pwpwp · on June 21, 2010

Totally agree.

devinj · on June 21, 2010

Really? You're calling Python a messed up programming language because ten years ago it got lexical scoping? I don't know about Ruby, maybe it's had a rougher road, but that's really the only scoping rule change Python got. It got an additional keyword -- nonlocal -- recently to allow you to rebind variables in outer scopes, but otherwise no change has been had since then. If this is a warning sign, what on earth am I supposed to be watching out for? What's this danger it's warning me against?

sprout · on June 21, 2010

You're taking this entirely too seriously. There is no such thing as a perfect language. Most flaws are features when looked at in the right light. And obviously the reverse is also true.

devinj · on June 21, 2010

I never said Python was perfect. I consider it seriously flawed in many respects. I'm not defending Python as a perfect language.

I'm just saying this "flaw" doesn't make any sense as such. If you want to tell me that with some perspective it's a flaw, show me the perspective, because I'm not seeing it at all. It looks pretty ridiculous on my end.

mhansen · on June 21, 2010

It's not really assembler with hashtables, at all. It's got closures FFS.

However the hashtables are broken: they say they're empty, but they have 'constructor' as a key when you create them.

j_baker · on June 21, 2010

This is a decently thought-out article, but it's important to remember that programming language design is as much an art as any other form of programming if not moreso.

aerique · on June 21, 2010

Funny, I don't understand the only one that sort of applies to me: "Syntax is the Maya of programming".

rntz · on June 21, 2010

http://en.wikipedia.org/wiki/Maya_(illusion)

I'm not sure which part you didn't understand; this is my best guess. I believe the author is trying to say that syntax is a superficial appearance, and not relevant to - indeed, a distraction from - understanding the true beauty or ugliness, utility or disutility, of the concepts embodied in a language.

aerique · on June 21, 2010

At first I didn't understand the whole paragraph (I had just woken up :-) ) and then it was just the Maya part. I didn't understand his use of Maya.

Thanks for the explanation. It seems the author and I have got a similar view on syntax (how can we not since we both seems to admire Alan Perlis!).

kunjaan · on June 21, 2010

"Syntax is the Viet Nam of programming languages" - Matthias Felleisen.