Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Letter to a Young PL Enthusiast (axisofeval.blogspot.com)
113 points by vamsee on June 20, 2010 | hide | past | favorite | 47 comments


> Ignore the siren calls of the virtual machines. You have to be the master of your domain. Use C or assembler to implemente your language. The free Unices are your friends. Learne about the internals of your operating system, how it compiles, links, loads, and runs executables, the fruits of your labor. Learn about calling conventions, system calls, and the god-given hierarchy of the memory.

Might want to skip the low level stuff and quickly get to business with high level libraries like the Dynamic Language Runtime.


Consider C--[1] as a first-pass approximation. It'll probably treat you pretty well. Or LLVM[2], actually, the new hotness[3]. Let the hard-but-shared-across-languages optimization part be done by existing systems.

[1] http://en.wikipedia.org/wiki/C-- [2] http://www.llvm.org [3] Yes, I've got a thing for LLVM. Can you blame me?


Sure. But IME, somewhat paradoxically, the lower you go, the more freedom you gain. Not to speak of understanding.


True, but if you have a great idea for a new language you really don't want to spend years implementing your own code generator, garbage collector, runtime system and standard library; you want to get a useful language as quickly as possible.

As an educational experience doing everything yourself from the machine code layer up is great, but for getting stuff off the ground it's not.


I agree that the VM can be a lot of work. However, some language features may need support from the VM. E.g. Scala and Clojure don't have full TCO support as JVM doesn't support that (yet!). If a languages uses an existing VM, it may need to work around the VM limitation. I suspect thats the reason Jython is slower than CPython even though JVM is much faster than the CPython VM.

IMO the big advantage of using a mature existing VM is the library. It should be possible to have at least a usable GC (say simple mark-and-sweep), code generator etc. without too much effort.


It'd be a good idea to decide on what features you want most and whether the disadvantages of a VM offset its advantages. Although Clojure lacks TCO, it's already gotten very popular very quickly, maybe more than other lisps. Hard data for the JVM's role in this is the poll posted not long ago that showed many Clojure programmers were former Java programmers.

I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?


> I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?

I know Scala does it at function level (i.e. the function calls itself at tail position). I think Clojure have a 'recur' keyword to similar effect. The issue with JVM is that if f() calls g() at tail position and g() calls f() at tail position, it can't be optimized away (at least not without an undue amount of work so the advantage is lost). Clojure uses a trampoline[1] based approach to handle such a situation. I think Scala 2.8 also adds support for that. This works well with constant space, the only issue is that its a performance hit as its not done by the VM.

[1] http://richhickey.github.com/clojure/clojure.core-api.html#c...


compiling self-recursive function into loop catches many cases of tail recursion but certainly not all. Real TCO requires some support from VM to be efficient (if you don't care about efficiency it is possible to fake it with exceptions).


I don't know, a new language will take you years anyway until its useful. And it's a very cool hobby, so I'm not even interested in getting it done as quickly as possible. ;) I want to make it as good as I can.

And I get a warm fuzzy feeling from doing stuff myself (although I'll use the Boehm GC and compile to C for my next language).


How confident are you that a novice language designer will be able to do better cross-platform codegen than LLVM?


I would rather that language designers get basic stuff like lexical scope right, before they care about performance.

And LLVM is effectively a huge black box, which I would caution any new language implementor against using. Sure, it may get you off the ground easier, but that's because you'll no longer be standing on the ground, you'll be standing on LLVM, a massive codebase you don't understand nothing about.


That's going to be the case regardless. If you compile to x86 machine code you're standing on a huge black box, but this time it's also an ugly, platform specific one.

You think people should rather worry about getting lexical scope right, then why should they worry about the low level details? Sure, they have to have at least some idea about how it works on the lowest level, but they don't have to know all the stupid man-given details. Choosing LLVM over x86 assembly is good for performance and productivity. Performance with LLVM will be better unless you are going to spend an extraordinary amount of time to build better low level optimizations, register allocation and code generation than LLVM.


I agree entirely that language designers should focus on getting language issues like scopes right — and that's why I don't think they should waste their time reinventing codegen over and over again unless there's a compelling need for it. I mean, if you're just playing around and don't really want to make a language, fine. But wasting your time on details that aren't useful is the best way to make sure your project never amounts to anything.


Speaking from the systems side of things - it's plainly obvious when you get a piece of software where the developers don't understand the system level at all - it's obvious they only understand things at an abstract, programming level, and don't really understand how their software is going to work in the real world. (The software will do what it's supposed to, and they may have implemented some fancy algorithms, but it will be a PITA to debug, PITA to install, PITA for every sysadmin who has to touch it, and PITA to try to design systems to support it.)

The point of doing the low-level projects is to learn for yourself, not to literally create the best new language (but you never know.)


Sure, like I said, if your goal is just idle curiosity and a desire to learn, that's fine. But it's kind of moving the goalposts to frame that as "creating a programming language" rather than "fruitlessly messing around with the science and techniques behind programming languages."


Kind of funny that the only negative listed for C# is "(it) hails from the evil Northwest."

A backhanded way of saying C# is really nice.


You are right. C# is a cool language, especially v4.0, and originally, I wanted to add that to the post.


This is inspiring to me, as a young computer scientist. Maybe I'll go off to create my own language and OS.


Be aware that it is a giant rathole. If you're looking for something commercially viable that might actually make a difference in the world, you're better off in a field like information retrieval, machine learning, geo, or image/audio processing.

There seems to be a siren's call of language/OS/editor development, though. If you really can't resist it, go do it. At the very least, you'll learn a lot, and it beats CRUDscreen Web2.0 apps as a mental exercise. But other fields of CS are much, much, more useful.


I'm a systems guy who happens to be obsessed with biologically-inspired machine learning stuff. There's just something about my genetic makeup that makes me love PL/OSes, though they're more of a means to an end than anything else.


In your opinion, what languages have the best return on investment when it comes to IR, ML and related data sciences?


In my opinion, languages don't matter as much as the underlying algorithms. Learn the math, and then you can implement them in any language.

...but if you had to choose, I'd say to learn Python so you can prototype quickly, and then C++ so you can make it run fast in production. The two also have the nice benefit of working quite well together, so that you can push things into the C++ layer as you understand them better, and keep experimenting by gluing together those libraries with Python.


And we used to say prototype in C++ and then re-implement in C to make it fast in production.

The math and the algorithms are important - but one shouldn't dismiss a fundamental understanding of the lower levels of the system - even though they'll change over time and you probably won't have to "go there". Real-world software runs on real-world systems, and there is no reason for a budding computer scientist to deprive himself of at least a cursory understanding of how things work underneath - you never know when he'll want to break out of the toolset Vendor X provides him and do something radical and new (like implement something in hardware, or recognizing there is some feature there he can use to massive real-world benefit)


Thanks! Comments like this make blogging even more worthwhile.

And go for it. There's no better way to learn CS.


Well now, this gem was unexpected:

you must create a programming language, or be enslav'd by another man's.

William Blake allusion FTW!

Nothing at all is lost.


The original is:

"I must Create a System, or be enslav'd by another Man's; I will not Reason and Compare: my business is to Create"

From "Jerusalem The Emanation of The Giant Albion":

http://www.blakearchive.org/exist/blake/archive/work.xq?work...


Write the debugger first.

You'll need it anyway. And it will provide /much/ insight into your mistakes.


It's odd that he dismisses most existing Lisps, then advises the reader to create their own. I wasn't sure if he meant it pedagogically or pragmatically or both.


Good question. I think that the crux is that if you're enthusiastic about PLs, you have to create your own or be enslaved by another man's. That's my feeling at least. And on the way to creating your own PL, you'll also learn to appreciate the existing PLs and implementations better, and learn to live with them, warts and all.


Gotcha. And an influx of new and experimental Lisps is always welcome.


Right on :)


I can see it as purely an academic exercise. But what good is inventing a language that nobody else knows? How do we communicate? How do we share code?

Languages are important to the extent that people decide to learn them and use them, and that way lies politics.


Some people, like me, simply can't not create PLs. This post is for them. Politics doesn't even enter the picture.


Question from someone who maybe doesn't spend enough time examining PLs: How is JavaScript "like assembler with hashtables"? What "messed-up-ness" are Python and Ruby legendary for?


First, you should take the whole post with more than a grain of salt. ;)

Re JavaScript, I'm referring to the lowlevel nature of many of its constructs:

- its impoverished way to pass parameters (i.e. there are no keyword parameters; you don't get an error if you pass too many or too few arguments)

- its impoverished exception handling (i.e. you can't catch exceptions of a specified type)

- its impoverished standard library and built-in data structures

- its lowlevel and complicated OOP system

There's more, see http://pwpwp.blogspot.com/2009/08/awesome-helma-and-lacking-...

Re Python and Ruby, I'm mostly referring to the fact that both languages started out with broken lexical scoping, and had to change their scoping rules repeatedly, which is a huge red warning sign. Additionally, it seems very hard to implement both languages so that they run fast, another factor. Ruby also has an array of different kinds of first-class functions (blocks, procs, lambdas, I lost track), with different capabilities and restrictions, which is a design failure to me. That said, I think they're acceptable languages, even if they force you to memorize a lot of irrelevant stuff.


JavaScript has a certain elegance to the minimalism, though, much like Scheme:

- You can emulate keyword parameters by passing an 'options' dict, and you can emulate defaults by 'var myOpt = options.myOpt || default". You can also define a function, like $.extend, to do this for you.

- You can catch exceptions of a specified type by checking the type and rethrowing if it's not appropriate. And you can define a function to do this for you:

  function try_catch_if(exc_type, body, exc_handler) {
    try {
      body();
    } catch (e) {
      if (e instanceof exc_type) {
        exc_handler();
      } else {
        throw e;
      }
    }
  }
- You can define your own standard library a la JQuery or YUI, or even modify the built-in one a la Prototype (though I don't recommend this). And the built-in data structures aren't all that bad.

- You can build whatever OOP system you want on top of prototypes, and many libraries do just that. (In this respect, it's quite similar to Scheme, where every programmer starts by defining his own incompatible object system).

Most of the sucky parts of JavaScript come from it introducing things that weren't really thought through, eg. the 'this' keyword nonsense is ridiculous, as is the lack of argument-checking by default.


You can build your own keyword-argument passing style, exception handling, standard library, and OOP system with assembly as well. Most of would rather start with a language that got these features right in the first place, rather than defending languages that didn't.


Totally agree.


Really? You're calling Python a messed up programming language because ten years ago it got lexical scoping? I don't know about Ruby, maybe it's had a rougher road, but that's really the only scoping rule change Python got. It got an additional keyword -- nonlocal -- recently to allow you to rebind variables in outer scopes, but otherwise no change has been had since then. If this is a warning sign, what on earth am I supposed to be watching out for? What's this danger it's warning me against?


You're taking this entirely too seriously. There is no such thing as a perfect language. Most flaws are features when looked at in the right light. And obviously the reverse is also true.


I never said Python was perfect. I consider it seriously flawed in many respects. I'm not defending Python as a perfect language.

I'm just saying this "flaw" doesn't make any sense as such. If you want to tell me that with some perspective it's a flaw, show me the perspective, because I'm not seeing it at all. It looks pretty ridiculous on my end.


It's not really assembler with hashtables, at all. It's got closures FFS.

However the hashtables are broken: they say they're empty, but they have 'constructor' as a key when you create them.


This is a decently thought-out article, but it's important to remember that programming language design is as much an art as any other form of programming if not moreso.


Funny, I don't understand the only one that sort of applies to me: "Syntax is the Maya of programming".


http://en.wikipedia.org/wiki/Maya_(illusion)

I'm not sure which part you didn't understand; this is my best guess. I believe the author is trying to say that syntax is a superficial appearance, and not relevant to - indeed, a distraction from - understanding the true beauty or ugliness, utility or disutility, of the concepts embodied in a language.


At first I didn't understand the whole paragraph (I had just woken up :-) ) and then it was just the Maya part. I didn't understand his use of Maya.

Thanks for the explanation. It seems the author and I have got a similar view on syntax (how can we not since we both seems to admire Alan Perlis!).


"Syntax is the Viet Nam of programming languages" - Matthias Felleisen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: