It seems to be a meme on HN that C doesn't reflect hardware, now you're extending that to assembly. It seems silly to me. It was always an approximation of what happens under the hood, but I think the concepts of pointers, variable sizes and memory layout of structs all represent the machine at some level.
For example, C has pointer provenance, so pointers arent just addresses. Thats why type punning is such a mess. If a lang claims to be super close to the hardware this seems like a very weird thing.
C is super close to the hardware in that it works exactly like the abstract C machine, which is kind of a generalization of the common subset of a lot of machines, invented to make it portable, i.e. viable to be implemented straightforwardly on various architectures. For example pointer provenance makes it work on machines with segmented storage, these can occur anywhere, so there is no guarantee that addresses beyond a single allocation are expressible or meaningful.
What makes C feel free for programming is that instead of prescribing an implementation paradigm, it instead exposes a computing model and then lets the programmer write whatever is possible with that (and also what is not -- UB). And a lot of higher level abstractions are quickly implemented in C, e.g. inheritance and polymorphism, but then they still allow to be used in ways you like, so you can not just do pure class inheritance, but get creative with a vtable, or just use another vtable with the same object. These are things you can't do when the classes are a language construct.
The C abstract machine is exactly the important part. There is a difference between saying C is close to "the hardware" and C is close to the C abstract machine. The latter like you described has a few concepts that allow for abstraction and thus portability but obviously they lead to situations where the "maps to the hardware" doesn't seem to hold true.
My gripe is only with people acting like the C abstract machine doesn't exist and C is just syntax sugar for a bit of assembly. It's a bit more involved than that.
> The C abstract machine is exactly the important part. ... My gripe is only with people acting like the C abstract machine doesn't exist and C is just syntax sugar for a bit of assembly. It's a bit more involved than that.
Most people have no understanding of an abstract machine though the very idea of a high-level programming language is based on it.
The C Language Standard itself specifies "Program Execution" only on a "Abstract Machine". Mapping that abstract machine to an ISA/Memory on real hardware is the task of the C compiler. It can do this in any manner as long as the observable behaviour of the program is "as-if" it ran on the abstract machine.
Relevant quote:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.
> the concepts of pointers, variable sizes and memory layout of structs all represent the machine at some level.
Exactly.
Everything in assembly is still one-to-one in terms of functional/stateful behavior to actual execution. Runtime hardware optimization (pinhole instruction decomposition and reordering, speculative branching, automated caching, etc.) give a performance boost but do not change the model. Doing so would mean it didn't work!
And C is still very close to the assembly, in terms of basic operations. Even if a compiler is able to map the same C operations to different instructions (i.e. regular, SIMD, etc.)
You keep making these sorts of comments on various threads which tells me that perhaps you are not clear on the idea of an "Abstract machine" which underpins all high-level languages.
The gap between the "C Abstract Machine" and the actual Hardware underneath is smaller than most other high-level languages. This comment by user haberman puts it very nicely - https://news.ycombinator.com/item?id=46910015
Yes, most languages allow C type code, if that’s what you are trying to do.
Java with only primitive values, arrays, and classes only with fields and static methods.
But that wouldn’t be idiomatic Java, so typically non-explicit abstractions such as polymorphism have code generated for them that you don’t have explicit control over.
C is consistently low level because that’s all you get. Down to direct access to addressing and RAM, the stack frame, etc. as with assembly.
I am puzzled by the claim that C and assembly are not relatively close.
Note here “close” being used in the injective, not bijective, sense. (Scratch out “one-to-one” in my earlier comment.)
And “closer” lowers the bar here too. C isn’t simply decorated assembly. But closer to it.
And “close” being used informally. Arguments for closeness are several and strong (I think), but a bit of a hodgepodge.
In terms of non-bijectivity, for systems programming and performance choices C makes it easy to drop into assembly. But the former are uniquely application specific. And the latter doesn’t make the C version less like the assembly it maps onto - whether the compiler uses the more performant instructions for the context or not.
C’s convenient assembly inlining, and the handoff in both directions being smoothed by an assembly friendly model of the C code around it, are both a part of the “closeness”
But C is generally “close” to assembly, because its data types emphasize types handled natively, compound types reflect RAM layout, and pointers are explicit addresses to data and code. And those address values can be constructed and operated on just like any other data.
C is objectively closer to assembly than languages with strongly required abstractions. (E.g., Java classes, Lisp S-exp's/cons cells, etc.)
C is more “strictly closer” to assembly than languages with more optional abstractions, even if they also allow for relatively low level coding.
Functions could be viewed as a preferred abstraction, but they have a clear assembly level model accessible directly with pointer arithmetic. And they don’t get in the way of directly encoding custom argument passing schemes, and using goto’s and zero argument functions and tail calls as atomic assembly calls for function and jumps for continuations.
Types are a significant non-assembly abstraction, but are zero-cost in that they don't separate C from assembly, but C from C, as a code safety mechanism that is easily escaped.
It is often easy to add abstractions, via regular C, or macros, but you have to provide an explicit implementation for them in the source or complied library.
(However, if macros, with their mixed logical, symbol, text and file “data” model, are viewed as C source instead of as a C source construction language, then C becomes a very wacky abstraction language with behavior and rules that look nothing like simple assembly.)
> I am puzzled by the claim that C and assembly are not relatively close.
Did anyone say that?
I think the point is not that it is not "close", but that C is not equivalent to ASM: C has its own abstractions, and there are things you can do on assembly that you can't express in C.
The other low level languages such as C++, Rust, Zig, ... are equally close since you can express the same things. In some respect they are even closer since they got more features builtins that modern assembly can now do that was not part of the design in C. (SIMD, threading, ...)
Modern languages also have extra abstractions that makes programming easier without compromising on the cost. There are more abstractions than in C, but they also are optional. (Just like you could use goto instead of while or for loop, but you're happy this abstractions exist. We could also use functions pointer in C++ instead of virtual functions, but why would we if the language provide tools that make programming easier, for the same result)
> The other low level languages such as C++, Rust, Zig, ... are equally close since you can express the same things.
C is not just low level friendly, but low level out of the box. That is the level that all C must be written in, even when creating higher abstractions.
Some higher level languages are also low level friendly, not low level strict. Which is a kind dual.
I would argue that what makes C lower level, is that it comes in at, or under, the low levels of other languages, and its high bar comes in much lower than the abstractions built into other languages.
Forth is a good candidate for being even lower level.
But if someone else doesn't see things that way, that is fine. It is just one lens for comparing languages.
> C is not just low level friendly, but low level out of the box. That is the level that all C must be written in
No, it is not:
- People use for/while loop, for example, instead of the "low level" 'goto'
- C compiler compute pointer aliasing, assume operations don't overflow, etc., in order to optimise your code: What you write doesn't translate directly to assembly.
- Some low level operations cannot even be represented in pure C (without using __asm__ extension escape hatch)
There is no "C's convenient inline assembly": that is a vendor extension, if available, and its convenience could vary considerably.
The manipulation of memory by C programs is close semantically to the manipulation of memory by assembly programs. Memory accessed through pointers is similarly "external" to both assembly language and C programs.
The evaluation of C program code is not close to assembly language. C programs cannot reflect on themselves portably; features like parameter passing, returning, and allocating local storage during procedure activation, are not in the programming model.
C loses access to detailed machine state. Errors that machine language can catch, like overflows, division by zero and whatnot, are "undefined behavior". An assembly language program can easily add two integers together and then two more integers which include the carry out from the previous addition. Not so in C.
Assembly language instruction set designs (with some exceptions) tend to bend over backwards to preserve the functioning of existing binary programs, by maintaining the illusion that instructions are executed in sequence as if there were no pipelining or speculative execution, or register renaming, etc.
Meanwhile, C compiler vendors bend over backwards to prove that code you wrote 17 years ago was wrong and make it fail. C is full of unspecified evaluation orders and various kinds of undefined behavior in just the basic evaluation model of its syntactic, built-in constructs; and then some more in the use of libraries.
In assembly language, you would never have doubt about the order of evaluation of arguments for a procedure.
Even when it comes to memory, where C and asasembly language agree in many points, there are some subtle ways C can screw you. In assembly language, you would never wonder whether copying a structure from one memory location to another included the alignment padding bits. In C you also don't have to wonder, if you use memcpy. Oh, but if you use memset to clear some memory which you don't touch afterward and which goes out of scope, the compiler can optimize that away, oops!