Rethinking Files

erichanson · on June 3, 2019

Nice to see folks rethinking files, as they're a scourge on the planet and an antiquated anti-pattern that has been holding back the industry pretty much since its inception. I don't know how anyone could take a look at /etc for example, and consider it anything but archaic. The adduser command is some 1130 lines long, and all it does is do CRUD on files, to name just one example. Then there are countless config files that just have to be edited by hand and happily accept syntax errors and logical errors. No modern system would tolerate this.

The root of the problem with files is that they lack an information model, beyond just a sequence of bytes. They are unopinionated to a fault. All files have structure. Even if that structure is a "non-structure" like "all these files are just a random sequence of meaningless bytes", then that is their structure. But this information isn't present in the system, nor can it be enforced or constrained when that is desirable.

To me, the obvious alternative is the database, aka "everything is a row". We have used the database (relational or otherwise, but mostly relation) to successfully model many many domains, and bring coherence and clarity to them. The cool thing about the relational database is that it's based on an underlying relational algebra. The syntax of data in an RDBMS is really just one manifestation of a deeper layer of structure that is syntax-free, and these abstract structures can be (and are) manifested in multiple coexisting syntaxes.

I'm exploring this pattern ("datafication", headshake) with Aquameta (http://aquameta.org/) and written a lot more about why file-centric is holding us back (http://blog.aquameta.com/intro-chpater2-filesystem/). Boot to PostgreSQL! :)

agumonkey · on June 3, 2019

It's something that felt right in COBOL (and COBOL rarely feels right in this day and age). The file IO is record based at the core, so opening a file is basically a crude (npi) sql statement. It makes a lot of tiny things simpler IMO.

gjs278 · on June 3, 2019

files can be modified by any text editor in /etc it sounds like you are advocating for a system similar to the windows registry. it can easily corrupt and can’t be fixed by live cds or other operating systems that have the filesystem driver. it would be a massive step backwards.

erichanson · on June 3, 2019

I agree the system you describe would be a massive step backwards. :)

No, I'm saying the OS needs an information model as a first-class citizen. But since you brought up data corruption: This hypothetical OS could also benefit from having transaction support up in application space -- to avoid data corruption -- something most "modern" programs don't even have, even though most file systems do.

To be fair, I do think the database needs a way to edit the contents of a field using your favorite text editor. But we've got a pgfs plugin that uses FUSE to make the database accessible as a filesystem as well.

zokier · on June 2, 2019

> People like Unix's “everything is a file” approach because what it really means is “everything is exposed to the same nexus”. It means you need only ssh to a system and you have all the power to reshape all aspects of that system with a single interface, the command line, using a common set of highly composable tools

But at least in Linux there are ton of files that are not exposed to the "same nexus", i.e. filesystem. The most common example would be network sockets. They are files, but do not exist anywhere in filesystem. In Linux file is more of an object handle.

https://yarchive.net/comp/linux/everything_is_file.html

http://events17.linuxfoundation.org/sites/events/files/slide...

darkpuma · on June 3, 2019

Whenever somebody talks about a "Unix philosophy" they are almost certainly alluding to some revisionist history or whitewashing of the truth. The only real "philosophy" of Unix that is grounded in reality is the "worse is better" idea.

One generation throws some shit at the wall, some of it sticks. Time passes and a few elders talk up their achievements in grandiose terms. With time, people begin to forget the truth and view the artifacts from the past as products of pure enlightenment. Shit-throwers are retconned into being master architects. The just-world fallacy kicks in and people mistake 'passing the test of time' for proof of quality, then they spin legends to fill that narrative.

mpweiher · on June 2, 2019

See PipesFS: Fast Linux I/O in the Unix Tradition [1]

"PipesFS replaces socket-specific data access calls (like send and recvmsg) with basic reading from and writing to pipes, whereby the location of the pipe identifies the socket. The actual path for sockets is long, consisting of 8 filters for the reception end, but users are easily shielded from this com- plexity through symbolic links"

[1] https://ts.data61.csiro.au/publications/papers/deBruijn_Bos_...

delish · on June 2, 2019

Yes, people call it the "Unix philosophy" not the "Linux philosophy." And the Plan 9 folks (nee Unix folks) also thought everything in Unix should be a file and not everything was.

[edited for clarity]

xfer · on June 2, 2019

It is not the "Linux Philosophy", socket api originated from BSD and it is considered Unix. Based on my experience noone in the plan9 community would call it unix.

p_l · on June 3, 2019

Socket API was bolted onto Unix as a port from TOPS-20. It might have become accepted part of Unix, but so was XTI which in retrospect was probably better.

amelius · on June 2, 2019

Instead of calling everything a "file", you might as well call everything an "object" then. It also seems more correct.

xfer · on June 2, 2019

Calling it object would mean you have different methods of accessing it. But files have a fixed set of methods/properties.

amelius · on June 2, 2019

Not sure what you mean, because e.g. ioctl() call has many functions which are not sensible for regular files.

In any case, the article has 17 occurrences for "file(s)" versus 28 for "object(s)", so the author seems to agree with me :)

xfer · on June 2, 2019

Not every file respond to ioctl() and that's not what people mean when they say "unix philosophy". Yes, in a sense ioctl() does model the object nature, but they aren't discoverable and rather adhoc, as the article points to powershell and capability for reflection for how a OOP based resource access would look like.

eMSF · on June 2, 2019

What's the difference?

Perhaps a file never meant just something you have in a directory, rather "a stream of bytes", and the bigger thing was always unifying the difference whether those bytes are read from a magnetic tape or received wirelessly over the internet.

jackfraser · on June 2, 2019

A file is just an object with a name in a namespace. Without a name, it's not really very much of a file.

amelius · on June 3, 2019

In Unix, some files even don't have names. See for instance the pipe() call.

Lowkeyloki · on June 2, 2019

I found the URL addressing scheme of Redox to be fascinating, if perhaps slightly less user-friendly compared to files and file paths.

https://doc.redox-os.org/book/design/url/urls.html

hlandau · on June 2, 2019

Personally, Redox's use of URLs seemed like really bad design to me. It doesn't get simpler than the Unix path syntax.

Having a scheme:// makes sense for URLs because you don't otherwise have any contextual information indicating how to access a resource. But this isn't the case for something like a virtual filesystem, where the total set of filesystems mounted under it - and their types - are all known to the system. There's no need for disk://foo when you can just have /dev/disk/foo.

mpweiher · on June 2, 2019

That's true when the namespace covers objects that are very similar to access, ideally identical.

If that's not the case, I have found the scheme to be helpful to indicate what's going on.

hlandau · on June 2, 2019

On *nix, you can always figure out what type of filesystem is mounted at a given prefix by typing `mount`.

What the use of schemes does is make things needlessly inflexible, and embeds a dependency on the name of a filesystem provider inside consumers of that filesystem. It's akin to a Unix where filesystems can only be mounted in top-level directories /mnt, but not /mnt/foo, etc.; I don't see the appeal.

yjftsjthsd-h · on June 2, 2019

I prefer to use `df -T /path/to/mount`, personally.

vageli · on June 3, 2019

> I prefer to use `df -T /path/to/mount`, personally.

Why?

yjftsjthsd-h · on June 4, 2019

Lets me specify a single file system; I don't think `mount` does that (unless I'm blind; possible).

vageli · on June 5, 2019

Mount lets you do this as well.

mount -t type device destination_dir

Unless I am missing something in your use case.

mpweiher · on June 3, 2019

> schemes make things needlessly inflexible

Not so. See file: :-)

EmilStenstrom · on June 3, 2019

I recommend opening up developer tools and adding this before reading this article:

  body {
      width: 40em;
      margin: 0 auto;
      font-size: 1.4em;
      line-height: 1.4em;
  }

saadat · on June 3, 2019

Or use the Reader View/Reader mode in Firefox/Safari.

OJFord · on June 2, 2019

What if the interaction were more like OOP - the File class wouldn't necessarily make sense as the top parent.

Would be kind of interesting to call methods on objects rather than read/write files, but it's not immediately obvious to me that that really gains anything over the status quo.

And now that I've written that, I wonder is that what powershell's verb-object does anyway? I've never come close to proficient enough (nor wanted to!) to know.

maxxxxx · on June 3, 2019

I have had the (dis)pleasure of learning PowerShell pretty well in the last few months and from that experience I believe forcing different things into the same model doesn't really work. When I read something like "New-Item" I have to take another look and see what it really deals with. Maybe some people think it's "elegant" but I don't agree. It's not like everything that's returned by "New-Item" behaves the same way. .NET is doing it better with static methods like "File.xxxxx" and "Directory.xxxx". You know immediately what you are dealing with.

bayareanative · on June 3, 2019

PowerShell's problem is it's so goddamn wordy and poorly documented. It reminds me of AppleScript, but far worse.

maxxxxx · on June 3, 2019

"It reminds me of AppleScript, but far worse."

I wouldn't go that far :-). But you are right. They did good work with the cmdlets but they put a terrible language on top of them.

It would have been much better if they had put an interpreted version of C# (maybe with a few extensions) on top of it.

carlmr · on June 3, 2019

Agreed, I think F# which already has an interpreter would have been a great and obvious choice. Terse Syntax, amazing type inference, and a great existing community.

maxxxxx · on June 3, 2019

This may be interesting. Script languages generally aren’t typed but maybe a language with very good type inference may feel like it’s untyped.

carlmr · on June 3, 2019

You mostly don't need types in F#, although types often add readability, especially for function signatures (IMHO).

F# will even compile

    let add a b = a + b
    add 1 2;;

as

    val add : a:int -> b:int -> int
    val it : int = 3

or

    let add a b = a + b
    add 1ul 2ul;;

as

    val add : a:uint32 -> b:uint32 -> uint32
    val it : uint32 = 3u

So it will even infer type from the first usage of the function.

Riverheart · on June 3, 2019

Too each their own. With an auto completing IDE and aliases I've never found wordiness to be an issue. Certainly helps when you or others need to read it again. Don't find the documentation lacking either. Even if you do, anything you can do or have access to in C# can be replicated in Powershell.

vkazanov · on June 3, 2019

Some things should not require an IDE? Powershell is a shell so it should be easy to type, no?

Riverheart · on June 3, 2019

The shell has auto complete as well and most of the commonly used cmdlets have aliases for shell usage if you don't want to rely on auto complete. You can also create aliases if you so desire.

Edit: The side benefit of the verbosity is the discoverability of less used commands. Is it groupadd or addgroup? No question in Powershell, it would be Add-Group because of the Verb-Noun standard. Bash has all sorts of inconsistencies that require look-up if you don't use those commands often.

maxxxxx · on June 3, 2019

I think Powershell made a big mistake going with verb first. I know I want to do something with group but when I type group there is no auto complete. Now I type “add” and I get dozens or hundreds of suggestions which makes auto complete almost useless. They should have gone with target object first so when I type in group I get all the relevant auto complete suggestions.

Riverheart · on June 3, 2019

I'd suggest the following.

  get-command add*group*

Or for brevity

  gcm add*group*

For reference, the Bash version is this:

  compgen -c | grep group

Edit:

Let's get crazier. You want custom tab completion to focus on the command's noun plus Bash style completion.

  $Function:OriginalTabCompletion = $Function:TabExpansion

  function TabExpansion($line,  $lastWord) {
      if ($line -match ('^!(.*)') {
          $lastWord = $lastWord.trimstart('!')
          Get-command -noun *$lastWord*
      } elseif {
          OriginalTabExpansion  $line $lastWord
      }
  }
  Set-psreadline -chord tab -function MenuComplete

  !group<TAB>

maxxxxx · on June 4, 2019

Oh man. Somebody who really knows PowerShell. That's really rare :-)

OJFord · on June 3, 2019

tell system to tell browser to tell tab with name 'HN' to tell comment with name 'bayareanative' to click button with name 'reply' and ...

Oh hell.

gmueckl · on June 3, 2019

This is very much the model of Win32. Everything thatbexists at least partially in the kernel is exposed as a handle. There are a lot of different types of handles, but all of them can have CloseHandle called on them (maybe there are exceptions? I don't know). Most of them can be used with calls like WaitForSingleObject or WaitForMultipleObjects. So in a way, Windows implements exactly that model.

asveikau · on June 3, 2019

File descriptors and Windows handles are pretty much the same concept. Just that the former is an int, and the latter is packed into a pointer typedef. Your WaitForMultipleObjects example is not radically different from the poll family of syscalls. (Although there aren't named mutex file descriptors and the like... But Linux has some concepts that remind me of Win32 handle types, such as eventfd(2)).

hlandau · on June 2, 2019

This is pretty much what I mean. Essentially I'm proposing that the "base class" shouldn't support anything other than close(). You could have objects which don't support read()/write(), but custom verbs with different semantics appropriate to their type. Tape drives (and only tape drives) could support a rewind(), etc.

mbreese · on June 3, 2019

I think it might be more helpful o think of what semantics could be supported by the everything is a file operations. A tape drive rewind() is just a specialized version of seek(), which any random access object would need to support.

The file metaphor is soooo flexible, so it’s hard for me to think of examples where it breaks down. So, what are some good examples where the file metaphor breaks down? Maybe that’s helpful?

hlandau · on June 3, 2019

The trouble is, a tape drive can support seek(). But can it support it performantly for all arguments? seek(0, SEEK_SET) is easy. seek(1024, SEEK_CUR) is easy - just read forward a little. But seeking to some arbitrary fixed offset? As far as I'm aware, tape drives are designed to use filemarks for 'searching', not precise offsets.

Of course seek(n, SEEK_SET) could be implemented anyway, in a very un-performant (and tape-wearing manner): by rewinding, and then reading forward n bytes. There's a question of whether the utility of this is desirable when weighed against how surprising it may be to people who don't realise just how bad the performance will be, especially when a tape drive which only supports seek(0, SEEK_SET) can easily have this behaviour emulated on top in userspace by seek(0, SEEK_SET) followed by dummy reads, if you really want it.

read() and write() and seek() prove remarkably versatile, but the niggles come with the fact that different types of file/device on POSIX can have subtly different gotchas with these verbs which, on the face of it, appear to be the same verb. Essentially, I might argue they're not the same verb at all - they just seem similar.

For example, read() from a UDP socket and read() from a normal file have extremely different semantics. If you read() with a 64 byte buffer from a UDP socket, the message is truncated and the remainder of it is lost. This is a very different semantic to reading from a file, where you can read in whatever chunks you like.

I wrote the article upon reflection of precisely this attempt to force everything into the straightjacket of everything-is-a-file that we've had for decades with UNIX. How much code correctly deals with short write()s? "Everything can be expressed as an object on which you can perform read()/write()" can only be true if you ignore the details of a verb's precise semantics, but the precise semantics are important. I think it's fair to argue that write() isn't one verb at this point, but an overloaded verb referring to a set of verbs. Which verb in that set you're invoking is dependent on the type of "file".

zaarn · on June 3, 2019

The problem, IMO, is less that the file metaphor is not capable of expressing some things and rather that it expresses some things very poorly.

For example, a GPU device doesn't have like a file. You cannot effectively control a GPU via read/write. read/write are excessively slow for anything you'd want to do, including a simple VGA buffer. Almost all operations on a GPU in Linux involve mmap'ing it and then applying ioctl() liberally.

You can do almost everything using the file metaphor, Plan 9 proves it. But it's at times going to be a very poor metaphor that is better at working at all than working well.

all2 · on June 2, 2019

This would be interesting for things like robotics and automation; think of a home() command for a mill center, or a lock-all-doors() command for a house, or bedroom-lights-on(), or make-cappacino(). I could go on.

Coffee on the command line sounds interesting.

This could also make GUIs far easier to spin up. A operation on the computer could easily spin up a 'new' GUI that depends on system / operation state using GUI objects available to the entire operating system.

johnday · on June 3, 2019

Surely this is just a layer of indirection? It moves the onus for providing implementations of these verbs from software developers to the maintainer of the file type.

Personally I'd rather just stick to the existing analogue of verbs, which we call executables.

taffer · on June 2, 2019

But do you really need the open()/close() semantics for everything? What if we wanted to support a hypothetical filesystem with stateless semantics (e.g. a distributed filesystem), then we wouldn't even need close().

mbreese · on June 3, 2019

You would probably still want a close() to release local resources. A stateless object itself might just NOP the close(), but the local handler might still need to release a reference or two.

I think open/close() are probably the minimal interface.

mpweiher · on June 2, 2019

That's kind of the point of stores:

https://github.com/mpw/MPWFoundation/blob/master/Documentati...

and Polymorphic Identifiers:

http://objective.st/URIs/

Hierarchical paths were a good idea, let's use them. Objects were also a good idea, let's use those. A small set of verbs (GET, PUT, POST, DELETE) was also a good idea. Let's combine these!

Abstract from:

   Path    + File       + POSIX I/O
   URI     + Resource   + REST Verbs

Get:

1. Polymorphic Identifiers, which subsume paths, URIs, variables, dictionary keys etc.

2. Stores, wich resolve URIs, subsume filesystems, HTTP servers, dictionaries, etc.

3. A small protocol that essentially mirrors REST verbs in-process

See also: In-process REST, https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...

hlandau · on June 2, 2019

The whole point of my article was that constraining things to a small, limited set of verbs is actually a bad idea.

Theoretically you can just make up your own verbs for HTTP and use those. In practice people stick to the common ones because they're well supported. This leads to people massaging a problem domain into the straightjacket of GET/PUT/POST/PATCH/DELETE, regardless of how well it fundamentally fits that set of verbs. (I'm also convinced nobody actually knows what "REST" means, but that's another rant for another time.)

mpweiher · on June 3, 2019

The WWW kinda shows that, empirically, having a small set of verbs is a pretty darn good idea. CORBA and most call/return programming has shown that having to invent new verbs all the time ( getX(), getY(), getTopLeftCorner() ...) gets old really fast.

The other thing a common set of verbs gets you is generic endpoints and, even more interesting, generic intermediaries.

My approach is to let resource-y things be resource-y, and let verb-y things be verb-y. After all, language has nouns and adjectives and verbs, maybe there is a good reason for this diversity?

So

     var:myhome/doorbell ring.

(Although I am highly skeptical of IoT, so somewhat wary of such an example).

If you wanted to model this in a more resource-y way, you could doL

     var:/myhome/doorbell/ringing := true.
     // delay
     var:/myhome/doorbell/ringing := false.

That would also get you the ability to read the status of the doorbell.

> I'm also convinced nobody actually knows what "REST" means

Considering REST is the basis of the WWW, the largest and arguably most successful information system of all time, I would say (a) most people understand it "well enough" to work with it and (b) if we don't understand it, it behooves us to make an effort to do so.

Because it's not like there haven't been other attempts to build something like the WWW, they just failed miserably.

k__ · on June 2, 2019

What other verbs could be needed?

tgbugs · on June 3, 2019

In the context of http, we have HEAD, but we don't have something DATAHEAD, or METADATA, which would ask the server to provide you what _it_ thinks is the relevant metadata for the endpoint in question. You can fake this by streaming data, or using something like `Content-Range` _if_ you already know what _you_ think the incoming data is, but this means that the consumer has to already know what it is expecting, which kind of defeats one of the purposes of metadata. For example it would be great to be able to send a METADATA request to a url for a zip file and have the server send you back just the central directory (why did they put the central directory at the _end_ of the file?? -> edit: answer: because it was created at a time when you wanted to write the zipped data to disk in a stream so you wouldn't actually know what you had until the end, optimized for writing, duh).

hlandau · on June 3, 2019

Let's suppose that I have an HTTP endpoint for a doorbell (for some reason). The doorbell resource is represented as http://example.com/doorbell.

There's no RING HTTP method, and I could invent one, but heaven knows if various HTTP middleware would be happy with that. In practice, people do something like

    POST http://example.com/doorbell/ring

The problem with this is that you now have a hierarchy of verbs; you have first class verbs (GET, PUT, POST, PATCH, DELETE), and second class verbs which have to be represented as distinct resources. This feels like a hack to me.

k__ · on June 3, 2019

Ah okay.

But isn't this basically what RPC vs REST boils down to?

As far as I know people tried the RPC way for years then gave up on it and started doing REST. Seemingly because inventing a whole bunch of methods was inherently flawed.

syn0byte · on June 3, 2019

Your not solving anything, at best you are getting maybe one extra level of abstraction by shifting potential complexity in the application. It may or may not care about internal file "schema" and thus has no code for it. Shifts to concrete complexity in the system; Your application doesn't utilize file schema but some applications might so everyone gets a schema field and there is a bunch of extra code and complexity to support it.

From a security/reliability standpoint it sounds like a nightmare combining the worst of things like NTFS alternate data streams and share library loading into one.

leoc · on June 3, 2019

See my earlier comment, https://news.ycombinator.com/item?id=14542595 .

Lotus Agenda/Chandler https://en.wikipedia.org/wiki/Chandler_(software) is another part of this long Grail quest.

bayareanative · on June 3, 2019

Files are too finite, low-level and lose generate/parsing knowledge that is implemented N times in N places. OSes should read and write message-oriented streams of records (pb, capnp or similar.) that are invisible to the user, while tools and code see data and data structures. This solves many problems of unnecessary and repeated effort parsing log files, log file rotation, proprietary file formats, portability, compatibility and extensibility.

Also, programs should be able to dynamically-serve the contents of "files" as well with an "activation symlink", i.e.,

    /etc/resolv ->* resolvconf

The "the everything must be plain text" refrain is obsolete and unnecessary because it's trivial to serialize anything to any format since it would already be an universally-supported data structure both in tools and code.

It's not 1978 anymore.

RcouF1uZ4gsC · on June 3, 2019

Sounds a lot like WinFS https://en.m.wikipedia.org/wiki/WinFS

O_H_E · on June 3, 2019

Two sic projects that can help managing files until we get another system

TMSU - tags your files and then access them through a virtual filesystem from any other application

https://tmsu.org -- https://github.com/oniony/TMSU

Tagsistant - Semantic filesystem for Linux, with relation reasoner, autotagging plugins and deduplication

https://www.tagsistant.net -- https://github.com/StrumentiResistenti/Tagsistant

solidsnack9000 · on June 3, 2019

The examples given at the end, where verbs are commands at certain paths, looks a lot like a special file system. All the printers are under `/print` and all the print commands are under `/print`. One could imagine all the database tables being under `/db` and all the commands being under `/db/bin`.

ubrpwnzr · on June 3, 2019

Another site, can we please just add something like this:

</style>

tgbugs · on June 3, 2019

I've done some silly things [0] with python's pathlib recently that seem related to the issues discussed here. Given that smalltalk message passing finally clicked for me durnig the process, I am attracted to an object-like solution for everything (or a file-object-like solution for everything, since the practical performance advantages are undeniable). That said there are some considerations both for the low level implementation, and for high level things like affordances for 'file' operations.

In direct response to the suggestion about file paths for verbs. Allan Kay says in one (possibly many) of his talks something along the lines of 'every function should have a url.' The one of surely many challenges is how to ensure that the mechanism used to populate file system paths with nested functionality (e.g. /usr/bin/ls/all to `ls -a`) don't trigger malicious behavior during service/capability discovery. Being able to more deeply introspect file data and metadata as if the file were a folder could potentially be implemented as a plugin, and I worry about the complexity of requiring a file system to know about the contents of the files that it hosts, or that the files themselves be required to know about how to tell the file system about themselves. Existing file systems adhere to a fairly strict separation of concerns, since who knows what new file format or language will appear, and who knows what file system the file will need to exist on.

Said another way I think that the primary issue with the suggested approach is that it is hard to extend. The file system itself needs to know about the new type of object that it is going to represent, rather than simply acting as an index of paths to all objects. If there is a type of object that is opaque to the current version of the file system that object either has to implement a file-system-specific discovery protocol (which surely would have fun security considerations if it were anything other than a static manifest) or the user has to wait for a new version of the file system that knows what to do with that file type.

Some thoughts from my own work. (partially in the context of OJFord's comment below)

Treating files and urls as objects that have identifiers, metadata, and data portions and where the data portion is treated as a generator is very powerful, but the affordances around the expression local_file.data = remote_file.data make me hesitate. When assignment can trigger a network saturating read operation, or when setter doesn't know anything about how much space is on a disk, etc. then there are significant footguns waiting to be fired and I have already shot myself a couple of times.

The more homogeneous the object interface the better. However, this comes with a major risk. If the underlying systems you are wrapping have different operational semantics (think files system vs database transactions) and there is no way to distinguish between them based solely on the interface (because it is homogeneous) then disaster will strike at some point due to a mismatch. To avoid this everything built on top of the object representation has to be implemented under the assumption of the worst case possible behavior, making it difficult to leverage the features of more advanced systems. As with the affordances around local.data = remote.data, if I have a socket, a local file, a remote web page that I own, a handle to an led, a handle to a stop light, a database row in a table that has triggers set, the stdin to an open ssh session, and a network ring buffer all represented in the same object system, I have as many meanings for file_object.write('something') as I have types of objects, and the consequences and side effects of calling write are so diverse (from flipping bits on a harddrive to triggering arbitrary code execution) that it is all but guaranteed that something will go horribly wrong. At the very least there would need to be a distinction between operations where all side effects could be accounted for beforehand (e.g. writing a file of known length to disk has the side effect of reducing free disk space, but that is known before the operation starts), and operations where the consequences will depend on the contents of the message (e.g. DROP TABLES), with perhaps a middle ground for cases with static side effects (e.g. the database trigger) but that would not immediately visible to the caller and that might change from time to time.

The distinction between files and folders is quite annoying (non-homogeneous), especially if you want to require that certain pieces of metadata always 'follow' a file. This is from working with xattrs that are extremely easy to loose if you aren't careful. Xattrs are a great engineering optimization to make use of dead space in the file system data structure, but they aren't quite the full abstraction one would want. It is also not entirely clear what patterns to use when you have a file that is also a folder -- do you make the metadata the outer file and the data the inner file? Or the other way around? Having the metadata as the outer file means that you can change the metdata without changing the data, but that the metadata will always 'update' when its contents (the data) changes. However, when I first thought about using such a system, I had it the other way around, and a system with that much flexibility I suspect would have even more footguns than the current system.

Another issue is the long standing question around what constitutes an atomic operation. Everything is simple if only a single well behaved program is ever going to touch the files, but trying to build a full object-like system on top of existing systems is a recipe for leaky abstraction nightmares.

While I was working on this I came across debates from before I was born. For example hardlinks vs symlinks. There are real practical engineering tradeoffs that I can't even begin to comment on because I don't understand the use cases for hardlinks well enough to say why we didn't just get rid of them entirely.

0. https://github.com/SciCrunch/sparc-curation/blob/master/spar...

hlandau · on June 3, 2019

>and the consequences and side effects of calling write are so diverse (from flipping bits on a harddrive to triggering arbitrary code execution) that it is all but guaranteed that something will go horribly wrong.

This is why I suggest we really need the ability to dynamically add new verbs. POSIX has one write() but in terms of semantics it's really a whole family of verbs as one overloaded method.

>The file system itself needs to know about the new type of object that it is going to represent, rather than simply acting as an index of paths to all objects.

What I had in mind was that a given filesystem driver (e.g. a userspace FUSE process) would provide object types it supports. So for example, a "printer FS" process, printerfsd, would provide printer objects under e.g. /printer/. But the vfs - the layer that does prefix matching on mount()ed filesystems wouldn't need to know about new object types, as it's just a dispatcher.

One shortcoming of this is that you can't mv /printer/foo to another filesystem. That's also a shortcoming of e.g. today's /proc or /sys, but there still seems to be enough that's worthwhile about this approach.