Ions: Develop cloud applications by deploying to a running Datomic cluster

vincentdm · on June 7, 2018

Can someone explain the benefits of Datomic Ions? What problems is it supposed to solve and how does it compare to existing solutions?

Also, what is the story regarding local development? (I also have this concern about AWS Lambda; it seems you can only realistically run code in the actual cloud environment)

We are investing heavily in the Clojure-ecosystem (but not yet Datomic). It seems Cognitect is strongly headed into a direction which involves some kind of holistic vision about a new stack, but they don't seem to explicitly communicate this vision anywhere.

Edit: I asked the above question also on Clojure Reddit and got an interesting response: https://www.reddit.com/r/Clojure/comments/8p3d5s/datomic_ion...

jeremiahwv · on June 7, 2018

My take is that one of Datomic's original value props was that it ran in your Java app's process. That plus immutability plus cache-foo allowed you to write your Java app as-if your entire db was not only "in-memory", but "in-your-app's-process-memory." That is, um, "pretty dope".

Then they introduced Datomic Client, which is more of the "normal" db paradigm. Your app includes the client lib, which connects to the running db process "somewhere else". I don't know why they did this (easier to include?), but it meant that the "db-in-your-app-process" value prop went away.

Initially when they deployed Datamic Cloud (a sort of managed Datamic service deployed in AWS), they only supported the client model.

That is context for your question. The problem Ions solves is how to get that "db-in-your-app-process" value prop while using Datamic Cloud. It seems to me that it effectively reverses the original method. Original method was add the DB as a lib into your app, Ions method is push your app into the db (which is running in a configured-for-you cluster on AWS).

(It also allows you to not have to provision/configure servers to actually run your app code in addition to the db servers, but that seems secondary to me.)

Note, that I'm not affiliated with Cognitect, and I've never used Datamic, or Clojure in a production system. So have a salt shaker.

derefr · on June 7, 2018

> I don't know why they did this (easier to include?)

I wanted to use Datomic, but none of the languages I work in run on the JVM. Datomic Client is really the only way I could. (Though I also considered, briefly, reimplementing the entire "transactor" library for this other platform, since the transactor living "inside" your process really is a selling point.)

jb1991 · on June 7, 2018

What is the diff between “in memory” and “in-your-app's-process-memory”?

derefr · on June 7, 2018

An "in memory" DB is just a DB that can answer queries quickly (e.g. in O(1) time.) You still speak to it over a network protocol (even if it's going over the loopback interface), where everything gets serialized into long binary strings and then deserialized again. You can maybe achieve zero-copy access into the packets the DB is sending you if the DB is communicating using something like Capn Proto, but that's still two copies of the data: one in your app process, and one in the DB process.

A database whose data is in your process (like, say, BDB, or Erlang's ETS, and not like SQLite†) allows you to hold pointers to data (usually either "tuples" or "objects") that are in the DB, from your process, and treat them as if they were just regular data sitting on the stack/heap. Depending on the DB's persistence architecture, this can mean that you can even mutate the data in object form, and that mutation will be automatically persisted—because what you're mutating is, after all, the DB's copy of the data. Unlike e.g. Firebase, there's no secondary synchronization between "your local" copy of an object and "the database's copy" of the object; your process is holding the database's copy of the object.

Or, if you like, you can think of an "in-process-memory" database like a server in a distributed-RPC stack, ala CORBA or Windows COM. You ask the DB for an object, it gives you an RPC handle to the object. Except, since you and the COM server share memory, this handle doesn't need to do any marshalling or IPC or locking; it's an optimized-path "direct" handle.

Another potential path to enlightenment: if you create a "flat-file database" by just mmap(2)ing a file—and then you write structs into that file-backed memory, and keep pointers to those structs—then you've got an "in-process-memory database."

† SQLite the library resides in your process's memory, and it can even allocate tables that reside only in-memory (which is also "your process's memory") but SQLite still communicates with your process through the SQL protocol, as if it was on the other side of a socket to your process, with no way to "get at" SQLite's internal representation of its data.

jonahbenton · on June 7, 2018

Datomic Ions are a cloud-native version of stored procedures (in Datomic terms). I have not fiddled with Ion but the local dev story with Datomic is fine. One does not need cloud connectivity for local work, and ordinary functions can be run locally in the write path in Datomic. I would guess/assume that these same functions can be deployable as is in Ion- modulo the CI/dependency machinery required for cloud deployments.

rockostrich · on June 7, 2018

I can't speak for Datatomic Ions and Clojure specifically, but there's a bunch of tooling options around "running" AWS Lambda locally. The most popular last time I checked was Serverless (https://serverless.com/)

scarface74 · on June 7, 2018

Why do you say that you can only run lambdas in the environment? A Lambda function just calls a function with an event and a context parameter.

I run and test my Lambda methods locally just like I do my Controller classes - create "unit tests" (not real unit tests just a method that calls the Lambda handler with the event body I want to use). Your Lambda handler should be skinny and just translate the message to your business methods just like a controller.

vincentdm · on June 7, 2018

Yes, I understand that, but it doesn't account for all the moving parts of the Lambda service (like container re-use, warmup time, deployment steps...) that have no local equivalent AFAIK. But maybe I'm overestimating their importance...

eikenberry · on June 7, 2018

You can replicate some of the live-like environment using SAM[1] and SAM-local[2]. Combined with services like ngrok[3] you can even test remote hooks and such.

[1] https://github.com/awslabs/serverless-application-model [2] https://github.com/awslabs/aws-sam-cli [3] https://ngrok.com/ or http://serveo.net/

scarface74 · on June 7, 2018

Deployment to a production environment is usually different than running locally. I have a CloudFormation script to handle deployments.

Well, I also have a yml file that builds my Lambda package with CodeBuid since I develop on Windows and Python packages with binary dependencies don't work on Linux based lambdas.

The CodeBuild step will package the Linux versions of the packages.

jb1991 · on June 7, 2018

Man, am I ever a dummy! I look at the diagram on that page and it might as well be Egyptian hieroglyphs to me. I'd have to study that thing one long time to even understand what problem Ions is solving. I'm sure it's a real problem with an interesting solution, but geez the modern Cloud world has really gotten quite complex.

I feel particularly silly because I've been writing Clojure professionally for many years.

nickik · on June 7, 2018

This is about Datomic, not really Clojure. So if you don't have experience with Datomic Clojure understanding would not really help you.

Seems to me this is a combination of Datomic and Amazon Lambda to deploy programs that can use Datomic without all surrounding deployment problems.

EpicEng · on June 7, 2018

>Ions let you develop applications for the cloud by deploying your code to a running Datomic cluster

I could be totally off base here as I have no real experience with clojure, but this intro sentence threw me off right away.

>Ions let you develop applications for the cloud by deploying your code to a running Datomic cluster

What is an ion? What is a "datomic cluster"? Are these terms just new jargon for what is essentially a deployment system? If they're a clojure thing then ok, but a quick search makes me think they're not. I understand software and I understand deployment. I read the rest of the page and I feel like I would have to problem understanding what this really is if they cut down on the jargon.

paradigmshiv · on June 7, 2018

Datomic is an immutable database, written in Clojure. It's pretty cool, you should check it out.

The company behind Datomic (Cognitect, which also develops the Clojure language) recently started offering the database as a cloud service. This was something Datomic users had wanted for a long time, but one problem with the new service was that one of Datomic's primary features, i.e. having the database reside in memory with your application, was missing.

Ions is a new service that addresses this problem by allowing you to deploy your app to the same VMs that are hosting Datomic in the cloud (a "datomic cluster"). It's basically PaaS for Datomic-backed applications.

I'm a longtime Clojure user and I had some trouble understanding all this as well, so I agree that could do a better job with presentation.

jb1991 · on June 7, 2018

> one of Datomic's primary features, i.e. having the database reside in memory with your application

I'm not sure I follow - are you saying it is desired to have an entire database also in memory? You don't probably mean that, but I thought a key feature of databases is for them to be your offline storage so things don't need to be in memory. I wouldn't be able to fit the contents of my SQL databases into memory on any machine I own.

stuarthalloway · on June 7, 2018

Datomic keeps your database in the same memory space as your application code, backed by SSD, backed by EFS, backed by S3/DynamoDB. Each layer of this cache provides a different value, and the whole cache system is automatic and eternally consistent because the database is an immutable value.

You want the illusion that your database is entirely in memory, just as want the illusion that all your memory is in your L1 cache.

alter_eco123 · on June 7, 2018

> Each layer of this cache provides a different value

What does that mean?

And what kind of actual memory usage should people expect with a "big" database?

stuarthalloway · on June 7, 2018

Hm, I should have said "value proposition", e.g. SSDs are low latency but can fail, S3 is high latency but "never" fails, etc.

Datomic is designed for data sets larger than memory, and also to do a good job caching working sets that do fit in memory.

alter_eco123 · on June 8, 2018

> Datomic is designed for data sets larger than memory

Of course. But you didn't answer my question.

Let's say someone has an application with 100GB of data in Datomic.

What kind of memory usage should he expect for the "peer" process?

stuarthalloway · on June 8, 2018

That depends on what queries the peer runs. In both On-Prem and Cloud, Datomic maintains an in-memory LRU object cache. Also, you can send queries with different working sets to different processes without having to shard data on ingest. For example, some peers might handle queries related to user transactions while other peers handle analytic or batch work.

See also https://docs.datomic.com/cloud/operation/caching.html.

0xFACEFEED · on June 7, 2018

> You want the illusion that your database is entirely in memory

I don't want illusions. When it comes to databases I want simple and predictable.

dunham · on June 7, 2018

Datomic basically does the same thing your database does - caching the working set in memory and running the queries. The difference is that with datomic the db is running in your java process rather than on another server. This should get you better scalability (for reads) and make complex queries faster. (Writes still go through a designated server.)

The data is modeled as a collection of (entity,property,values,txn) tuples (with schema on property). They are stored in a log and a few ordered sets (indexes) that are structured somewhat like 2-layer btrees with large pages. These pages and the tail of the log are loaded as needed while running queries. The nature of the data model also gets you point in time queries and queries across time (e.g. all previous values of a property).

paradigmshiv · on June 7, 2018

There is some cleverness going on behind the scenes, so for a large database you wouldn't necessarily have the whole thing in memory at once. But one of the big selling points of Datomic is that you get to treat the entire database as a value, and when you configure your app as a Datomic Peer (the feature I referred to above), it loads most of the Database into memory as it is needed.

fulafel · on June 14, 2018

I think the idea of databases being on-disk (vs in-memory) is getting quite outdated. You want durability of course, but most datasets can fit in memory these days.

(Which is natural if you consider the retarded development of disk/ram vs processing power; cf the memes of "disk is the new tape" "ram is the new disk")

EpicEng · on June 7, 2018

> Datomic is an immutable database, written in Clojure.

Ah, that helps (I guess my search was a bit _too_ quick). So Ion is really the only jargon there. I'm not even going to delve deeper as I just don't know the ecosystem at all. I was hesitant to even comment in the first place. Thanks.

gadders · on June 7, 2018

They should get you to write their website copy. Thank you for the summary/translation.

nickik · on June 7, 2018

Ion is a name as far as I know.

Datomic is a database consists of multiple services so you can deploy it on one machine or multiple. I called it a cluster because on AWS of course the idea is to deploy it on multiple.

Amazon Lambda is a way to deploy code without telling Amazon about the underlying machines/VMs. Ion is a way to deploy Clojure and Datomic without knowing anything about the underlying machines or VMs.

always_good · on June 7, 2018

Ions is the name of a service. Datomic is a database. These aren't jargon.

Are you also bewildered by "Postgres cluster" or "MongoDB cluster? Why isn't it your responsibility to spend the 10 seconds to see that Datomic is the name of another database?

JBiserkov · on June 7, 2018

In addition to the linked page there's also a tutorial https://docs.datomic.com/cloud/ions/ions-tutorial.html

and a reference https://docs.datomic.com/cloud/ions/ions-reference.html

baxtr · on June 7, 2018

There are always two sides to this... I think it’s badly communicated. Someone said (I don’t find the quote), that if you can’t explain a complex topic in simple words, you haven’t understood it thoroughly.

cube2222 · on June 7, 2018

Einstein said this, but he also said: Everything Should Be Made as Simple as Possible, But Not Simpler.

VaedaStrike · on June 7, 2018

If you know Clojure and AWS I really don’t think it is poorly communicated. There’s a limit to that idiom of simplicity and understanding being correlated.

This was a walk through of the mechanics. The TL;DR is if you are comfortable enough in AWS and Clojure implementing your system in a scalable and simple and wholistic way just got a bunch more accessibile.

zimablue · on June 7, 2018

I've written a clojure app, used aws and lambda functions and skimming this I really wasn't sure wtf they were talking about. Datomic... Something something lambda functions, deployment. I know that they have an existing ami that uses a cloudformation template to let you set up a datomic server on AWS, this is something similar maybe? I think I could get it if I read in detail but this is definitely poorly communicated, it reads like old Microsoft enterprise sales copy.

samuell · on June 7, 2018

We need more open source datalog implementations with disk based storage.

I have my hopes now on Mozilla's Mentat [1], but it seems to still have quite some way to go, and also I wouldn't mind something similar implemented in Go too ... At least something simple, allowing to start working with the datalog paradigm for expressing logic and views.

[1] https://github.com/mozilla/mentat

swlkr · on June 7, 2018

Here’s a really early one based on persisting datascript

https://github.com/replikativ/datahike

nickik · on June 7, 2018

I really love Datomic and it solves so many problems that I usually have when doing application development special in bank and financial context.

We are already using Clojure and I would love to introduce Datomic as well but being a commercial database makes it pretty difficult.

This new setup is sounds cool but I would love something like that on a Kubernetes stack. In the real world Amazon is a no-go in so many situation.

I must say I often found the setup needlessly complicated. There is not much application for making setup and development easy when outside of AWS. No official way to set up of the whole stack with Docker or Kubernetes for example.

alter_eco123 · on June 7, 2018

> We are already using Clojure and I would love to introduce Datomic as well but being a commercial database makes it pretty difficult.

I don't see why paying for Datomic would be a problem, but if the license is as crazy as it used to be, that's a bit scary.

For example, the license at least used to forbid even saying "Datomic" aloud in public.

Publishing benchmarks was also forbidden, which is a big red flag. If Datomic performs well, wouldn't Cognitect want lots of benchmarks out there? If it doesn't, they wouldn't!

> I must say I often found the setup needlessly complicated. [..] No official way to set up of the whole stack with Docker or Kubernetes for example.

You don't find Kubernetes needlessly complicated? Do you actually need Docker or Kubernetes?

marcinzm · on June 7, 2018

>You don't find Kubernetes needlessly complicated? Do you actually need Docker or Kubernetes?

They make life easier in my experience especially if you need something more than a simple self-enclosed application. For what I do we need Apache Spark, Airflow (which in turn uses RabbitMQ and Celery if you want to scale out), possibly JupyterHub (which also needs some way to spin up child servers), and then the application code itself.

I've wired some of it up myself in the past and it's a massive pain in the backside with lots of edge cases (Spark used to have a shitty massive python script for cloud deployments which always was missing something). Deploying it on Kubernetes with Docker on the other hand is rather simple. I basically trade the complexity of Kubernetes/Docker for the complexity of custom deployment/scaling/management code. In my experience, the code quality of the former is going to be much much higher than the code that an internal DevOps team outputs.

tedmiston · on June 7, 2018

I've spent timing wiring up a similar setup with Airflow, and Celery on Kube, Kafka too, and running Spark clusters and echo the same sentiment. I also find Kubernetes easier/simpler compared to DC/OS, perhaps because it's a bit more structured and opinionated.

nickik · on June 7, 2018

> I don't see why paying for Datomic would be a problem, but if the license is as crazy as it used to be, that's a bit scary.

Paying is not the only problem with using commercial tools, specially in the context of costumers and their own infrastructure.

> You don't find Kubernetes needlessly complicated? Do you actually need Docker or Kubernetes?

Kubernetes is complicated but what it does is very complex. Yes I do need it because if you have many costumers and many environments not using Kubernetes is 10x more complicated. That said we done require it but having it easily available would be very useful when considering using it in the company.

Having docker to run on the local machine is way preferable to other setups. I have not used a natively installed mysql or oracle db for testing in a long time. Given how tricky the setup for Datomic it would make things easier.

jb1991 · on June 7, 2018

> For example, the license at least used to forbid even saying "Datomic" aloud in public.

> Publishing benchmarks was also forbidden

Seriously? I find this absurd, but I have no reason to think you aren't being accurate.

That is some corporate paranoia.

ahjones · on June 7, 2018

From the license https://www.datomic.com/datomic-pro-edition-eula.html

> The Licensee hereby agrees ... it will not: ... (j) publicly display or communicate the results of internal performance testing or other benchmarking or performance evaluation of the Software;

andrioni · on June 7, 2018

This kind of clause is usually known as a "DeWitt clause"[1], after a University of Wisconsin professor who benchmarked a couple databases, including Oracle (which performed poorly). Oracle/Larry Ellison didn't react well to that and decided to forbid benchmarks.

[1]: https://danluu.com/anon-benchmark/

JoeAltmaier · on June 7, 2018

Database performance depends on so many variables - thread pools, queue sizes, RAM allocations to different purposes, disk layout - the list goes on. There are whole consulting fields that are essentially doing a random walk through database configuration files looking for better performance - ascending the performance gradient if you will.

So in this world, its sensible to say "benchmarks are uninformative and misleading - your mileage will almost certainly vary"

solussd · on June 7, 2018

Benchmarks are easily abused, misused, and misinterpreted. E.g., benchmarks looking at some very specific aspect of query performance being extrapolated to more complex/real-world queries.

Also trade-offs are rarely mentioned in benchmark numbers– e.g., great write throughput, at the expense of: ?.

It's fun to be cynical about stuff like this, but it's rarely as simple as "Ellison didn't react well to that and decided to forbid benchmarks".

alter_eco123 · on June 7, 2018

> Oracle/Larry Ellison didn't react well to that and decided to forbid benchmarks.

So you kind of have to wonder why Cognitect is going Oracle on us..

The most obvious explanation is that Datomic just doesn't perform well and they don't want people to know.

_halgari · on June 7, 2018

Anyone who has done serious performance testing on a DB knows that there's a massive gap between initial findings and a well tuned system designed with the help of the database maintainers. I've seen some nasty performance out of Riak, Cassandra, SQL, ElasticSearch etc. But with each of those, once I talked to the DB owners and fully understood the limitations of the system it was possible to make massive gains in performance.

Databases are complex programs, and if I ever wrote one, it would be infuriating for someone to pick it up, assume it was "just like MySQL" and then write a blog post crapping on it because it failed to meet their expectations.

alter_eco123 · on June 8, 2018

Yes, benchmarks can give a misleading impression of a database's performance.

So what? Somehow PostgreSQL is doing fine despite that.

Which is worse publicity for Cognitect: people publishing bad benchmarks or Cognitect forbidding benchmarks Oracle style?

always_good · on June 7, 2018

Well, the absurd one was:

>the license at least used to forbid even saying "Datomic" aloud in public.

bjconlan · on June 7, 2018

It seems Rich (yes and the team at cognitect) might have jumped the shark on this one. (Or this was released too early and the next conf. talk will clear up what the problem is and what it is that this is attempting to solve).

Either way it's rare for Rich not to describe to problem a little more before throwing a solution to it. Hopefully more will come to light in time.

The brief take away I see from this as a non AWS deployment based user is the ability to go beyond the current limitations of only having access to core library functions in the context of datomic. (I see that deployment of applications also plays a part in this but not sure what that solves exactly)

VaedaStrike · on June 7, 2018

This looks like a Godsend :)

The only qualm I have is it being only on AWS.

Other than that this enables just what I could use right about now!

mark_l_watson · on June 7, 2018

DynamoDB is a preferred backend, and the us AWS only.