Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Generative Agents: Interactive Simulacra of Human Behavior, Now Open Source (github.com/joonspk-research)
174 points by sirobg on Aug 10, 2023 | hide | past | favorite | 55 comments


Would be better to extend to use Llama 2 and break itself away from an open ai dependency. However I think this is striking in the right direction with LLM, embed them in a goal based agent framework with a variety of abilities and available actions that are mediated by planners, optimizers, constraint systems, etc, using the abductive capabilities of LLM for providing the abstract semantic glue and delegating other functions to classical AI and algorithms.

I always find it weird people are obsessed with LLMs being unable to solve quadratic equations or recite pi to some digit or play chess. That’s not the interesting abilities they demonstrate, and we already have techniques that do those tasks as good as you would ever need or want. But we have never had something that can operate in a natural language space and “reason” in that abstract space so effectively. That language alone is sufficient to bring out such amazing capabilities is cool, and mixing models and techniques using LLM as a glue between them will be (IMO) where they really change things.


Plenty of people I know can't solve quadratic equations or more than 4-5 digits of pi, and nobody claims they're not intelligent.

Once again it's a kind of backhanded milestone for AI that in less than a year the goalposts have moved from "can it even once hold a half-coherent conversation" to "if anyone can think of a question it can't answer then it's dumb and not AI".


It is definitely a nice milestone, for researchers.

I do think, though, that one of the rewards of making LLMs promising was the discovery that the bar for “what is useful” for any task is higher than what the average person can do. That could be seen as moving the goalposts, but another way of looking at it is: we’ve finally bothered to install a scoreboard.

Like, if a person can’t solve a quadratic equation, they probably just aren’t useful for manipulating equations. Which is fine, of course, the average person is also not a very good hammer or screwdriver.

We’re all specialized, the bar for usefulness is probably something more like “how would a person who’s taken an intro college course on this do it.”


The goal isn’t to be more or less useful than a person. It’s to expand the capabilities of computers. And computers are exceptionally good at mathematics tasks already. Why do we care if LLMs are good at things that are already exceptionally well done with computers? The point is they’re exceptionally good at things computers to date have been exceptionally bad at. The rest is weird goalpost moving noise - I wouldn’t use ChatGPT to solve a quadratic equation, I would use Mathematica, or any number of numeric libraries and runtimes. But I challenge you to get Mathematica to summarize a book, carry on a conversation, or even call an api based on natural human language without a rigorous specification apriori. These are holy grails of NLP, which is a capability that is enhancing of the existing capabilities offered by computing technologies today.


What we need are LLMs that can use the tools for mathematics (and other disciplines) that already exist and to write and execute new programs to hopefully solve novel problems, or at least to be able to glue the existing tools together with shells.


Well, this project can use LLMs do decide routes to take then delegates to a path finding algorithm. That’s why it’s interesting IMO. There are increasing number of things that do that. But I would note the LLMs don’t have to use the tools - a new tool can use the LLM, and be the coordinator between tasks to solve. A lot of the challenges folks are struggling with is making a LLM reliably invoke APIs and stuff. I suspect doing that lobotomizes the LLMs just as guard rails and safety does. Better, IMO, is to manage the LLM in an extrinsic framework like this.


Agreed, it's apologetics for human intelligence.


More like a reasonable response to VC funded hysterics.


That too, but this trend has been going on for decades in the field of linguistics. "Language is X," people say. Oh, it turns out that prairie dogs can do X. "Oh, well then language is X-prime!" Turns out parrots can do X-prime. "Oh, umm, maybe language is X-prime-prime! Whatever it is, only humans can do it!"


I find it pretty insane to argue that the human mind isn't somehow exceptional considering we're currently communicating through devices and massive infrastructure require centuries of research produced by those minds.


It's not the status of the human mind we're talking about here, it's definitions of 'language' and 'intelligence.' It seems offensive to some people that those labels might be applied to entities other than humans.


It makes sense that animals are smart in similar ways to how we are smart since they're using the same meat we are. Comparing us to computers is a bit offensive in my view.


I don't think anyone's arguing that we're not exceptional at all. After all, those prairie dogs aren't sitting around arguing about whether humans are actually on their level. Really what we're wondering when discussing AI is whether we're exceptional enough to build machines smarter than ourselves?

Has the Singularity already happened, and was it us?


Side note -- discussion on adding support for local models is here, along with a preliminary fork that adds support: https://github.com/joonspk-research/generative_agents/issues...


Agreed -- now that the repository is open-source, it feels ripe for adapting to a smaller local model. Is it too much dreaming to imagine that if we can perhaps fine-tune a small LLaMa2 model to perform well in this context, we might even be able to get this small enough to run on consumer hardware in an actual "Sims" type game...?


My dwarves in DF would be much more predictable. I’m all for it. So long as the model is safely trained and constraints are in place to prevent Actor A from doing something shockingly bad to Actor B in a way that wasn’t designed.


there is no technical reason why you couldn't unbolt openai interface and bolt in llama. Moreover, once you have this, you need only load the model into memory once. emulating different agents would be handled exclusively through the context window sizes that llm's expect. each agent would just have its own evolving context window. roundrobin the submissions. repeat

what's crazy to think about is what new things will become possible as the context window sizes creep up.


Or as the model is able to be trained (integrated with new information) on the fly, making a somewhat limitless context window


I think there are actual technical problems in doing this.


Well, cost prohibitive on big models obviously. But even just things like DreamBooth training extra associations to sets of new data images is tuning the model as you go


Not your point but I'm pretty sure gpt4 is more than capable of solving quadratic equations and playing chess and probably knows pi to quite a few digits... also it knows many great algorithms for computing pi


Yeah, but it’s not nearly as good at playing chess as a basic chess algorithm. Why would I use GPT4 for chess when Chessmaster 2000 can beat it 100/100 times?


Wolfram alpha was doing those things and more way before gpt was a thing. I don't think any of the attributes you listed make intelligent machines interesting..


I would be shocked if GPT4 could reliably solve novel quadratic equations when it struggles with simple arithmetic.


Why would you be shocked? GPT-4 scores in the 89th percentile on SAT Math.


Because SAT math is all multiple choice, which can be reverse engineered. Ask it to multiply numbers too large to be in the memorization set.


I don't pay for GPT4, but here's Claude falling on its face:

Q:

Provide the quadratic roots of the following equation. Work out the answer step by step.

21 - 135 x + 19 x^2 = 0

A:

Okay, here are the step-by-step workings:

1) Start with the quadratic equation: 21 - 135x + 19x^2 = 0

2) Group the x^2 and x terms together: 19x^2 - 135x + 21 = 0

3) Use the quadratic formula to find the roots: x = (-b ± sqrt(b^2 - 4ac)) / 2a Where: a = 19 b = -135 c = 21

4) Substitute the values into the formula: x = (-(-135) ± sqrt((-135)^2 - 4(19)(21))) / 2(19)

5) Simplify: x = (135 ± sqrt(18225 - 1596)) / 38

6) Evaluate the square root: x = (135 ± 135) / 38

7) The two roots are: x = 5 x = 3

Therefore, the roots of the quadratic equation 21 - 135x + 19x^2 = 0 are x = 5 and x = 3.


We need a llama2 shim that runs locally and understand openai calls, then we can run all these funny software just changing our host file at no additional complexity for the implementer


It feels plausible that within the next few years LLMs will be powering NPCs/enemies in AAA video games. On a technical side, I think we will be gated by what a PS5/Xbox Series X can process locally for this generation. On the gameplay side, I think this could open up a lot. The main loop of summarizing experiences and then feeding that summary into the next prompt can unlock characters/enemies that are much more dynamic. Here are two examples of gameplay elements I think would benefit from it.

Resident Evil 2 Remake was famous for having a strong enemy AI for Mr. X. He stalked the protagonist around a three story building. Players had to flee from him, and when they have successfully lost him then they had to quietly sneak around the building to avoid detection as he searched for the player. I imagine being able to learn from past encounters would make him even more frightening to run from. A stalking AI could take into account hiding places it found the player in the past, tactics they used to flee, and where the player's next objective is when deciding it's plan of action.

There's another genre of games this might find itself useful in. Those games in which you interact with a village on a social level. Stardew Valley, and Majora's Mask come to mind. Having more dynamic interactions with the townsfolk, that impact future interactions, could help draw in users to the simulated-social aspect of these games.


Procedural generation has been around since the dawn of computer games: Rogue, one of the earliest games, randomly generates a dungeon each time you enter it.

Despite that, procedural generation is still quite rare in shipping games outside of a few niche genres. I think the biggest problem is one of control. A huge part of the process of making and shipping a game is balancing it and testing it to ensure the play experience never goes off the rails.

Even relatively simple procedural generation can make that very difficult. Imagine playing a Zelda-like game where it turns out that 0.0001% of the time, the item you need to make progress is stuck behind a wall where the player can't reach it. Worse, they won't discover this until many hours into the game. That kind of stuff keeps game designers and producers up at night.

They would rather a less varied, hand-authored gameplay experience, if the result is one that they have more control over and understand better.

Bolting an LLM onto your game for NPC dialog sounds really cool until a popular Twitch streamer is playing your game for an audience of millions and some random NPC spouts a racist slur.

What I do think will be very common is game designers using LLMs offline to generate dialog and other assets, and then after the designer has vetted them, putting them into the game as fixed authored content. That kind of procedural generation is used all the time and has been for decades.


> Imagine playing a Zelda-like game where it turns out that 0.0001% of the time, the item you need to make progress is stuck behind a wall where the player can't reach it.

The very concept of "an item you need to make progress" is an artifact of a non-procedurally generated, railroaded plot. And even if your procedurally-generated game includes such things, it's a solvable problem to make this basically never happen.


I think your RE style AI suggestion can be done in the traditional sense of AI in game development; that is, without machine learning involved at all. That's all possible with goal oriented action planning, and similar techniques. Game developers often find though, that making the AI too smart can actually make players dislike playing, or putting completely fair but very smart AI in can make the players report that they feel the AI is cheating or unfair. So it gets dumbed down.

The main issue I see with generative AI and games is that it's all good to be able to chat to an NPC as if it were a person with a personality and knowledge of the world. There's an issue of fidelity though; how can you ensure that the AI only reports things that are true about the game world? And then, the issue of actual behaviour: an LLM might generate speech that has the NPC you're talking to say that they're going off to the inn at 5pm to recruit some sellswords, then march to the den of a fire dragon to defeat it, stopping along the way to collect an ice sword from its guardian maiden who lives in a giant tree nearby. OK, it's to generate that text from the player's prompts, it's very difficult to then actually have the NPCs act out the things that the LLM has just said it would do, tying in pathfinding, scheduling, animation, group behaviour and so on, and carrying out those actions would probably involve more traditional game AI techniques (again, not machine learning) anyway. Maybe that'll be solved some other way, maybe this repo does something like that already, I didn't check.


I'm reminded about how people still talk about how the original F.E.A.R. has the best AI in video games [1], while the actual behavior in a technical sense is quite simple [2] and is fundamentally designed around the player being time-pressured but still understanding what's going on. If you just plug an LLM into everything you lose that intentionality around players understanding the system without actually needing to be a subject matter expert on whatever the enemy should be 'realistically' doing.

[1]: https://www.rockpapershotgun.com/why-fears-ai-is-still-the-b...

[2]: https://alumni.media.mit.edu/~jorkin/gdc2006_orkin_jeff_fear...


You also have to consider how AI techniques fit into the context of a running game. ML techniques haven't really caught on with existing games because they completely tank performance, in a program that has to make computations every frame, in addition to either completely murdering the player or acting weird/being hard to debug or just not being that compelling.


What I've seen of current experiments with integrating LLMs into NPC conversations (I believe someone even got it integrated with a Skyrim mod, iic) is that to skirt around this issue, they just make HTTP requests to OpenAI or whatever. Game runs smoothly, but there's still a weird lag around NPC responses that makes it feel stilted and unnatural, and obviously relies on having a stable internet connection at all times.


It will be exciting to see what people could do with something like a fine tuned local Llama2 model. I don't think the way gpt is set up is very conducive to game systems unfortunately.


I agree with this, I think it definitely has potential along these lines. Open world sandbox games that can play out in different variations from a starting cast of characters with different personalities and motivations.


Interesting project. Though the sample conversations in the picture read like what someone thinks a human would sounds like in conversation. Or like those snippets you get in a language learning module:

>[Abigail]: Hey Klaus, mind if I join you for coffee?

>[Klaus]: Not at all, Abigail. How are you?

>[John]: Hey, have you heard anything about the upcoming mayoral election?

>[Tom]: No, not really. Do you know who is running?


Or expository dialog in a movie. For example, most people have context in the real world when you say “any news on the election?” - “the election” is going to have local or national significance enough that “upcoming” is unnecessary (there would be news about the outcome, dates are known in advance, etc.) and “mayoral” might be kind of helping (I don’t mean city council or county commissioners) but I have never heard anyone use the term outside The Media. But this is exactly the kind of dialogue I expect from on-screen characters at the start of a movie or episode.


> For example, most people have context in the real world when you say “any news on the election?”

There was a quote (I thought from Hacker News but can't find it) that went something like this:

"As an engineer, I've always imagined that working in sales is something like this:

- you are on the golf course with a client

- someone says 'hey, did you guys see the game last night?'

- somehow, everyone magically knows what game you are talking about!"


If they are trained on a corpus of text it seems likely they got far more antiquated and/or stilted written dialog in that sample than they did transcribed modern conversations.


I'd be curious how it handles something like:

[A] Not so good. I lost my mom last week.

There are any numbers of ways robo-Klaus might respond to that in a completely inappropriate way.


Have you tried looking in the sofa? I found some change I had lost last week.


Link to a precomputed simulation: https://reverie.herokuapp.com/arXiv_Demo


The "tip" made me chuckle.

> We've noticed that OpenAI's API can hang when it reaches the hourly rate limit. When this happens, you may need to restart your simulation.

What's the current status about if we're in a simulation?


a cointoss


I would like to see someone try to add something like this on top of dwarf fortresses using it’s character description etc as a base.


This is What I also thought about! I have no idea how you can connect Dwarf fortress To a LLM Though.


Reminds me of this small github project I stumbled upon many moons ago: https://hermanya.github.io/sims_kind_of_game/


Is the title of the paper a nod to Simulacra and Simulation? (And hence, also a nod to the Matrix by Wachowski sisters)


Just when the hype around LLMs start to go away, those with stakes in the game drop a new hyped news. The lack of effectiveness and usefulness of these agents has been shown time and over. Interesting thought experiments, but not so much useful.

This is similar to APL, SmallTalk, and Lisp. Novel ideas that don't scale.


>The lack of effectiveness and usefulness of these agents has been shown time and over.

Has it ?

https://arxiv.org/abs/2307.07924

https://arxiv.org/abs/2307.02485

The first one doesn't even use GPT-4


Could please anyone explain like I am 5yo what is the purpose of that project and it’s use case ?


To prove that we are in a simulation.


This is awesome! Thank you for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: