Recent Posts

ExVenture Overworld

Posted on 19 Jun 2018

This week I wrapped up the absolute basics of ExVenture's largest new feature, the overworld.

Overworld overview

This was a huge feature that's barely been scratched.

Map Editor

First up was I wanted to make a nice way to edit the overworld in the browser. This was my first big React app. You can see it in action below:

Overworld map editing

I have a lot of ideas on how to make this nicer, but the editor works for now. Things I want to add are: a nicer color picker, which will display the color instead of just using the word; and some way to paint bucket large areas with symbols and colors.

I may also set up a terrain swatch to bundle together a symbol and color (and eventually cell attributes) into a nice picker.

Exit Editor

After the map is created, you need to hook it up with rooms so players can move in and out of the map. This is another react page that does live API updates.

Overworld exit editor

Moving Around

Once exits are created, players will be able to move around freely between the room based zones and the overworld based zones.

Here is movement on MidMUD, going from the room based town of Milay to the countryside around it:

Overworld movement

Backing Architecture

Overworld zones are set up as a pool of Sector processes. Each sector handles a portion of the grid of cells in a map, 10x10 right now. The calling or casting process determines which sector to talk to so this should scale well.

Only minor updates for the commands were needed to get the overworld working. Mostly the updates were stripping functionality and making sure they did less on the overworld. Commands like involving items in particular were broken on the overworld, because I didn't want to implement item storage on the overworld quite yet.

TODOs

I've started a GitHub issue to track all the things that are still missing. The big ones right now are:

  • Items in the overworld
  • NPCs in the overworld
  • Say communication should go further than your current cell
  • Seeing other players on the map

I am looking forward to seeing the overworld slowly fill out now that the basics of being able to get to it and move around are done. It is a huge relief to have this part over.

Links for MidMUD & ExVenture

ExVenture Updates for May 2018

Posted on 29 May 2018

The last month of ExVenture had a lot of development, from a Mudlet package to fixing bugs in how ExVenture clusters to more NPC customization.

Links for MidMUD & ExVenture:

Mudlet Package

Mudlet is a client that can be used to connect to MidMUD or any other MUD. They have a way for the server to push a client package down to the client. This lets Mudlet be customized for MidMUD and any ExVenture server.

Through the server push mechanism, you can also auto-update the package. This lets the package stay lock-step with the server as new things are added and changed.

The package now contains and auto-mapper, stat gauges, and a tabbed chat.

Mudlet package

Telnet Connection Flow

To go together with the Mudlet additions, I worked on the telnet connection flow. Previously you needed to get a one time password and enter that into you chat.

I changed it up to work more like how Netflix authorizes a remote device. The telnet connection gets an ID and registers it as a session waiting to be authorized. You then click the authorize link the connection is giving you. On this page you can authorize the connection and be signed in when you switch back to your client.

This is so much more convenient than copy pasting a password and I think its just as secure.

Telnet authorize

Large Scale Metrics

I spent an afternoon a few weeks ago trying to figure out how well ExVenture will scale as more and more data is pushed in and thousands of processes spin up. ExVenture performed wonderfully in my testing. I only tested the number of NPCs performing movement.

I set up roughly 14,000 rooms with 20,000 NPC processes wandering around them. This was on a quad-core Macbook Pro. The 99 percentile for movement was 350 microseconds. I am still in a slight shock about how much Erlang did not care about data size.

Movement stabilization
Movement stabilizing over time
Movement metrics
Movement metrics

Multi-Node Bugs

As expected, turning ExVenture into a distributed app came with a significant number of bugs. Luckily I had some help on the ExVenture Discord in finding them. I set up Sentry to start recording new exceptions as they came. This lets me see what was going on and record them better than a log I never look at.

A summary of the bugs I found so far:

  • Channel communication was not spanning the cluster
  • Raft deadlock
  • Keeping players out of the game until it is online
  • Rebalance zones as nodes drop out
  • Current connected players were locking up
  • Lots of functions were not expecting bad data back
  • Strange transient data errors, I think due to processes not rebooting properly
  • A room can crash often enough to tank the entire supervision tree and kill a node
  • Session recovery exponential back off

Not all of these are fixed, namely the room tanking the entire tree, but most are. Getting in session recovery to be exponential is a big win. A player connected could generate thousands of exceptions by how I had it before.

Game Jam

The MUD Coders Guild had a Game Jam going on during the beginning of May. In the last moments I joined with MidMUD in the cheapest feature I could think of: a random name selection for players signing up.

This is a simple feature, but one that I am glad to get in. Picking a name has to be the hardest part of making a character, so simplifying this for new people is a good win.

Home Page Updates

The home page now includes a web chat client! This was fun to add since it was the prototypical phoenix application. Phoenix channels are finally being used "right" in ExVenture!

The admin UI also got an update. I had been watching [Refactoring UI][refactoring-ui] and became inspired to make the admin look less crappy. There are now flash messages on all actions, and the forms should look a lot better.

If you attach an email to your account, you can trigger a password reset.

Web chat

Smaller Tweaks

  • Debug command
  • A basic API is available for public information
    • curl -H "Accept: application/json" https://midmud.com
  • Admin can change web client colors
  • Home page is slightly tweaked color wise
  • NPCs don't target players after a respawn
  • If you try using a skill that exists but you don't have, you get a nicer message for it
  • Get all items in a room at once
  • Delete rooms from the admin
  • Handle web client disconnects, by reconnecting
  • Send mail from the home page
  • Disable user accounts
  • Bug: Adding an item spawning did not trigger its timer
  • Bug: Session recovery did not start regen
  • Quests always show their progress as 100% after completion
  • Bug: several bugs in the telnet protocol surrounding IAC
  • Bug: web page redirects after signing in was slightly broken
  • NPCs can have delayed actions
  • NPCs can have mutliple actions in a single event, queued up
  • Channel chatter is recorded for replay when connecting to the web client
  • scan command to view the surrounding area
  • Who list tweaks
  • Commands can be marked admin only
  • Automatic balancing of NPCs, change their level and the stats boost to the minimum for that level
  • Continuous effect for stat boost
  • Skills can default target yourself
  • Target yourself via self
  • Hide yourself when looking at a room
  • Format chat messages by capitalizing and adding punctuation
  • Bug: NPCs targeted you before entering a room
  • GMCP heartbeat message
  • Lock movement when skills are cooling down
  • GMCP skill status message

Next Month

Next month I am going to continue with bug fixing while the app is in clustered mode. I have been getting some suggestions about what to tackle next so I will probably continue with that. I also wouldn't mind getting to in game forums in the next month.

I might also start switching these to a faster update cycle if I continue to have this many changes.

Going Multi-Node with Elixir

Posted on 17 May 2018

From the last update of ExVenture, ExVenture can now be configured to run in a multiple node configuration. In this post I'll show you the basics of how I did that.

What is ExVenture?

ExVenture is a multiplayer text-based game, a Multi-User Dungeon (MUD), server.

You can see more information about ExVenture at exventure.org and GitHub. You can see it running on MidMUD. There is also a public Trello board.

Starting point

ExVenture was heavily geared towards running as a single node to start out. I used Registry very heavily, which only works for a single node. I also have a few local cache processes that out of the gate wouldn't work well when spanning multiple nodes.

My first thoughts where about splitting up the app into an umbrella app and figuring out how to boot the web on one node, the telnet connection on another, etc. I started talking about this in the MUD Coders Guild, and I got a question of "why?"

This was a great question and got me thinking about what else I could do.

I eventually settled on trying to get the same application booting on all nodes and a leader node starting the processes that can only exist once in the cluster, e.g. the world.

Clustering

First up was connecting up nodes in an automated fashion. Since my app was heavily geared towards being a single node, this should be fine. Each node wouldn't talk to each other and the entire world would be spun up on each node.

This was extremely simple with libcluster. It was as simple as installing the hex package and adding this to my configuration files:

config :libcluster,
  topologies: [
    local: [
      strategy: Cluster.Strategy.Epmd,
      config: [hosts: [:"world1@host", :"world2@host"]]
    ]
  ]

Then when starting my app I switched to booting with iex to get the sname flag available.

iex --sname world1 -S mix
iex --sname world2 -S mix

Picking a Leader

Once the world was clustered, I started on picking a leader. I had heard about Raft before, but never really looked into it.

If you are interested in clustering at all, I'd highly recommend giving the paper a read. It is very simple to follow along and understand. Which was the main point of creating Raft, a simple to understand consensus algorithm.

For ExVenture, I went with implementing my own Raft module because I only wanted the leader election part of Raft. I don't need (at least yet!) the rest of Raft.

You can see that in the Raft module. There is a lot to this module that doesn't need to be covered here, but it boils down to the group picks a leader and that leader calls the subscriptions that care about who was leader.

A leader is picked

Once a leader is picked, the Game.World.Master process uses pg2 to find all of the other Game.World.Master processes and sees what zones are alive in the cluster.

After finding out what nodes are alive it spins up the zones not online across the cluster, using a simple rebalacing algorithm.

Global process registry

I also switched to using the global process registry as part of this. I started looking at swarm but it seemed to be something different than what I was looking for.

I will most likely end up changing this in the future but switching {:via, Registry, {Game.NPC, id}} to {:global, {Game.NPC, id}} was an extremely simple change that worked. So I went with it and haven't looked back.

Messages spanning the cluster

The game was now officially multi-node and you could play on either iex servers and see other players and NPCs on the other one.

There was only one final step and that was setting up pg2 groups for each of my caches and my communication layer.

This is very simple and can be done as follows in the init function and a slight change to casting to your GenServers.

@key :items

def init(_) do
  :ok = :pg2.create(@key)
  :ok = :pg2.join(@key, self())

  #...
end

def insert(item) do
  members = :pg2.get_members(@key)

  Enum.map(members, fn member ->
    GenServer.call(member, {:insert, item})
  end)
end

Next Steps

With this up and running I was able to get MidMUD in a multi-node (3 world servers) set up for production. It has been working out pretty well so far and only once I found a large bug (of not being able to communicate across nodes in channels.)

I will post more updates as I continue enhancing the distributed nature of ExVenture.

You can see everything described here in these two pull requests, #37 and #39.

ExVenture Updates for April 2018

Posted on 25 Apr 2018

The last month of ExVenture had a lot of updates to clustering and some extra world details.

The documentation website is exventure.org. You can see the latest additions here on MidMUD, my running instance of ExVenture. There is also a public Trello board now.

Also check out The MUD Coder's Guild, it's a slack team devoted to developing MUDs.

Distributed Erlang

The biggest new feature of ExVenture in the last month is the improved support for erlang node clustering. I started with adding libcluster to join nodes together. Then I had the world start spanning the cluster via a really simple leader election.

Next up was using pg2 to have the player registry and send cache updates across the cluster. The last step was using :global as the registration mechanism for world processes. I tried out swarm for this but it had some weird properties of restarting all of the processes when a single one died.

You can see this step in PR #37.

Raft

The most recent step was implementing the leader election of Raft. It's mostly done, but it should handle rejoining nodes and adding nodes to the cluster. Once a leader is picked, I have a set of modules that get called on the winner node. On start up this will spin out the world across the cluster.

You can see this step in PR #39.

Damage Types

Early on in the month I added custom damage types and damage resistances. Each damage type has an opposing stat that reduces the damage. This step also added an "echo back" of the damage that was actually applied. A character calculates the effects to send over, sends them, and hears back what actually happened.

You can see this in PR #27 and PR #28.

Custom Colors

This is easier to show in pictures than text.

Custom colors in the admin

Customize your colors

You can see this in PR #29 and PR #31.

Listening

A new command was added, listen. This lets you tune into noises that are in the same room as you. This feature has lots of room to grow as more things make sound.

You can see this in PR #33.

NPC Status Engine

NPCs can change their status line and listen text via emotes now. As part of this they set a status key to eventually enable gating of events they do. Such as they only start combat if they are in a certain "mood" and can get out of it later on.

PR #34 has this.

Smaller Tweaks

  • Admin panel tweaks for events
  • Refactoring events

Next Month

For next month I want to continue with the distributed part of ExVenture. I definitely need to get the world rebalacing as nodes fallout and re-join the cluster. I would like to continue with the other game additions from this month. Listening and the npc status engine are both very good foundations for a lot of extra features in the future.

Once I have more work in the clusting of ExVenture I want to do a deep dive blog post.

Looking at ExVenture's Supervision Tree

Posted on 11 Apr 2018

I was watching The Hitchhiker's Guide to the Unexpected (YouTube link) by Fred Hebert and in that there is a neat exercise of writing out your supervision tree on a whiteboard and seeing how things would fail. With this you could better determine what happens to your application as things go wrong.

I decided this would be a good exercise to do on ExVenture. This is a fairly long post that goes through the full supervision tree for ExVenture.

You can see ExVenture in action on MidMUD.

Supervision Tree

ExVenture Supervision Tree

This is the supervision tree that ExVenture ships with now. There are roughly 3 levels in the photo.

First Level

This is the top level directly underneath the application. It contains, in start up order:

  • Data.Repo - the Ecto repo
  • Web.Supervisor - the Phoenix supervisor
  • Game.Registries - a collection of Registrys
  • Game.Supervisor - a the top level supervisor of the game
  • A ranch listener is also started at this level, but it spins off into the ranch application

At this level the supervision strategy is rest_for_one. This is fine because if the Repo dies the rest of the app should be rebooted, something went wrong. As we'll find later on the loads process with an ID to fetch from the database to ensure a clean state is fetched on process restarts (if something crashes.)

Second Level - Web.Supervisor

This supervisor is mostly sitting on top of the Phoenix Endpoint along with a few process monitors for the TelnetChannel and a Cachex cache. It is handled by a one_for_one strategy. This is fine as none of them are really connected to the other, this supervision level is mostly to break sections up for my benefit.

Second Level - Game.Supervisor

This supervisor contains the "world" along with supporting processes. In start up order:

  • Game.Config - an agent that caches game configuration
  • Game.Caches - a supervisor of Cachex caches along with GenServer processes that are related to caching
  • Game.Server - a tiny process that used to do more, but now keeps player telemetry up to date
  • Game.Session.Supervisor - the supervisor for player sessions
  • Game.Channel - a gen server that tracks player sessions and which channels they are joined to, inspired by Phoenix Channels
  • Game.World - the supervisor that supervises the game world, see more below
  • Game.Insight - a small GenServer that tracks bad command parsing
  • Game.Help.Agent - an agent that load internal game help

This level has one_for_one as its strategy. At this level most sub-trees are fairly separate and can handle rebooting (to my knowledge) without interfering with other sub-trees.

Third Level - Game.World

This is the heart of the app. It contains everything the user interacts with in the game. Its direct children are Zone.Supervisor supervisors. This level has a strategy of one_for_one. This is fine because each zone is self contained and can reboot on its own.

Zone.Supervisor

This level has in startup order:

  • Game.Zone - the zone's state, which tracks what rooms/npcs/shops are online
  • Game.Room.Supervisor - A supervisor of rooms that belong to the zone
  • Game.NPC.Supervisor - A supervisor of NPCs that belong to the zone
  • Game.Shop.Supervisor - A supervisor of shops that belong to the zone

The reboot here is one_for_all. If any of these processes die something bad happened and the whole zone should restart. To further go into this, the Zone process tracks processes inside the sibling supervisors and if that dies then the rest should go as well. If the supervisors at this level died something really bad beneath them happened and the rest should be restarted.

When the sibling supervisors start they are started with the zone id. With this they figure out which children should be loaded at boot. Tese supervisors start processes as transient because they may be terminated normally and should not be rebooted, e.g. if someone deletes a spawner for an NPC then the process will be terminated cleanly.

The sibling supervisors are also a one_for_one strategy. This is fine as each process under them are fairly self contained and separated mostly for programmer benefit, this could probably be a big bag of processes directly under the Zone.Supervisor.

Take Aways

While doing this I was able to rework some of the tree. I pushed Game.Config further up the tree since that seems important. I also pushed more GenServers into the Cache sub-tree since they were similar.

One of the other reasons I did this was to figure out how to split up the app on separate nodes. This exercise taught me that it's currently not as easy as I was hoping. I figure the Web tree could be pulled off without doing much of anything, yet I found out that the Game tree is connected in a few spots that prevent it from immediately being pulled off. This would have been an annoying lesson to learn as I did that, now I know before hand and can fix the problems I found first.

In going multi-node, each of the first level would be good as a separate OTP app in an umbrella app. I had previously started with that but the application was too new for that to be useful. If I split them up again, I can boot nodes that are just for web, just for telnet connections, or just the world. I think this is a next step for going multinode.

I hope this was useful reading through seeing why I picked what I did and also finding out I had a few things ordered wrong. I hope you go through your own apps and try out a similar exercise on them.

Eric Oestrich
I am:
All posts
Creative Commons License
This site's content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specified. Code on this site is licensed under the MIT License unless otherwise specified.