Having built a scalable, distributed content distribution system in C# for Cheezburger, you might think I would just do that again for chatasaur.us, and initially I had thought I might do just that. My plan was to start with JabbR, an open source chat service written in C#, and customize it to my needs. After about an hour of poking through the JabbR codebase, I decided that JabbR wasn’t the best jumping-off point for chatasaur.us. There were two problems with it that I was facing. The first is just a problem with JabbR itself… it is kind of a mess. I didn’t want to start off with a codebase like JabbR.
The second problem was that as great a platform as ASP.NET MVC is to use, it does have its failings. From building scalable web apps in C# in the past, I knew the easiest thing to screw up was shared state. Shared state is just too easy in .NET. Whether it is using session state (something easy to track down after the fact) or static fields (something that can be quite hard to disentangle), ASP.NET’s philosophy on shared state is counter to what is best when building scalable web applications.
While I was lamenting the sad state of things (aka whining) to my longtime friend and colleague, Jacob Krall, he asked why even start with C# at all. This got me thinking about alternatives that might make it easier to just get it right from the get go. So I started looking around, and at some point I found myself reading learnyousomeerlang.com. The more I read, the more excited I was, because Erlang was really starting to jive with what I had learned building stuff throughout my career.
Before I dig into why Erlang is so awesome, it is worth spending a little time on some related topics.
The CAP theorum states that in any distributed system, you can only guarantee that you have two of the following three things: Consistency, Availability, and Partition tolerance. That is to say that when you are designing your system, you get to pick two but have to live without the third. Consistency means that every part of the system have a consistent view of the underlying data. Availability means that system is up for everyone. Partition tolerance means when parts of the system are either down or unable to communicate with each other, things continue to function properly. It should be pretty clear that a non-distributed system has only consistency, since there is only one partition; if it goes down the entire system is down, and so the system is neither available nor tolerant of partition failure. This is why we create distributed systems. We can make the system available by adding partitions, but then we have to choose between consistency and partition tolerance. While we can’t have both, there are many techniques that can ensure that we are eventually consistent and tolerant of partition failures.
One technique for providing eventual consistency is called Command Query Responsibility Segregation (CQRS). The basic idea is that each partition has a read-only database that it uses to query data, while commands are performed against the write-only database, and eventually data is propagated from the write-only database to the read-only databases. This approach ensures that if a partition fails, it still has a copy of the data to use, even if the partition failing is the partition with the write-only database.
In order to make CQRS work (or even most other approaches to eventual consistency), you need a message bus. A message bus is a system that can be used to send messages between the different components of your system. A CQRS system might have a series of components that each fire off a message to the next component in order to perform an command. For example, to perform an action, a message might trigger a change to the write-only database, which then sends a message that triggers all the read-only databases to be re-synced. There is plenty more to read and learn about the CAP theorem and CQRS on the internet. If this is your first time reading about either, then I recommend doing some more digging.
Now back to Erlang. The single biggest feature of Erlang that made me consider it for chatasaur.us is the Erlang process model. Erlang provides cheap, lightweight processes — so cheap in fact that you can create hundreds of thousands in a matter of seconds. In addition, each process has its own message queue and powerful message routing abilities. Cheap processes and built in message queues are pretty cool, but alone they don’t tell the whole story. Messages are Erlang’s only mechanism for interprocess communication and shared state. So you are forced to write more flexible code. To top it off, Erlang can almost transparently send and receive messages with Erlang processes running on different nodes in a cluster. Given the processes and message passing awesomeness going on, Erlang could be thought of as the message bus language. (In fact, the very popular message bus RabbitMQ is written in Erlang!)
Another thing that makes Erlang great is its pattern matching mechanism. Almost all of Erlang’s flow control is built around pattern matching, most notably when calling functions and receiving messages, and this allows for clear, concise code and data filtering. Erlang isn’t unique in this regard. A number of other languages have similar structural pattern matching mechanisms, and it is pretty darn awesome wherever it is found. Once pattern matching and receiving messages are combined, as they are in Erlang, it is a recipe for amazing productivity (and talkative dinosaurs).
The last thing on my list is probably the most controversial and probably the least important, but I think it is particularly useful. In Erlang most everything is immutable, and so “variables” are write once, meaning once you set the value, it has the value forever inside the current scope. Some people don’t like this, and it is the thing people seem to complain about the most often after the way Erlang uses commas, periods, and semi-colons. As I’ve grown as a developer, I find myself defaulting more and more to making fields readonly in C# and making things final in Java. In fact, I mark literally every field, variable, and argument as final when I’m writing Java. I only remove the final modifier if I absolutely must. The benefit of this approach is that your code becomes cleaner. It does require more thought to avoid excess use of silly variable names (ie Client, Client2, Client3, Client4) as you mutate a values, but that also results in better code. Again your mileage may vary, but I find this aspect of the language benefits me.
Though it is still early, I am super happy with Erlang. I find it a joy to code in and am very glad I chose it for chatasaur.us.