Devblog 80 - Postmortem

Apologies are in order. The patch on Thursday was fucked and we didn't communicate well.

3 October 2015

These seem to have been caused by sound pooling, which was a last day, untested addition and we figured couldn't hurt anything. As it turns out it made a lot of negative difference. This is usually the way of any shipping bugs, we make a change, or fix a bug and make assumptions about it and they turn out to be wrong. We disabled this change and posted an update - which seems to have fixed the problems most people were experiencing.

There was a save bug on the original server release. It meant that if you were running that server for a while, then updated your server to the patch we released, a lot of your buildings would fall down. Subsequent restarts have no problems. This is one of the shittiest types of bugs possible in Rust. It was my fuckup, tinkering on patchday. I'm sorry, hopefully no-one lost too much progress.

We get lots of reports of memory leaks. We've even kind of seen it happen ourselves, but not to the extreme that other people have shown. I've seen Rust use up to 6gb of memory and then fall down to 1gb. Unity's memory profiler doesn't account for it, nothing we're doing on the native side really accounts for it. It only seems to have started happening since we updated to Unity 5.2 - although that's not confirmed. It's not something we can easily diagnose because Unity's memory tools only tell us about the c# side, and don't offer us any runtime memory profiling stuff we can use to report back to us to flag up issues. We might have to get Unity Support involved.

There are a bunch of other bugs. They're due to the amount of shit we touched in this patch. A lot of stuff changed, and a lot of new bugs were introduced. We're working through them and I think we're on top of things, but expect a couple more patches.

We fixed some performance stuff in the hot patches after the main patch. I'm hoping this helps with performance, although I'm not hopeful because I get the feeling that we only really ever hear about performance when it's bad. So this poll question is probably a good idea - please answer based on the latest version.

The patch was broke and we didn't tell anyone. You didn't know we were working on it, you didn't know that we were aware of the problems. We posted in a couple of places but you shouldn't have to go searching for this stuff, it should go on twitter - like everything else. We run under the assumption that if something is broke you know we're going to fix it, but I accept that it isn't a giving, and when a patch is as fucked up as this one was a simple headsup is appreciated.

We aren't doing a very good job testing shit before we push it. That's obvious. We don't have an internal QA team, that seems like an outdated concept to me. Instead we rely on people playing the dev branch. But this update that didn't work, for a number of reasons I guess. The biggest reason being that we don't have any report back mechanism for people on the dev branch (or the main branch really) to report problems. Most of the problems in this update were visible on the dev branch - but no-one was playing on it - because of these bugs. We fixed a bunch of problems before the update, but we didn't fix them soon enough to enable the dev branch testing. Our internal error reporting doesn't differentiate between the dev branch and main branch, so errors reported on the dev branch were getting eclipsed by the day to day errors on the main branch - so automatic reporting wasn't helping us.

So this is somewhere we need to improve. We need to get more people on the dev branch.

Come patch day we feel a lot of pressure to push a patch and then fix any bugs that come up. The reason we post patches on a thursday instead of a friday is because we expect bugs, so we want a day to fix them before the weekend. The bugs this week weren't the type of bugs we expect to ship with and patch out, and we probably there was a chance this could all turn to shit when we posted it. The thing is, from our point of view, we're fucked either way. If we delay the patch for a week we're dicks because we had two weeks to fix this shit, if we don't, we're dicks because we've had two weeks to test this shit. Especially on a wipe patch, where people have fucked up their servers in preparation.

In future I guess we need to feel less bound by the patch day, and if something isn't ready then delay it and take the heat.

This has happened a couple of times in recent history now, so it's stuff we need to give serious consideration to. The game is in development, it's in alpha - but that's no excuse for throwing any old shit at the main branch. We should be keeping stuff on the dev branch until it's has a reasonable amount of testing - even if that means we end up with irregular patches.

Mailing List

If you want to follow this project you can sign up to the mailing list.

We'll only update you about this project, we won't spam you about other stuff or sell your email address.

* By subscribing you agree to the Terms Of Service and Privacy Policy