Postmortem of a fun couple bugs
Story at my previous job:
Tieg: Hey Tyrel, I can't run invoke sign 5555, can you help with this?
This is How my night started last night at 10pm. My coworker Tieg did some work on our CLI project and was trying to release the latest version. We use invoke to run our code signing and deployment scripts, so I thought it was just a quick "oh maybe I screwed up some python!" fix. It wasn't.
I spent from 10:30 until 1:30am this morning going through and looking into why Tieg wasn't able to sign the code. The first thing I did was re-run the build on CircleCI, which had the same error, so hey! at least it was reproducible. The problem was that in our Makefile scripts we run tidelift version > tidelift-cli.version and then upload that to our deployment directories, but this was failing for some reason. We let clients download this file to see what the latest version is and then our CLI tool has the ability to selfupdate (except on homebrew) to pull this latest version if you're outdated.
Once I knew what was failing, I was able to use CircleCI's ssh commands and log in, and see what happened, but I was getting some other errors. I was seeing some problems with dbus-launch so I promptly (mistakenly) yelled to the void on twitter about dubs-launch. Well would you know it, I may have mentioned before, but I work with Havoc Pennington.
Havoc Pennington: fortunately I wrote dbus-launch so may be able to tell you something, unfortunately it was like 15 years ago
Pumped about this new revelation, I started looking at our keychain dependency, because I thought the issue was there as that's the only thing that uses dbus on Linux. Then we decided (Havoc Pointed it out) that it was a red herring, and maybe the problem was elsewhere. I at least learned a bit about dbus and what it does, but not enough to really talk about it to any detail.
Would you know it, the problem was elsewhere. Tieg was running dtruss and saw that one time it was checking his /etc/hosts file when it was failing, and another time it was NOT, which was passing. Then pointed out a 50ms lookup to our download.tidelift.com host.
Tieg then found Issue 49517 this issue where someone mentions that …
[read post]