When I worked at Propel Marketing, we used to outsource static websites to a third party vendor, and then host them on our server. It was our job as developers to pull down the finished website zip file from the vendor, check it to make sure they used the proper domain name, (they didn’t a lot of the time,) and make sure it actually looks nice. If these few criteria were met, we could launch the site.
Part of this process was SCPing the directory to our sites server. The sites server was where we had Apache running with every custom static site as a vhost. We would put the website in /var/www/vhosts/domain.name.here/ and then create the proper files in sites-available and sites-enabled (more on this in another entry). After that the next step was to run a checkconfig and restart Apache.
Here’s where it all went wrong one day. If I can recall correctly, my boss was on vacation so he had me doing a bit of extra work and launching a few more sites than I usually do. Not only that, but we also had a deadline of the end of the month which was either the next day, or the day after. I figure I’ll just setup all mine for two days, and then have some extra time the next day for other things to work on. So I started launching my sites. After each one, I would add the domain it was supposed to be at into my /etc/hosts file and make sure it worked.
I was probably half way done with my sites, and suddenly I ran into one that didn’t work. I checked another one to see if maybe it was just my network being stupid and not liking my hosts file, but no, that wasn’t the problem. Suddenly, EVERY SITE stopped working on this server. Panicking, I delete the symlink in sites-enabled and restart Apache. Everything works again. I then proceed to put that site aside, maybe something in the php files breaks the server, who knows, but I have other sites I can launch.
I setup the next site and the same thing happens again, no sites work. Okay, now it’s time to freak out and call our sysadmin. He didn’t answer his phone, so I call my boss JB. I tell him the problem and he says he will reach out to the sysadmin and see what the problem is, all the while I’m telling JB “It’s not broken broken, it just doesn’t work, it’s not my fault” etc etc. A couple hours later, our sysadmin emails us back and says he was able to fix the problem.
It turns out, there’s a hard limit to the number of files your system can have open per user, and this was set to 1000 for the www-data user. The site I launched was coincidentally the 500th site on that server, each of them having an access.log and an error.log. These two files apparently constantly open on each site for apache to log to. He was able to change www-data’s ulimit to a lot higher, (I don’t recall now what it was) and that gave a lot more leeway in how many sites the sites server could host.