This post is a follow-up to my last post in which I spoke about the software that I use to build software. In this post, I want to talk about some of the hardware and operating system infrastructure that I have in place and the roll it performs.
As I said in my last post, I do not like to do work that could be automated. A large part of the work that should be automated is the work around the build process. More than anything else, successful software development depends on being able to produce a repeatable build process where the code that is built is thoroughly tested, installed and verified before it is considered stable. To get to the point of understanding how this all works, the hardware and network infrastructure is pretty important. So that's where I am going to start.
The diagram at left shows the hardware configuration that I have in place. I have a wireless hardware router that is connected to my DSL line to the internet. Connected to that router are several Linksys hubs (there's only one shown in the diagram because it is the only one that bears upon the discussion), and a Netgear 10/100/1000 switch. Why do I have all of these?
Well I have several machines that are unrelated to my work that are connected to the network. All of our entertainment devices (DIRECTV set-top boxes, Playstation, Wii, and Blue-Ray players), my kids' notebooks and desktop PCs, and my wife's notebook are all connected to the network, some via wires and others wirelessly.
The Dell Inspiron Laptop and its companion Dell Dimension Desktop are two older machines (they're both over 5 years old) that I use for all kinds of testing and experimentation. I'm not afraid to replace and reinstall operating systems and software on these machines all the time. The Inspiron Laptop functions as my Linux server when I am presenting and I need a server away from home.
My HP Pavilion notebook is my workhorse workstation. All my blogging, development work, specs, and photography stuff (yes, I enjoy amateur photography in my copious spare time) all happen on my notebook.
One of the most useful and valuable purchase I ever made was the purchase of a LaCie 2Big Network drive. It is a network file store that hangs off the Netgear switch at 1Gb. This device contains all our backups, and acts as a file server for all the other devices on the network. I have a few 1TB external drives and once a week I plug in an external drive and back up the LaCie drive and store it off-site. All the other equipment backs up on to this device so we can always go back to it to restore.
The Dell PowerEdge T710 is a relatively new addition to the network. As I mentioned in my previous post, it hosts the virtual machines that I need to get most of my work done. It has 4 physical network cards that are all connected to the Netgear switch.
When I first got the PowerEdge I was worried about how much power it would consume. Surprisingly, it is more cost-efficient than the Dell Dimension Desktop, which is why I was able to free up the Desktop. Until fairly recently, the Desktop performed the role of DNS, DHCP server, NIS server and DMZ. That has now changed. The DMZ has now moved onto the PowerEdge and runs in a separate VM. The DNS, DHCP and NIS server now also run in a separate VM and the firewalls are set up appropriately. My network is also segmented so that the personal network and the work network are completely separate.
I have been looking at virtualization for quite a while. When I worked for my last employer, we had a couple of Sun servers that were running OpenEdge AppServers, databases and other software that I was using to diagnose issues with their OpenEdge to Java configuration. I was surprised at how the system coped with the load that I threw at it, even though it was running off the same processors.
When I started work on the Microsoft Exchange Integration project, I knew I was going to need at least 2 Windows 2008 Servers and a couple of Linux boxes. I spent some time with a colleague of mine and discussed the issue and he suggested looking into VMWare again. I had used VMWare Fusion on my Apple Powerbook at my previous employer, but had not really understood how powerful this product really is.
My colleague told me about VMWare ESXi, which is a free version of VMWare that will run on a single machine and allow you to administer that machine as an isolated server. We spec'ed out what we thought I would need for the work I am doing and I ordered the PowerEdge accordingly. When it arrived, I installed VMWare ESXi on it and it has been set up and running now for almost 2 months with no down time.
What has stunned me is how efficiently this machine runs. At any point in time, I have at least 8 machines (3 of which are Windows Servers) running on this machine. The thing that I completely missed out on is how much time the machine spends doing nothing so that there is a lot of overlapping processor downtime that can be exploited. You can easily over-provision this machine and still have it perform very well.
I'm going to need to spend some money on another 16GB of memory and another 2 TB of disk space, but that is a very simple upgrade for this box.
The Big Score with Virtualization
The thing that I had not considered about virtualization that has been the biggest score is the fact that everything around the virtual machine management can be scripted. Not only that, it can be scripted from a Linux virtual machine running on the hardware that is running the virtual machine!
Now the reason this is really cool is because of the automation of the build process, and this is where the whole infrastructure discussion culminates.
As a software engineer, I live by the following rule:
The software does not work until it is packaged, installed, and run on a(set of) machine(s) that is(are) not your development environment.
In other words, the fact that the code works on my notebook does not mean that it works in production. To prove that it does, I have to check it in, have an automated build kick off and build the code, have the automated unit tests run, deploy the code to the test environment, have it run there, and have the tests all pass. Finally, and this is really important for the Microsoft Exchange Integration project, I need to understand how all of this is likely to perform under load.
Building and Installing
With virtualization, I am able to create snapshots of machines, power them up by means of a script, install new software by means of a script, extract the results, power down the machines, revert back to the snapshot, and start again whenever I want to.
By means of one script, I am able to validate how successfully the build ran, and as it is a cron job that runs off my development server on a nightly basis, I have a good build and test report first thing every morning when I get up.
Moreover, the test is able to use my quiet time to run any stress tests that need to be run. Assuming the build is successful (95% of tests pass, and none of the critical tests fail), the build is stored to the LaCie drive. At 8:00am every morning, the stress test kicks off because this is when the network is likely to be the quietest and the others in the house will not be affected. The script powers down any machines that are not needed during the stress test, starts the stress test machines, runs the stress tests during which time the processors are pegged at 95%, and keeps a log of how the stress tests ran.
At the end of the test (at 4:00pm) it e-mails a report for the tests and reverts to the snapshots for each of the virtual machines after it powers them down. Before it reverts to the snapshot, it copies the virtual machines across to the LaCie drive so that I can always restore the tests and see what the logs looked like if there is any question.
The cool thing about the test ending at 4:00pm is that I normally get the e-mail for stress test on my Blackberry as I am driving out of the parking garage at work, so I know what I am in for when I get home that evening.
The Value of Laziness
The real value of being lazy is that because I won't do repetitive things manually, I have put together a fairly stable build configuration that allows me to prove the code thoroughly before it ever makes it out the door. There are several important things that this process relies on, though.
I have to be ruthless about building tests for each of the new features that I build. This does not mean I believe in Test Driven Development – I don't. Test Driven Development is about building tests and then coding for the tests. I believe in building the code and then figuring out how to break it, and trust me, I am good at breaking my own code.
I have to make sure I add additional load to stress test additional features where I can. This can be really tedious because I spend a lot of time writing code that generates data.
Finally, I couldn't do this without software like Hudson CI, JUnit, NUnit, Apache Ant, NAnt, Apache Maven and most of all, VMWare.