It happens to the best of us

Monday, October 5th, 2009

We just had some customers report a bug. Not good. We didn’t get an exception email. All the tests passed. We couldn’t see anything untoward in the log files. But it was there. We could reproduce it, both in staging and in production. Not good at all.

But the weirdest thing was we couldn’t figure out the cause. Well I could see why the code was failing (after adding some extra log messages). But ‘git blame’ said those lines of code were unchanged in twelve months. Why hadn’t people complained before? Why hadn’t we noticed it?

After much hunting through log files we found the point when the feature last worked. It coincided with a deployment. That deployment was our Rails 2.3.4 forms vulnerability fix. And the bug was in a form – a missing form parameter that earlier versions of Rails ignored, the newer Rails was choking on.

But why didn’t the tests catch it?

After more hunting I saw that the Cucumber test that exercised the form didn’t have a “When I press the Update button” step. And the subsequent tests were passing, even though the update button hadn’t been pressed.

So I added the step in and made the feature pass. Then deployed it as an emergency fix.

However, what are the lessons to learn here (as there are always some)?

  • Firstly, testing cannot catch everything.
  • Secondly, the cracks in your tests are where the bugs are.
  • Thirdly, we probably need some sort of peer review for tests. I feel that this is more important than for code, because once the tests are right you can refactor the code without worry.
  • Fourthly, you really need to log everything. Absolutely everything. Don’t worry about your huge log files – that’s what `logrotate` is for. Get it written down so that one time when you have an obscure bug, you’ll be able to find it easily.

Switching off transactions for a single spec when using RSpec

Friday, May 8th, 2009

I have just written a load of test code that needed to verify that a particular set of classes behaved correctly when a transaction was rolled back.

However, the rest of my suite relied on transactional fixtures (which is Rails’ badly named way of saying that a transaction is started before each test and rolled back at the end, leaving your test database in a pristine state before the next case is run).

In particular, my spec_helper.rb had the following:

Spec::Runner.configure do |config|
  config.use_transactional_fixtures = true
  # stuff
end

The code being tested looked something like this:

begin
  Model.transaction do
    do_something # may fail with an exception
    do_something_else # may fail with an exception
  end
rescue Exception => ex
  do_some_recovery_stuff
end

I had a spec for the successful path (checking that the outcomes of do_something and do_something_else were what I expected.

However when I tried the same for the failure paths, the outcomes matched the successful path. The time-tested debugging method of sticking some puts statements in various methods showed that do_some_recovery_stuff was being called as expected. But the outcomes were still wrong.

And the reason? Transactions. This was a Rails 2.2 project, running on Mysql (innodb). As RSpec/Test::Unit starts a transaction before the specification clause runs (and then rolls it back on completion) when Model.transaction statement is reached, the spec is actually starting a second transaction, nested within RSpec/Test:Unit’s. Which means when the inner transaction is rolled back, the database doesn’t actually do anything – there’s still an outer transaction that may or may not be rolled back. (I think Rails 2.3 corrects this behaviour and if you roll back an inner transaction then the outer transaction reflects the correct state, but I’m not 100% on that).

So I had a choice – move the (production) app to Rails 2.3 to fix this one bug (which is very urgent) or figure out how to switch the outer transaction off for these particular steps. Google wasn’t very helpful – lots of stuff on how RSpec extends Test::Unit, lots of stuff on how Rails extends Test::Unit to add the fixtures (and transaction) support. But no concrete example on how to actually switch it off.

After much playing around (overriding methods like use_transaction?(method_name) and runs_in_transaction?) I eventually stumbled across the answer. And it’s pretty simple.

Set your default to be config.use_transactional_fixtures = true in spec_helper.rb. Then, for the specs that are not transactional, simply create a describe block and simply add the following:

describe MyClass, "when doing its stuff" do
  self.use_transactional_fixtures = false
  it "should do this"
  it "should do that"
  it "should do the other"
end

The only thing to be aware of is you may well need an after :each block to clean up after yourself.

Fixing bugs in untested code

Wednesday, May 6th, 2009

When you’ve got an application that has little or no test coverage it can be quite daunting making changes. What if you alter X and it breaks Y? Without running through the entire app by hand how will you know what you’ve broken?

Well you won’t.

Even worse, what if your client reports a bug in that application? That makes things even worse doesn’t it?

No. It’s actually an opportunity. Because even if your boss thinks “testing is a great idea, we’ll start on the next project when we’ve got more time”, a bug fix is one of those things where the time to fix varies. So take advantage of that.

Start by writing a test that reproduces some functionality in the same area that you know works.

This won’t be easy. You need to get the database into the correct state, set up the session correctly. To save you some time, try using an object factory – with the correct configuration you can concentrate on creating just the models you need without having to fill the entire database with test data.

Take a look at the code you are testing as you are writing the tests. But make sure your test checks the outcomes of that code, not the implementation – when using mocks it’s pretty easy to end up effectively rewriting the real code as a series of should_receive(:something) calls. Which looks great until you come to refactor, at which point it becomes a nightmare.

Get your test to pass. Remember this is a feature that works so it shouldn’t be that hard to get it to pass. And you are building some important foundations as you are setting up a configuration for your object factory a model at a time.

Once you’ve got the first test working, prove that it’s doing what it’s supposed to be doing. Comment out its of the implementation and watch it fail. If it doesn’t fail then you’ve got a problem – your test is testing something other than that particular implementation.

Now we’ve got a test that really checks your existing code. I like to add another test that checks the error handling in that existing code as well (there is error handling in your existing code isn’t there?) Follow the same process as before – test the outcomes (probably catching an exception if you’re at the model level, probably looking for a particular redirect and flash message at the controller level), and then prove that it works by commenting code out. Hopefully this should be pretty quick to write as most of the hard work was done when setting up the original test.

And finally we can get to the meat of it. Write a test that reproduces your bug (that is, your test exercises your app in the way expected and your app should fail). Again, most of your setup work should already be done, so it shouldn’t take too long. Now run the test and watch it fail. If it doesn’t fail then your test isn’t right.

After all of this, we are finally in a position to fix the bug.

Your test should prove that it’s been fixed. And prove that the bug won’t reappear in a future version. But you’ve also taken an important first step to wrapping your application in tests, making your life easier in future.

Quick Tip: make it easier to debug your full-stack acceptance tests

Tuesday, March 24th, 2009

Spanner in the works

Spanner in the works

One of the issues when using Selenium or Watir to power your full-stack acceptance testing (apart from the time it takes for the test suite to run), is that stuff happens within your browser, fails and then Cucumber happily moves on to the next test before you get a chance to look at what went wrong.

If you are just using plain old Webrat you can pepper your code with puts statements so you can check the value of variables, the existence of HTML elements and the flow of code as it happens. But with Selenium or Watir, you need to run your app separately to Cucumber, normally in a hidden, background, process, so the output of your puts statements is lost in the ether (or an empty pipe).

After having a particularly annoying and hard to trace bug, that was related to an interaction between form content and javascript, I came up with an extremely simple debugging tool.

Just add the following into one of your steps files:

When /^I pause$/ do
  STDIN.gets
end

Then, find the feature that is causing you grief and insert a “when I pause” step at the appropriate time.

When I do this
And I do that
And I pause
And I press "Save"
Then I see my newly created object

Cucumber will power your app, poking it until it gets to the “when I pause” step. It will then pause, waiting on STDIN for you to hit return – giving you time to open your inspector window and poke around in the form as the tests see it.

In this particular case, my steps file had an incorrectly named element within it – all it took was an inspection of the element in question and I saw the error. Hours of frustration wiped out by one of the simplest commands there is.

Spanners by woodsy

The trouble with mocks (or design versus acceptance)

Thursday, March 12th, 2009

I had the pleasure of speaking to Luke Redpath the other day.

I started off by thanking him for his Demeter’s Revenge plugin, which is one of the first things I install on a new project. He said he didn’t use it much any more, as he doesn’t do mocking, except during the design process. This surprised me, but thinking about it, the distinction between what your different types of tests are for is an important one.

The most important part of using mocks is their value in designing your class’s public API. You start with a cucumber feature (your acceptance test) and as you are implementing it, fill in specs for each component in turn. As features are defined in terms of the user interface, you start by specifying and designing your view and controller. The controller needs to interact with a model. And this is where mocking comes into its own.


describe WotsitPokesController
  it "should poke a wotsit" do
    on_posting_to :create, :id => '1' do
      @wotsit = mock_model Wotsit
      Wotsit.should_receive(:find).with('1').and_return(@wotsit)
      @wotsit.should_receive(:poke).and_return(true)
    end
    response.should redirect_to(edit_wotsit_path(@wotsit))
    flash[:notice].should == 'Your wotsit has been poked'
  end

(this uses the RSpec-Rails Extensions to tidy up the specification)

Whatever @wotsit.poke does, when you are mocking, it is in your interests to encapsulate its behaviour within a single method. Otherwise you end up writing tons of “fake” code within your spec, which makes the tests brittle and hard to maintain.

A small housekeeping note; as soon as your controller spec references @wotsit.poke, you need to go to your model spec and add:


  it "should poke itself"

A pending specification, just as a reminder that you’ve got a method to implement in a few minutes time.

But Luke had started doing full-stack testing alone; using shoulda (with its nested contexts) to write acceptance tests, rather than RSpec and mocking.

I’m not so sure about this. His reason was the brittleness of mocked tests. Agreed, it can be dangerous; if you rename the “poke” method on your Wotsit, your controller spec will still pass. But, crucially, your cucumber story (or shoulda integration test) will not.

So I guess the point to make here is that specifying with mocks alone can be a bad idea. You need a full-stack test to catch the stuff that “falls between the cracks”, the proof that your application does what it is supposed to do.

Can you just forego the mocks completely? I don’t think so; they are too valuable during the exploration/design phase.

Should you just get rid of the mocks once your acceptance test is passing? Luke says yes. I’m still undecided, but certainly am tending towards no.

So, what’s your take? Are mocks too brittle to be used in long term development? Or, coupled with an acceptance test, is the value they give so great we need to keep them in our test suite?

(interestingly, since I drafted this post, Pat Maddox has also written about much the same subject with a similar outcome – including Matt Wynne stating “Acceptance tests are for regression coverage, unit tests are for design” in the comments, although Pat himself says you can delete the specs once the design is done).