The Importance of Words: Waterfall vs Agile
Waterfall methodologies are often seen as the antithesis of Agile, and therefore ‘bad’. But what does it really mean for something to be ‘waterfall’? Are you...
This is written from the perspective of pytest
. So some needs may change based on the language and testing framework used. But these principles should extend to all automated testing in some capacity.
They should also be named after the thing the do or provide. They can be made to do both at the same time, but only if the name you would give to the thing it provides also adequately describes the thing it does.
A good measure to go by when trying to figure out if you should break something out into a standalone fixture, is very similar to doing the same for functions. If a chunk of the code in your fixture can be described with a simple phrase, break it out into its own fixture. Fixtures can request other fixtures just like tests can, so we can (and should) abstract these operations and resources just like we can (and should) with normal code.
On top of this, a fixture that yields only attempts to run what’s after the yield statement (i.e. the teardown/cleanup) if the fixture was successfully executed. If you do multiple things in a fixture that each need to be cleaned up, then those cleanup steps will not run if an error is thrown anywhere in the fixture.
Typically, if a step throws an error, that means whatever modification it sought to make, wasn’t actually made. If it wasn’t actually made, then there’s nothing to cleanup, so there’s no need to run the rest of the fixture. Fixtures should be designed with this in mind, which means they should only try to do one thing at a time.
Your test functions are for the end result, not the steps to get there, so don’t bother asserting those went well in the test. Your fixtures are meant to get your SUT to a specific state that, once reached, is not modified until all your tests are done running (Only observe. Do not interact).
The setups are merely to get to the behavior you want to test, and then to trigger that behavior. The test itself is how you demonstrate that the behavior worked in a way that aligns with what you want. If your assert doesn’t demonstrate this, then the test serves no purpose.
Fixtures and tests should be written with the assumption that everything that went before them went as expected. This makes them fragile, but that’s the idea (at least in this context). The tests should be deterministic and operate consistently, so if there’s a slight deviation in behavior, the tests shouldn’t pass.
Steps in the setup process can be broken out into standalone fixtures that should throw errors if something didn’t go well. When those errors are thrown, they will show up as an error in the test report, rather than a failure in the test itself. When a test has an error, it should represent that the desired state for that test could not be reached. If a test fails, it should represent that the state was reached, but something about the state is wrong
Additionally, if a fixture has a problem, then every test that exists in the scope of that fixture will automatically show the problem, and you’ll have a cascading failure that happens quickly (performance boost!).
This lets us know at a quick glance which tests are actually failing according to what they should be testing, and which ones are just being impacted by problems in some earlier step of the workflow. For every step in the setup process, there should be an associated test scenario that tests this state.
Those fixtures should also have their own tests anyway, and it’s not the responsibility of this test to make sure those went well.
For example, let’s say you want to run an end-to-end test using Selenium to make sure logging in works (and we’ll assume that after you log in successfully, it would bring you to some sort of landing page). That’s not something you can (or should) test directly with an end-to-end test, because the process itself is handed off from the browser almost immediately, and happens mostly in the backend. So instead, you can only have your test make sure that after trying to go through the process of logging in, it ended up where it was supposed to.
def test_login(driver):
driver.get("https://www.mysite.com/login")
page = LoginPage(driver)
assert page.title == "login"
page.login(username="username", password="password")
page = LandingPage(driver)
assert driver.title
This makes you have to repeat yourself, and potentially perform steps more times than is necessary over the course of your test suite. The test scenario is making sure the landing page is as it should be after logging in, but this test attempts to validate more parts of the process, while also failing to adequately test the landing page. Sure, if it passes, it would tell you that logging in worked, but it wastes resources and opportunity because you don’t need the browser to test that logging in works, and the opportunity provided by those expensive preparations was wasted by not running more checks against the static page.
test_landing_page.py:
class TestAfterLogIn():
@pytest.fixture(scope="class", autouse=True)
def login_page(self, driver):
driver.get("https://www.mysite.com/login")
page = LoginPage(driver)
return page
@pytest.fixture(scope="class", autouse=True)
def login(self, login_page):
login_page.login(username="username", password="password")
@pytest.fixture(scope="class", autouse=True)
def page(self, driver):
return LandingPage(driver)
def test_title(self, page):
assert page.title == "Welcome!"
# other tests
If there was an issue with the login page, then this structure will tell you exactly where the problem was without the need to dig through the stacktrace, as it will tell you what fixture threw an error, and each fixture is only responsible for one step. For example, if the username field didn’t show up on the login page, then login_page.login()
would throw an error inside the login
fixture.
This also leverages the fixtures that were run for the first test in the class so you can run as many tests as you need against this particular state without having to rerun anything. The setups/teardowns are usually much more resource-demanding than actual tests, especially for end-to-end tests, so taking advantage of an already established state to run several tests is very valuable.
For every fixture that changes something about the state of the SUT, there should be an alternate version of this fixture that doesn’t have an assert in it, and that fixture should be used to make the actual test for that state.
In combination with the previous point, this creates a system where you can quickly and easily identify where the breakdowns are in the flows of the STU, and how widely impacting certain problems are. Failures will tell you exactly where a problem is, while errors tell you how much impact that problem has. It also gives you the liberty to take shortcuts to help make your tests faster, because you can know that the steps you’re skipping over have test cases of their own. In the previous example, that would just mean the login page itself would have test cases of its own.
To put it another way, the tests and fixtures should be written with the assumption that everything before them worked properly. This approach proves that was the case. If one of the changes that was assumed to be working, wasn’t actually working, then everything that depends on it should be considered a failed test, because one of the assumptions it was making was wrong.
Parellelization also helps speed your tests up, and the more broken up the tests are, the greater the potential benefit of parrallelizing.
assert
per test function/method and nothing elseThe only thing that should be in a test function is a single assert
statement. It should be simple, concise, and easily readable.
If you’re working with multiple assert
statements because your test subject is very complex and there are many things about it that need to be validated, you can use a class, module, or even an entire package as the scope for your test scenario. Your fixtures can be made to work around this scope, just like they can for the function-level scope. This gives you room to breath so you can give each package/module/class/function a meaningful name and you can maintain the rule of only 1 assert
statement per test function/method.
You can also rely on magic comparison methods (e.g. __eq__
) along with special state comparison objects to create more intuitive assert
statements that can check more than one thing at once while providing more readable failure messages.
assert
statement, and an earlier one fails, you won’t know if the later ones would pass or fail because test execution stops on the first failure, and this prevents you from getting a complete understanding about what is wrong.assert
should be the test, because that is thing you’re asserting should be the case, so more than one assert
means you’re asserting more than one thing, and thus you are testing something different.python
and pytest
by having multiple test methods in a test class/module/package, or by utilizing magic comparison methods like __eq__
in custom data type objects.https://testing.googleblog.com/2018/06/testing-on-toilet-keep-tests-focused.html
Code:
@pytest.fixture
def apple():
return Apple()
def test_apple(apple):
assert apple.is_sweet()
assert apple.color == "Red"
assert len(apple.worms) == 0
Output:
_______________________________ test_apple ________________________________
apple = <test_fruit.Apple object at 0x7f36857b5cf8>
def test_apple(apple):
> assert apple.is_sweet()
E assert False
test_fruit.py:17: AssertionError
Was the apple red? Did it have any worms?
Code:
@pytest.fixture(scope="class")
def apple():
return Apple()
class TestApple():
def test_is_sweet(self, apple):
assert apple.is_sweet()
def test_color_is_red(self, apple):
assert apple.color == "Red"
def test_has_no_worms(self, apple):
assert len(apple.worms) == 0
Output:
_________________________ TestApple.test_is_sweet _________________________
self = <test_fruit.TestApple object at 0x7f6a9b48db70>
apple = <test_fruit.Apple object at 0x7f6a9b471b70>
def test_is_sweet(self, apple):
> assert apple.is_sweet()
E assert False
test_fruit.py:20: AssertionError
_______________________ TestApple.test_color_is_red _______________________
self = <test_fruit.TestApple object at 0x7f6a9b48da20>
apple = <test_fruit.Apple object at 0x7f6a9b471b70>
def test_color_is_red(self, apple):
> assert apple.color == "Red"
E AssertionError: assert 'Green' == 'Red'
E - Green
E + Red
test_fruit.py:22: AssertionError
_______________________ TestApple.test_has_no_worms _______________________
self = <test_fruit.TestApple object at 0x7f6a9b487400>
apple = <test_fruit.Apple object at 0x7f6a9b471b70>
def test_has_no_worms(self, apple):
> assert len(apple.worms) == 0
E assert 1 == 0
test_fruit.py:24: AssertionError
assert
statements, instead of the unittest.TestCase
assert methodsDon’t use the assert methods provided by the unittest.TestCase
class. Instead, use the standard assert
keyword.
pytest
will show context-sensitive comparisons automatically for failed tests.pytest
will do assertion introspection to give more information surrounding a failed test.unittest.TestCase
, which is essential for the next point.unittest.TestCase
in test classes (either directly, or indirectly)This causes conflicts within pytest
with regards to how it executes tests and fixtures. As a result, it heavily limits the functionality of pytest
that is available to you (e.g. parameterization is not possible).
A test should not involve any superfluous resources. If you don’t need to involve some resource in order to test some behavior, then there’s no reason to have the test require it.
When a test utilizes less resources, it will inherently run faster. The faster a test is, the more often it can be run, and the less time a developer is waiting for it.
https://www.youtube.com/watch?reload=9&v=AJ7u_Z-TS-A
Design your tests around the behavior you want to test, not the implementation you ended up with. If you are changing behavior, you should be thinking about how you can test that behavior. You should try to find ways to test it that aren’t dependent on the implementation.
https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html
Note: this does not mean don’t ever use non-state-changing method calls in your tests.
https://testing.googleblog.com/2017/12/testing-on-toilet-only-verify-state.html
You can utilize classes, modules, and even entire packages to hold the tests for a single state. All it requires is creating fixtures for those scopes.
@pytest.fixture(scope="class")
def apple():
return Apple()
class TestApple():
def test_is_sweet(self, apple):
assert apple.is_sweet()
def test_color_is_red(self, apple):
assert apple.color == "Red"
def test_has_no_worms(self, apple):
assert len(apple.worms) == 0
In this example, only a single Apple
instance is created. But the state is represented by the entire class, so you have the opportunity to test individual attributes in their own test methods.
This is more of an organizational point, but what the test subject is can often be confusing. To simplify this, you should always consider your test subject to be the end result of the process, rather than the process itself.
For example, let’s say you want to test that logging in to your website works, and you’re going to do this with an end-to-end test. Is this a test for the log in page? the log in process? or the page you land on after logging in? Because your tests are running while you’re on the landing page, and the landing page is the only thing you can use to test that logging in worked, that means the landing page is your test subject.
Tests should be grouped together based on the behavior they are testing, and the scenario under which they’re being tested.
The name of the outermost surrounding scopes should combine to describe what unit/component is being tested, while the innermost surrounding scope describes the scenario, and the test function/method itself describes the behavior being tested. The only exception is if you’re using a state comparison object to check multiple things at once in a single test function/method, each thing being checked is self-explanatory, and if the test fails, the failure output will very clearly describe each issue that was found.
tests.py::test_check_product_is_good FAILED
tests/test_website/test_website_landing_page.py::TestWebsiteLandingPageAfterLogIn::t
est_website_landing_page_after_login_header_is_displayed FAILED
tests/website/test_landing_page.py::TestAfterLogIn::test_footer PASSED
tests/website/test_landing_page.py::TestAfterLogIn::test_user_info_in_navbar FAILED
tests/website/test_landing_page.py::TestAfterLogIn::test_welcome_message FAILED
Failure Messages:
=================================== FAILURES ======================================
_____________________ TestAfterLogIn.test_user_info_in_navbar _____________________
self = <src.tests.website.test_landing_page.TestAfterLogIn object at 0x7f8d20e0fd30>
page = <src.utils.pages.test_landing_page.LandingPage object at 0x7f8d20e0f860>
user = <src.utils.user.User object at 0x7f2d21a0e970>
def test_user_info_in_navbar(self, page, user):
> assert page.navbar.user == user
E AttributeError: No user info to parse from navbar.
website/test_landing_page.py:25: AttributeError
_______________________ TestAfterLogIn.test_welcome_message _______________________
self = <tests.website.test_landing_page.TestAfterLogIn object at 0x7f8d20e0f550>
page = <tests.website.test_landing_page.LandingPage object at 0x7f8d20e0f860>
def test_welcome_message(self, page):
> assert page.welcome_message == "Welcome!"
E NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#welcome"}
website/test_landing_page.py:28: NoSuchElementException
tests/website/test_landing_page.py::TestAfterLogIn::test_header FAILED
Failure Message:
==================================== FAILURES =====================================
___________________________ TestAfterLogIn.test_header ____________________________
self = <tests.website.test_landing_page.TestAfterLogIn object at 0x7faf95aee240>
page = <tests.website.test_landing_page.LandingPage object at 0x7faf95aaad30>
def test_header(self, page):
> assert page.header == State(
IsDisplayed(),
Text("My Header"),
TagName("h1"),
)
E AssertionError: assert Comparing Header State:
E Text: 'Something else' is not 'My Header'
E TagName: 'div' is not 'h1'
website/test_landing_page.py:22: AssertionError
https://testing.googleblog.com/2014/10/testing-on-toilet-writing-descriptive.html
https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices#naming-your-tests
The order pytest runs tests in is not deterministic, and can change at any time, especially when the tests may be run in parallel.
Every test is given a nodeid
by pytest, and that nodeid can be passed as an argument to pytest on the command line so that it is run directly.
When parameterizing a test, or scope, pytest will automatically use the repr
for each object of the set to give a unique element to the test nodeid
s so you can see which specific set being used for a given test, and also so you can directly target one or more tests directly for them to run individually. You can override this behavior using the ids
argument when parameterizing. This accepts either a iterable of strings that should map 1:1 for each parameter set, or a callable so you can rely on some logic to generate the name for a given parameter set. Whichever approach is used, it should be something that is unique and adequately describes what that set is being used to test, i.e. it should provide insight as to what exactly it’s being used to test, and why the other parameter sets wouldn’t cover the same thing.
https://testing.googleblog.com/2014/10/testing-on-toilet-writing-descriptive.html
Some tests should not be allowed to be split up across multiple threads, simply because it would be inefficient. For example, you may have a test class that is used to house several tests for a single test scenario, i.e. they are all meant to be run against a single, common state. Those tests should not be split apart, because it’s more efficient to keep them together and only have to run those fixtures once total, instead of once for each thread they’re split across. You can do this quite easily with the pytest-xdist
plugin by either providing a standard scope to the --dist
argument, or by defining your own FixtureScheduling
class for the plugin to use. An example of a custom FixtureScheduling
class can be found here.
There are plans to improve this capability and make it more refined, but that’s still being discussed here.
Even so, the tests should be designed to assume that they may be running at the same time as any other test.
A test should be able to build it’s entire environment from scratch, including the test data. This does not include things that need to be installed, however. Once everything is installed, you should be able to just run the pytest
command, and have everything run.
A test should never pass or fail on a whim. If the test can be run and have different results without any changes to the code, then that test is useless and can’t be trusted.
Try to avoid mocking things whenever possible.
https://testing.googleblog.com/2013/05/testing-on-toilet-dont-overuse-mocks.html
Just because there is good code coverage in the tests doesn’t mean those are good tests, are that they’re even trying to test something. At best, code coverage can be used to determine what areas of your code should be pruned, or it can be used as a debugging tool. At worst, it gives developers false confidence in the quality of their code.
However, it can be leveraged during the CI/CD pipeline to determine if developers at least tried to hit the lines of code they added or changed. You can (and should) configure your build system to check that any lines that were added/modified have tests that at least hit those lines, and if not, fail the build. It doesn’t tell you that the lines that were hit were tested properly, but it’s a cheap and easy way to tell that lines that were added/modified definitely aren’t tested at all. Enforcing this doesn’t just make sure that the developers know what they’re adding/modifying and how that code can be hit, but it also encourages them to consider the behavior of the actual application so they can write more effective tests.
https://testing.googleblog.com/2008/03/tott-understanding-your-coverage-data.html
Testing code should not be difficult. If the code is hard to test, then the code is hard to use. This is also indicative of tightly coupled code.
Testing should be a primary concern when writing code. Whenever you change behavior, you should be thinking about how you can test that change in a way that would have failed before the change. if you aren’t changing behavior and are only changing the implementation, you should not have to worry about changing your tests.
If you find that you can’t easily test your code, you should change the code so it can be easily tested.
https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices#less-coupled-code
Have your tests and the tools built for them take and return data using interfaces/custom data types. Everything should be communicating using uniform data shapes. Methods/functions should both accept and return data using these interfaces/data types, and your tests should be running asserts using them.
If everything is using the same interfaces/data types to communicate, there’s less to maintain and the code is easier to read. You can also implement custom comparison logic into them for your tests to leverage. If implementation needs to change in one part of the code, the core of what that data is and how it should be compared will remain the same, so you only need to change the implementation in that one part of the code. If how the dats is compared changes, then you only need to change the comparison logic inside the interface/data type.
It also reduces the number of arguments your functions take while making them more idiomatic, and can produce more intelligent failure reports (if made with that in mind).
Plus, you can then use type annotations, which opens the door to lots of helpful tools.
@pytest.fixture
def user():
return {"name": "Chris", "hair_color": Color("brown")}
@pytest.fixture(autouse=True)
def set_user(client, user):
# equivalent to client.set_user(name=user["name"], hair_color=user["hair_color"])
client.set_user(**user)
def test_set_user(client, user):
# client.get_user() returns a dict similar to the one return by the user fixture
assert client.get_user() == user
@pytest.fixture
def user():
return User(name="Chris", hair_color=Color("brown"))
@pytest.fixture(autouse=True)
def set_user(client, user):
client.set_user(user)
def test_set_user(client, user):
# client.get_user() returns another User object
assert client.get_user() == user
Your code should read like a short story written by C. S. Lewis. That is, it should be trivial and straightforward to read and shouldn’t involve anything fancy. What goes on behind the scenes is another story, but at the level seen in the tests themselves, it should look like any other normal, simple code that you would typically see in the language you’re using.
def page(driver, url):
driver.get(url)
return CarTablePage(driver)
@pytest.fixture(scope="class")
def car():
return Car(CarMake.CHEVROLET, ChevroletModel.IMPALA, 1995, Color.RED)
class TestTableIsEmptyOnLoad:
def test_table_has_no_entries(self, page):
assert page.table.has_no_cars()
class TestCarIsAdded:
@pytest.fixture(scope="class", autouse=True)
def add_car(self, car, page):
page.add_car(car)
def test_car_is_in_table(self, page, car):
assert page.table.has_car(car)
class TestCarIsRemoved:
@pytest.fixture(scope="class", autouse=True)
def add_car(self, car, page):
page.add_car(car)
@pytest.fixture(scope="class", autouse=True)
def remove_car(self, car, page):
page.remove_car(car)
def test_car_is_not_in_table(self, page, car):
assert not page.table.has_car(car)
I’m pulling this directly from an example in my POM’s docs, if you were curious as to how the implementation looks:
def page(driver, url):
driver.get(url)
return CarTablePage(driver)
@pytest.fixture(scope="class")
def car():
return Car(CarMake.CHEVROLET, ChevroletModel.IMPALA, 1995, Color.RED)
class TestTableIsEmptyOnLoad:
def test_table_has_no_entries(self, page):
assert len(page.cars) == 0
class TestCarIsAdded:
@pytest.fixture(scope="class", autouse=True)
def add_car(self, car, page):
page.add_car(car)
def test_car_is_in_table(self, page, car):
assert car in page.cars
class TestCarIsRemoved:
@pytest.fixture(scope="class", autouse=True)
def add_car(self, car, page):
page.add_car(car)
@pytest.fixture(scope="class", autouse=True)
def remove_car(self, car, page):
page.remove_car(car)
def test_car_is_not_in_table(self, page, car):
assert car not in page.cars
Waterfall methodologies are often seen as the antithesis of Agile, and therefore ‘bad’. But what does it really mean for something to be ‘waterfall’? Are you...
We care a lot about the word ‘quality’ in the software industry. But what actually is quality? How do we use the word in our day to day life?
It’s natural (and inevitable) for words and phrases to change in meaning over time. But what if they were chosen for a purpose, but their meaning changes eno...
A sustainable development process needs checks and balances to ensure we move forward as quickly as we can safely. But what happens when there are none?
If the developer wrote the tests for their tickets, what would QA do all day? More importantly, what are the implications of that question being asked in the...
The intent of DRY is to help us find potential opportunities for abstraction. Many take it to mean we should never repeat a block of code. However, this inte...
What is grey box testing? How can it benefit us? How is it different from white or black box testing? Are they all required? Do they dictate how we design ou...
You’ve heard it before, and probably many times. But why exactly is it the rule that should only have 1 assert per test?
Whose responsibility is it too write unit tests? Should SDETs know how to write effective unit tests?
There’s some fundamental issues with the claims that Cypress makes of themselves that need to be acknowledged. Let’s take a look at their claims and see if t...
We’ve all gotten frustrated dealing with flakiness once we start involving Selenium in our tests. But is Selenium really the problem? What can we do to solve...
The usage of React and Redux together creates fantastic opportunities for refining tests. Let’s go over one of those opportunities and the benefits it provid...
Before we can refine our tests to take advantage of React and Redux, we first have to understand what they do for us, and a little bit about how they do it.
Now we know how to maximize test validity. But how can we leverage that in other ways than just providing test results to someone else?
The validity of tests helps build our confidence in them and determines the value they add to our test suites. But what does it mean for a test to be ‘valid’?
Software tests are a form of scientific research and should be treated with the same scrutiny. To show this, let’s go over what ‘science’ is.
A collection of testing maxims, tips, and gotchas, with a few pytest-specific notes. Things to do and not to do when it comes to writing automated tests.