Introduction to property-based testing


Example-based tests

  • Define examples of given input and expected output
  • Indirectly imply rules dictating the behaviour of your code
def test_mysort():
  l = [5, 3, 2, 1, 6]
  l_sorted = my_sort(l)
  assert l_sorted == [1, 2, 3, 5, 6]

Property-based tests

Property-based testing (PBT) asks you to come up with the rules first and them tests them for you
from hypothesis import given
import hypothesis.strategies as some

def test_my_sort(l):
    l_sorted = my_sort(l)
    assert len(l_sorted) == len(l)
    assert set(l_sorted) == set(l)
    for i in range(len(l) - 1):
        assert l_sorted[i] <= l_sorted[i+1]        

Why property-based tests?

  • Exhaustively comb through your code
  • Write more declarative tests instead of hard-coded examples
  • Verify assumptions
  • Test behaviour you could have never imagined


  • T. Arts found 25 important bugs from Project FIFO
    • 460 lines of test code
    • Covered 60 000 lines of code
    • Timing errors, race conditions, type errors, errors in documentation,...
  • J. W. Norton found sequences of 17 and 31 specific calls that could corrupt databases in Google's levelDB

Why not PBT?

  • Property-based tests do not replace example-based tests
  • Example-based tests are good, for example, for
    • Regression tests
    • Anchors ensuring properties don't "drift"
    • Human-readable examples of input and output
  • Coming up with good properties can be hard and error-prone


Basic concepts

  • Properties
  • Generators
  • Shrinking (not discussed)

Coming up with properties

  • Modeling
  • Invariants
  • Symmetries
  • Generalize example-based tests
    • Replace hard-coded input values with generators


property "finds biggest element" do
  forall x <- non_empty(list(integer())) do
    MyModule.biggest(x) == model_biggest(x)

def model_biggest(x) do


property "symmetric encoding/decoding" do
  forall data <- list({atom(), any()}) do
    encoded = encode(data)
    is_binary(encoded) and data == decode(encoded)


property "a sorted list keeps its size" do
  forall l <- list(number()) do
    length(l) == length(MyModule.sort(l))


  • Generators generate test data
    • Integers, lists, objects, one-of, symbolic calls, HTTP requests...
  • Generators are composable
    • Lists of integers, map with strings, symbolic calls with integer arguments
  • Generators can be resized, transformed, and restricted
    • Lists with more than one million values
    • List of HTTP requests where POST requests always precede GET requests
    • Only non-leap-day dates

Fancy generators

Generator statistics

  • Very important to understand what is (not) being tested
  • Example output from hypothesis.event:
      - Events:
        *  63.37%, input list length in range 0-5
        *  27.72%, input list length in range 5-10
        *   3.96%, input list length in range 10-15
        *   2.97%, input list length in range 15-20
        *   0.99%, input list length in range 35-40


Repository: ksaaskil/introduction-to-property-based-testing

Stateful test execution

  • PropEr divides execution of stateful tests into abstract and real phases
  • Abstract phase
    • is only based on the model: it lacks postconditions or actual calls to the system
    • defines the sequences of commands run in the real execution phase
  • Illustration from [1]:

Real execution phase

  • Apply the commands from the abstract execution phase to the real system
  • Also check postconditions as well as preconditions
  • Important: model defines tried sequences of operations, the actual system never drives the test