The use of log binomial regression, regression on binary outcomes using a log link, is becoming increasingly popular because it provides estimates of relative risk. However, little work has been done on model evaluation. We used simulations to compare the performance of five goodness-of-fit statistics applied to different models in a log binomial setting, namely the Hosmer–Lemeshow, the normalized Pearson chi-square, the normalized unweighted sum of squares, Le Cessie and van Howelingen's statistic based on smoothed residuals and the Hjort–Hosmer test. The normalized Pearson chi-square was unsuitable as the rejection rate depended also on the range of predicted probabilities. The Le Cessie and van Howelingen's test statistic had poor sampling properties when evaluating a correct model and was also considered to be unsuitable in this context. The performance of the remaining three statistics was comparable in most simulations. However, using real data the Hjort–Hosmer outperformed the other two statistics.