Models for the prediction error distribution in losslessly encoded gray scale images are explored. The prediction error distribution results from the use of linear predictors but the techniques used in the paper may also be applied to distributions which arise from the use of other methods to encode losslessly digital images. A compact method for representing the prediction error distribution for 12-bit grey scale images is used and the trade-off between space required for a distribution and the use of multiple distributions is investigated. Models considered include zeroth and first order Markov models. The variation of the prediction error distribution over the image is considered and shown to be important in achieving better compression. Choosing the predictor formula adaptively is also investigated.