Performance of Haskell programs

Pedro Vasconcelos, DCC/FCUP

May 2020

Performance of Haskell programs

Bibliography

“Real World Haskell”, Bryan O’Sullivan, John Goerzen & Don Stewart. Chapter 25.

Haskell performance tutorial @ CUFP, Johan Tibell:

http://blog.johantibell.com/2010/07/high-performance-haskell-tutorial-at.html

Pros and cons of high-level languages

Reasoning about performance

Profiling

Profiling

Example: computing averages

avg :: [Double] -> Double
avg xs = sum / fromIntegral (length xs)
 -- sum :: [Double] -> Double
 -- length :: [Double] -> Int

main = do
   n <- readLn
   print (avg [1..n])

How much memory does this program use for input \(n\)?

Step 1: Collecting runtime statistics

$ ghc Avg.hs 
$ ./Avg +RTS -s 

Sample output

$ echo 1e6 | ./Avg +RTS -s
     257,391,016 bytes allocated in the heap
     280,151,128 bytes copied during GC
      67,918,488 bytes maximum residency (9 sample(s))
       7,771,496 bytes maximum slop
             145 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       482 colls,     0 par    0.143s   0.143s     0.0003s    0.0011s
  Gen  1         9 colls,     0 par    0.158s   0.158s     0.0176s    0.0681s

Overview of statistics collected

145 MB total memory in use
%GC     time      76.3%  (76.7% elapsed)

Step 2: Profiling

$ ghc -prof -auto-all Avg.hs
$ echo 1e6 | ./Avg +RTS -p -hc
# generates Avg.prof

        Mon May 25 16:51 2020 Time and Allocation Profiling Report  (Final)

           Avg +RTS -p -hc -RTS

        total time  =        0.14 secs   (141 ticks @ 1000 us, 1 processor)
        total alloc = 224,128,320 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                   %time %alloc

main        Main   Avg.hs:(8,1)-(10,20)   58.2   75.0
avg         Main   Avg.hs:5:1-42          41.8   25.0
...

Profiling heap types

Run profiling again to view heap allocation by data types.

$ echo 1e6 | ./Avg +RTS -p -hy
# generates `Avg.prof' and `Avg.hp'
$ hp2ps -c -e5in Avg.hp
# generates `Avg.ps' chart

Profiling retentions

Run profiling again to view which functions are causing heap retention.

$ echo 1e6 | ./Avg +RTS -p -hr
$ hp2ps -c -e5in Avg.hp

Understanding the profile

avg :: [Double] -> Double
avg xs = sum xs / fromIntegral (length xs)

main = print (avg [1..1e6])

Attempt to improve retention

avg :: [Double] -> Double
avg xs = loop xs (0,0)
  where
  -- tail recursive function
  loop [] (s,n)     = s / fromIntegral n
  loop (x:xs) (s,n) = loop xs (x+s,1+n)

Profiling again

Compiled with optimizations (-O2)

Compiled without optimizations (-O0)

Lazy evaluation

Lazy evaluation of average

  avg [1,2,3]
= loop [1,2,3] (0,0)
= loop [2,3] (1+0,1+0)
= loop [3] (2+1+0,1+1+0)
= loop [] (3+2+1+0,1+1+1+0)
= (3+2+1+0) / fromInt (1+1+1+0)
= 6 / 3
= 2.0

Eager evaluation of average

  avg [1,2,3]
= loop [1,2,3] (0,0)
= loop [2,3] (1+0, 1+0)
= loop [2,3] (1, 1)
= loop [3] (2+1,1+1)
= loop [3] (3, 2)
= loop [] (3+3,1+2)
= loop [] (6, 3)
= 6 / 3
= 2.0

Optimizing lazy evaluation

Controling evaluation

The primitive function seq function allows controlling evaluation order.

seq :: a -> b -> b

Evaluation of seq x y first evaluates x and then returns y.

Controlling evaluation (1)

We can use seq to force evaluation of s and n before the tail call.

avg :: [Double] -> Double
avg xs = loop xs (0,0)
  where
   loop [] (s,n) = s / fromIntegral n
   loop (x:xs) (s,n)
       = s `seq` n `seq` loop xs (x+s,1+n)

Controlling evaluation (2)

Easier alternative: just use bang patterns.

{-# LANGUAGE BangPatterns #-}

avg :: [Double] -> Double
avg xs = loop xs (0,0)
  where
   loop []     (!s,!n) = s / fromIntegral n
   loop (x:xs) (!s,!n) = loop xs (x+s,1+n)

Results

Folds

Fold operations

Recall the higher-order fold operations (also called reduce) that process a list using some argument function.

https://en.wikipedia.org/wiki/Fold_(higher-order_function)

Left and right folds

foldr (+) 0 [1, 2, 3, 4, 5]
= 1 + (2 + (3 + (4 + (5 + 0))))

foldl (+) 0 [1, 2, 3, 4, 5]
= ((((0 + 1) + 2) + 3) + 4) + 5

Left vs. right folds

Accumulating using a left fold

We could rewrite the average example using a left fold.

avg :: [Double] -> Double
avg xs = s / fromIntegral (n :: Int)
   where (s, n) = foldl f (0,0) xs
         f (!s,!n) x = (x+s,1+n)

However, even with bang patterns, this still exhibits excessive space use!

The problem with foldl

foldl f z [] = z
foldl f z (x:xs) = f (fold f z xs) x


  foldl f z [x1, x2, x3, x4]
=
  f (f (f (f z x4) x3) x2) x1

For a list with \(n\) elements foldl builds a a thunk of \(n\) f applications.

A strict left fold

To avoid the excessive lazyness we can use the strict left fold foldl' (from Data.List):

foldl' f z [] = z
foldl' f z (x:xs) = let z' = f z x
                    in z' `seq` foldl' f z' xs

foldl' forces evaluation of f before each recursive call.

Constant-space average

import Data.List (foldl')

avg :: [Double] -> Double
avg xs = s / fromIntegral (n :: Int)
   where (s, n) = foldl' f (0,0) xs
         f (!s,!n) x = (x+s,1+n)

Note that we still need the bang patterns (!).

Reasoning about space usage

Reasoning about space usage

Knowning how GHC represents values in memory is useful

Memory layout

Example: the list [1,2].



Memory usage for constructors

   data T a b = C1 a | C2 a b
   -- C1 takes 2 words; C2 takes 3 words

Unboxed types

   data Int = I# Int#

Quiz

data IntPair = IP Int Int

How many words to store a value of this type?



Answer: 7

Why box basic types?

Arguments must be passed unevaluated to allow lazy evaluation. Example:

cond :: Bool -> Int -> Int -> Int
cond True x y  = x
cond False x y = y

> cond True 42 (1`div`0)
42

Unpacking

GHC allows controls over data representation using the UNPACK pragma:

Using UNPACK

Add the pragma just before the strictness annotation (!):

data Foo = Foo {-# UNPACK #-} !Int

GHC will also unpack automatically if you compile with optimizations.

Example of unpacking

data IntPair = IP Int Int

data IntPair = IP {-# UNPACK #-} !Int
                  {-# UNPACK #-} !Int

Nested unpacking

Unpacking also works for nested data types.

Example: a pair of pairs.

data IntQuad = IQ {-# UNPACK -#} !IntPair
                  {-# UNPACK -#} !IntPair

Benefits of unpacking

However: there are situations where making fields strict and unpacking them can hurt performance e.g. the data is passed to a non-strict function.