May 2020
“Real World Haskell”, Bryan O’Sullivan, John Goerzen & Don Stewart. Chapter 25.
Haskell performance tutorial @ CUFP, Johan Tibell:
http://blog.johantibell.com/2010/07/high-performance-haskell-tutorial-at.html
avg :: [Double] -> Double
avg xs = sum / fromIntegral (length xs)
-- sum :: [Double] -> Double
-- length :: [Double] -> Int
main = do
n <- readLn
print (avg [1..n])
How much memory does this program use for input \(n\)?
+RTS
are for the runtime system+RTS -s
prints a summary of runtime statistics$ echo 1e6 | ./Avg +RTS -s
257,391,016 bytes allocated in the heap
280,151,128 bytes copied during GC
67,918,488 bytes maximum residency (9 sample(s))
7,771,496 bytes maximum slop
145 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 482 colls, 0 par 0.143s 0.143s 0.0003s 0.0011s
Gen 1 9 colls, 0 par 0.158s 0.158s 0.0176s 0.0681s
145 MB total memory in use
%GC time 76.3% (76.7% elapsed)
Mon May 25 16:51 2020 Time and Allocation Profiling Report (Final)
Avg +RTS -p -hc -RTS
total time = 0.14 secs (141 ticks @ 1000 us, 1 processor)
total alloc = 224,128,320 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
main Main Avg.hs:(8,1)-(10,20) 58.2 75.0
avg Main Avg.hs:5:1-42 41.8 25.0
...
Run profiling again to view heap allocation by data types.
$ echo 1e6 | ./Avg +RTS -p -hy
# generates `Avg.prof' and `Avg.hp'
$ hp2ps -c -e5in Avg.hp
# generates `Avg.ps' chart
Run profiling again to view which functions are causing heap retention.
avg
retain the entire list rather than consume it one value at a time?avg :: [Double] -> Double
avg xs = loop xs (0,0)
where
-- tail recursive function
loop [] (s,n) = s / fromIntegral n
loop (x:xs) (s,n) = loop xs (x+s,1+n)
-O2
)-O0
)+
, -
, *
, etc.)case
or if
avg [1,2,3]
= loop [1,2,3] (0,0)
= loop [2,3] (1+0,1+0)
= loop [3] (2+1+0,1+1+0)
= loop [] (3+2+1+0,1+1+1+0)
= (3+2+1+0) / fromInt (1+1+1+0)
= 6 / 3
= 2.0
avg [1,2,3]
= loop [1,2,3] (0,0)
= loop [2,3] (1+0, 1+0)
= loop [2,3] (1, 1)
= loop [3] (2+1,1+1)
= loop [3] (3, 2)
= loop [] (3+3,1+2)
= loop [] (6, 3)
= 6 / 3
= 2.0
The primitive function seq
function allows controlling evaluation order.
Evaluation of seq x y
first evaluates x
and then returns y
.
We can use seq
to force evaluation of s
and n
before the tail call.
avg :: [Double] -> Double
avg xs = loop xs (0,0)
where
loop [] (s,n) = s / fromIntegral n
loop (x:xs) (s,n)
= s `seq` n `seq` loop xs (x+s,1+n)
Easier alternative: just use bang patterns.
{-# LANGUAGE BangPatterns #-}
avg :: [Double] -> Double
avg xs = loop xs (0,0)
where
loop [] (!s,!n) = s / fromIntegral n
loop (x:xs) (!s,!n) = loop xs (x+s,1+n)
Recall the higher-order fold operations (also called reduce) that process a list using some argument function.
https://en.wikipedia.org/wiki/Fold_(higher-order_function)
foldr (+) 0 [1, 2, 3, 4, 5]
= 1 + (2 + (3 + (4 + (5 + 0))))
foldl (+) 0 [1, 2, 3, 4, 5]
= ((((0 + 1) + 2) + 3) + 4) + 5
foldr
requires traversing the entire before we can accumulatingfoldl
can start accumulating immediatelyWe could rewrite the average example using a left fold.
avg :: [Double] -> Double
avg xs = s / fromIntegral (n :: Int)
where (s, n) = foldl f (0,0) xs
f (!s,!n) x = (x+s,1+n)
However, even with bang patterns, this still exhibits excessive space use!
foldl
foldl f z [] = z
foldl f z (x:xs) = f (fold f z xs) x
foldl f z [x1, x2, x3, x4]
=
f (f (f (f z x4) x3) x2) x1
For a list with \(n\) elements foldl
builds a a thunk of \(n\) f
applications.
To avoid the excessive lazyness we can use the strict left fold foldl'
(from Data.List
):
foldl'
forces evaluation of f
before each recursive call.
import Data.List (foldl')
avg :: [Double] -> Double
avg xs = s / fromIntegral (n :: Int)
where (s, n) = foldl' f (0,0) xs
f (!s,!n) x = (x+s,1+n)
Note that we still need the bang patterns (!
).
Knowning how GHC represents values in memory is useful
Example: the list [1,2]
.
Nothing
, True
or []
) take zero space because they are shared#
Int#
, Double#
Int
are called boxed (they are stored as small “boxes” in the heap)How many words to store a value of this type?
Answer: 7
Arguments must be passed unevaluated to allow lazy evaluation. Example:
GHC allows controls over data representation using the UNPACK pragma:
Add the pragma just before the strictness annotation (!):
GHC will also unpack automatically if you compile with optimizations.
Unpacking also works for nested data types.
Example: a pair of pairs.
However: there are situations where making fields strict and unpacking them can hurt performance e.g. the data is passed to a non-strict function.