Saturday, 12 October 2013

A thread-safe object cache that doesn't leak

Consider the problem of lazily initializing objects from handles. Requesting the object corresponding to any given handle more than once should return the same physical object that the first such call returned. The simplest solution is to use a dictionary to memoize a create function:

let get create =
  let cache = System.Collections.Generic.Dictionary()
  fun id ->
    let mutable obj = Unchecked.defaultof<_>
    if cache.TryGetValue(id, &obj) then obj else
      obj <- create id
      cache.[id] <- obj
      obj

But this is not thread safe. No problem, thanks to .NET the code is actually simpler using a concurrent dictionary:

let get create =
  let cache = System.Collections.Concurrent.ConcurrentDictionary<int, 'a>()
  fun id -> cache.GetOrAdd(id, System.Func<_, _>(create))

The problem with this approach is that objects never have their entries removed from the dictionary and, therefore, this solution leaks memory. For example, the following test:

do
  let get = Leaks.get (fun n -> [n])
  for i in 1..1000000000 do
    ignore(get i)
    if i &&& (i-1) = 0 then
      printfn "%d created, %dMB in use"
        i (System.GC.GetTotalMemory false / 1048576L)

Produces this output, showing that the program has leaked away over a gigabyte of memory in just a few seconds:

1 created, 0MB in use
2 created, 0MB in use
4 created, 0MB in use
4194304 created, 331MB in use
8388608 created, 649MB in use
16777216 created, 1297MB in use

Fortunately, this mistake is easy to fix on .NET by using a concurrent weak dictionary that has strong keys and weak values. One such implementation is available here. This challenge may then be solved correctly in just 10 lines of F# code as follows:

let get create =
  let cache = TvdP.Collections.ConcurrentWeakDictionaryStrongKeys()
  fun id ->
    lock cache (fun () ->
      let mutable value = Unchecked.defaultof<_>
      if cache.TryGetValue(id, &value) then
        value
      else
        cache.[id] <- create id
        cache.[id])

With this solution we find that the memory usage never rises beyond 20MB:

1 created, 0MB in use
2 created, 0MB in use
4 created, 0MB in use
4194304 created, 17MB in use
8388608 created, 13MB in use
16777216 created, 16MB in use

To learn more, subscribe to the F# Journal today!

No comments: