Sunday, 27 December 2009

Evolution: the weasel program

An interesting challenge is called Evolutionary Algorithm and draws upon a famous computer simulation written by Richard Dawkins to demonstrate the power of random variation and non-random cumulative selection in natural and artificial evolutionary systems, and how this process differs from chance.

The program begins with a random string of letters and spawns a generation of 200 random mutations of the parent string, selects the fittest mutation (by similarity to an "ideal" string) and uses that individual to spawn the next generation. The ideal string used is "METHINKS IT IS LIKE A WEASEL".

The results of this program clearly demonstrates how selection can drive random processes to solve problems extremely efficiently: only 6 generations were required to reproduce the ideal string exactly. In fact, randomized algorithms are of great practical importance and a similar technique, simulated annealing, is used as an example in our F# for Technical Computing book.

Our F# solution is as follows:

let target = "METHINKS IT IS LIKE A WEASEL"
let charset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "
 
let rand = System.Random()
 
let fitness (trial: string) =
  Seq.zip target trial
  |> Seq.fold (fun d (c1, c2) -> if c1=c2 then d+1 else d) 0
 
let mutate parent rate _ =
  String.mapi (fun _ c ->
    if rand.NextDouble() < rate then c else
      charset.[rand.Next charset.Length]) parent
 
do
  let mutable parent =
    String.init target.Length (fun _ ->
      charset.[rand.Next charset.Length] |> string)
  let mutable i = 0
  while parent <> target do
    let pfit = fitness parent
    let best, f =
      Seq.init 200 (mutate parent (float pfit / float target.Length))
      |> Seq.map (fun s -> (s, fitness s))
      |> Seq.append [parent, pfit]
      |> Seq.maxBy (fun (_, f) -> f)
    if i % 100 = 0 then
      printf "%5d - '%s'  (fitness:%2d)\n" i parent f
    parent <- best
    i <- i + 1
  printf "%5d - '%s'\n" i parent

This program produces the following output:

    0 - 'CEUMIDXSIXOOTSEHHXVMD IHTFWP'  (fitness: 6)
  100 - 'PEPHIZLB NGSIO LCWE AQEKCSZQ'  (fitness:11)
  200 - 'MESHIZHB IQ IO LTWGGAQWMKSRX'  (fitness:13)
  300 - 'MESHIZHB IQ IO LTWGGAQWMKSRX'  (fitness:13)
  400 - 'METHIVKS ITLIN LYKJPABWDASEU'  (fitness:19)
  500 - 'METHINKS IT IB LIKEFA WDASEL'  (fitness:25)
  518 - 'METHINKS IT IS LIKE A WEASEL'

F# examples on Rosetta Code

Rosetta Code is a wonderful community-driven wiki containing examples of dozens of programming tasks written in dozens of languages.

Naturally, we couldn't resist the challenge to solve many of these tasks in F# and we shall be publicizing our results here and even developing some into full-blown F#.NET Journal articles.

Perhaps the most surprising result is that the F# solutions are extremely concise even when compared with bleeding-edge research languages such as Haskell. For example, the Haskell solution to the Data Munging 2 problem is:

type Date = String
type Value = Double
type Flag = Int
type Record = (Date, [(Value,Flag)])
 
duplicatedDates :: [Record] -> [Date]
duplicatedDates [] = []
duplicatedDates [_] = []
duplicatedDates (a:b:tl)
    | sameDate a b = date a : duplicatedDates tl
    | otherwise    = duplicatedDates (b:tl)
    where sameDate a b = date a == date b
          date = fst
 
numGoodRecords :: [Record] -> Int
numGoodRecords = length . filter recordOk
    where recordOk :: Record -> Bool
          recordOk (_,record) = sumOk == 24
              where sumOk = length $ filter isOk record
                    isOk (_,v) = v >= 1
 
parseLine :: String -> Record
parseLine line = (date, records')
    where (date:records) = words line
          records' = mapRecords records
 
          mapRecords :: [String] -> [(Value,Flag)]
          mapRecords [] = []
          mapRecords [_] = error "invalid data"
          mapRecords (value:flag:tail) =
              (read value, read flag) : mapRecords tail
 
main :: IO ()
main = do
  contents <- readFile "readings.txt"
  let inputs = map parseLine $ lines contents
  putStrLn $ show (length inputs) ++ " total lines"
  putStrLn "duplicated dates:"
  mapM_ putStrLn $ duplicatedDates inputs
  putStrLn $ "number of good records: " ++ show (numGoodRecords inputs)

whereas our F# solution is simply:

let dates = HashSet(HashIdentity.Structural)
let mutable ok = 0
 
do
  for line in System.IO.File.ReadAllLines @"readings.txt" do
    match String.split [' '; '\t'] line with
    | [] -> ()
    | date::xys ->
        if dates.Contains date then
          printf "Date %s is duplicated\n" date
        else
          dates.Add date
        let f (b, t) h = not b, if b then int h::t else t
        let _, states = Seq.fold f (false, []) xys
        if Seq.forall (fun s -> s >= 1) states then
          ok <- ok + 1
  printf "%d records were ok\n" ok

This really highlights the dramatic step forward that F# represents, not only improving upon the previous generation of mainstream programming languages but even leaping beyond research.

Tuesday, 22 December 2009

Zach Cox' word count challenge

Zach Cox published an interesting challenge on his blog recently: to traverse a directory hierarchy of files and compute the total distribution of word frequencies. Here is a simple F# solution:

open System.IO

let rec words_of_dir dir =
  seq { for dir in Directory.GetDirectories dir do
          yield! words_of_dir dir
        for file in Directory.GetFiles dir do
          use reader = new StreamReader(File.OpenRead file)
          while not reader.EndOfStream do
            yield! reader.ReadLine() |> String.split [' '; '.'; ':'; ';'; '?'] }

do
  @"C:\Users\Jon\Downloads\20_newsgroups"
  |> words_of_dir
  |> Seq.countBy id
  |> Seq.sortBy (fun (_, n) -> -n)
  |> printf "%A\n"

Taking only 17.5s at only 14 lines of code, this F# combines the brevity of the most concise solution (Ruby) with the performance of the fastest solution (Scala).

Monday, 21 December 2009

Trending F# jobs in the UK

As a pioneering functional programming language on the world's favorite platform, F# is set to explode onto the job scene following its first major release as part of Visual Studio 2010 on 22nd March 2010.

The ITJobsWatch website is everyone's first stop for analysing trends in the UK job market and, interestingly, their statistics for the F# language indicate that the number of F# jobs is over 4× higher than the same period last year and the average salary has risen to £70k or €79k or US$112k:

This is the fastest growth in demand and salary of any functional programming language! Extrapolating the current trends, demand for F# programmers will be higher than for C++ programmers in 2012!

Suffice to say, the demand and salary of F# programmers in the UK will be a trend to watch over the next couple of years.

Wednesday, 16 December 2009

Generating and visualizing 2D mazes

The F#.NET Journal just published an article about puzzles:

"Maze generation is a remarkably complex and diverse subject that essentially falls into the category of network or graph theory but is most often seen in the context of games and puzzles. The characteristics that make a maze interesting for humans to try to navigate are subjective and not easily defined and, consequently, the design and implementation of an automatic maze generator is as much an art as it is a science. This article describes the design and implementation of a simple but effective maze generation algorithm including WPF-based visualization. The algorithm is elegantly expressed in terms of recursive functions and purely functional data structures..."

To read this article and more, subscribe to The F#.NET Journal today!

Parsing and visualizing binary Geographic Information System data

The F#.NET Journal just published an article about parsing binary file formats:

"Detailed vector descriptions of worldwide geographical data are freely available in the Shapefile format. This article describes how polygonal data can be parsed from the binary Shapefile format and visualized using Windows Presentation Foundation easily and efficiently using F#. The result is a simple program that draws a map of the world and can be used as the basis for a wide variety of Geographic Information System (GIS) applications from cartography to climatology..."

To read this article and more, subscribe to The F#.NET Journal today!

Tuesday, 8 December 2009

Book review: F# for Scientists

James Litsios has kindly published the eighth review of our book F# for Scientists on Amazon and, like every other review, gave the book the highest possible rating of 5/5 stars and described the book as "An excellent book that presents a broad set of examples of use of F#".

We also recently published the F# for Technical Computing book that covers Windows Presentation Foundation, the Task Parallel Library and Message Passing Interface (MPI) for parallel programming, asynchronous workflows for concurrent programming, LINQ for XML, named and optional arguments, standard .NET interfaces for interoperability and dozens of other exciting new features.