Tuesday, 18 August 2009

Dynamic programming: Levenshtein edit distance

The F#.NET Journal just published an article about dynamic programming:

"Levenshtein distance is a metric used to measure the amount of difference between a pair of sequences in terms of the number of insertions, deletions and substitutions required to get from one sequence to the other. Computing the Levenshtein distance is an interesting problem in dynamic programming with many practical applications including spell checkers and the study of DNA in bioinformatics. This article describes how the basic algorithm may be implemented and then optimized, including the use of both generic and specialized memoization..."

To read this article and more, subscribe to The F#.NET Journal today!

Advantages of Pattern Matching

Following the May 2009 CTP release of F#, this new programming language is finally starting to gain some serious traction. However, many people trying to learn about the advantages and disadvantages of F# are being misled by much of the oversimplified journalistic material being made available. We shall endeavour to address this by releasing short articles in the form of blog posts that augment the common information with valuable insights.

Perhaps the most common misconception is that functional programming is the primary advantage of the F# programming language over alternatives like C#. This is misleading for two main reasons. Firstly, the latest C# 3.0 version provides extensive support for funtional programming including lambda functions and the .NET 3.5 framework is already taking advantage of this with the System.Func and System.Action generic delegates. Secondly, the F# programming language provides a wealth of other valuable features that are as advantageous as functional programming but which are not found in any other mainstream programming languages. Pattern matching over variant types is one such feature.

Grouping code by function rather than class lets you use pattern matching, which is far more powerful and efficient than alternatives such as method dispatch in object-oriented programming:

  • Pattern matches can act upon ints, floats, strings and other types as well as objects. Method dispatch requires an object.
  • Pattern matches can act upon several different values simultaneously: parallel pattern matching. Method dispatch is limited to the single this case in mainstream languages.
  • Patterns can be nested, allowing dispatch over trees of arbitrary depth. Method dispatch is limited to the non-nested case.
  • Or-patterns allow subpatterns to be shared. Method dispatch only allows sharing when methods are from classes that happen to share a base class. Otherwise you must manually factor out the commonality into a separate member (giving it a name) and then manually insert calls from all appropriate places to this superfluous function.
  • Pattern matching provides exhaustiveness and redundancy checking which catches many errors and is particularly useful when types evolve during development. Object orientation provides exhaustiveness checking (interface implementations must implement all members) but not redundancy checking.
  • Non-trivial parallel pattern matches are optimized for you by the F# compiler. Method dispatch does not convey enough information to the compiler's optimizer so comparable performance can only be achieved in other mainstream languages by painstakingly optimizing the decision tree by hand, resulting in unmaintainable code.
  • Active patterns allow you to inject custom dispatch semantics.

Interestingly, although F# is one of the world's first production-quality functional programming language implementations, it still provides features like or-patterns and active patterns that are not available in other academic functional languages such as Haskell.

You can learn all about pattern matching in F# from the F#.NET Journal article "How to Leverage Pattern Matching" (16th April 2007).

Tuesday, 4 August 2009

Optimizing the Burrows-Wheeler Transform

The F#.NET Journal just published an article about data compression:

"The Burrows-Wheeler Transform (BWT) is one of the few preconditioners used in data compression and, in particular, is the core of the popular bzip2 compression utility. This article describes a simple 12-line implementation of the BWT in F# and progressively optimizes the implementation to use a customized quicksort then shared-memory parallelism. On 8 cores, the final optimized implementation is over 1,000× faster than the original..."

To read this article and more, subscribe to The F#.NET Journal today!