Wednesday, September 15, 2010

Dataflow programming in F# and C#




Introduction into dataflow programming



What is dataflow programming all about? In classical imperative programming a program is basically set of
operations working with mutable state thus effectively hiding data paths. Dataflow programming is more like a
series of workers/robots on an assembly line, who execute only when input material arrives. Imperative programming
style can introduce non-determinism in case of concurrent execution (multithreading) without proper
synchronization. In dataflow programming program execution depends on the actual data, or on the data availability
to be precise. Dataflow programming yields completely deterministic programs.


Let’s introduce the concept of dataflow variable which is one of the main concepts of dataflow programming.
Dataflow variable can have two possible states: bound (assigned a value) or unbound (no value has been yet
assigned). Whenever a thread tries to read the value of unbound dataflow variable it gets blocked until some other
thread bounds the variable. Dataflow variable can be bound only once, successive tries to bind the variable will
fail. So, what is dataflow programming? With dataflow variable one can also build blocking queues and streams.
Actor model can be implemented using such blocking queues.

Basically, you can get more information on dataflow programming from this Wikipedia article. Also there is nice article in Groovy GPars guide.


Overview of the article


This article presents basic implementations of dataflow variable in both C# and F#. Also article demonstrates
examples of dataflow programming in C# using futures. The best effect of dataflow programming is achieved in
programming languages that follow declarative model principles. In our case C# is imperative language and
programming in a dataflow style requires developers to be self-disciplined. Surprisingly, but F# being considered
to be a functional programming language, and therefore following declarative programming paradigm, also enables
developers to program in an imperative programming way (via mutable keyword). Adding dataflow variables to C# and
F# does not make them automatically dataflow programming languages, because there is still no necessary syntactic
sugar and language support.

Clojure is one of the most popular modern languages that enable dataflow programming. Clojure supports dataflow
programming through premises. It is also possible to do a dataflow programming in other popular languages like
Groovy, Scala, Ruby using open-source libraries like GPars for Groovy, but all those languages provide no syntactic
support for dataflow variables. As a genuine dataflow programming language I would distinguish Oz programming
language which treats all variables as dataflow variables: reader trying to read an unbound/uninitialized variable
will be blocked until variable is bound/initialized. One one hand it saves us from fameous NullReferenceException
exceptions, but on the other hand it can introduce program hangs.

First I will present implementations in C# and F# and later will dig into the thread synchronization details.

Dataflow variables in C#


Let’s start with the simple example of how to use a dataflow variable in C#.
var variable = new DataflowVariable(); //create variable
variable.Bind(value); //bind variable
int value = 1000 + variable;//read variable

C# is not very extensible when it comes to operator overloading (as you later see in F# implementation) and this is
the reason we are using Bind method here. Actually this is a matter of taste – whether to use operators when
working with dataflow variables or simply properties/functions, but as per me operators look more naturally. What I
love about C# is implicit conversion operators.
Now the code itself:
public class DataflowVariable
{
private readonly object syncLock = new object();
private volatile bool isInitialized = false;
private volatile object value;

private T Value
{
get

{
if(!isInitialized)
{
lock(syncLock)
{
while(!isInitialized)
Monitor.Wait(syncLock);
}
}
return (T)value;
}
set
{
lock (syncLock)
{
if (isInitialized)
throw new System.InvalidOperationException("Dataflow variable can be set only

once.");
else

{
this.value = value;
isInitialized = true;
Monitor.PulseAll(syncLock);
}
}
}
}

public void Bind(T newValue)
{
this.Value = newValue;
}

public static implicit operator T(DataflowVariable myVar)
{
return myVar.Value;
}

}




Dataflow variables in F#


Let’s start with the simple example of how to use a dataflow variable in F#.
let myVar = new DataflowVariable() // create variable
myVar <~ value //bind variable

let value = (1000 + !!myVar) //read variable

Here we use operator (<~) to bind the dataflow variable and operator (!!) to read its value.

Now the code itself:
type public DataflowVariable<'T> () =
class

[<volatilefield>]
let mutable value : option<'T> = None

let syncLock = new System.Object()

member private this.Value
with get() : 'T =
match value with
| Some(initializedVal) -> initializedVal
| None ->
lock syncLock (fun () ->
while (value.IsNone) do
ignore (System.Threading.Monitor.Wait(syncLock))
value.Value)
and set(newVal : 'T) =
lock syncLock (fun () ->

match value with
| Some(_) -> invalidOp "Dataflow variable can be set only once."
| None ->
value <- Some(newVal)
System.Threading.Monitor.PulseAll(syncLock))

static member public (<~) (var:DataflowVariable<'T>, initValue:'T) =
var.Value <- initValue

static member public (!!) (var:DataflowVariable<'T>) : 'T =
var.Value
end



You may have noticed [<volatilefield>] attribute. As per pretty stingy documentation this attribute
effectively replaces volatile keyword in C#, but I haven’t performed thorough testing to verify it. What? F# has no
keyword for volatile fields? And this is as it has to be. Volatile fields belong to the domain of imperative
programming and F#, being first of all functional programming language (which is implementation of declarative
model), tries to avoid shared state (remember mutable keyword?). F# does not support overloading of implicit
conversion operators, that’s why we need some kind of dereferencing prefix operator (!!).
F# implementation is more elegant, because we expose Option type here and thus do not have to deal with
isInitialized field as in case of C# implementation.

Implementation details and some thoughts on thread synchronization


For synchronization in both implementations I have used volatile fields in conjunction with a simple pattern for
Monitor.Wait/Monitor.Pulse. More information regarding Monitor.Pulse/Monitor.Wait you can get in this very nice
article by Joe Albahari.
Volatile fields here are used to prevent instruction reordering and ensure CPU cache synchronization.
Also as an option, instead of using volatile field, we could use here Thread.VolatileRead method (we do not need to
use also Thread.VolatileWrite because actual write is done in within the lock statement which effectively prevents
reordering and flushes and invalidates CPU cache, and anyway Thread.VolatileWrite only flushes the CPU cache but
does not invalidate it). Basically, the static VolatileRead and VolatileWrite methods in the Thread class
read/write a variable while enforcing (technically, a superset of) the guarantees made by the volatile keyword.

Dataflow programming examples in C# and F#


In C# I will demonstrate a simple example of dataflow prorgamming with Parallel extensions library (futures and
continuations). Basically using Task.Factory.ContinueWhenAll one can achieve similar results as with dataflow
variables, but dataflow variables provide developers with much more flexibility.
var input1 = new DataflowVariable<int>();
var input2 = new DataflowVariable<int>();
var output1 = new DataflowVariable<int>();
var output2 = new DataflowVariable<int>();

Task<int> task1 = Task.Factory.StartNew<int>(
() =>

{
output1.Bind(input1 + input2);
return output1*10;
});
Task<int> task = Task.Factory.StartNew<int>(() =>
{
output2.Bind(input1 + output1);
return input1;
});

input1.Bind(333);
input2.Bind(888);

Console.WriteLine(10 + output1 + output2);





Conclusion


Article describes basic implementation of dataflow variables in C# and F# programming languages and basic examples
of dataflow programming using continuations/futures. Please, consider this article as an starting point in a
journey into the world of dataflow programming.






Thursday, November 5, 2009

The best .NET debugger in the world

Recently I've been playing with pretty complex .NET components written by Vitaliy Liptchinsky and at some point came to the conclusion:

The best .NET debugger in the world is... IronPython console.

Why?

Have you ever tried to continuously play, set up new relationships, create complex objects within Visual Studio? Isn't it boring and cumbersome after each modification re-compile and re-run VS project?

Thursday, March 12, 2009

WCF and delayed computations (C# yield keyword)

Never use delayed computations in WCF service contract implementation! Reason: IErrorHandler component is never invoked in this case. If there is any kind of exception within delayed computation, it would be increasingly hard for you to find out the reason even with WCF tracing.
Let's have a detailed look how WCF works (this is very simplified version):
int --> WCF serializer --> some other staff :) --> try { result = CallYourCustomCode() } catch{ CallToErrorHandler() ... } --> WCF serializer (result) --> out
So, if you have delayed computations created with help of yield C# keyword, CallYourCustomCode returns not the actual result, but kind of reference to your implementation. This reference will be resolved and executed during serialization (!). So, any exception during serialization will close wcf channel, get round of IErrorHandler, and produce sensless exception to the WCF client.

Friday, February 27, 2009

F# and Parallel Extensions for .NET

Recently I've posted an article about F# workflows based on Coordination and Concurrency Runtime:
http://www.codeproject.com/KB/net-languages/wf_Fsharp_ccr.aspx

Now I've started thinking that in the same way it would be possible to combine F# with Parallel Extensions for .NET....
Really explosive mixture!!!

Wednesday, November 26, 2008

Transactional repository

What transactional repositories do we know at the moment? Here is a list: SQL Server, MSMQ, file system and registry (in Windows Vista/Windows Server 2008). Is it enough? Does it covers all possible needs of enterprises?
At CodeProject I've described custom implementation of transactional repository based on Enterprise Library Caching Application block.
Transactional Repository implementation described in article above provides basic principles required for implementation of any custom transactional repository that can easily participate in ambient and explicit transactions in .NET.

Friday, November 14, 2008

volatile field and memory barrier: look inside

I've seen a lot of discussions in the web regarding volatile field. I've performed my own small investigation regarding this subject and here is some thoughts on this:

The two main purposes of c# volatile fields are the following ones:

1. Introduce memory barriers for all access operations to this fields. In order to improve performance CPUs store frequently accessible objects in CPU cache. In case of multi-threaded applications this can cause problems. For instance, imagine situation, when one thread is constantly reading some boolean value (read thread) and another one is responsible for updating this field (write thread). Now, if OS will decide to run these two threads on different CPUs, it is possible, that update thread will change value of the field on CPU1 cache and read thread will continue reading this value from CPU2 cache, in other words, it will get the change of thread1 until CPU1 cache is invalidated. Situation can be even worth if two threads update this value.
volatile field introduces memory barriers, which means, that CPU always will read from and write to virtual memory, but not to CPU cache.
Nowadays such CPU architectures as x86 and x64 have CPU cache coherency, which means that any change in CPU cache of one processor will be propagated to other CPUs' caches. And, in it's turn, it means that JIT compiler for x86 and x64 platforms makes no difference between volatile and non-volatile fields (except stated in item #2). Also, multicore CPUs usually have two levels of cache: first level is shared between CPU cores and second one is not.
But, such CPU architectures as Itanium with weak memory model does not have cache coherency and therefore volatile keyword and memory barriers play significant role while designing multi-threaded application.
Therefore, I'd recommend always to use volatile and momemory barriers even for x86 and x64 CPUs, because otherwise you introduce CPU architecture affinitty to your application.

Note: you can also introduce memory barriers by using Thread.VolatileRead/Thread.VolatileWrite (these two method successfully replace volatile keyword), Thread.MemoryBarrier, or even with c# lock keyword etc.

Below are displayed two CPU architectures: Itanium and AMD (Direct connect architecture). As we can see in AMD's Direct Connect architecture all processors are connected with each other, so we have memory coherence. In Itanium architecture CPU are not connected with each other and communicated with RAM through System Bus.


2. Prevents instruction reordering. For instance, consider we have a loop:
while(true)
{
if(myField)
{
//do something
}
}
In case of non-volatile field, during JIT compilation, JIT compiler due to performance considerations can reorder instructions in the fo9llowing manner:
if(myField)
{
while(true)
{
//do something
}
}

In case if you plan to change myField from separate thread, this significant difference, isn't it?


Usually it is recommended to use lock statement (Monitor.Enter or Monitor.Exit), but if you change only one field within this block, then volatile field will perform significantly better than Monitor class.

Friday, October 10, 2008

Custom ThreadPool implementation

.NET Framework BCL contains very nice implementation of thread pool (System.Threading.ThreadPool class).
But this class is not suitable for following scenarios:
1. Long-running operations. Usually for long-running operations it is recommended to use Thread class.
2. ThreadPool is per process. It means, that situation when there is no available threads in ThreadPool can happen pretty often. What if you have very important and emergent work items and do not want rely on such a risk? But pretty often, especially when you have application with number of app domains (like IIS or SQL server) you can run out of threads in thread pool...
3. ThreadPool does not support IAsyncResult. BeginInvoke methods of all delegates internally pass control to ThreadPool, but ThreadPool itself does not support IAsyncResult.

It is just initial version of CustomThreadPool and I plan to extend it in future.

Generally, there are number of strict recommendations when you should use ThreadPool and when you should use Thread class directly or MulticastDelegate.BeginInvoke. Ideally I plan to create ThreadPool that would suit for those scenarios applicable for Thread class and BeginInvoke. The main problem of System.Threading.ThreadPool is that it is per process. So if you have set of very important tasks to do and also have set of third-party assemblies that you host in the application, there is always probability that your important tasks will be delayed. In case of CustomThreadPool you have separate threadpool for each application domain.
Ok, I know, there is number of maximum threads allowed per process and if there is too much threads, thread switch context can be awfull...

The sources of ThreadPool implementation can be found on CodeProject.

The code above is intended to demonstrate an idea how it could be implemented. I haven't tested it carefully, but it seems to be working ;)

Wednesday, October 8, 2008

ESB: WCF implementation

So, at first, in order to implement enterprise service using WCF, we need the idea of generic service, which can handle any incoming messages.
This blog describes how to handle generic messages in WCF service.
This means, that we can serialize any custom serializable object and send it as message to WCF service and service will accept it and will try to process.

But whats next? The question is, how would WCF service know how to process this incoming message.... It can be, for instance business logic related entity or notification about problems in remote component...
Yes, and according to title our WCF service needs to be generic. The idea below describes quite simple WCF service that can handle various incoming messages without any recompilation.
The flow is the following:
1. Retrieve contents of body element
2. Based on some criteria (it can be name of the root node plus namespace name), choose XSLT transformation that is appropriate for given message. XSLT transformation file can be stored as local file or in database.
3. Using XSLT transformation transform incoming XML to XAML (or even to c# code!).
4. Compile resulting XAML.
5. Execute compiled code. The resulting code can do anything: from executing database queries to sending e-mail to administrator

Drawbacks of this approach:
1. You are not able to debug the resulting code
2. It requires a nice tool that would produce XSLT transformation based on given message content and handler (c# code).

Benefits:
1. As new messages are introduced in enterprise, you do not have to modify/redeploy the service. All you need, is to provide WCF service with additional scripts.
2. It is very flexible, because you are not dependent on any message types, so you theoretically can receive and process anything.


P.S.

Yes, I know, instead of XSLT transformation we can deploy assemblies-handlers for each event and dynamically load these assemblies.

Probably, this approach will never get a chance to be implemented... I just wanted to share interesting idea...

Thursday, June 12, 2008

Database connection: static or non-static?

I believe this is quite common question of all developers, that use databases in applications. I’ve seen a lot mistakes regarding this choice.
Is it better to create one static connection and use it thru all the code, or during each database call create new connection object?

The answers are:
In case of DB server (stand-alone database like SQL Server Express/Standard/Enterprise or Oracle) it is always better to create and dispose new connection objects, because almost all db drivers (like ADO.NET, ODBC, Oracle) have such feature as connection pooling and you won’t experience benefits of keeping one connection alive. Static connection can even decrease performance, because in multithreading application single connection object cannot be used simultaneously. Also static connections decrease scalability of applications. Usually connection pooling performs better than your custom code that tries to re-use created connections. There is always exceptions from this situation: if you are going to execute number of SQL statements sequentially, it would be mistake to create new connection for each new statement!

In case of embedded databases (like SQL Server CE) it is better to use static connection, because such kind of databases does not have connection pooling and connection re-creation usually costs a lot.

Wednesday, January 2, 2008

Quite interesting ASP.NET utilization

Hi,I just want to present you quite interesting approach of ASP.NET utilization.

This approach shows how to render html files using ASP.NET pages and server controls. For instance, we have some amount of read-only data which we want to present to the user with high level of user interactivity and ability to print. Let's consider html documents and embedded browser as ActiveX control. Here we already have pretty good printing capability and also we can provide users with rich interactivity using JavaScript. If we could generate html using compiled .aspx pages it would be the best, because we can edit and create web forms in Visual Studio (and also we can use all ASP.NET powerful controls like DataGrid) and then all we have to do is to produce html using generated ASP.NET page handlers.

Flow is as follows:
1. Retrieve data from database
2. Bind ASP.NET page to retrieved data
3. Render ASP.NET page to output html stream in order to retrieve necessary document

The principle of my solution is as follows: create pseudo-HTTPContext (Request, Response, Response stream) and pseudo-Browser -> omit all unnecessary HTTP Modules and directly assign this context to page handler -> launch lifecycle of the page -> retrieve rendered html.I've already created proof-of-concept (this proof-of-concept uses simple controls like button with JavaScript as well as complex controls like DataGrid). Everything works great.

I've looked through web and haven't found anything similar.

Friday, October 5, 2007

typeof(void)???

What returns typeof(void)? void is special keyword in C#. void is related to System.Void type in .NET BCL. System.Void is simple structure with no arguments and default constructor.

Tuesday, September 25, 2007

Be careful with reflection and delegates

For instance, you have pluggable application which runs set of assemblies (plugins), each in separate "sandbox" domain. Next code block included within "malicious" plugin causes application to crash:


class Program
{
static void Main(string[] args)
{
AppDomain newDomain = AppDomain.CreateDomain("NewDomain");
ObjectHandle oh = newDomain.CreateInstance(Assembly.GetExecutingAssembly().FullName, "CLRCrashApp.SomeClass");
((SomeClass)oh.Unwrap()).DoSomething();
}
}

public class SomeClass : MarshalByRefObject
{
private delegate void DoSomethingDelegate();
public SomeClass() : base()
{
}
public void DoSomething()
{
Console.WriteLine("AppDomain is: {0}", AppDomain.CurrentDomain.FriendlyName);
try
{
Activator.CreateInstance(typeof(DoSomethingDelegate), new object[] { null, null });
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
if (ex.InnerException != null)
{
Console.WriteLine(ex.InnerException.ToString());
}
}
finally
{
Console.WriteLine("Finally is executing...");
}
}
}

So, be careful when playing with reflection and delegates. Small mistake can cause crash of your entire application. Event separate AppDomain won't help you handle this.

In order to prevent such "malicious" plug-ins it is necessary to use CLR hosting API and host CLR by your own.

Friday, September 21, 2007

How to precompile method from MSIL to native code without invoking it

In order to precompile method without invoking it you should use System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod method.

Possible situation when this method is extremely necessary: You need to precompile huge assembly in separate thread during application delays and you can't use ngen.exe.

User always complain about lazy load of all screens in .NET applications. This lazy load is caused by JIT compilation...
For instance, you have fat (rich) client application with log-in screen. During log-in process you can pre-compile heavy assemblies in separate thread.
 
Counter