jeudi 4 décembre 2014

C++: erasing elements of std::vector using a lambda

Removing elements from a vector is a task that one can encounter pretty often and that isn't as easy as one could think.

The simplest case is when the index of the unwanted element is known. The std::vector class provides a first form of the erase() member function that takes an (const) iterator as argument.

Thus, if I want to remove, say the 10th element, it's as easy as:

    std::vector<whatever> myVec;
//... fill with more than 10 elements
    myVec.erase( myVec.begin() + 9 );

And if you want to remove the 3 elements between positions 10 and 12, it will be the second form of this function, which has two arguments:

    myVec.erase( myVec.begin() + 9, myVec.begin() + 12 );

(Yes, the second argument defines the first one you want to keep)

But what happens when you want to remove elements based on their value ? Say remove all elements that have value foo (assuming that value is of type whatever).

This is a task for std::remove(). It actually does not remove anything, it just switches element around so that the ones to be erased will be at the end, and it returns an iterator pointing on the first element to be erased. The next step is to feed that iterator to std::vector::erase().

The code will using its second form:

    myVec.erase(
        std::remove(            // returns iterator on
            myVec.begin(),      // first element to
            myVec.end(),        // be removed
            foo
        ),
        myVec.end()
    );

(This is known as the Erase–remove idiom.)

Next, what if you want to remove elements based on some property they have ? Consider for example a vector of vectors:

   std::vector<std::vector<Whatever>> myVec2;

And now the task is to remove elements that hold less than 2 elements. Okay, so we need to check every element, and decide to remove it or not.

This is a task for the second form of that same algorithm, remove_if(). Instead of a value, it takes as third argument a predicate, and will "remove" (move, actually) the considered element if that predicate returns "true". A predicate is usually implemented as a functor, which is an object of some class that defines the operator() and that returns a bool, based on the given value.

At first, this seems like a harsh constraint, as no one wants to declare a class for such a trivial task. But before C++11 came out, that was required (unless, maybe, using some Boost library). Or else, we needed to iterate through the vector and test each element, copy it or not, and swap:

   vector<vector<whatever>> newv;
   newv.reserve( myVec.size() ); // to avoid resizing when using push_back
      for( size_t i=0; i < myVec.size(); i++ )
         if( myVec[i].size()<MinSize )
            newv.push_back( myVec[i] );
   std::swap( myVec, newv );

This is where C++11 and lambdas come in. A lambda can be seen as a sort of "anonymous inline function", that captures variables in scope. Here, as the function iterates over all the elements, each of them will be a std::vector.

A lambda is made of three parts:

  • [how capturing variables happens],
  • (the functions arguments),
  • {The body of the function}.

The complete code:

   std::size_t MinSize = ... (some value);
   myVec2.erase(
      std::remove_if(
         myVec2.begin(),
         myVec2.end(),
         [&]( const std::vector<whatever>& vw ) // lambda
            { return vw.size() < minsize; } 
     ),
     myVec2.end()
   );

More on c++ lambdas.

mercredi 14 mai 2014

Subversion: colordiff fo html files

(Mostly a reminder:)

Say you have some software project, hosted on some Subversion repository. You happily edit your files, and before committing you want to have a quick look at the edits you have done.

No problem, as simple as:
> svn diff 

But this outputs lots of text, not easily readable. Ok, lets' go with colordiff:
> svn diff | colordiff

And then you get drowned under floods of nice and flashy colors, and you have to painfully scroll your terminal. Well, what else ? A simple redirecting in a file won't keep the colors.

This is where another magic tool shows up: aha. Yeah, that's his name. It's a "ANSI to HTML" converter. Install it with sudo apt-get install aha, and then, go:
svn diff | colordiff | aha >mydiff.html

For conveniency, you can now add a new target to you makefile:
diff:
     svn diff | colordiff | aha >mydiff.html
     xdg-open mydiff.html
Thus, entering make diff at the shell will show you the current edits you have done up to now.
xdg-open is just the Gnome app that opens the default application associated with a file type. On Windows, just use the file name alone, as this OS has some mechanism to open the file with the default application when given a file name.

Edit 20141224: a small improvement: in order to keep track of all these generated diff files, you can append date/time to the filename so that each new one doesn't erase the previous one. This can be done easily with bash (not that hard for Windows either, but no time at present to figure that out):

ifndef COMPSPEC
NOW=$(shell date +%Y%m%d_%H%M)
BROWSER=xdg-open
endif

diff:
     svn diff | colordiff | aha >mydiff_$(NOW).html
     $(BROWSER) mydiff_$(NOW).html

Notice the conditional, so that this makefile should also work out-of-the-box under Windows (except for the time/date, but if you send it to me, I'll publish it ;-) )

vendredi 11 avril 2014

State machine diagrams with Graphviz

Once in a while, I need to draw a simple state machine diagram. These are a quick way to show in a visual way how a system works.

While these can be drawn with general drawing tools, or even with more dedicated tools, I usually prefer the textual way. Describing a drawing through some design language with an acceptable learning curve and letting some application do the drawing is IMO a better approach: editing the graph afterwards is just a matter of editing the source file.

Okay, so what tool ? Some are... funny (and I mean it!), but are not really usable for anything more than a small graph, or anything that needs to be edited many times.

No, this post is about Graphviz and it's associate set of tools. It does truely have some oddities buts its the best around.
Among its oddities, the default size units are in distance units. For an image, yes, no pixels here. Wait, that's not all ! The default units are inches. Inches !
I suppose this is for historical reasons and it seems that there is no option to change this. After thinking about it, once you go with distance, then, as metrication of image density is still not used, you might as well stay with inches.

Another thing, don't expect to be able to define precisely the position of nodes and edges: these are done by a placement algorithm, and adjusting this is not easy although some commands should help.

Nevertheless, say you want to describe some simple state machine. Just a light bulb connected to a switch.

Elementary graph

The associate state machine can be described by the following text file:
digraph g{
   rankdir="LR";
   edge[splines="curved"]
   ON -> OFF;
   OFF -> ON;
}

Assuming you have Graphviz correctly installed, the following shell command will generate the image:

dot -Tpng:cairo myfile.dot >myfile.png   


Transitions

Ok, now lets add the transitions (the switch action). Lets call it "sw": if sw=1, the light bulb will be on, if 0, it will be off.

Ah. Here some problem appear. In the field of electrical engineering, the transitions between the states are frequently based on some boolean variable. Notation for the complement operation seems to be country-dependent. In France, this is usually expressed by a bar over the expression ("sw barre"), in Latex math-syntax, it will be $\bar{sw}$.

So, how can we manage this issue ?

First (and easiest), forget about the "bar" thing, and just go for plain text:

digraph g{
   rankdir="LR";
   edge[splines="curved"]
   ON -> OFF [label="sw=0"];
   OFF -> ON [label="sw=1"];
}




This is not very satisfying, it clutters the diagram.

Second solution: use Unicode. Graphviz natively supports it, and Unicode provides some special character that is supposed to handle this situation. So just enter:
digraph g{
   rankdir="LR";
   edge[splines="curved"]
   OFF -> ON [label="sw"];
   ON -> OFF [label="s̅w̅"];
}
(sorry, seems that the current hosting of this blog does not correctly display this, this is why the bar isn't exactly over the two letters).
 
In GTK+ based apps (Gedit, for instance), Unicode can be entered by hitting CTRL+SHIFT+U, then entering the desired character code (here '+0305') after each letter. Here, you need to do this manually after each letter of the label .

Unfortunately, the final rendering depends on the font used by the layout engine. It seems that the default png output of Graphviz does not use the Cairo library. Or if it does, it does not provide any control on the used font, so the final result looks quite ugly:


Direct insertion into Latex source file

If the graph image is intended to end up in a Latex source file, then check out the Graphviz package. It allows you to insert directly the graph command into the main Latex document. Unfortunately, this does not mean you suddenly have all the associated formatting power: this package only calls the 'dot' command himself, the only benefit is that you don't have to do it yourself and then import the image file into the Latex document. So for the issue detailed up here, it is of no help.

Another tool, dot2tex, has been specifically designed to have it all: direct editing of dot file inside Latex file and Latex formatting for labels and edges. Basically, it converts the dot file into PSTricks and/or PGF/TikZ format using some Python magic, then process it as regular Latex code.
Unfortunately, installation on my machine seems to suffer from some obscure Python bug, so I can't tell more at present! I hope to be able to try this soon.

Edit 2015/05: for more precise positioning of your nodes and vertices and better rendering, you'd better go off with a Latex-based solution. Tikz seems to be the easiest, see for example this sample.
For my own record, here are some relevant links: