Note: You are viewing a non-styled version of this website. Click here to skip to the content.

Vermonster - Open Standards through Open Source

Vermonster Logo

Using Perl Pipes for Parallel Interprocesses Communication

Posted on October 12th, 2007 under Code, Perl.

A recent Perl-based project called for some parallel processing. I naturally turned first to fork, which I have used successfully in the past. But this time, I needed to share information across the processes - they can’t just run in their isolated worlds. To do this, I had some options, including:

As the title of this article hints, I went with the last one.

Try, Try and Try Again

To preface all of this, my target environment is quite constrained. It is a hardware appliance, with the day-to-day operations being anything but running my code. So I didn’t have the flexibility to install tons of CPAN modules and also couldn’t tweak the environment (basic out-of-the-box Fedora). I had to “just get it done” with the hand I was dealt.

First, I tried using the IPC::Sharable module. But I found that the process I was trying to run was consuming too much memory, especially when a number in parallel. I kept getting an error “Could not create semaphore set: No space left on device”. I think this had to do with the variable being tied could be a fairly complex object.

The next attempt was subs::parallel. In theory, this module should have worked great (it is a really nice module). But my application has a fairly robust ORM (Object Relationship Model) as it follows a MVC pattern, so my code was calling the “parallelized” return variable almost immediately following the subs::parallel call. As the module blocks once this occurs, it was also waiting for the return almost immediately and ended up not being any faster (in fact the overhead was making it a wallclock or two slower). To use this module, I would have to drastically change the existing (and working) application. So I kept looking.

I do have threaded perl available, so I thought I would give it a shot. Every few tests (I am running my own unit tests based on PerlUnit), I was getting core dumps. Not a good thing. When I gdb’d the core dump, I got an invalid format error, so I decided to pass on threads.

This lead me to look at using pipes and simple fork() to communicate from the child processes to the parent.

Basic Concept

The general idea is you can use IO pipes for interprocess communications. That is, much like opening a file handle (which is a form of a pipe) you can open a pipe that connects one handle to another. For instance, consider this:


pipe(FROM_SENDER, TO_READER);

# now you can write to the first handle
print TO_READER "Hello";

# and it will be able to be read from the other
my $message = <FROM_SENDER>;

This is great, but if you combine this with fork(), you will notice it is blocking, which isn’t all that interesting if you want to launch these processes in parallel. You can use IO::Handle to make the pipe non-blocking, but then if two (or more) of the child processes try to write to the pipe at the same time, you can loose messages (yea, I saw this in action).

So to implement a non-blocking, but non-dropping mechanism, the idea it to dynamically create the IO handles. The implementation I used is borrowed from the perl fork documentation, under the Forking pipe open() not yet implemented section.


sub pipe_from_fork ($) {
    my $parent = shift;
    pipe $parent, my $child or die;
    my $pid = fork();
    die "fork() failed: $!" unless defined $pid;
    if ($pid) {
        close $child;
    } else {
        close $parent;
        open(STDOUT, ">&=" . fileno($child)) or die;
    }
    $pid;
}

Now execute the fork process like so:


if (pipe_from_fork('BAR')) {
    # parent
    while (<BAR>) { print; }
    close BAR;
} else {
    # child
    print "pipe_from_forkn";
    exit(0);
}

This allowed me to fork off x-number of processes and make sure the responses are returned without losing any. This is great, but there was one more issue…the messages were strings.

Encoding/Decoding

I had the ability to fork off processes and read the return, but I could only send back simple strings. This was not adequate, as my processes mostly returned objects (or arrays or objects). I decided to use the infamous Data::Dumper to handle my encoding and decoding. So in the child section, the modification is something like this:


use Data::Dumper;

if (pipe_from_fork('BAR')) {
    # parent, read our "string"
    chomp( my $encoded_return = <BAR> );

    # Eval the the string to rebuild
    my $complicated_return = eval($encoded_return);

    close BAR;
} else {
    # child, assume we want to return a complicated thing
    my $complicated_thing = some_thing_that_made_it();

    # Encode it here.
    my $encoded_thing = Data::Dumper->new( [ $complicated_thing ] );
    $encoded_thing->Purity(1)->Terse(1)->Deepcopy(1)->Indent(0);

    # Then print, like before.
    print $encoded_thing;

    exit(0);
}

Putting it Together

This all is working fine, but I wanted more general usage. So I whipped together a quick and dirty perl module with a neat interface. First the interface looks something like this:


#!/usr/bin/perl -w

use SimpleParallel;

my $parallel = SimpleParallel->new;

my ($return_variable1, $return_variable2);

$parallel->processes($return_variable1, sub { sleep(5); return ["hi", "one"]; });
$parallel->processes($return_variable2, sub { sleep(5); return ["hi", "two"]; });

$parallel->execute;

use Data::Dumper;
print "From Process 1n"; print Dumper $return_variable1;
print "n----n";
print "From Process 2n"; print Dumper $return_variable2;
print "n----n";

The idea being to add processes with return variables (references, bound the each process). Then when ready, execute them in parallel. Running this with time show us the process takes roughly 5 seconds (as it should):


$ time perl ./parallel.pl
From Process 1:
$VAR1 = [
          'hi',
          'one'
        ];

----
From Process 2:
$VAR1 = [
          'hi',
          'two'
        ];

----

real    0m5.042s
user    0m0.036s
sys     0m0.000s

The entire library, including POD, is only 150 lines of code. You can download it here: simpleparallel.pm