SvOK(get_sv("main::blog", 0));: adventures in ignorance: each, keys, and values

So, each, keys, and values all use the same iterator under the covers. I knew this and it is the reason I almost never use each. There is just too much chance of each leaving the iterator partway through the hash. For instance, the following code is fine:

while (my ($key, $value) = each %hash) {
    do_something($key, $value);
}

But this code has a subtle bug just waiting to bite you:

eval {
    while (my ($key, $value) = each %hash) {
        do_something($key, $value);
    }
    1;
} or do {
    die $@ unless $@ eq "We're all fine here now, thank you. How are you?\n";
}

If the eval block dies with the acceptable message then the code will continue on with a borked iterator. A more common mistake is the use of last in a loop using each:

while (my ($key, $value) = each %hash) {
    last unless do_something($key, $value);
}

Adding or removing items from the hash while using each can also bite you. So, all of those little problems means I tend to write the loops above like this:

for my $key (keys %hash) {
    my $value = $hash{$key};
    do_something($key, $value);
}

All of this means I only dust off each and try to remember all of its issues when I know the number of keys in the hash (or the size of the keys themselves) is going to be huge in relation to memory (which generally means it is a tied dbm). However, it just struck me today that this behavior has can be used for good. I want to loop over a bunch of hash entries checking to see if the are all equal to each other. Now I could say:

my ($first, @others) = values %hash;
die "bad" if grep { $first ne $_ } @others;

but I could also say:

my ($key, $value) = each %hash;
die "bad" if grep { $value ne $_ } values %hash;

And the second bit of code is between twice as fast (tiny hashes) and five times as fast (mid-size and up):

#!/usr/bin/perl

use strict;
use warnings;

use Carp;
use Benchmark;

sub benchmark {
    my $subs = shift;

    my %results;
    for my $sub (keys %$subs) {
        $results{$sub} = $subs->{$sub}->();
    }
    my ($k, $v) = each %results;
    croak "bad" if grep { $v ne $_ } values %results;

    Benchmark::cmpthese -1, $subs;
}

for my $n (10, 100, 1_000, 10_000) {
    my %h = map { $_ => $_ } 1 .. $n;
    print "for $n:\n";
    benchmark {
        values => sub {
            my ($first, @others) = values %h;
            return join "", $first, @others;
        },
        each => sub {
            my ($k, $v) = each %h;
            return join "", values %h;
        },
    };
}

And yes, the reason I started thinking about this is the bit in the benchmark function. Of course, now that I am looking at it with a critical eye, I see it should be

my ($k, $sub) = each %$subs;
my $value = $sub->();
croak "bad" if first { $value ne $_->() } values %$subs

1 comment:

AristotleAugust 16, 2009 at 2:01 AM
You can use each in scalar context – it will return just the key. No need for the throw-away $value declaration.

Also, keys in void context will reset the iterator without doing anything else; this is documented behaviour. Therefore you can safely use each in loops that won’t ever call any code that’s not under your control – eg. simple construction of a string from the data in a hash.

Some limited HTML markup is allowed by blogger: strong, b, i, and a. You may also use em, but I have repurposed it through the magic of CSS to be behave very much like <tt><code></code></tt>.

SvOK(get_sv("main::blog", 0));

Sunday, July 26, 2009

adventures in ignorance: each, keys, and values

1 comment:

Places to find me

Search This Blog

Blog Archive

About Me

Blogger Syntax Highliter