Sunday, July 26, 2009

adventures in ignorance: each, keys, and values

So, each, keys, and values all use the same iterator under the covers. I knew this and it is the reason I almost never use each. There is just too much chance of each leaving the iterator partway through the hash. For instance, the following code is fine:
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
But this code has a subtle bug just waiting to bite you:
eval {
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
1;
} or do {
die $@ unless $@ eq "We're all fine here now, thank you. How are you?\n";
}
If the eval block dies with the acceptable message then the code will continue on with a borked iterator. A more common mistake is the use of last in a loop using each:
while (my ($key, $value) = each %hash) {
last unless do_something($key, $value);
}
Adding or removing items from the hash while using each can also bite you. So, all of those little problems means I tend to write the loops above like this:
for my $key (keys %hash) {
my $value = $hash{$key};
do_something($key, $value);
}
All of this means I only dust off each and try to remember all of its issues when I know the number of keys in the hash (or the size of the keys themselves) is going to be huge in relation to memory (which generally means it is a tied dbm). However, it just struck me today that this behavior has can be used for good. I want to loop over a bunch of hash entries checking to see if the are all equal to each other. Now I could say:
my ($first, @others) = values %hash;
die "bad" if grep { $first ne $_ } @others;
but I could also say:
my ($key, $value) = each %hash;
die "bad" if grep { $value ne $_ } values %hash;
And the second bit of code is between twice as fast (tiny hashes) and five times as fast (mid-size and up):
#!/usr/bin/perl

use strict;
use warnings;

use Carp;
use Benchmark;

sub benchmark {
my $subs = shift;

my %results;
for my $sub (keys %$subs) {
$results{$sub} = $subs->{$sub}->();
}
my ($k, $v) = each %results;
croak "bad" if grep { $v ne $_ } values %results;

Benchmark::cmpthese -1, $subs;
}

for my $n (10, 100, 1_000, 10_000) {
my %h = map { $_ => $_ } 1 .. $n;
print "for $n:\n";
benchmark {
values => sub {
my ($first, @others) = values %h;
return join "", $first, @others;
},
each => sub {
my ($k, $v) = each %h;
return join "", values %h;
},
};
}
And yes, the reason I started thinking about this is the bit in the benchmark function. Of course, now that I am looking at it with a critical eye, I see it should be
my ($k, $sub) = each %$subs;
my $value = $sub->();
croak "bad" if first { $value ne $_->() } values %$subs

1 comment:

  1. You can use each in scalar context – it will return just the key. No need for the throw-away $value declaration.

    Also, keys in void context will reset the iterator without doing anything else; this is documented behaviour. Therefore you can safely use each in loops that won’t ever call any code that’s not under your control – eg. simple construction of a string from the data in a hash.

    ReplyDelete

Some limited HTML markup is allowed by blogger: strong, b, i, and a. You may also use em, but I have repurposed it through the magic of CSS to be behave very much like <tt><code></code></tt>.