What's the best way to manipulate and store huge ordered lists or hashes?
I have a a simple ordered list that could contain 1 million or more items. There are only a few actions that are done with this list:
- lookup in a value exist
- find the index for a value
- find value for index
- add a value
- get number of items in the list
Once a value is added to the list, it never changes. I append items to the list, no insert or delete.
I need to manipulate this big list, and store it persistently. Right now I am using a database Int => String to represent the list, but I think there should me a more efficient way to do that.
I could use memcached, but I think 2 functions are missing:
- persistent storage
- find the index for a value
如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。

评论(2)

How big are your Items? How much memory do you mind using? Are you items unique?
You could probably get away with something like this:
my @list; # This keeps the ordered list
my %keyval; # This maps key to value
my %valkey; # This maps value to key
on each insert you would:
push @list, value;
$valkey{$value} = $#list;
$keyval{$#list} = $value;
And for each of your requirements:
#Existence of a value:
if(exists($valkey{$value}));
#Existence of an index;
if(exists($keyval{$index}));
#value for an index:
$keyval{$index};
#value for a key:
$valkey{$value};
#Size
$#list;

It appears that you also need a String -> Int
mapping table.
In Perl the easiest way to do this is to tie
a hash to a DBM File (see man perltie
).
Sample code, untested, could almost certainly be improved:
use DB_File;
tie %value2index, 'DB_File', 'value2index';
tie %index2value, 'DB_File', 'index2value';
sub index_count() {
return scalar %value2index;
}
sub value_exists() {
my $value = shift;
return exists($value2index{$value});
}
sub append() {
my $value = shift;
if (!value_exits($value)) { # prevent duplicate insertions
my $index = index_count() + 1;
$value2index{$value} = $index;
$index2value{$index} = $value;
}
}
sub find_index() {
my $value = shift;
return $value2index{$value};
}
sub find_value() {
my $index = shift;
return $index2value{$index};
}
Don't use this in a multi-threaded environment, there are non-atomic operations here.
发布评论
需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。