What's the best way to manipulate and store huge ordered lists or hashes?

What's the best way to manipulate and store huge ordered lists or hashes?

眼泪淡了忧伤 发布于 2021-11-28 字数 771 浏览 911 回复 2 原文

I have a a simple ordered list that could contain 1 million or more items. There are only a few actions that are done with this list:

  • lookup in a value exist
  • find the index for a value
  • find value for index
  • add a value
  • get number of items in the list

Once a value is added to the list, it never changes. I append items to the list, no insert or delete.

I need to manipulate this big list, and store it persistently. Right now I am using a database Int => String to represent the list, but I think there should me a more efficient way to do that.

I could use memcached, but I think 2 functions are missing:

  • persistent storage
  • find the index for a value

如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。

扫码加入群聊

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

作死小能手 2022-06-07 2 楼

How big are your Items? How much memory do you mind using? Are you items unique?

You could probably get away with something like this:

my @list; # This keeps the ordered list
my %keyval; # This maps key to value
my %valkey; # This maps value to key

on each insert you would:

push @list, value;
$valkey{$value} = $#list;
$keyval{$#list} = $value;

And for each of your requirements:

#Existence of a value:
if(exists($valkey{$value}));

#Existence of an index;
if(exists($keyval{$index}));

#value for an index:
$keyval{$index};

#value for a key:
$valkey{$value};

#Size
$#list; 
稍尽春風 2022-06-07 1 楼

It appears that you also need a String -> Int mapping table.

In Perl the easiest way to do this is to tie a hash to a DBM File (see man perltie).

Sample code, untested, could almost certainly be improved:

use DB_File;
tie %value2index, 'DB_File', 'value2index';
tie %index2value, 'DB_File', 'index2value';

sub index_count() {
    return scalar %value2index;
}

sub value_exists() {
    my $value = shift;
    return exists($value2index{$value});
}

sub append() {
    my $value = shift;
    if (!value_exits($value)) { # prevent duplicate insertions
        my $index = index_count() + 1;
        $value2index{$value} = $index;
        $index2value{$index} = $value;
    }
}

sub find_index() {
    my $value = shift;
    return $value2index{$value};
}

sub find_value() {
    my $index = shift;
    return $index2value{$index};
}

Don't use this in a multi-threaded environment, there are non-atomic operations here.