PHP-php过滤重复中英文字符串

PHP-php过滤重复中英文字符串

灵芸 发布于 2017-07-07 字数 442 浏览 1225 回复 1

从数据库中采集了一大部分名字中英文火星文都有,为了针对这些字符进行唯一字符过滤

$chars=file_get_contents("log");

$chara=array();
$charss="";
$len=iconv_strlen($chars);
for($i=0;$i<$len;$i++){

$char=iconv_substr($chars, $i,1);

if(!iconv_strpos($charss,$char,"utf8")){
$charss.=$char;
}
}

echo $charss;

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

灵芸 2017-08-02 1 楼

个人觉得就是2点优化

1、分块,分块以后遍历不用每次从头开始定位
2、快速定位,可以将字符验证改成整数验证

<?php
$str = "中国人abc蓅氓浍娬朮国人ac娬朮";

$str = str_pad('', 1024*1024, $str);

mb_internal_encoding("UTF-8");

$time = time();
$len = mb_strlen($str);

// 按照千字一组
$group_size = 500;
$group_total = ceil($len / $group_size);
$chars = array();
$result = '';
for($i = 0; $i < $group_total; $i++) {
$tmp = mb_substr($str, 0, $group_size);
// 这里如果处理1组字符了,就将前1组理掉
$str = mb_substr($str, $group_size, $len > $group_size ? $len - $group_size : $len);
$len = mb_strlen($str);
if($i % 50 == 49) {
printf("process %d groups, total %d groups, run time %dn", $i + 1, $group_total, time()-$time);
}

// 处理字符
$tmp_len = $i < $group_total - 2 ? $group_size : mb_strlen($tmp);
for($j = 0; $j < $tmp_len; $j++) {
$char = mb_substr($str, $i, 1);
$num = hexdec(bin2hex($char));
if(isset($chars[$num])) {
continue;
} else {
$chars[$num] = 1;
$result .= $char;
}
}
}

var_dump($result);

输出

 process 50 groups, total 870 groups, run time 4
process 100 groups, total 870 groups, run time 9
process 150 groups, total 870 groups, run time 13
process 200 groups, total 870 groups, run time 18
process 250 groups, total 870 groups, run time 22
process 300 groups, total 870 groups, run time 27
process 350 groups, total 870 groups, run time 31
process 400 groups, total 870 groups, run time 35
process 450 groups, total 870 groups, run time 40
process 500 groups, total 870 groups, run time 44
process 550 groups, total 870 groups, run time 48
process 600 groups, total 870 groups, run time 52
process 650 groups, total 870 groups, run time 56
process 700 groups, total 870 groups, run time 60
process 750 groups, total 870 groups, run time 65
process 800 groups, total 870 groups, run time 69
process 850 groups, total 870 groups, run time 73
string(27) "氓娬蓅cab人国朮中浍"