Chapter7——基础算法——哈希、二分

作者: crishawy | 来源:发表于2019-07-27 22:16 被阅读0次

Chapter7——基础算法——哈希、二分
消息传递-缓存-转发流程
算法
2021-04-28
『算法』『数据结构』浅谈二分算法，理解程序员必懂必会的计算机常
剑指Offer.C++.code6-10
高频题
29.算法入门
数据结构与算法-排序/二分查找
数据结构与算法之美笔记——二分查找（下）

1. 题目列表

POJ3349（数组hash）
POJ3274（问题转换+数组hash、树状数组）
POJ2151（待完成，概率DP）
POJ1840（待完成）
POJ2002（待完成，二维树状数组）
POJ2503（STL,字符串二分，字典树）

2. POJ3349——Snowflake Snow Snowflakes

2.1 题目描述

Description

You may have heard that no two snowflakes are alike. Your task is to write a program to determine whether this is really true. Your program will read information about a collection of snowflakes, and search for a pair that may be identical. Each snowflake has six arms. For each snowflake, your program will be provided with a measurement of the length of each of the six arms. Any pair of snowflakes which have the same lengths of corresponding arms should be flagged by your program as possibly identical.

Input

The first line of input will contain a single integer n, 0 < n ≤ 100000, the number of snowflakes to follow. This will be followed by n lines, each describing a snowflake. Each snowflake will be described by a line containing six integers (each integer is at least 0 and less than 10000000), the lengths of the arms of the snow ake. The lengths of the arms will be given in order around the snowflake (either clockwise or counterclockwise), but they may begin with any of the six arms. For example, the same snowflake could be described as 1 2 3 4 5 6 or 4 3 2 1 6 5.

Output

If all of the snowflakes are distinct, your program should print the message:
No two snowflakes are alike.
If there is a pair of possibly identical snow akes, your program should print the message:
Twin snowflakes found.

Sample Input

2
1 2 3 4 5 6
4 3 2 1 6 5

Sample Output

Twin snowflakes found.

2.2 解决思路

给定n个数组，判断n个数组是否存在至少两个数组按顺时针或逆时针顺序相同。

枚举判断时间复杂度O(n^{2})超时，
使用hash思想：将每个数组映射到一个key 
 
    由于本题中数组不考虑顺序，因此，我们考虑将数组的
    和作为该数组的唯一标识(key)。
    
    第一步的hash基于这样的事实，如果两个数组和不同，则
    一定不同。通过第一步的hash，基本能够区分两个不同的数组，
    但不完全。
    
    因为两个数组和相同，他们也不一定相同， 
    因此，每一个key值可能存在不同的数组，
    
    对于某个数组，我们遍历其key对应的其他数组，
    判断是否完全匹配即可。

2.3 代码

#include <cstdio>
#include <cmath>
#include <vector>
#include <cstring>
#include <algorithm>
#include <ctime>
using namespace std;
/*
    给定n个数组，判断n个数组排序过后，是否存在
    至少两个数组相同。
    
    枚举判断时间复杂度O(n^{2})超时，
    使用hash思想：将每个数组映射到一个key 
     
        由于本题中数组不考虑顺序，因此，我们考虑将数组的
        和作为该数组的唯一标识(key)。
        
        第一步的hash基于这样的事实，如果两个数组和不同，则
        一定不同。通过第一步的hash，基本能够区分两个不同的数组，
        但不完全。
        
        因为两个数组和相同，他们也不一定相同， 
        因此，每一个key值可能存在不同的数组，
        
        对于某个数组，我们遍历其key对应的其他数组，
        判断是否完全匹配即可。 
*/ 
const int prime = 999983; // prime建议取值10n内的最大素数，能够最大化的离散key，最小化冲突 
const int maxn = 100010;
vector<int> hashtable[prime + 1];
int a[maxn][7];

bool judge(int a[], int b[], int len){
    // 顺时针或逆时针匹配 clockwise or counterclockwise
    for (int i = 0; i < 6; i++){
        if (a[0] == b[i] && a[1] == b[(i + 1) % 6] &&
        a[2] == b[(i + 2) % 6] && a[3] == b[(i + 3) % 6]
        && a[4] == b[(i + 4)] % 6 && a[5] == b[(i + 5) % 6]
        || a[0] == b[i] && a[1] == b[(i + 5) % 6]
        && a[2] == b[(i + 4) % 6] && a[3] == b[(i + 3) % 6]
        && a[4] == b[(i + 2) % 6] && a[5] == b[(i + 1) % 6]
        )
            return true;
    }
    return false; 
}

int main(){
    int n, e;
    scanf("%d", &n);
    for (int i = 0; i < n; i++){
        int sum = 0;
        for (int j = 0; j < 6; j++){
            scanf("%d", &e);
            a[i][j] = e;
            sum += e;
        } 
        sum %= prime;
        hashtable[sum].push_back(i); // 加入该数组的索引 
    }
    bool flag = false;
    for (int i = 0; i < n; i++){
        if (flag) break;
        int sum = 0, cnt = 0;
        for (int j = 0; j < 6; j++)
            sum += a[i][j];
        sum %= prime;
        // 如果 
        for (int k = 0; k < hashtable[sum].size(); k++){
            if (judge(a[i], a[hashtable[sum][k]], 6)){
                if (++cnt > 1){
                    flag = true;
                    break;
                }
            }
        }
    }
    if (flag) printf("Twin snowflakes found.\n");
    else printf("No two snowflakes are alike.\n");
    return 0;
}

注意：

散列时的prime应选择10n以内的最大素数，可以最大化离散程度，最小化冲突。

3. POJ3274——Gold Balanced Lineup（重要）

3.1 题目描述

Description

Farmer John's N cows (1 ≤ N ≤ 100,000) share many similarities. In fact, FJ has been able to narrow down the list of features shared by his cows to a list of only K different features (1 ≤ K ≤ 30). For example, cows exhibiting feature #1 might have spots, cows exhibiting feature #2 might prefer C to Pascal, and so on.

FJ has even devised a concise way to describe each cow in terms of its "feature ID", a single K-bit integer whose binary representation tells us the set of features exhibited by the cow. As an example, suppose a cow has feature ID = 13. Since 13 written in binary is 1101, this means our cow exhibits features 1, 3, and 4 (reading right to left), but not feature 2. More generally, we find a 1 in the 2^(i-1) place if a cow exhibits feature i.

Always the sensitive fellow, FJ lined up cows 1..N in a long row and noticed that certain ranges of cows are somewhat "balanced" in terms of the features the exhibit. A contiguous range of cows i..j is balanced if each of the K possible features is exhibited by the same number of cows in the range. FJ is curious as to the size of the largest balanced range of cows. See if you can determine it.

Input

Line 1: Two space-separated integers, N and K.
Lines 2..N+1: Line i+1 contains a single K-bit integer specifying the features present in cow i. The least-significant bit of this integer is 1 if the cow exhibits feature #1, and the most-significant bit is 1 if the cow exhibits feature #K.
Output

Line 1: A single integer giving the size of the largest contiguous balanced group of cows.

Sample Input

7 3
7
6
7
2
1
4
2
Sample Output

4
Hint

In the range from cow #3 to cow #6 (of size 4), each feature appears in exactly 2 cows in this range

3.3 解决思路

该问题本质是求最大连续相等的列和长度，需要转化问题

第一个关键点在于引入sum[n][k]数组，其中sum[i][j]为
第1头牛到第i头牛特征为j的个数(类似于dp)。从而将i~j范围
的比较计算转换为i、j的比较计算，并初始化sum[0][1~k]=0
 
如果满足banlanced,则从第i头牛到第j头牛的每个特征个数 
之和相等，即
    sum[i][0] - sum[j][0] = sum[i][1] - sum[j][1] =
    ... = sum[i][k-1] - sum[j][k-1]

为了使用hash方法，这里最巧妙的转化是将深度的纵向比较 
计算转为短平的横向计算。 
上式通过移项可以转化为：
    sum[i][1] - sum[i][0] = sum[j][1] - sum[j][0],
    sum[i][2] - sum[i][0] = sum[j][2] - sum[j][0],
    ...
    sum[i][k-1] - sum[i][0] = sum[j][k-1] - sum[j][0]
因此，将sum[i][l]和sum[j][l],l=0,1,...,k-1,都减去第一
个值得到新的序列a[n][k]，只需要求解相距最远的相同两行间
的距离。
由于寻找的数组长度过长达到1000000，因此将行数组hash，
快速搜索，key(i) = abs(sum(a[i][j])),j=0,1,...,k-1
注意处理非负的key。

本题有三个关键点：
1. 如何求解最大连续区间问题：定义sum数组，记录前i个
指标和。
2. 如何高效查询数组中的给定值，根据数组元素进行hash，
若数组元素是一个数组的话，使用数组和作为key；若数组
元素是string的话，使用set或map实现hash
3. key的选择：10n内的最大素数

3.4 代码

#include <cstdio>
#include <cmath>
#include <algorithm>
#include <vector>
#include <set>
using namespace std;

const int maxn = 100010;
const int maxk = 35;
const int prime = 99983; 
int sum[maxn][maxk], a[maxn][maxk]; 
vector<int> hashtable[prime + 1];
set<int> key_rec;  // 使用set存储key

bool judge(int a[], int b[], int len){
    for (int i = 0; i < len; i++)
        if (a[i] != b[i])
            return false;
    return true;
}
void output(int n, int k){
    for (int i = 0; i < n; i++){
        for (int j = 0; j < k; j++)
            printf("%d", sum[i][j]);
        printf("\n");
    }
    printf("\n");
    for (int i = 0; i < n; i++){
        for (int j = 0; j < k; j++)
            printf("%d", a[i][j]);
        printf("\n");
    }
}
int main(){
    int n, k, f, res = 0;
    scanf("%d%d", &n, &k);
    fill(sum[0], sum[0] + k, 0);
    fill(a[0], a[0] + k, 0);
    hashtable[0].push_back(0);
    key_rec.insert(0);
    for (int i = 1; i <= n; i++){
        int key = 0;
        scanf("%d", &f);
        for (int j = 0; j < k; j++){
            sum[i][j] = sum[i - 1][j] + f % 2;
            // 计算a数组
            a[i][j] = sum[i][j] - sum[i][0]; 
            f /= 2;
            key = (key + a[i][j]) % prime;
        }
        // key可能存在负值
        key = abs(key); 
        hashtable[key].push_back(i);
        key_rec.insert(key);
    }
//  output(n, k);
    int MAX = 0;
    // 枚举所有hash的key
    set<int>::iterator iter;
    for (iter = key_rec.begin(); iter != key_rec.end(); iter++){
        // 判断一个key中是否存在两个相等的数组
        int key = *iter;
        if (hashtable[key].size() < 2) continue;
        for (int i = 0; i < hashtable[key].size() - 1; i++){
             for (int j = i + 1; j < hashtable[key].size(); j++){
                int index1 = hashtable[key][i];
                int index2 = hashtable[key][j];
//              printf("index1:%d, index2:%d\n", index1, index2);
                if (judge(a[index1], a[index2], k)){
                    if (abs(index1 - index2) > MAX)
                        MAX = abs(index1 - index2);
                 }
             } 
        }   
    }
    printf("%d\n", MAX);
    return 0;
}

Chapter7——基础算法——哈希、二分
1. 题目列表 POJ3349（数组hash） POJ3274（问题转换+数组hash、树状数组） POJ2151...
消息传递-缓存-转发流程
消息传递缓存查找哈希查找三种查找方式缓存 -> 哈希算法查找当前类 -> 已排序二分查找算法未排序 ...
算法
一.算法基础--算法的特性二.算法基础--算法的复杂度三.顺序查找和二分查找顺序查找二分查找(前提是有序的...
2021-04-28
现在学的东西没有输出，打算把以前收藏的一堆杂的知识来吸收下必学基础 10个算法：递归、排序、二分查找、搜索、哈希...
『算法』『数据结构』浅谈二分算法，理解程序员必懂必会的计算机常
基本认识二分算法，又名二分查找、折半查找，是一种查找算法，是最基础的，最简单易学且高效实用的算法之一。二分算法的...
剑指Offer.C++.code6-10
（1）排序和查找是面试考察算法的重点，如二分查找、归并排序、快速排序等；（2）查找：顺序查找、二分查找、哈希表查找...
高频题
字符串数学数组二叉树二分查找链表动态规划贪心算法堆广度优先搜索深度优先哈希表回溯算法设计
29.算法入门
算法与数据结构基础一、基础算法思想二分：递推：枚举：递归：分治：贪心：试探：模拟：二、简单数据结...
数据结构与算法-排序/二分查找
算法中基础中的基础，排序/二分查找排序 1.快排QuickSort 归并排序堆排序 1. 二分查找
数据结构与算法之美笔记——二分查找（下）
摘要: 基础的二分查找算法无论是概念还是实现都比较简单（关于二分查找基础实现文章可点击此处查看），但二分查找存...