LeetCode | 4. Median of Two Sort

作者: 听这一刻的晚风 | 来源:发表于2019-03-30 23:06 被阅读3次

「每日一道算法题」Median of Two Sorted Ar
[LeetCode] 4. Median of Two Sort
LeetCode - 4. Median of Two Sort
[LeetCode] 4. Median of Two Sort
LeetCode --4. Median of Two Sort
[LeetCode] 4. Median of Two Sort
LeetCode | 4. Median of Two Sort
[leetcode题解]Median of Two Sorted
leetcode-004 Median of Two Sorte
leetcode第4题求两个数组的中位数

题目链接：https://leetcode.com/problems/median-of-two-sorted-arrays/
题目难度：Hard
题目描述：

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

You may assume nums1 and nums2 cannot be both empty.

Example 1:

nums1 = [1, 3]
nums2 = [2]

The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

相关主题： Array, Binary Search, Divide and Conquer

思路 1

假设数列的元素数目为 $n$ ，若 $n$ 是奇数，则中位数是第 $\frac{n}{2} + 1$ 个数；若 $n$ 是偶数，则中位数是第 $\frac{n}{2}$ 个数和第 $\frac{n}{2}+1$ 个数的平均数。一种思路是直接从开头搜索两个数组，并记录搜索的元素数目。但这样做的时间复杂度为 $O(m+n)$ ，不满足题目 $O(\text{log}(m+n))$ 的要求。

虽然思路 1 时间复杂度为 $O(m+n)$ ，但是如果实现得当，实际上运行速度也很快（可能是因为内存开销少），比如下面网站上其他人提交的这份答案。这份代码用 aLeft 和 bLeft 两个指针分别对两个数组进行遍历，用minIndex 变量来统计已经遍历到的元素总数目，当 minIndex 到达中位数的位置时，停止遍历。
时间复杂度： $O(m+n)$
空间复杂度： $O(1)$

// C++, copied from someone's submission on LeetCode
class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        int minIndex = 0;    // to count the elements less than or equal to median

        int aLeft = 0, bLeft = 0;
        int aRight = nums1.size() - 1, bRight = nums2.size() - 1;

        int mLeft, mRight;
        double middle = 0;
        int value = 0;
        unsigned long size = nums1.size() + nums2.size();
        if (size % 2 == 0) {
            mLeft = size / 2 - 1;
            mRight = mLeft + 1;
        } else {
            mLeft = mRight = size / 2;
        }

        for (int i = 0; i < size; ++i) {
            if ((bLeft > bRight) ||
                (aLeft <= aRight && nums1[aLeft] <= nums2[bLeft])) {
                value = nums1[aLeft];
                aLeft++;
            } else {
                value = nums2[bLeft];

                if (bLeft <= bRight) {
                    bLeft++;
                }
            }

            if (minIndex == mLeft) {
                middle = value;
            }

            if (minIndex == mRight) {
                middle = (middle + value) / 2.0;
                break;
            }

            minIndex++;
        }
        return middle;
    }
};

思路 2

自己想了一个 $O(\text{log}(m+n))$ 的思路，类似于“寻找第 K 大的数”，只不过寻找的范围是两个排序好的数组。假设两个数组为 A 和 B，元素数目分别为 m 和 n，就可以每次取 A[m/2] 和 B[n/2]进行比较，然后每次递归缩小搜索的范围。这样做好像有些复杂，没有考虑清楚该怎么写这个递归。

参考了一下官方的解决方案，没看懂。之后我找到了 Frank Dai 的《LeetCode 题解》（题目2.1.5），发现他的思路跟我的有些像。

经过对比，我发现他们的思路和我的思路最大的区别就在于取的索引的不同。在每次迭代时，我的思路是取 A、B 的两个中间位置进行判断，即 i = A.size()/2、j = B.size()/2，这样在考虑第 K 大的数究竟落到哪个范围时就更麻烦。而且我在考虑的时候，还尝试一次把第 K 大和第 K+1 大两个元素都找出来，又把问题变复杂了一点。而参考的两个思路在考虑时，i 和 j 都满足由 K 来约束的关系（例如 i + j = K），这样在确定第 K 大的数落在哪个范围时会更简单一些。结合他们的方法，下面重新整理一下思路。

给定两个有序数组 A、B，元素个数分别为 m 和 n，寻找它们的并集中第 k 大的数。我们在 A 中找到前 i = k/2 大的数，在 B 中找到前 j = k - i 大的数，这样就可以确保索引位置是跟 k 联系在一起的。始终保证两个数组中 B 的元素较多，考虑到 A 中元素的数目有可能小于 k/2，所以取 i = min(m, k/2)。起始的搜索范围是 A[0...end] 和 B[0...end]。一般会有下面有三种情况：

A[i-1] < B[j-1]：说明第 k 大的数一定不在 A[0...i-1] 这个范围内，将搜索范围缩小至 A[i...end] 和 B[0...end]（注：由于 k 的约束，这时实际的搜索范围是 A[i...end] 和 B[0...j-1]，没有必要再对 B 的搜索范围额外约束，下面的情况也是同理），寻找第 k-i 大的数；
A[i-1] > B[j-1]：说明第 k 大的数一定不在 B[0...j-1] 这个范围内，将搜索范围缩小至 A[0...end] 和 B[j...end]，寻找第 k-j 大的数；
A[i-1] == B[j-1]：说明已经找到了第 k 大的数，返回 A[i-1] 或者 B[j-1]。

考虑递归函数的退出条件：

如果 m == 0，说明数组 A 的搜索范围是 0，则返回 B[k-1]；
如果 k == 1，这时 i = k/2 的值为0，无法索引 A[i-1]，返回 min(A[0], B[0])；
如果 A[i-1] == B[j-1]，返回 A[i-1] 或者 B[j-1]。

下面是一个具体的例子：

初始情况：A = {1, 3, 5, 7}，B = {2, 4, 6, 8, 10}，m = 4, n = 5, 因为总元素数目为奇数，设置初始 k = (m+n)/2 = 5。取 i = k/2 = 2，j = k - i = 3，这会将数组 A、B 各分为红、蓝两个部分：
$A: \{\color{red}{1, 3 }, \color{blue}{5, 7 } \}$
$B: \{\color{red}{2, 4, 6 }, \color{blue}{8, 10 } \}$
此时，A[i-1] 的值为 3，B[j-1] 的值为 6，A[i-1] < B[j-1]，此时第 k 大的数一定不在 A 的红色部分中，将这部分从搜索范围内剔除掉，并更新 k = k - i = 3：
$A: \{\xcancel{ \color{red}{1, 3 } }, \color{blue}{5, 7 } \}$
$B: \{\color{red}{2, 4, 6}, \color{blue}{8, 10 } \}$
更新后 A = {5, 7}，B = {2, 4, 6, 8, 10}，m = 2, n = 5, k = 3。取 i = k/2 = 1，j = k - i = 2，再次将数组 A、B 各分为红、蓝两个部分：
$A: \{\color{red}{5 }, \color{blue}{7 } \}$
$B: \{\color{red}{2, 4 }, \color{blue}{6, 8, 10 } \}$
此时，A[i-1] 的值为 5，B[j-1] 的值为 4，A[i-1] > B[j-1]，此时第 k 大的数一定不在 B 的红色部分中，将这部分从搜索范围内剔除掉，并更新 k = k - j = 1：
$A: \{\color{red}{5}, \color{blue}{7 } \}$
$B: \{\xcancel{\color{red}{2, 4 } }, \color{blue}{6, 8, 10 } \}$
更新后 A = {5, 7}，B = {6, 8, 10}，m = 1，n = 3，k = 1。由于 k == 1，返回 min(A[0], B[0]) = min(5, 6) = 5，最终得到中位数为 5。

时间复杂度： $O(\text{log}(m+n))$
空间复杂度： $O(\text{log}(m+n))$

// C++
int find_the_kth_largest_num(vector<int>::const_iterator A, int m,
                             vector<int>::const_iterator B, int n, int k)
{
    if (m > n) return find_the_kth_largest_num(B, n, A, m, k);
    if (m == 0) return *(B + k - 1);
    if (k == 1) return min(*A, *B);
    int i = min(m, k/2), j = k - i;
    if (*(A + i - 1) < *(B + j - 1)) {
        return find_the_kth_largest_num(A + i, m - i, B, n, k - i);
    } else if (*(A + i - 1) > *(B + j - 1)) {
        return find_the_kth_largest_num(A, m, B + j, n - j, k - j);
    } else {
        return *(A + i - 1);
    }
}

class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        int m = nums1.size(), n = nums2.size();
        if ((m + n) % 2 != 0) {
            return find_the_kth_largest_num(nums1.begin(), m, nums2.begin(), n,
                                            (m + n) / 2 + 1);
        } else {
            double a = find_the_kth_largest_num(nums1.begin(), m, nums2.begin(), n,
                                         (m + n) / 2);
            double b = find_the_kth_largest_num(nums1.begin(), m, nums2.begin(), n,
                                         (m + n) / 2 + 1);
            return (a + b) / 2.;
        }
    }
};

因为上面的思路在数组总数目为偶数时需要查找两次，所以我又尝试了在偶数情况下的优化，希望能减少一些搜索的开销。具体来说，当数组总数目为奇数时，跟原来一样，寻找第 $\frac{n}{2} + 1$ 大的数；当数组总数目为偶数时，寻找第 $\frac{n}{2}$ 大的数，并且不退出递归，在即将返回时继续寻找第 $\frac{n}{2} + 1$ 大的数，最后返回二者的平均值。然而实际提交时，时间消耗跟原来几乎没有差别。
时间复杂度： $O(\text{log}(m+n))$
空间复杂度： $O(\text{log}(m+n))$

// C++
double find_the_kth_largest_num_optimize(vector<int>::const_iterator A, int m,
                                         vector<int>::const_iterator B, int n,
                                         int k, bool find_next)
{
    if (m > n)
        return find_the_kth_largest_num_optimize(B, n, A, m, k, find_next);
    if (m == 0) {
        if (!find_next) {
            return *(B + k - 1);
        } else {
            double a = *(B + k - 1);
            double b = *(B + k);
            return (a + b) / 2.;
        }
    }
    if (k == 1) {
        if (!find_next) {
            return min(*A, *B);
        } else {
            double a, b;
            if (*A < *B) {
                a = *A;
                if (m > 1) {
                    b = min(*(A+1), *B);
                } else {
                    b = *B;
                }
            } else if (*A == *B) {
                return *A;
            } else {
                a = *B;
                if (n > 1) {
                    b = min(*A, *(B + 1));
                } else {
                    b = *A;
                }
            }
            return (a + b) / 2.;
        }
    }
    int i = min(m, k / 2), j = k - i;
    if (*(A + i - 1) < *(B + j - 1)) {
        return find_the_kth_largest_num_optimize(A + i, m - i, B, n, k - i,
                                                 find_next);
    } else if (*(A + i - 1) > *(B + j - 1)) {
        return find_the_kth_largest_num_optimize(A, m, B + j, n - j, k - j,
                                                 find_next);
    } else {
        if (!find_next) {
            return *(A + i - 1);
        } else {
            double a = *(A + i - 1);
            double b;
            if (i <= m && j <= n) {
                b = min(*(A + i), *(B + j));
            } else if (j <= n) {
                b = *(B + j);
            } else if (i <= m) {
                b = *(A + i);
            } else {
                exit(-1);
            }
            return (a + b) / 2.;
        }
    }
}

class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        int m = nums1.size(), n = nums2.size();
        bool find_next = ((m + n) % 2 == 0);
        if (find_next) {
            return find_the_kth_largest_num_optimize(
                nums1.begin(), m, nums2.begin(), n, (m + n) / 2, find_next);
        } else {
            return find_the_kth_largest_num_optimize(
                nums1.begin(), m, nums2.begin(), n, (m + n) / 2 + 1, find_next);
        }
    }
};

2019年03月31日

「每日一道算法题」Median of Two Sorted Ar
Algorithm OJ address Leetcode website : 4. Median of Two ...
[LeetCode] 4. Median of Two Sort
原题思路一：题目要求O(log (m+n))复杂度，一开始想到的是(m+n)，遂试了下，发现居然也过了。。新建一...
LeetCode - 4. Median of Two Sort
题目描述 There are two sorted arrays nums1 and nums2 of size ...
[LeetCode] 4. Median of Two Sort
There are two sorted arrays nums1 and nums2 of size m and...
LeetCode --4. Median of Two Sort
陆陆续续在LeetCode上刷了一些题，一直没有记录过，准备集中整理记录一下
[LeetCode] 4. Median of Two Sort
问题描述两个有序的数组 nums1 和 nums2 ，它们的数组长度分别为 m 和 n。要求找到这两个数组的中位...
LeetCode | 4. Median of Two Sort
题目链接：https://leetcode.com/problems/median-of-two-sorted-a...
[leetcode题解]Median of Two Sorted
4. Median of Two Sorted Arrays 题目 https://leetcode.com/pr...
leetcode-004 Median of Two Sorte
[TOC] P004 Median of Two Sorted Arrays There are two sort...
leetcode第4题求两个数组的中位数
@(LeetCode)[数组] leetcode 4 Median of Two Sorted Arrays描述：...

LeetCode | 4. Median of Two Sort

思路 1

思路 2

相关文章

「每日一道算法题」Median of Two Sorted Ar

[LeetCode] 4. Median of Two Sort

LeetCode - 4. Median of Two Sort

[LeetCode] 4. Median of Two Sort

LeetCode --4. Median of Two Sort

[LeetCode] 4. Median of Two Sort

LeetCode | 4. Median of Two Sort

[leetcode题解]Median of Two Sorted

leetcode-004 Median of Two Sorte

leetcode第4题求两个数组的中位数

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

程序员

算法提高之LeetCode刷题

算法