greedy algorithm贪心法

greedy algorithm贪心法

概念

A greedy algorithm constructs a solution to the problem by always making a choice that looks the best at the moment. A greedy algorithm never takes back its choices, but directly constructs the final solution. For this reason, greedy algorithms are usually very efficient.

The difficulty in designing greedy algorithms is to find a greedy strategy that always produces an optimal solution to the problem. The locally optimal choices in a greedy algorithm should also be globally optimal. It is often difficult to argue that a greedy algorithm works.

每一步行动总是按某种指标选取最优的操作进行，该指标只看眼前，并不考虑可能造成的影响。可想而知，并不是所有的时候贪心法都能获得最优解，所以一般使用贪心法的时候，都要确保自己能证明其正确性。

Coin problem硬币问题

As a first example, we consider a problem where we are given a set of coins and our task is to form a sum of money n using the coins. The values of the coins are coins={c_1,_c_2,…,_ck}, and each coin can be used as many times we want. What is the minimum number of coins needed?

硬币的面值有{1, 2, 5, 10, 20, 50, 100, 200}，如果要取得520元，我们至少需要4枚硬币。200+200+100+20 = 520

然而，贪心并不一直都是最佳方案。比如硬币的面值有{1, 3, 4}，如果需要取得6元，使用贪心的思路，每一次取最大的，取的方案是{4, 1, 1}，三枚硬币。然而，我们知道最佳方案是{3, 3}，只需要两枚硬币。（这时，我们就会需要动态规划）

Scheduling时间安排问题

Given n events with their starting and ending times, find a schedule that includes as many events as possible.

event   starting time    ending time
  A            1               3
  B            2               5
  C            3               9
  D            6               8

算法1
The first idea is to select as short events as possible. In the example case this algorithm selects the following events:

However, selecting short events is not always a correct strategy. For example, the algorithm fails in the following case:

If we select the short event, we can only select one event. However, it would be possible to select both long events.

算法2
Another idea is to always select the next possible event that begins as early as possible. This algorithm selects the following events:

However, we can find a counter example also for this algorithm. For example, in the following case, the algorithm only selects one event:

If we select the first event, it is not possible to select any other events. However, it would be possible to select the other two events.

算法3
The third idea is to always select the next possible event that ends as early as possible. This algorithm selects the following events:

[证明]
It turns out that this algorithm always produces an optimal solution. The reason for this is that it is always an optimal choice to first select an event that ends as early as possible. After this, it is an optimal choice to select the next event using the same strategy, etc., until we cannot select any more events.

One way to argue that the algorithm works is to consider what happens if we first select an event that ends later than the event that ends as early as possible. Now, we will have at most an equal number of choices how we can select the next event. Hence, selecting an event that ends later can never yield a better solution, and the greedy algorithm is correct.

Tasks and deadlines任务截止时间问题

Let us now consider a problem where we are given n tasks with durations and deadlines and our task is to choose an order to perform the tasks. For each task, we earn d − x points where d is the task’s deadline and x is the moment when we finish the task. What is the largest possible total score we can obtain?

task     duration     deadline
 A          4             2
 B          3             5
 C          2             7
 D          4             5

In this case, an optimal schedule for the tasks is as follows:

In this solution, C yields 5 points, B yields 0 points, A yields −7 points and D yields −8 points, so the total score is −10.

Surprisingly, the optimal solution to the problem does not depend on the deadlines at all, but a correct greedy strategy is to simply perform the tasks sorted by their durations in increasing order. The reason for this is that if we ever perform two tasks one after another such that the first task takes longer than the second task, we can obtain a better solution if we swap the tasks. For example, consider the following schedule:

Here a > b, so we should swap the tasks:

[证明]
Now X gives b points less and Y gives a points more, so the total score increases by a − b > 0. In an optimal solution, for any two consecutive tasks, it must hold that the shorter task comes before the longer task. Thus, the tasks must be performed sorted by their durations.（假设，右边边界是得分deadline，交换完之后，x得分少得了b分，y得到多了a分，a-b>0，总共会得更多的分数。所以，正确）

Minimizing sums求最小的和

We next consider a problem where we are given n numbers a_1,_a_2,…,_an and our task is to find a value x that minimizes the sum

我们讨论一下，当 c = 1的情况和当 c = 2 的情况

当 c = 1的情况

|a1 - x| + |a2 - x| + ... + |an - x|

For example, if the numbers are [1, 2, 9, 2, 6], the best solution is to select x = 2 which produces the sum

|1 - 2| + |2 - 2| + |9 - 2| + |2 - 2| + |6 - 2| = 1 + 0 + 7 + 0 + 4 = 12

In the general case, the best choice for x is the median of the numbers, i.e., the middle number after sorting. For example, the list [1, 2, 9, 2, 6] becomes [1, 2, 2, 6, 9] after sorting, so the median is 2.

The median is an optimal choice, because if x is smaller than the median, the sum becomes smaller by increasing x, and if x is larger then the median, the sum becomes smaller by decreasing x. Hence, the optimal solution is that x is the median. If n is even and there are two medians, both medians and all values between them are optimal choices.

当 c = 2的情况

(a1 - x)^2 + (a2 - x)^2 + ... + (an - x)^2

For example, if the numbers are [1, 2, 9, 2, 6], the best solution is to select x = 4.

(1 - 4)^2 + (2 - 4)^2 + (9 - 4)^2 + (2 - 4)^2 + (6 - 4)^2 = 46

In the general case, the best choice for x is the average of the numbers. In the example the average is (1+2+9+2+6)/5 = 4.

The last part does not depend on x, so we can ignore it. The remaining parts form a function greedy algorithm贪心法 - 图11 , where greedy algorithm贪心法 - 图12 . This is a parabola opening upwards with roots x = 0 and x = 2s/n, and the minimum value is the average of the roots x = s/n, i.e., the average of the numbers a_1,_a_2,…,_an.