DSaA W14 HuffmanCode Knapsack


Huffman code &
knapsack problem
Algorithms
and Data Structures
1
DSaA 2012/2013
Problem
Problems:
" Huffman codes
" knapsack problem:

0-1 knapsack problem

fractional knapsack problem
2
DSaA 2012/2013
Huffman codes
Suppose we have a 100,000-character data file that we wish to store compactly.
We observe that the characters in the file occur with the frequencies given by
below table. That is, only six different characters appear, and the character  a
occurs 45,000 times.
a b c d e f
Frequency (in thousands)
45 13 12 16 9 5
Fixed-length codeword
000 001 010 011 100 101
Variable-length codeword
0 101 100 111 1101 1100
Fixed-length code:
3*100,000 = 300,000
Variable-length codeword:
(45 · 1 + 13 · 3 + 12 · 3 + 16 · 3 + 9 · 4 + 5 · 4) · 1,000 = 224,000 bits
3
DSaA 2012/2013
Prefix codes
" We consider here only codes in which no codeword is also a prefix
of some other codeword. Such codes are called prefix codes.
" Encoding is always simple for any binary character code; we just
concatenate the codewords representing each character of the file.
For example, with the variable-length prefix code of the table, we
code the 3-character file  abc as 0·101·100 = 0101100, where we
use  · to denote concatenation.
" Prefix codes are desirable because they simplify decoding. Since no
codeword is a prefix of any other, the codeword that begins an
encoded file is unambiguous. We can simply identify the initial
codeword, translate it back to the original character, and repeat the
decoding process on the remainder of the encoded file. In our
example, the string 001011101 parses uniquely as 0 · 0 · 101 ·
1101, which decodes to  aabe .
a b c d e f
prefix codes
0 101 100 111 1101 1100
4
DSaA 2012/2013
Trees
" We interpret the binary codeword for a character as the path from the root to
that character, where 0 means "go to the left child" and 1 means "go to the
right child
100
100
0 1
1
0
86 14 a:45 55
0 1 0 0 1
58 28 25 30
14
0 1 0 1 0 1 0 1 0 1
a:45 b:13 c:12 d:16 e:9 f:5 14
c:12 b:13 d:16
0 1
f:5 e:9
" Given a tree T corresponding to a prefix code, it is a simple
matter to compute the number of bits required to encode a
file. For each character c in the alphabet C, let f(c) denote
the frequency of c in the file and let dT(c) denote the depth of
c's leaf in the tree. Note that dT(c) is also the length of the
B(T ) = f (c)dT (c)
codeword for character c. The number of bits required to
"
encode a file is thus B(T) which we define as the cost of the
c" C
tree T
5
DSaA 2012/2013
Huffman codes - Code
" We assume that C is a set of n characters and that each character
c"C is an object with a defined frequency f [c]. The algorithm builds
the tree T corresponding to the optimal code in a bottom-up manner.
It begins with a set of |C| leaves and performs a sequence of |C| - 1
"merging" operations to create the final tree. A min-priority queue Q,
keyed on f , is used to identify the two least-frequent objects to
merge together. The result of the merger of two objects is a new
object whose frequency is the sum of the frequencies of the two
objects that were merged.
Huffman(C)
{ 1} n := |C|
{ 2} Q := C
{ 3} for i:=1 to n-1 do
{ 4} z := Allocate_Node()
{ 5} x := left[z] := Extract_Min(Q)
{ 6} y := right[z] := Extract_Min(Q)
{ 7} f[z] := f[x] + f[y]
{ 8} Insert(Q,z)
{ 9} return Extract_Min(Q)
6
DSaA 2012/2013
Example
f:5 e:9 c:12 b:13 d:16 a:45
a:45 55
0 1
c:12 b:13 14 d:16 a:45
25 30
0 1 0 1 0 1
f:5 e:9 14
c:12 b:13 d:16
0 1
14 d:16 25 a:45
f:5 e:9
0 1
100
0 1
1
f:5 e:9 c:12 b:13
0
a:45 55
0 1
25 30 a:45
25 30
0 1 0 1
0 1 0 1
14
c:12 b:13 d:16
14
c:12 b:13 d:16
0 1
0 1
f:5 e:9
f:5 e:9
7
DSaA 2012/2013
Huffman codes - complexity
" Q is implemented as a binary minheap. For a set C of n
characters, the initialization of Q in line 2 can be
performed in O(n) time using the BUILD-MIN-HEAP
procedure. The for loop in lines 3-8 is executed exactly
n-1 times, and since each heap operation requires time
O (lg n), the loop contributes O (n lg n) to the running
time. Thus, the total running time of HUFFMAN on a set
of n characters is O (n lg n).
letters and (interesting) frequency:
a:1 b:1 c:2 d:3 e:5 f:8 g:13 h:21
a:1 b:1 c:2 d:4 e:8 f:16 g:32 h:64
8
DSaA 2012/2013
Knapsack problems
" Knapsack problems:

The 0-1 knapsack problem is posed as follows. A
thief robbing a store finds n items; the ith item is worth
vi dollars and weighs wi pounds, where vi and wi are
integers. He wants to take as valuable a load as
possible, but he can carry at most W pounds in his
knapsack for some integer W. Which items should he
take? (This is called the 0-1 knapsack problem
because each item must either be taken or left
behind; the thief cannot take a fractional amount of an
item or take an item more than once.)

In the fractional knapsack problem, the setup is the
same, but the thief can take fractions of items, rather
than having to make a binary (0-1) choice for each
item.
9
DSaA 2012/2013
Solution
fractional knapsack problem  solution: (greedy
algorithm):
 compute the value per pound vi/wi for each item

take as much as possible of the item with the greatest
value per pound. If the supply of that item is
exhausted and you can still carry more, take as much
as possible of the item with the next greatest value
per pound, and so forth until you can't carry any more
" because of sorting the items by value per pound,
the greedy algorithm runs in O(n lg n) time.
for 0-1 knapsack problem this greedy strategy
does not work!
10
DSaA 2012/2013
Example
20
---
$80
30
30
$120
+
50 50 50 50 50
30
$120
$100 $100
20 20
30
+
$100
20 20
$60 $60 $60
10 10 10 10
$60 $100 $120 knapsack
value =$240 =$160 =$180 =$220
value
$6 $5 $4
per
non-optimal solution for
pound
the 0/1 knapsack problem
optimal solution for optimal solution for
the fractional the 0/1
knapsack problem knapsack problem
11
DSaA 2012/2013
0/1 knapsack problem - solution
" 0/1 knapsack problem  solution: (dynamic programming) Knowing
the best solutions for items from a set {v1,v2,& ,vi} for a knapsack
with capacity from 1 to W, we can find a formula to compute the best
solutions for items from a set {v1,v2,& ,vi,vi+1}
n
witi d" W
"
i= 1
ti " {0,1}
n
viti max
"
i= 1
0 x e" 0,i = 0
Å„Å‚
ôÅ‚
F[i, x] = - " x < 0,i = 0
òÅ‚
Complexity:
ôÅ‚
max{F[i - 1, x],(F[i - 1, x - wi ] + vi )} 1 d" i d" n
ół
O(nW)
F[i, x]
i
i - 1
x - wi x W
W - 1
1
12
DSaA 2012/2013
Example
v1 v2 v3 v4 v5 v6 v7
W=20
weight 3 10 8 6 2 9
value 5 12 10 7 1 11
7
6
5
4
3
0 0 5 5 5 5 5 5 5 12 12
2
0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
13
DSaA 2012/2013
Example
v1 v2 v3 v4 v5 v6 v7
W=20
weight 3 10 8 6 2 9
value 5 12 10 7 1 11
7
0 1 5 5 6 7 7 10 12 12 15 16 17 17 18 19 22 23 24 26
6
0 1 5 5 6 7 7 10 12 12 15 15 17 17 18 19 22 22 24 24
5
0 0 5 5 5 7 7 10 12 12 15 15 17 17 17 19 22 22 24 24
4
0 0 5 5 5 5 5 10 10 12 15 15 17 17 17 17 17 22 22 22
3
0 0 5 5 5 5 5 5 5 12 12 12 17 17 17 17 17 17 17 17
2
0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
14
DSaA 2012/2013
Example
v1 v2 v3 v4 v5 v6 v7
W=20
weight 3 10 8 6 2 9
value 5 12 10 7 1 11
7
0 1 5 5 6 7 7 10 12 12 15 16 17 17 18 19 22 23 24 26
6
0 1 5 5 6 7 7 10 12 12 15 15 17 17 18 19 22 22 24 24
5
0 0 5 5 5 7 7 10 12 12 15 15 17 17 17 19 22 22 24 24
4
0 0 5 5 5 5 5 10 10 12 15 15 17 17 17 17 17 22 22 22
3
0 0 5 5 5 5 5 5 5 12 12 12 17 17 17 17 17 17 17 17
2
0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
15
DSaA 2012/2013
Example
v1 v2 v3 v4 v5 v6 v7
W=14
weight 3 10 8 6 2 9
value 5 12 10 7 1 11
7
0 1 5 5 6 7 7 10 12 12 15 16 17 17
6
0 1 5 5 6 7 7 10 12 12 15 15 17 17
5
0 0 5 5 5 7 7 10 12 12 15 15 17 17
4
0 0 5 5 5 5 5 10 10 12 15 15 17 17
3
0 0 5 5 5 5 5 5 5 12 12 12 17 17
2
0 0 5 5 5 5 5 5 5 5 5 5 5 5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
16
DSaA 2012/2013


Wyszukiwarka

Podobne podstrony:
W14 Kodowanie i Kryptografia kody cykliczne?le 6g
Antropologia kulturowa W14
w14 PSYCH
modrzynski w14 aktualne trendy w lesnictwie 2r
huffman
DSaA W10b DSforDisjointSet
W14 Rynek Finansowy wydrukowane
DSaA W08band09 EffectiveDict
w14 (3)
w14
w14 3
w14

więcej podobnych podstron