Homework 4
Due Jan 13
The problems in this homework refer to the following fkd
matrix:
D1: 2 0 0 0 0 4 0 0 2 1 1 0 0 0 0 0
D2: 0 0 0 0 1 1 0 0 7 0 2 0 0 0 0 6
D3: 0 0 1 0 0 1 0 0 3 0 3 0 2 0 3 0
D4: 0 3 1 0 0 0 0 5 0 0 0 3 0 0 0 0
D5: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 3 0
D6: 0 0 0 0 1 0 0 0 3 0 0 2 0 0 0 0
D7: 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3
D8: 0 0 1 0 0 0 0 3 0 0 0 2 0 1 0 0
D9: 0 1 2 0 0 0 2 0 3 0 0 0 2 1 0 1
D10: 0 0 0 0 1 1 0 0 0 1 0 5 0 0 0 0
For purposes of calculation, you are advised to load these values into
a spreadsheet program. You may prefer writing your own code in C++ or
java to do the problems below.
1. Using the cosine similarity measure, compute the similarity matrix
(a 10 by 10 matrix). Find the average similarity of each document and
the overall average similarity.
2. Rank the documents for the queries below, using the Dice Coefficient
Q1: 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
Q2: 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
3. Repeat #2 using the Lp distance, for p=1, 2, and infinity.
4. Repeat #2 using the cosine measure.