Homework 4

Due Jan 13


The problems in this homework refer to the following fkd matrix:
 
 D1:  2   0   0   0   0   4   0   0   2   1   1   0   0   0   0   0
 D2:  0   0   0   0   1   1   0   0   7   0   2   0   0   0   0   6
 D3:  0   0   1   0   0   1   0   0   3   0   3   0   2   0   3   0
 D4:  0   3   1   0   0   0   0   5   0   0   0   3   0   0   0   0
 D5:  0   0   0   1   0   0   0   0   0   0   0   0   0   0   3   0
 D6:  0   0   0   0   1   0   0   0   3   0   0   2   0   0   0   0
 D7:  1   0   0   0   0   0   1   0   0   0   0   0   0   0   0   3
 D8:  0   0   1   0   0   0   0   3   0   0   0   2   0   1   0   0
 D9:  0   1   2   0   0   0   2   0   3   0   0   0   2   1   0   1
D10:  0   0   0   0   1   1   0   0   0   1   0   5   0   0   0   0
For purposes of calculation, you are advised to load these values into a spreadsheet program. You may prefer writing your own code in C++ or java to do the problems below.


1. Using the cosine similarity measure, compute the similarity matrix (a 10 by 10 matrix). Find the average similarity of each document and the overall average similarity.

2. Rank the documents for the queries below, using the Dice Coefficient
 Q1:  0   0   0   0   1   0   0   0   1   1   0   0   0   0   0   0
 Q2:  0   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0

3. Repeat #2 using the Lp distance, for p=1, 2, and infinity.

4. Repeat #2 using the cosine measure.