

1)

Some years are unknwon, e.g., "(????)" entries.
Some titles have multiple genres, e.g., "106 & Park Top 10 Live": Music and News.
Some titles have multiple entries, e.g., tv series.
Some actors have multiple entries, e.g., "Macleod, Ben".

2) 

dataset 1 - imdb1.*:

Movies in the range [1990, 2010] with at least 100000 votes:
421 movies
2,938,702 edges
19014 actors (all actors have label)
edge probabilities: minProbab=0.047619047619 maxProbab=0.380952380952

3) useful commands:

number of actors:
$ cat imdb.dat | grep '",' |wc -l

number of relations:
$ cat imdb.dat | grep -v '"' | grep , | wc -l

4) to generate the dataset:

$ ./lists2dat.py


5) IMDB dataset files retrieved on August,13 2012.

scripts to retrieve all dataset files in directory "scripts".
to retrieve all files, just run command:

$ ./getfiles.sh

