Cornell University, Ithaca, NY, USA, 29 September 2008
The DBLP Computer Science Bibliography from the University of Trier now includes more than 1.1 million bibliographic records. For CS researchers the DBLP web site is a popular tool to trace the work of colleagues and to retrieve bibliographic details when composing the lists of references for new papers. Ranking and profiling of persons, institutions, journals, or conferences is another controversial usage of DBLP.
You may download the DBLP data set for your own experiments. The bibliographic records are contained in a huge XML file (dblp.xml[.gz] in http://dblp.uni-trier.de/xml/). We are aware of more than 400 publications which mention the usage of these data for an amazing variety of purposes (e.g. search the ACM Guide or SpringerLink for DBLP papers).
Additionally we now provide a simple direct access to all XML records (try conf/sigmod/GuptaDG08). The DBLP person search may return results in XML (example). The keys of all records of a person may be listed in XML, too (try JG). This very basic DBLP API should make it very easy to develop software which explores the DBLP data dynamically.
The goal of this talk is not to communicate new scientific insights, but to inform you about (yet undocumented) details of the DBLP data and the (emerging) DBLP API. This knowledge might enable you to use the DBLP web site and the DBLP data more efficiently to support your research. For (undergraduate) students there are nice applications for the DBLP data, too: For our students it's seems to be more funny to test their shortest path algorithm on the coauthor graph than on a synthesized test graph - "what is your advisor's distance to E. F. Codd?" ...