With 4.7 billion visits a month, Wikipedia is one of the internet’s most visited properties. 1 It has a global worldwide website rank of 6 and ranks above the likes of Amazon.com, Live.com and Twitter.com. Wikipedia currently hosts over 5,324,724 articles with over 27 billion words in its repository 2. Dbpedia is a project that was initiated on 10th January 2007, by individuals of the Free University of Berlin and Leipzig University with the goal of indexing all of the unstructured data within Wikipedia and converting it into a structured and queryable database. The unstructured database would be converted in key-value pairs. These pairs of data would be stored within Dbpedia and later accessed if and when the requirement to query it arises.
DBpedia knowledge base
Dbpedia currently has structured and queryable information on following:
- 1,445,000 people
- 735,000 places
- 411,000 creative works like music and movie pieces
- 241,000 organizations
- 251,000 species
- 6,000 diseases
This data can be studied, queried and analysed to derive helpful insights and drive innovation. It’s 2014 release consists of over 3 billion pieces of information that was extracted through Wikipedia.
How to use Dbpedia
The datasets can be queried using the SPARQL query language for RDF. RDF stands for Resource Description Framework and is the primary abstraction in Dbpedia. These queries can be fired across various RDFs to get insights and knowledge from the database.
Real world examples
- A SPARKQL query may be used to derive accurate information on the average duration of all the movies made by a particular movie house.
- A SPARKQL query may be used to derive accurate information on the amount of documented illnesses caused by a particular disease in a particular location.
- It may also be used to get information on the number of tourists a particular tourist attraction attracts.
Quepy is a framework made using the Python programming language. It has the ability to convert natural language questions into a querying language understood by databases.
The Bubble Navigator
Bubble Navigator is used to visualize semi structured data and derive insights out of it. It gives a visual overview of the entire dataset.
The Dbpedia lookup tool can be used to derive relatable URL’s by entering a particular keyword. The derived dataset can then be used for analysis and insights.
DBpedia study on Violence
Dutch company LAB1100 queried datasets of Dbpedia to run a study on every war fought in the history of the planet 3 and derived the bloodiest places on the planet. The study concluded that Europe was the most aggressive continent on the planet over a range of 4000 years.
Wikidata is a project undertaken by the Wikipedia foundation that works similar to Dbpedia. It acts as a central storage unit that encompasses the structured data of Wikipedia. The data cab be accessed using Lua Scribunto interface and the Wikidata API.
Difference between Dbpedia and Wikidata
Within Dbpedia, the data is extracted to its repository. Within Wikidata, the data is provided by Wikipedia itself.
The project attracts independent and corporate donations alike to keep its project going and cover the cost of technology and hardware alike. A link to donate can be found here.