Saturday, June 22, 2013

Wikipedia Ontology Extraction- An introduction

After going through various phases of trying to find my-self in all the wrong places, I guess I might have finally stumbled across into a position where I 'might be looking in a possibly-right place'. This time, I have begun to collaborate in a research project of one of our most beloved lecturers in the department, Mr. Nisansa de Silva. I will only publish on this blog which he has already publicly posted in his website as well as minimal information so as to avoid disclosing any sensitive materials opposing his will. 

I am one of the collaborators of this project, along with my friend Tharindu Amila Perera. 

The project idea, in a nutshell, is to extract concept ontological relations based on Wikipedia articles. Now what the hell is that?

Ontology

Ontology is a branch of philosophy, where we study the 'entities' for their reality, existence, being and becoming, as well as categorize them and study the relationships among those entities. Now that sentence was a bunch of big words but a simple example would do to get the gist of it. Imagine a classroom. There are various entities such as teacher, student, blackboard, desks, chairs, pencils, chalk, dusters, lockers etc. Now all of these entities on their own make little sense. But we can categorize them. For example "Living" and "Non-Living" entities. In which case they would give a rough 'concept-map' as is shown by the following lazy, crude, MS PAINT diagram.

Now I must say that this rough diagram does not provide a good premise of a good ontology. But it is one nonetheless. The black connecting lines means that they are generalizations/specifications. That means "Teacher is-a Living Entity" can be inferred from the diagram. The green lines show aggregations or ownership. For instance, "Student has-a pencil".  This is but one of many ontological representations possible given the premise. I'm sure you get the picture.

Ontological relations in Wikipedia

If you take any article in Wikipedia, you will get something out of its title. A trivial example that pops to mind is if you search "Sri Lankan Cuisine" in Wikipedia, the next thing that pops into your head may be "Chilli" or "Pepper" or "Spices" or "Kiribath" (or even the contemporary internet meme, "Papadam" ). And most likely, you will find hyperlinks to some of those things (for examples, 'spices' will definitely be in an article about 'Sri Lankan Cuisine").


So the basic idea of this research project, is to exploit the fact that those articles are connected through hyperlinks in the domain containing their titles as a premise.

For now, I guess I have bored you enough on this. I will keep you guys updated as I go.

Ciao...