# CPFAQ: Classifying Quora Topics

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

If you study the current Quora topic ontology for competitive programming, it’s clear that it needs some work. The six top-level topics don’t cover all of the subjects that people ask about. And many related topics aren’t even in the ontology, usually because people created them without selecting parent topics.

Here are two steps to create a better ontology: 1) Evaluate the related topics that already exist, and add each one to the appropriate place in the ontology, and 2) Write a clear description of what questions belong in each topic.

## Evaluating the Current Ontology

As I have been doing for other page elements, I looked into how to extract the from the page HTML. One challenge is that the topic ontology /organize page is only available to logged-in users, so it’s more complex to extract the page contents programmatically. For now, I just did it manually, since there are fewer than 150 relevant topics.

Once I had the HTML, I looked for identifiers I could use in XPath to extract the ontology for each topic. The result:

• Parents of the topic are listed in a div with class name TopicParents. Each parent has an anchor tag with class name TopicNameLink.
• Children of the topic are listed in a div with class name TopicChildren. The children anchor tags have the same TopicNameLink class name.

Using this information, I collected the number of parents and children for each topic, along with names and links for all parents and children. Out of about 140 topics, I found the following results:

• ~15% topics have two or more parents
• ~30% topics have one parent
• ~54% topics have no parent (meaning they are disconnected from the ontology)
• ~9% topics have two or more children
• ~4% topics have one child
• ~86% topics have no children

It’s expected that the ontology has many leaf nodes, so topics with no children are not a problem. But all topics should have a parent so that they’re part of the ontology.

To assign a parent to a topic, it’s necessary to know what the topic means. For example, what is CodeAgon? Although Quora topic pages provide an About field to describe the topic, it’s often empty. In those cases, research is required to properly classify the topic. And even when the About section is used, it’s usually just a few sentences that may not have any advice about which questions should be tagged with the topic.

(By the way, CodeAgon was a HackerRank contest held in September 2017. I added a link to the topic About section).

## The Competitive Programming Wiki

In planning my project for 2018, I considered the option to publish competitive programming information in a wiki format. I thought I would get to that late in the year, if at all. But as I’ve been looking into the Quora topic ontology, I have realized that I need somewhere to keep track of what each topic is for, and what questions belong in it. And although I could keep that information in a local document, why not make it public?

I use Wikipedia all the time, so I looked into the options for hosting a MediaWiki instance. It turned out to be very easy to install and host side by side with this blog.

My plan is to use CPWIKI to store reference information to help organize and categorize links, including links to Quora questions and topics. A CPWIKI article could be:

• A summary of a competitive programming article that is already covered on Wikipedia (e.g., ACM-ICPC).
• A short article for a topic that isn’t notable enough to have its own Wikipedia article (e.g., CodeAgon, mentioned above).
• A description of a Quora topic, with more information than will fit in the topic About section. This could include notes about which questions should be tagged with the topic, and links to top questions in the topic.