I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.
This week, I published a post to the Quora Topic Gnomery blog on the subject of competitive programming topic cleanup. It generated some discussion about how competitive programming topics on Quora should be organized, and prompted me to look into the Quora Topic Ontology.
Competitive programming questions on Quora are not very well-organized. Improving this will require four related activities:
- Organizing topics: Deciding which topics are related to competitive programming, creating any that don’t yet exist, and arranging all of them in Quora’s topic graph.
- Topic cleanup: Merging, deleting, and renaming existing topics. Achieving a clean topic graph will require adjusting the one we have now, not starting from scratch.
- Organizing questions: Selecting the correct set of topics for each question, so that each question appears under the relevant topics, based on the question’s meaning.
- Question cleanup: The goal of all this topic manipulation is to make Quora’s competitive programming content more discoverable. The Quora algorithm is designed to surface the best content, but it relies on signals from humans. Examples of these signals include upvoting questions and answers, downvoting questions and answers, and editing question titles to be more clear.
Competitive programming enthusiasts have had a few years to ask and answer Quora questions about competitive programming. So there’s plenty of raw data available to help construct a list of topics that classify the landscape of what people want to know about the subject.
This classification is known to information scientists as an ontology. Quora offers an ontology for its topics, and employs an ontology architect to keep things under control. But there are a lot of topics, and while one ontology expert can provide an overall structure and philosophy, specialists need to weigh in with opinions about lower-level topics.
The Quora ontology is a set of topics that each have zero or more parent topics, and zero or more child topics. If a topic has neither parent nor child topics, then it is disconnected from the ontology, which isn’t ideal. Part of the topic organization process is to find an appropriate spot in the ontology for these orphan topics, so that each topic has at least one parent.
The root of the Quora ontology is a parentless topic called Major Topics. Ideally, all Quora topics should be reachable by starting at this root node. Questions shouldn’t be directly tagged with Major Topics. It’s just for organizing topics.
Since a topic can have multiple parents, there are multiple ways to get from Major Topics to Competitive Programming. One option is to start with STEM, go through a few general technology topics, then follow the path from Software and Applications to Computer Programs, Computer Programming, Specific Types of Computer Programming, and finally to Competitive Programming.
Today, Competitive Programming has six child topics:
- Online Judges
- Competitive Programming Competitions
- Competitive Programming Problems
- Algorithms in Competitive Programming
- Training for Competitive Programming
- Competitive Programmers
Those six topics are probably not the best way to segment the parent topic. For example, the popular C++ in Competitive Programming topic currently has no parents or children. Which of those six topics does it belong under? It’s not a very good fit for any of them. Perhaps there should be a Programming Languages in Competitive Programming topic, since people often ask questions about which languages they should use in programming competitions, or which language is best for competitive programming.
One of the best resources for creating a good competitive programming ontology is the large set of questions that Quora already has on the topic. If a set of questions don’t logically fit anywhere in the ontology, that’s evidence that we need a new child topic under the main topic. Those top-level topics can then be split further to organize questions into smaller categories. After a few more iterations of topic research to make sure I understand the current state of the competitive programming ontology, I’ll be getting back down to the question level, and I expect that to feed back into topic organization.