A Project for 2018

Happy 2018, everyone.

In my end-of-year post for 2017, I wrote a summary of the programming project that I worked on last year, along with lessons learned from the experience. For 2018, I have a new project in mind.

A Q&A Problem to be Solved

Competitive programming Q&A resources on the Web are not as good as they could be. When I launched this blog three years ago and started writing about competitive programming, it quickly became clear that Quora was the place to go for answers from CP experts. But it was also clear that the site had some problems with quality. In particular, duplicate questions were left un-merged, and low-quality questions were left un-deleted.

In contrast, Stack Overflow is ruthless when it comes to question quality. For competitive programming questions, I would even say they are unreasonably strict. But they do keep a high quality bar.

In December 2016, I supported a proposal for a new Stack Exchange Q&A site on the topic of Algorithmic Competitive Programming. A year later, in accordance with Stack Exchange rules, it was closed due to lack of sufficient participation. It’s hard to know for sure why things worked out that way. There was some negative feedback about the proposal, but I don’t think it came from people in the traditional CP community. For example, one user from the Stack Exchange Programming Puzzles and Code Golf site believed that a new CP site would represent an overlapping topic, and would therefore be redundant to the PP&CG site. (And yet most CP questions still appear on Stack Overflow, not PP&CG, and are quickly closed by Stack Overflow users).

My guess is that the proposal failed for a simple reason: Most CP enthusiasts didn’t hear about it. The proposal ended with 74 followers. That’s about 0.03% of the 284,500 followers that the CP topic on Quora currently has. And the proposal only got 1789 views, which indicates that most of those Quora followers didn’t even discover the page, whether or not they would have ultimately supported it.

I don’t have a good way to get thousands of Quora users interested in a Stack Exchange site. (I already tried the obvious approach of posting a question on Quora). So I’m going to attack the problem a different way.

Questions

How does one solve the problem of question quality? Quora itself is working on that problem, but they have to solve it across the whole site. And their business model drives them to do things, like allowing anonymous contributions, that works against their quality goals. I’m focusing on a small set of related technical topics, and I’m not trying to run a big content company.

The problem of question quality has two components: bad questions and duplicate questions. Quora offers a number of tools to deal with the first component: downvoting bad questions, and following, sharing, and answering the good ones. At that point, it’s up to the algorithm to show users good content. Quora, unlike Stack Exchange, is very reluctant to delete questions unless they’re truly egregious.

For the second component, Quora offers question merging. Merging takes a bit of effort, since you have to find the correct merge target, and Quora’s list of suggestions isn’t perfect. To supplement it, I use Google search plus a local text file with links to popular questions.

That system works fine for popular questions, but it doesn’t scale well to the long tail of more specialized questions. For example, here’s one from a few years ago: How do I use graphs and trees to solve competitive programming questions? What is a way to learn them using C++? And here’s a similar question with many more followers: How can I be good at graph theory based programming problems in competitive programming? Should the first one be merged into the second one? Maybe. To answer definitively, you need to decide how specific you want questions to be in this topic. That requires a strategy.

To create such a strategy, I think the best approach is to use tags. Quora already has tags, but Quora Topic Bot tends to mess with them, so they aren’t completely reliable. For example, QTB added TopCoder Vs. Sphere Online Judge and CodeChef Vs. Sphere Online Judge tags to the first question above, for no apparent reason.

But if there was a reliable tagging system, perhaps outside of Quora, I think it would be a good way to identify merge candidates by looking for similarities in tag lists. For example, the two questions above could share the tags competitive-programming and graph-theory, while the first question could also have the c++ tag.

Keep in mind that I’m not suggesting re-inventing Quora’s merge suggestion tool. Smart people at Quora are working on that. I’m suggesting that for a small set of related technical topics, the combination of manual work and some local tools can produce good results.

What about answer quality? For some questions, the Quora upvote system is able to highlight good content. But very popular questions like What is the best strategy to improve my skills in competitive programming in C++ in 2-3 months? tend to accumulate a lot of answers. In that case, the problem is how to find the unique content (since many authors repeat the same points).

FAQ

Once questions are merged and answers are summarized, what’s left is essentially a FAQ: Take the distinct set of CP questions on Quora (and possibly elsewhere) and summarize the answers. Quora already has a topic FAQ, but it’s limited to 10 questions, and the answers are not summarized (they’re just the original question answers). What I have in mind is a resource that someone could use to find all of the major questions and answers on the topic of competitive programming.

Software Tools

Quora is a tool to manage questions and answers. But just as with question quality, they have business goals that conflict with their goal to provide canonical questions and answers. I have some ideas for a more specialized tool to help organize the FAQ.

Consider a tool that works as a Web research assistant: You enter a URL, or a list of URLs, or a URL that points to a page that lists other URLs. The tool parses the list or the web page, canonicalises each URL, and checks if it has seen it before. If not, it gets added to a database.

Each URL can then be associated with metadata, such as tags (as described above), a content summary, and the date it was last retrieved. Some metadata could be extracted automatically from the page.

A tool like this could help organize references on a topic like competitive programming. It could recognize specific types of URLs, like those that point to Quora question pages, and organize those using knowledge of the page format. It could help find duplicate questions, and could notice new answers to be summarized.

Wiki

In addition to the FAQ format, another potential format for organizing CP information is a wiki. A wiki could present some of the same information as a FAQ, just in a different format. For example, a FAQ might have a question like What is TopCoder?, while a wiki would just have a topic called TopCoder. For simple factual questions of the “what is” or “who is” form, I think it’s cleaner just to have a topic page. The research tool could even create a stub topic page for each unique base URL that it found.

One resource that is more like a wiki than a FAQ is Jasmine Chen’s Awesome Competitive Programming list. This goal of the list is to summarize the best CP resources. And Jasmine makes sure that low-quality links don’t make it on the list. The difference between the Awesome List and a wiki has to do with scope. The Awesome List is like a table of contents, while a wiki would provide more detail about each entry.

Project Plan

Here’s how I plan to approach this year’s project:

• Tool: I’m going to start by writing a tool. Unlike last year, I’ll be focusing much more on the output of the tool, rather than the infrastructure details that I drilled into with Time Tortoise. So the coding will move faster.
• Data: I expect the tool to quickly generate a lot of data about competitive programming resources on the Web. This will give me the raw material for the next steps.
• FAQ: I’ll start with the FAQ format, using the tool output to find the most interesting questions, answers, and links. I expect that some Quora cleanup will happen at this point, including merging old questions, writing answers, and maybe even asking new questions if I find gaps in the current question list.
• Wiki: If I have time and I think it still makes sense, I’ll tackle the wiki format last.