I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.
So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.
As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:
- Title: What are some good questions on CodeChef from which I will learn more algorithms?
- Canonical title: What are some good competitive programming problems?
I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.
If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.