Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: Initial Commit: Webliographer

By Duncan Smith Jan 17 0

Bibliography

This week, I made my initial commit to the GitHub repository I’ll be using for my Webliographer project.

The purpose of Webliographer is to collect and manage web references, otherwise known as links or URLs. This initial commit focuses on importing links from various file formats, and exporting a table of link information in TSV (tab-separated value) format.

Text File UI

Webliographer is a .NET command-line program. This type of user interface takes less time to develop than a graphical UI, but it can also make program functions less discoverable. So rather than rely only on command-line arguments, I decided instead to read commands from a text file. The reason: As I was testing the program, I found myself running the same commands repeatedly. To remember which commands to run, I would store the command syntax in a text document and paste the commands into a console window. So why not just have the program read from a text file directly?

The advantage of a command text file is that it takes advantage of the full functionality of your favorite text editor. Commands can be copied, pasted, and modified as needed. It’s faster than pointing and clicking in a UI, though it does require a command-line way of thinking.

To interpret the command file, I wrote a bare-bones parser that reads the file and looks for one or more commands in a section of the file between the delimiters ExecuteStart and ExecuteEnd. Each command has a unique name and zero or more parameters, each on a separate line. Comments (lines that start with #) are ignored. Another section of the file (between LibraryStart and LibraryEnd) is ignored completely, so it can store commands that aren’t currently being used. Commands from this section can be copied and pasted into the active section when needed.

I wrote about the concept of a text file UI last year in Time Tortoise: Text File User Interface. However, Time Tortoise in its current form relies mostly on a graphical UI.

Importing Data

Webliographer needs to get its link data from somewhere. I have started by writing functions to import from a few different file formats:

  • Google search results in JSON format: I was able to get Google search results in this format, which captures information about a Google search query and each of the search results. I use JsonConvert.DeserializeObject to turn the JSON into a C# object graph for further processing.
  • Links in minimal TSV format: Each line in this minimal format has just a page title and a page URL, separate by a tab. It’s easy to create in a spreadsheet and export to a file. No special parser is required to read this format in C#.
  • Links in a Markdown document: I wrote this importer so I could extract the links from the Awesome Competitive Programming List. Rather than a full Markdown parser, I just use a regular expression to look for links on each line.

Storing Data

I don’t yet have a database up and running, so for now I’m storing my links in a master TSV file. Webliographer has a function to import multiple TSV files, consolidate any duplicates, and write out the result in TSV format. This master TSV can then be opened in a spreadsheet program for inspection. Once a database is available, I will import the master TSV and start using the database as the source of record.

For now, the master TSV contains the link title and URL, plus the domain where the page resides, and a rank (relative position).

(Image credit: Alexandre Duret-Lutz)

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • LeetCode 1022: Sum of Root To Leaf Binary Numbers January 27, 2021
  • LeetCode 1288: Remove Covered Intervals January 20, 2021
  • LeetCode 227: Basic Calculator II January 13, 2021
  • A Project for 2021 January 6, 2021
  • Lessons from the 2020 LeetCode Monthly Challenges December 30, 2020
  • Quora: Are Math Courses Useful for Competitive Programming? December 23, 2020
  • Quora: Are Take-Home Assignments a Good Interview Technique? December 17, 2020
  • Quora: Why Don’t Coding Interviews Test Job Skills? December 9, 2020
  • Quora: How Much Time Should it Take to Solve a LeetCode Hard Problem? December 2, 2020
  • Quora: Quantity vs. Quality on LeetCode November 25, 2020
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2021 Duncan Smith