How to get started building a web browser?

Well break it down into pieces. What is a Web browser? What does it do? It:

  • Fetches external content. So you need a HTTP library or (not recommended) write this yourself. There's a lot of complexity/subtlety to the HTTP protocol eg handling of expires headers, different versions (although it's mostly 1.1 these days), etc;
  • Handles different content types. Theres a Windos registry for this kind of thing that you can piggyback. I'm talking interpreting content based on MIME type here;
  • Parses HTML and XML: to create a DOM (Document Object Model);
  • Parses and applies CSS: this entails understanding all the properties, all the units of measure and all the ways values can be specified (eg "border: 1px solid black" vs the separate border-width, etc properties);
  • Implements the W3C visual model (and this is the real kicker); and
  • Has a Javascript engine.

And that's basically a Web browser in a nutshell. Now some of these tasks are incredibly complex. Even the easy sounding ones can be hard. Take fetching external content. You need to deal with use cases like:

  • How many concurrent connections to use?
  • Error reporting to the user;
  • Proxies;
  • User options;
  • etc.

The reason I and others are colletively raising our eyebrows is the rendering engine is hard (and, as someone noted, man years have gone into their development). The major rendering engines around are:

  • Trident: developed by Microsoft for Internet Explorer;
  • Gecko: used in Firefox;
  • Webkit: used in Safari and Chrome 0-27;
  • KHTML: used in the KDE desktop environment. Webkit forked from KHTML some years ago;
  • Elektra: used in Opera 4-6;
  • Presto: used in Opera 7-12;
  • Blink: used in Chrome 28+, Opera 15+, webkit fork;

The top three have to be considered the major rendering engines used today.

Javascript engines are also hard. There are several of these that tend to be tied to the particular rendering engine:

  • SpiderMonkey: used in Gecko/Firefox;
  • TraceMonkey: will replace SpiderMonkey in Firefox 3.1 and introduces JIT (just-in-time) compilation;
  • KJS: used by Konqueror, tied to KHTML;
  • JScript: the Javascript engine of Trident, used in Internet Explorer;
  • JavascriptCore: used in Webkit by the Safari browser;
  • SquirrelFish: will be used in Webkit and adds JIT like TraceMonkey;
  • V8: Google's Javascript engine used in Chrome and Opera;
  • Opera (12.X and less) also used its own.

And of course there's all the user interface stuff: navigation between pages, page history, clearing temporary files, typing in a URL, autocompleting URLs and so on.

That is a lot of work.


Sounds like a really interesting project, but it will require you to invest an enormous effort.

It's no easy thing, but from an academic point of view, you could learn so much from it.

Some resources that you could check:

  • HTMLayout.NET: fast, lightweight and embeddable HTML/CSS renderer and layout manager component.
  • GeckoFX: Windows Forms control that embeds the Mozilla Gecko browser control in any Windows Forms Application.
  • SwiftDotNet: A Browser based on Webkit in C#
  • Gecko DotNetEmbed
  • Gecko#
  • Rendering a web page - step by step

But seeing it from a realistic point of view, the huge effort needed to code it from scratch reminded me this comic:


(source: geekherocomic.com)

Good Luck :-)

Tags:

C#

Browser