Web Development 101

I was clearing my mailbox today and came across an old email exchange with a friend from college. He had asked me for some advice on getting started with web development. Since my reply turned out to be quite long, I thought I’ll post it here in the hopes that it’ll benefit a few more people.

Although this conversation took place in December 2010, most of what I said still stands. The only thing that I would take back is the recommendation to use Google App Engine. I believe that their pricing changes earlier this year mean that GAE is no longer a platform for hobbyist hackers to experiment upon.

Now here’s what my friend asked:

Can you guys suggest some essential reading (or advise on how) to understand:
1. Basic background on the structure of the internet and related ideas
2. Core technology that is useful in building a good web-based database that serves data based > on queries from apps etc.
3. Any other things that you think may be useful to a budding web entrepreneur

And here’s my (slightly edited) reply:

Overview

For reading material, you can start with these two:

Both are quite old in Internet Time but should give you the basics.

[NOTE: No links in the following - just Google for terms & phrases!]

Conceptually, most dynamic internet sites have a web server + application layer + relational database all deployed on either Linux or Windows.

Web Server

One the web server side, Apache is the most common although new upstarts like nginx are gaining popularity. In Microsoft land, IIS is what you would go with.

Application Layer

Coming to the application layer, this depends on your language of choice. PHP is by far the most common. Other options are Python, Ruby, Java, .NET, etc.

Beyond the language, 99% of the time you would also choose a web development framework that makes it easier to do web dev in your chosen language.

Database

Finally comes the database where you would store your data. Note that this isn’t absolutely necessary - there are plenty of use-cases where you don’t actually have to save any data or the little data that needs saving could be put in plain text files. But again, most of the time, you are modelling some entities & need a DB as storage layer.

Options: MySQL, PostgreSQL, Oracle, Sybase, SQL Server.

The first two are free & open source while the rest are proprietary & expensive.

For a long time, MySQL was the DB of choice for web startups since it was free & open source & unless you needed support, you didn’t have to pay a dime to anyone. However, Oracle’s acquisition of Sun - and by extension MySQL - has left plenty of people worried about how Oracle will tighten the screws on the free offering. Thus, PostgreSQL which is already equivalent to MySQL functionally, is increasingly gaining traction.

Deployment - Traditional

Finally, you deploy everything on either Linux or Windows. If you are choosing anything other than the C#/ASP.NET/IIS combo, you pick Linux.

Aside: This is the famous LAMP stack - Linux + Apache + MySQL + (PHP/Perl/Python) which powered plenty of startups in the first web boom when we were still roaming the corridors of Ganga!

Now once you have developed your application using some stack of your choice, you need to host it somewhere. This typically means signing up for a hosting account with a web hosting provider. This runs the gamut from shared hosting to Virtual Private Servers (VPS) to dedicated servers co-located in the provider’s data-center.

If you get to any kind of scale, i.e. traffic of a few hundred thousand per month, you’d need someone well versed in web site deployments to set up caching, load balancing proxies, reverse proxies, etc. otherwise there’s no way just the vanilla stack will handle the traffic.

That’s the traditional stack. Next we come to the higher level ‘cloud’ providers.

Deployment - Cloud

First is Amazon Web Services (AWS) which is basically ‘Infrastructure as a Service’. In the old (above described) world, whenever you need a server, you find a hosting provider, pick a server/bandwidth plan that matches your needs & then pay a monthly fee for use of that server/bandwidth. If you have sudden traffic burst (‘cos you got linked from the Yahoo homepage) which requires that you get another server to handle the increased load, you are in a tight spot. By the time you get another server & set it up, your site will probably go down & the traffic surge would be gone. Worse, now you are stuck with an extra server for at least a month even though you no longer need it!

Simply put, AWS, is a metered, pay-as-you-use, hosting platform. You can automatically (or rather, programmatically using APIs) request for servers, storage, etc. Request & power up new servers as soon as you get traffic & shut them down when you don’t need them. You move from a cap-ex model to a rental model for your IT infrastructure.

Further up the stack in ‘cloud computing’ is ‘Platform as a Service’. In this category, you have big players like Salesforce.com, Google App Engine & Microsoft Azure as well as smaller players like Heroku, etc.

Here, the provider gives you a chosen stack to develop for (language + runtime + storage) and then manages the deployment, scaling and other headaches for you. This is basically a zero sysadmin deployment option. As long as your application & dev team can work within the restrictions of the provider’s platform (and every platform has them), you don’t have to worry about the ‘running’ part of your web service. These are again ‘pay for what you use’ type of platforms. The biggest difference when developing for these platforms is that usually, you won’t get a regular relational database (like MySQL, etc.) to work with. Thus, there is a good deal of lock-in once you pick a platform.

tl;dr

Now for some concrete & admittedly biased advice:

That should be enough to give you a good idea of how the web works.