I use Eclipse almost every day at work for Java development. It isn't the best piece of software out there but it mostly works, has a plethora of features and you can't beat the price!
I do have to make several changes to the default settings to make Eclipse work according to my tastes. However, these changes are linked to a workspace and thus, every time that I create a new workspace, I have to remember the tweaks and re-do them.
So here's my attempt at listing them all out so I have a quick reference the next time I need to setup Eclipse.
Installation
Head to eclipse.org/downloads/ and get the relevant Eclipse bundle. The Eclipse IDE for Java Developers is what would generally suit me as a core Java developer. However, I find this bundle to contain unnecessary features (integrated CVS client, really?!) and prefer building up from a base installation starting with just the Eclipse Platform Runtime Binary. Unfortunately, the Eclipse folks keep re-organizing their site and each time, I've to hunt for the download location for the platform binary package. A bit of web searching should help here.
Once you've installed just the base Eclipse platform runtime, you can add the relevant plugins from within Eclipse. Go to Help > Install New Software or Help > Eclipse Marketplace and search for the required features.
My current list of plugins:
- Eclipse Java Development Tools
- Eclipse Color Theme
- ExploreFS
- Perforce SCM Support
- Vrapper (Vim Emulator)
Startup
Eclipse is a memory hog. It is unlikely that the default memory settings will give you a decent experience so it is best to change them. Edit the eclipse.ini file (lives next to eclipse.exe) as follows:
-startup
plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_1.1.2.R36x_v20101222
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
openFile
-vmargs
-Xms40m
-Xmx512m
The first few lines should already be present in the file. Add the memory related args with values suited to your development machine.
Configuration
Finally, time to tweak some preferences. These need to be set for each new workspace that you create.
Open the Window > Preferences dialog and change the following:
- General > Appearance:
Uncheck Enable animations
- General > Editors > Text Editors:
Set Display tab width to 4; Check Insert spaces for tabs
- General > Editors > Text Editors > Spelling:
Uncheck Enable spell checking
- General > Startup and Shutdown:
Uncheck Mylyn Tasks UI
- General > Workspace:
Set Text file encoding to UTF-8; Set New text file line delimiter to Unix
- General > Compare/Patch > General:
Uncheck Open structure compare automatically; Check Ignore white space
- General > Compare/Patch > Text Compare:
Uncheck Connect ranges with single line
- Java > Code Style > Formatter:
Download Android's code style profile and Import it here
- Java > Editor > Content Assist > Advanced:
Check only the following proposals in both sections: Java Non-Type Proposals, Java Proposals, Java Type Proposals and Template Proposals
- Java > Editor > Content Assist > Favorites:
Add New Type org.junit.Assert
- Java > Installed JREs:
Add a new JDK by pointing to the install directory; Add -server to the Default VM Arguments; Use Add External JARs to add tools.jar to the default libraries.
- Team > Perforce:
Check Use "move" command during refactoring operations
Ideally, you should be able to File > Export > General > Preferences from one workspace and then File > Import > General > Preferences into a new workspace. But it takes only a couple of minutes to make the above config changes and I don't have to worry that Eclipse screwed up something else under the covers. :-)
Take any Java application of a reasonable size and there's an almost 100% chance that at least one class in the codebase uses the lazy initialization pattern. One typical usage is in the creation of Singleton classes.
For a basic pattern in such widespread use, it is surprising how often it is implemented incorrectly! Why does this happen? Let's take a look with a simple example:
class Demo {
private Collaborator collaborator = new Collaborator();
public Collaborator getCollaborator() {
return collaborator;
}
public static void main(String... args) {
Demo demo = new Demo();
Collaborator collaborator = demo.getCollaborator();
}
}
Perfectly pedestrian stuff so far. We define a Demo class in which a Collaborator object is created and is ready for use as soon as an instance of Demo is created. But what if we don't always need the collaborator? We would be paying the cost of creating it even in situations where it isn't used. This becomes a real concern if it is relatively expensive to create a new collaborator. Enter lazy initialization:
class Demo {
private Collaborator collaborator;
public Collaborator getCollaborator() {
if (collaborator == null) {
collaborator = new Collaborator();
}
return collaborator;
}
}
In the revamped Demo class, we've delayed the construction of Collaborator to when the getter method is called thus ensuring that we don't create an instance before we need it.
Although it solves our first problem, it introduces another one: the Demo class will likely exhibit unexpected behaviour in a multi-threaded environment. If two or more threads simultaneously invoke the getCollaborator() method on an instance of Demo, then it is very much possible that more than one instance of Collaborator gets created. Depending on what Collaborator actually is, the effects of this can range from simply wasteful to downright dangerous!
Making the Demo class' behaviour predictable in a multi-threaded environment is easy - just make the getCollaborator() method synchronized.
class Demo {
private Collaborator collaborator;
public synchronized Collaborator getCollaborator() {
if (collaborator == null) {
collaborator = new Collaborator();
}
return collaborator;
}
}
By making the getCollaborator() method synchronized, we ensure that only one thread can invoke the method at a time and are thus guaranteed that only one instance of collaborator will be created.
However, there's yet another problem with our change. (In case you haven't guessed, this is the pattern for this entire post!)
The problem is that even though we needed to ensure exclusive access to the getter method only during the initial instantiation of Collaborator, we pay the cost of synchronization on all subsequent calls to the method!
Alright, let's try another change:
class Demo {
private Collaborator collaborator;
public Collaborator getCollaborator() {
if (collaborator == null) {
synchronized(this) {
if (collaborator == null) {
collaborator = new Collaborator();
}
}
}
return collaborator;
}
}
This is known as the double-checked lock pattern. What we are doing is first checking if the collaborator reference is null. If it is, we try to gain a lock on the Demo object instance (this). Once we hold the object lock, we need to check again if collaborator is still null. This double check is required because it is quite possible that between the time of the first check and the time we get the object lock, a different thread could come in, gain the lock and go ahead and construct a collaborator. So our second check is a defence against that. If we find that the collaborator is still null, we go ahead and construct one.
This seems right, doesn't it? Unfortunately, it is not.
This is where our intuitive sense of reasoning starts breaking down in the face of modern technology.
The problem (once again) is that modern compilers do this thing called instruction re-ordering or out-of-order execution. In fact, the Java Language Specification (JLS) even explicitly permits implementations to do instruction re-ordering code optimizations because it can improve execution speed. And Java isn't the only language doing this either; pretty much all modern programming language compilers do these optimizations.
We won't get into the details of how these optimizations work - that is the job of the JVM engineers! As application developers, our job is to know that these things happen and to ensure that our programs don't fail in the presence of these optimizations.
To better demonstrate the effect of instruction reordering, let me define a simple Collaborator class:
class Collaborator {
public Associate associate;
public Collaborator() {
associate = new Associate();
}
}
With the above class definition in mind, imagine an optimization where the constructor call is inlined in our double-checked Demo class:
class Demo {
private Collaborator collaborator;
public Collaborator getCollaborator() {
if (collaborator == null) {
synchronized(this) {
if (collaborator == null) {
// psuedo code now
associate = new Associate();
collaborator = new Collaborator();
}
}
}
return collaborator;
}
}
NOTE: I am not saying that this is how the code will look if the constructor is inlined. I am just asking you to visualize the fact that there are two reference assignment operations going on here: (1) the associate reference and (2) the collaborator reference.
Now since the JVM is free to re-order these instructions, it may choose to move the collaborator reference assignment before the associate reference assignment. If this happens, think of what happens if another thread comes in and calls getCollaborator() in between the collaborator reference store and the associate reference store? The second thread will reach the if (collaborator == null) check, find that collaborator is not null (since the store was already done!) and so it would skip the if block and return the collaborator reference.
Now with the collaborator that it got, if the second thread tries to do anything with collaborator.associate, it'll get an unexpected NullPointerException since the associate reference is still null!
This is how our intuition fails us and this is one of the reasons why folks keep saying that multi-threaded programming is hard!
Some of you may legitimately ask the question: shouldn't the synchronized block take care of this? Well, a synchronized block is essentially a 'monitor entry' operation at the start of the block and then a 'monitor exit' operation at the end of the block. The JLS guarantees that once we 'monitor exit', all other threads will see all the memory assignments that happened before the exit. However, and crucially, it makes no guarantee that other threads will not see these assignments before 'monitor exit'. See the problem?
So how do we fix this?
Solution 1
We got into this whole mess because we tried to do lazy initialization. So the first question is to ask ourselves if we really need to initialize our class lazily? If not, the safest is to just construct the object at Step 0:
class Demo {
private final Collaborator collaborator = new Collaborator();
public Collaborator getCollaborator() {
return collaborator;
}
}
The critical point to note here is the use of the final modifier. Without that, this class will not be thread-safe. Why? Suppose Thread 1 constructs a Demo instance and then hands it off to thread 2. In the absence of any synchronization, there is no guarantee that the second thread will see the collaborator reference assignment. To put it in a different way, the memory operations done by thread 1 are not guaranteed to be visible to thread 2 in the absence of synchronization. Declaring the reference as final is a great way of ensuring visibility without paying the cost of synchronization. This is made possible due to the special guarantees that the JLS provides for final fields. Go read up on it :-)
Solution 2
If eager initialization is not an option, this is how we can fix our double checked locking code:
class Demo {
private volatile Collaborator collaborator;
public Collaborator getCollaborator() {
if (collaborator == null) {
synchronized(this) {
if (collaborator == null) {
collaborator = new Collaborator();
}
}
}
return collaborator;
}
}
All we did was to add the volatile modifier to collaborator. By doing this, we invoke the JLS guarantee that reads and writes of volatile references shall not be re-ordered. This solves our earlier problem caused by non-apparent instruction reordering. Note that we still need the synchronized block!
There are performance implications to using volatile references but in most scenarios, they aren't too bad. At least on x86, a volatile read instruction is almost as cheap as a regular read. Volatile writes on the other hand are very expensive! If you wish to further optimize the above by reducing the number of volatile read operations, you can use a local variable:
class Demo {
private volatile Collaborator collaborator;
public Collaborator getCollaborator() {
Collaborator tmp = collaborator;
if (tmp == null) {
synchronized(this) {
tmp = collaborator;
if (tmp == null) {
tmp = new Collaborator();
collaborator = tmp;
}
}
}
return tmp;
}
}
Solution 3
In this final solution, we make use of another guarantee of the JLS: an inner class will not be initialized until it is referenced elsewhere.
class Demo {
private static class CollaboratorHolder {
public static final Collaborator collaborator = new Collaborator();
}
public Collaborator getCollaborator() {
return CollaboratorHolder.collaborator;
}
}
When the JVM loads our Demo class, it skips the initialization of the inner CollaboratorHolder class. It is only when a caller invokes the getCollaborator() method that the CollaboratorHolder class is initialized causing the construction of a new Collaborator object. Moreover, this code is 100% thread-safe since the JLS guarantees that class initialization is a serial operation. This pattern is known as the initialization on demand holder pattern.
Summary
As this simple pattern demonstrates, writing multi-threaded code that is both correct and fast is not an easy task. It is very important that we know our development platform and the facilities it provides. And as a meta-observation: we should try not to "optimize" code unless required!
Further Reading
I was clearing my mailbox today and came across an old email exchange with a friend from college. He had asked me for some advice on getting started with web development. Since my reply turned out to be quite long, I thought I'll post it here in the hopes that it'll benefit a few more people.
Although this conversation took place in December 2010, most of what I said still stands. The only thing that I would take back is the recommendation to use Google App Engine. I believe that their pricing changes earlier this year mean that GAE is no longer a platform for hobbyist hackers to experiment upon.
Now here's what my friend asked:
Can you guys suggest some essential reading (or advise on how) to understand:
1. Basic background on the structure of the internet and related ideas
2. Core technology that is useful in building a good web-based database that serves data based > on queries from apps etc.
3. Any other things that you think may be useful to a budding web entrepreneur
And here's my (slightly edited) reply:
Overview
For reading material, you can start with these two:
Both are quite old in Internet Time but should give you the basics.
[NOTE: No links in the following - just Google for terms & phrases!]
Conceptually, most dynamic internet sites have a web server + application layer + relational database all deployed on either Linux or Windows.
Web Server
One the web server side, Apache is the most common although new upstarts like nginx are gaining popularity. In Microsoft land, IIS is what you would go with.
Application Layer
Coming to the application layer, this depends on your language of choice. PHP is by far the most common. Other options are Python, Ruby, Java, .NET, etc.
Beyond the language, 99% of the time you would also choose a web development framework that makes it easier to do web dev in your chosen language.
- Python: Django, Flask, etc.
- Ruby: Ruby on Rails, Sinatra, etc.
- Java: J2EE app servers, Play framework
- .NET: ASP.net, etc.
Database
Finally comes the database where you would store your data. Note that this isn't absolutely necessary - there are plenty of use-cases where you don't actually have to save any data or the little data that needs saving could be put in plain text files. But again, most of the time, you are modelling some entities & need a DB as storage layer.
Options: MySQL, PostgreSQL, Oracle, Sybase, SQL Server.
The first two are free & open source while the rest are proprietary & expensive.
For a long time, MySQL was the DB of choice for web startups since it was free & open source & unless you needed support, you didn't have to pay a dime to anyone. However, Oracle's acquisition of Sun - and by extension MySQL - has left plenty of people worried about how Oracle will tighten the screws on the free offering. Thus, PostgreSQL which is already equivalent to MySQL functionally, is increasingly gaining traction.
Deployment - Traditional
Finally, you deploy everything on either Linux or Windows. If you are choosing anything other than the C#/ASP.NET/IIS combo, you pick Linux.
Aside: This is the famous LAMP stack - Linux + Apache + MySQL + (PHP/Perl/Python) which powered plenty of startups in the first web boom when we were still roaming the corridors of Ganga!
Now once you have developed your application using some stack of your choice, you need to host it somewhere. This typically means signing up for a hosting account with a web hosting provider. This runs the gamut from shared hosting to Virtual Private Servers (VPS) to dedicated servers co-located in the provider's data-center.
If you get to any kind of scale, i.e. traffic of a few hundred thousand per month, you'd need someone well versed in web site deployments to set up caching, load balancing proxies, reverse proxies, etc. otherwise there's no way just the vanilla stack will handle the traffic.
That's the traditional stack. Next we come to the higher level 'cloud' providers.
Deployment - Cloud
First is Amazon Web Services (AWS) which is basically 'Infrastructure as a Service'. In the old (above described) world, whenever you need a server, you find a hosting provider, pick a server/bandwidth plan that matches your needs & then pay a monthly fee for use of that server/bandwidth. If you have sudden traffic burst ('cos you got linked from the Yahoo homepage) which requires that you get another server to handle the increased load, you are in a tight spot. By the time you get another server & set it up, your site will probably go down & the traffic surge would be gone. Worse, now you are stuck with an extra server for at least a month even though you no longer need it!
Simply put, AWS, is a metered, pay-as-you-use, hosting platform. You can automatically (or rather, programmatically using APIs) request for servers, storage, etc. Request & power up new servers as soon as you get traffic & shut them down when you don't need them. You move from a cap-ex model to a rental model for your IT infrastructure.
Further up the stack in 'cloud computing' is 'Platform as a Service'. In this category, you have big players like Salesforce.com, Google App Engine & Microsoft Azure as well as smaller players like Heroku, etc.
Here, the provider gives you a chosen stack to develop for (language + runtime + storage) and then manages the deployment, scaling and other headaches for you. This is basically a zero sysadmin deployment option. As long as your application & dev team can work within the restrictions of the provider's platform (and every platform has them), you don't have to worry about the 'running' part of your web service. These are again 'pay for what you use' type of platforms. The biggest difference when developing for these platforms is that usually, you won't get a regular relational database (like MySQL, etc.) to work with. Thus, there is a good deal of lock-in once you pick a platform.
tl;dr
Now for some concrete & admittedly biased advice:
- Learn Python and go through the Django framework tutorial.
- Build a small web app (say an online address book) using Django. Get it running on your local machine.
- Read up on the Google App Engine documentation.
- Redo the same simple (address book) app using Google App Engine.
- Deploy the app to App Engine online.
That should be enough to give you a good idea of how the web works.
I came across a couple of interesting things at work this week.
env
I've always used env to find out what environment variables are set in a shell. Either run it and read the output or pipe it to grep to check if a particular variable is set or not.
This week, I found out something strange:
antrix@cellar:~$ FOO=bar
antrix@cellar:~$ env | grep FOO
antrix@cellar:~$ export FOO=bar
antrix@cellar:~$ env | grep FOO
FOO=bar
antrix@cellar:~$
I don't know about you but I was very surprised by this behaviour! How could this be?
antrix@cellar:~$ which env
/usr/bin/env
antrix@cellar:~$
Never in my years of using env had I suspected that it wasn't a shell built-in! So let's see what the man page for env says:
NAME
env - run a program in a modified environment
SYNOPSIS
env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
<snip>
If no COMMAND, print the resulting environment.
So essentially, all these years, I was using env without really using it fully!
Let me correct that. I did use it quite a bit as the first line of my python scripts:
But I never put two & two together. Shame on me!
kill
The second interesting thing was ksh's strange handling of kill. In bash, you can specify any of the normal signals that kill should send to a process using the -s flag. The argument value to that flag can be either the signal number or the name. So both of the following will work:
antrix@cellar:~$ kill -s 15 pid
antrix@cellar:~$ kill -s TERM pid
But in ksh, if you use the -s flag, the value must be a signal name, not a number! Worse, it won't even warn you with an error!
Just another in the long list of ksh idiosyncrasies!
And yes, I realize the title should actually be This Week I Learned :-)
After a long time, Bollywood has seen fit to produce a movie based on that much-neglected theme: male bonding. No, not the Dostana style bonding but the Dil Chahta Hai style. I can't recall the last movie I saw based on this theme. Even Dil Chahta Hai, great though it was, had to resort to the boy-meets-girl motif for the better part of the movie. Rock On comes to mind but personally, I really hated that movie. I could neither empathize nor sympathize with those moping 30 year olds.
Before watching it, I had no idea what Zindagi Na Milegi Dobaara was about nor did I know who was in the movie. Based purely on a couple of positive recommendations and the better half's insistence, I booked the tickets. Like they say, paisa wasool bheedu!
I don't want to describe the plot in detail; just know that it is the quintessential road trip movie and by the time the trip ends, the three protagonists have overcome their differences, cemented their friendship, found love and discovered new beginnings. The movie is paced well with plenty of laughs along the way. None of the jokes seem forced and as befits the theme, emotions are kept on a tight leash. Incidentally, this is something that I hate in the typical Shahrukh movie: he bloody well has to cry! It is said that for every tear that he sheds in a film, the box office receipts go up by a crore.
The performances by the three men are great and really hold the film together. Katrina gets her usual back story of 'mixed parentage' to justify her accent. Come to think of it, none of the main women in the movie can speak Hindi fluently!
Zoya Akhtar is clearly very good at what she does; this definitely does not feel like just the second movie that she's directed. Technically too, the film is very well produced with excellent cinematography, sound, editing, etc. The only way to improve on it would be to add some transformer bots. Kidding!
The music is a weak point for the film. I couldn't recall any tune after I left the theatre. Movies that stay with us for ages typically have great hummable music. Not so in this case.
I found one minor irritant in the movie, viz. the product placements. No, I am not talking about the long, long advertisement for Spain which, IMHO, the Spanish tourism ministry will be quite pleased with over the coming years. I am talking about product placements which break any sense of realism. For instance, a Royal Enfield shows up in some small Spanish town. That, I can just about digest. But catching a Vodafone India signal in the middle of Spain is taking us into the realm of science fiction!
Overall, I really enjoyed the movie and it brought back great memories of trips taken with friends. I would rate is a solid 4/5.