29 October, 2008

Book Review: An Introduction To Bioinformatics Algorithms

Disclaimer: I'm no where nearly important enough for anyone to pay me to write this review. I have no conflicting interest in writing this review... unless you clicked on my amazon.com link and bought the book I guees.

So we're using An Introduction to Bioinfomratics (by Jones & Pevzner) in (surprise, surprise!) my bioinformatics class. In 7 years of higher education, this is the first time I can say "this textbook rocks!". Ok, of course it doesn't "rock"; it's a freaking textbook - not Larry Mullen on the "40" solo. Anyhow, it's a good book, and I'd get it if I was you; and I'd also make sure you click on my amazon.com widget there on the right before you purchase it.

Anyhow, let me tell you two things about this book:

  1. Chapter 3 is a brilliant molecular biology primer. I've taken at least 3 semesters of bio and chem classes, and this book explained all of what I learned in those classes brilliantly and concisely (granted, I never got much out of college). In fact, after reading this book I felt like I finally understood molecular biology and could explain it to my grandma (which is harder than you think seeing how she's been dead for like 5 years now).
  2. One thing that annoys me about algorithm books is that they always have lame sample of how the algorithm could be used. Well, not this book my friend! This book begins every chapter with a real molecular biology problem and then applies classic computer science college algorithms to the problems. This is what really makes me learn and understand the algorithms.
Honestly, if you're considering buying an algorithms books, this is one worth looking at. It's not comprehensive in the number of algorithms it covers, but the ones it does cover, it covers well.

27 October, 2008

Why partial classes are WRONG! (And why you shouldn't use them.)

I'm absolutely positive the fine folks at Microsoft, particularly the brilliant Anders Hejlsberg (C#'s architect), know of a good reason for partial classes. But I've thought about this for a while, and I'm not convinced there's a good use case for partial classes.

To be intellectually honest I must admit I can actually think of a few scenarios where partial classes make sense; let me tell just lay those out before we continue:

  1. You should use partial classes when some tool (such as Visual Studio), not a developer, writes something like an ASPX page.
  2. You should sometimes use partial classes when you write code that will be auto generated (with CodeSmith or such tools). The sometimes I'm thinking of is DAL code.
  3. You should use partial classes when... ummm, when you, ummm..
I guess that was it; there's only 1.5 good use cases I can think of. I know there must be other reasons whey C# allows for partial classes, so if you know of one, don't hate me for being stupid and leave a comment instead.

The most prevalent argument for partial classes I've heard of is the one that says partial classes are great for organizing code. To this I respond: bull. Partial classes are not great for organizing code; if your class needs to be spread across multiple files just to manage its complexity, there's something fundamentally wrong with your class. Furthermore, when was the last time that having to look for something in multiples files made it easier to find? Finally, and this is my biggest complaint about partial classes, the compiler will never know if all the files present make the entire definition for your class - I'm sure you can see the problems this could create for you.

Mike Murray (a friend and ex-coworker at TGN) and I had a discussion about this very topic, based on his post about partial methods. While we were talking about partial classes, he mentioned that at TGN he had seen fulfillment code that used partial classes. The developer that wrote the code thought it would be a good idea to separate the SKU specific code into partial classes. At first this may look like a good idea, but again I respond: bull. There's a much better way to implement SKU specific behavior for fulfillment code; it involves interfaces and polymorphism, two fundamental principles of OOD.

And the list goes on and on. But for just about every argument in favor of partial classes, there's an OO principle that will do the same job in a much better way.

So, in summary, let me tell you why you shouldn't use partial classes: partial classes will only help you write crappy code. I don't know about you, but I'm already pretty good at writing crappy code; I need tools and frameworks that do the opposite.

Just as a footnote: I read an interview where Anders was asked about C#'s support for tooling. He mentioned that yes, he always keeps in mind tool support when designing C#. I'm guessing this is the true reason for partial classes, and why you should steer off them.

23 October, 2008

The Cost of Throwing & Catching Exceptions

Seeing how I recenlty wrote about how throwing exceptions is not a bad idea for, uhmm - you know, communication errors (believe it or not this is a topic of discussion), I thought I'd follow up with short follow up on the cost of throwing exceptions.

Those that argue against throwing exceptions for means of communicating errors, normally cite the "high" cost of catching exception: unwinding the stack looking for a handler, executing finally blocks, and finally returning control to the right place is allegedly a costly operation.

However, because of the way .NET exception handlers are stored in bytecode and because the code is optimized for the case in which an exception is not thrown (which makes sense considering that we shouldn't see TOO many exceptions):


The overall cost of a try...catch block that never handles an exception is a few bytes of memory - or at worst a few words - for the entry in the protected regions table. The only possible runtime penalty is the extra time to load those extra few bytes into memory. Since they are stored way away from the JITted bytecode stream, it's highly unlikely you're going to incur any additional cache-misses at runtime as a result of the handler too. Thus, the cost is essentially nothing.


However, the cost of not handling an exception is quite large:

The cost of not handling an exception that you should have may well be that your program crashes. This results in unhappy customers, a hit to your reputation and development time to go and do a bug-fix, which will almost certainly be much greater than if you had put it in there in the first place. Obviously, protecting code that can not throw an exception under any circumstances is a waste of your development time. But otherwise, it's best to be safe rather than sorry, safe in the knowledge that even if an exception never does occur in that bit of code, it's not really costing anything anyway.


(Both quotes from The Official Programmer's Heaven Blog on SEH performance)

One Important Consideration

Although the performance cost of structured exception handling is almost always minimal, there is one important consideration that you need to make when writing code inside try blocks: the compiler cannot optimize code inside try blocks.

Take Peter Ritchie's classical example; the following code


int count = 1;
SomeMethod();
count++;
SomeOtherMethod();
count++;
Console.WriteLine(count);


will get optimized to something that will effctively look like


SomeMethod();
SomeOtherMethod();
Console.WriteLine(3);


The same code as in Figure 1, inside a try block, however, will not get optimized. The compiler will actually write code that will increment the "count" variable 2 times.

So, when writing exception handling code, try to be brief inside your try blocks. Other than that, you don't have too much to worry about.

21 October, 2008

StackOverflow's Dirty Little Secret

Seeing how you probably got here from my link on SO, I won't bore you with the details about what SO is and how awesome it is. For everyone else, however, let me just mention that StackOverflow is a community where developers go to ask and answer questions - that's it, but it's AWESUM!

Here's a list of ideas on how to maximize your StackOverflow experience. In it, you'll find SO's dirty little secret; take a look:

Do:



  1. Put your foot in your mouth (often). Every time I answer incorrectly at SO the community let's me know it promptly and pointedly. It's kind of embarrassing to be wrong online, so you really learn from your mistakes. Not only does the shock factor help you remember the correct answer but, as an added benefit, you will no longer carry with you the baggage of incorrect knowledge in your head.

    I strongly recommend goofing up (and correcting your self); for a good sample of how I've goofed up, see this question. I can guarantee you, I'll never get that one wrong again.


  2. Ask subjective questions. Even though by definitions subjective quesitons do not have one correct answer, there's still value in asking these kinds of questions. I'd say about 50% of the SO population is smarter than I am (yes, I realize why about 50% of the population is smarter than me), so there's a lot of value in seeing what other developers think.

    If you care to see one of the subjective questions I've asked, take a lookie here.


Don't:


  1. Think that people know what they're talking about just because they have a high reputation score. And this is SO's dirty little secret: you can get a high reputation (which allegedly represents expertise) by just gaming the system. I'm not saying that the system is broken - there's a lot of people out there that reserve the high rep; there are also many that don't. The other side of this coin is also true: there's a lot of sharp people with low reputation scores (probably because they have better things to do than try to get a high SO rep score).

    So, again, don't judge people just because of their rep score.


  2. Trust answers just because they have a lot of up-votes. Even though there's a lot of smart people at SO, there's also just a lot of people, and sometimes the herd mentality takes over and answers get up-voted for no other reason than the fact that others have up-voted it.


If you're a developer and you haven't joined StackOverflow yet, I strongly recommed you to head over there and start asking and answering questions.

16 October, 2008

How To Write Self-Documenting Code

If I got promoted every time I spotted code like the one that follows, I'd probably be the President of the United States by now. Take a look at Joe's code here:


public static void main(String[] args) {
String db = args[0];
String uid = args[1];
String pwd = args[2];

String url = "jdbc:postgresql://localhost/" + db;
Connection conn;
try
{
conn = DriverManager.getConnection(url, uid, pwd);
String query = "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " + "B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " + "AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " + "< 20000 ORDER BY 3;";

Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);

ResultSetMetaData resultMetaData = rs.getMetaData();
System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
resultMetaData.getColumnName(3));
while(rs.next()){
System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
}
rs.close();
}
catch(SQLException sqle){
System.out.println(sqle.getMessage());
}
}




Can you tell what's going on in the code above? You probably can, with some effort - but why should code so simple be hard to understand? How come the code doesn't tell me what it is doing? Plus, when I'm maintaining this beauty 10 years from now, I'll be hunting Joe down and pulling his fingernails every time I have to debug it - seriously, this code sucks.

Even though just about every programmer I've worked with knows better than to write code that's hard to read, under pressure we all get sloppy. In fact, I was just looking at some code I just wrote a couple of weeks ago as inspiration for this post.

So, now that I've gotten that off my chest, let's see if we can clean this up some. The n00b fix would be to pepper in some comments like this:


public static void main(String[] args) {
String db = args[0];
String uid = args[1];
String pwd = args[2];

String url = "jdbc:postgresql://localhost/" + db;
Connection conn;
try
{
//get the connection to the db
conn = DriverManager.getConnection(url, uid, pwd);
String query = "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " + "B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " + "AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " + "< 20000 ORDER BY 3;";

Statement stmt = conn.createStatement();
//execute the query defined above
ResultSet rs = stmt.executeQuery(query);

ResultSetMetaData resultMetaData = rs.getMetaData();

//print column names
System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
resultMetaData.getColumnName(3));

//print every tuple
while(rs.next()){
System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
}
rs.close();
}
catch(SQLException sqle){
System.out.println(sqle.getMessage());
}
}



That still looks terrible, however. Not only that, the comments aren't really that useful; they don't tell me anything I couldn't have figured out by reading the code. In fact, I think these types of comments make the code harder to read since they interrupt what's really important - the code. But that's kind of a side note: the real problem here is that comment's don't give me any insight or explain the intent of the code.

Alright then... What do we do now? Well, how about something like this:


public static void main(String[] args) {
String db = args[0];
String uid = args[1];
String pwd = args[2];

String url = "jdbc:postgresql://localhost/" + db;
Connection conn;
try
{
conn = DriverManager.getConnection(url, uid, pwd);

//we're hard coding the query here because the db is out of SPs - joe
String query = "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " +
"B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " +
"AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " +
"< 20000 ORDER BY 3;";

ResultSet rs = ExecuteQuery(query, conn);
PrintQueryResults(rs);
rs.close();
}
catch(SQLException sqle){
System.out.println(sqle.getMessage());
}

}

private static ResultSet ExecuteQuery(String query, Connection conn) {
try{
Statement stmt = conn.createStatement();
return stmt.executeQuery(query);
}
catch(SQLException sqle){
return null;
}
}

private static void PrintQueryResults(ResultSet rs){
try{
ResultSetMetaData resultMetaData = rs.getMetaData();
System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
resultMetaData.getColumnName(3));
while(rs.next()){
System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
}
}
catch(SQLException sqle){ }
}


Ahhh... much better. I can tell what the code is doing in one brief overview - and most importantly, the code itself is telling me what it's doing!

Also, the code is now easier to debug and maintain because each method is only a few lines long; this means I'll know exactly where the code may be having problems next time it breaks by just looking at the stack trace. Finally, look at that comment about the query - that really tells me something I could have never known unless Joe had told me.

So there you go, that's self documenting code in 3 easy steps. Please let Joe know how much you appreciate learning from his mistake by dropping a comment. It'll make him feel better.

11 October, 2008

when to throw exceptions

have you ever heard the saying "exceptions are for exceptional circumstances"? well, i have, and until recently, i was a firm believer of such thoughts. i'd normally code for every "exceptional" condition and try to do everything i could to avoid having to throw an exception.

however, just a few days ago, i came across this statement from Krzysztof Cwalina (program manager for the CLR team at MS):

One of the biggest misconceptions about exceptions is that they are for “exceptional conditions.” The reality is that they are for communicating error conditions. From a framework design perspective, there is no such thing as an “exceptional condition”. Whether a condition is exceptional or not depends on the context of usage, --- but reusable libraries rarely know how they will be used. For example, OutOfMemoryException might be exceptional for a simple data entry application; it’s not so exceptional for applications doing their own memory management (e.g. SQL server). In other words, one man’s exceptional condition is another man’s chronic condition.


Cwalina then goes on to say that exceptions should be used for common errors such as (1) usage errors, (2) program errors, and finally (3) system errors. it seems to me this in direct opposition of the "exceptional exceptions" mantra.

to be honest, i never understood why exceptions should only be used in "exceptional" circumstances. it's hard to define "exceptional", and that's exactly the point Cwalina makes in his quote.

i realize, however, there's circumstances when it doesn't make sense to throw exceptions; for example, why throw a DivideByZeroException when you can easily check for the condition an appropriately terminate?

but for the most part, exceptions provide an objected oriented way to communicate errors to clients. i think i'm switching from the "exceptional exceptions" camp to the "let's use exceptions to communicate errors" camp. what about you?

Update: I have a new article on the cost of throwing exceptions. If you now think that throwing more exceptions is a good idea, you might want to check this article too.