the ramblings of a .net dude: 2008

19 December, 2008

Why You Should Be Controversial

I have to tell you a story before any of this makes sense, so bear with me for a minute.

Michael Murray, a good buddy of mine, IMed me today telling me he had been invited to write a blog post for the the main page of tech.lds.org. I was really happy for him; he deserves it as a tribute to his tireless contributions to the LDS tech community.

Unfortunately, for reasons that I can understand, Mike was having a hard time coming up with what to say. I mean, there's tons you could say, but you have to be somewhat careful... after all some may mistake criticism of the software the LDS Church employees write with criticism of the LDS Church period. And then there's also the fact that Mike really cares about the tech.lds community; he's an active moderator in it.

Unfortunately, the tech.lds community suffers from the same problem any extremely homegenous community does: everyone says the same freaking thing all of the freaking time! Everyone talks about how IT is a means to an end; everyone talks about how we, as IT folk, should seek for way to further the Kingdom and be more wise servants. Please don't get me wrong: I know that what the tech.lds community preaches is true, but haven't we heard enough of that already?

Why don't we talk about some of the real issues with IT at the Church?

Here's a few ideas off the top of my head:

How come LDS software is not open sourced? I still remember when I helped build my chapel back in Ecuador; truly a memorable experience - it was great to be able to contribute beyond tithing money.
How come phone numbers are tied to households in MLS? What's up with that?!? And even more important, how come that wasn't fixed 10 releases ago?
How come the IT department used to pay terrible salaries to IT employees? I recognize this is changing, but what kind of people was the IT department expecting to hire?
Why did it take 10 years for meetinghouses to get broadband internet access? Did IT managers really just learn of white lists?
Why does it take days to transfer membership records into a ward? Isn't it just a simple look up? And if it's not just a simple look up, how come it's not?
Why is the Church developing on the Java stack? When was the last time you saw MLS running on a Linux box?

I've been deliberately controversial on this post because I wanted to make point: if you're going to say something, you might as well say something that's worth thinking about. I suspect there are good answers to all of the questions I've listed above; furthermore, I realize in the grand scheme of things I know nothing compared to what the fine folks at the LDS Church do. But at least, you read this far and know you're thinking about software at the Church too.

Disclaimer: I cannot say it more clearly than this: I'm criticizing IT at the LDS Church, I am NOT criticizing the Church of Jesus Christ of Latter Day Saints. To some extent, I'm not trying to criticize the folks the work in IT at the Church; I realize that were I in charge of IT things would probably be even worse than they're now.

11 December, 2008

The Secret Behind LINQ To SQL

I finally get it! It's all about expression trees!

It all finally clicked for me when I saw the declaration of IQueryable:


public interface IQueryable : IEnumerable
{
  Type ElementType { get; }
  Expression Expression { get; }
  IQueryProvider Provider { get; }
}

See that? It's right there! How could I have been so blind for so long? But I'm getting ahead of myself. Let me tell you about expression trees.

What are expression trees? Well, they're a trees with expressions as nodes. What do they do? They provide a mechanism to convert code into data (expressions). Why is that useful? Because you may want to examine and/or modify you code before execution. In particular, it would be very useful, if we wanted to take say C# code and convert it to, oh... i dunno.., SQL statements.

Say then you created the following Expression:


Expression<Func<int, int, int>> expression = 
     (a,b) => a + b;

(There's a little magic going on in the above sample: the compiler knows how to take the lambda "(a, b) => a + b" and make a delegate out of it).

The above code would literally translate to a "+" plus node with two children nodes: "a" and "b". Obviously, I've simplified the tree for clarity. If you really want to see the tree structure, fire up VS2008 and take a look at the expression tree with the ExpressionTreeVisualizer plugin.

Anyhow, now that we have our C# code in a tree, we can easily parse that tree and convert the data to a string that SQL can understand. We can then take that string, send it across the wire to our SQL server and voila: we have LINQ to SQL. Cool, huh?

Now, here's something else to think about: IEnumerable offers most of the same methods IQueryable offers, yet the declaration is completely different:


public interface IEnumerable<T> : IEnumerable
{
   IEnumerator<T> GetEnumerator();
}

Notice how IEnumerable does not have an Expression tree? That's a fundamental difference between the two interfaces. This means that IEnumerable won't do anything with your code but execute it. Therefore, if you order, filter, or project an IEnumerable collection, the action will execute in the process where the collection lives; it will not be sent to a beefy SQL box that can handle ordering large sets easily (Yes, I've made that mistake. In fact, that's what inspired this post).

Turns out my CS professors were right: trees really are useful data structures.

09 December, 2008

Why Programmers Should Have Private Offices.

Non-programmers don't get it: interruptions are a huge deal; it takes a lot of mental effort to load the program you're working on onto your brain.

Once you're in "the zone", you're effective and you write good code. Naturally then, having some dude from marketing stop by to ask a "quick question" is a total disaster; you have to literally unload a bunch of important information from your brain, just to listen to the marketing clown ask you for the 23rd time when the product will ship. Not cool. Not cool at all.

What non-programmers don't get is how hard it is to resume what you were working on. They don't understand that, to some extent, you have to load all the classes, variables, functions, etc. in your code right onto your memory; you have to enter the freaking Matrix, ant that's no easy task.

I'm guessing non-programmers think going back to writing code is as easy as going back to writing an email: you just read the last sentence you wrote and pick up from there. Well, it's not that simple.

Joel Spolsky has been advocating for private offices for developers well for almost a decade now. Yet there's few companies out there that give their developers such luxury. So, if you ever come across such perk, there's a good chance that's a company you want to work for.

05 December, 2008

From Recursion To Dynamic Programming

Dynamic programming is a really useful technique. When used appropriately, it saves processor time and memory space.

Shoot! After writing all the code for this entry, I just noticed that the Wikipedia entry for dynamic programming uses the same example I thought about using in this post. Oh well, let me just show you some code and explain just a little.


private int recursiveFib(int n)
{
  if (n == 0)
   return 0;
  if (n == 1)
   return 1;
  return recursiveFib(n - 1) + recursiveFib(n - 2);
}

The above code calculates the Fibonacci number for any integer n. The code is easy to read and effective. However, there's one problem: we repeat calls to recursiveFib even though we may have already computed a value. For example for n=4, we'll call recursiveFib(2) twice; this may not seem like a big problem but for larger Ns the number of unique calls to recursiveFib is very small. What a waste of processor cycles!

Now that we've realized we're wasting resources because the number of unique calls to our recursive methods is small, we're ready to move on to dynamic programming. Here's the dynamic code:


private int dynamicFib(int n)
{
  int[] dynamicArray = new int[n];
  dynamicArray[0] = 0;
  dynamicArray[1] = 1;
  for (int i = 2; i < n; i++)
  {
   dynamicArray[i] = dynamicArray[i - 1] + dynamicArray[i - 2];
  }
  return dynamicArray[n];
}

Even though this code may not be as succinct as the recursive code, it saves resources. By taking a bottom up approach (rather than the top to bottom approach of the recursive method) and saving our previous calculations, we never have to recompute a value again! Although, with oil being as cheap as it is nowadays, you may not really care.

In summary:
If you want to solve a problem with dynamic programming, I recommend you first solve it recursively. This may seem like a waste of time, but you need the recursive solution to spot where the improvement to the algorithm needs to occur. Once you've figured that out, all you have to do is compute and store you partial solutions as you go.

25 November, 2008

Think About It!

Disclaimer: My apologies to the fine people who, unlike me, actually write source control systems.

If I asked you to write a source control system, you might say: "Pff. That's easy dude! Just store a copy of the document every time make a change. When you need a particular revision, you just access that copy of the document". And although your system would work, I'd say: Think about it!

Storing a full copy of the document for every change is wasteful; revision changes are small: a line of text here and a line of text there. Why would you store a full copy of the document just for a small change?

At this point you may be tempted to say: "Well, then just store the original document, and store the changes as small delta files. We can then apply the deltas to get you any revision you want.". This, of course, is a much better solution as far as storage is concerned; however, I'd still say: Think about it!

Under normal circumstances, users access the most recent revision of a document far more often than any other revision. Furthermore, storing the original document and applying all of its changes is computationally expensive. It would then seem that having to apply all these deltas for our most common operation is a bad idea. Seeing this, how can we further improve our syste?

Well, how about we store the most recent version of the document instead of the original? This would mean we would have to store deltas to take us all the way back to the original version, but that's OK - it's not much different than what we were thinking about doing before. However, with this change, we can now perform our most common operation (return the current version of a document) in constant time. Also, the expensive operation (returning an older version of the document) now occurs on rare occasions. Better, huh?

And now that I'm out of ideas for our source control system, I'm going to go back and "Think about it!", a little bit more, because I'm sure there's still lots of room for improvement.

20 November, 2008

Spell Checking The Right Way

In my fun-as-a-rock database class I recently got an assignment to correct misspellings in a file full with city names.

Now, there's two ways to do spell checking: the Microsoft way, and the Google way. Care to guess which is the wrong way to do it? Yup, you got it: the Microsoft way sucks! Ok, maybe it didn't suck back in the 17th century when Spanish Monks were doing all the spell checking known to mankind (which I think consisted of 3 or 4 individuals that actually knew how to read, or cared about spelling for that matter).

So, if the Microsoft spell checker and the Google spell checker could talk, what would they say?

Microsoft would say: Listen buster! My dictionary contains all the correct words in the universe; either you comply or you don't. Got it?

Google would say: What do I know about spelling? I'm just trying to figure out a way to make more money from all this content I just indexed. Oh, and by the way, that word you just typed, it look awful close to this other word I see a lot in my index. Is that what you meant?

The problem with the Microsoft approach should be obvious, but it's important to point out that the Google approach is not without faults either.

The biggest problem with the Google approach is that to some extent it's a form of crowdsourcing. If your crowd can't spell, then you're toast.

Last, but not least, I'd just like to show you some pseudo code on how I implemented my spell checker:

Read all the city names in the file while keeping track of every variation we've seen and how many times we'v seen it (in a hash, dictionary, etc). Take the most popular spelling for each city, and call that the correct spelling.
To correct word X, calculate its edit distance to all the correct spellings. Chances are word X is really the "correct spelling" it mostly resembles.
Figure out what do if you've never seen X before.

And that concludes today's post. Now if I could just get Google to write grammatically correct sentences for me, I'd never have to worry about proof reading my posts ever again.

Disclaimer: I would just like it to be known that I'm in no way a MS hater; in fact, I'm somewhat of a MS fan. I'd also like it to be known that I'm not a Google fan boy; in fact, I'm a little afraid of them - they read my email, and I'm sure they're the new federal agency that's in charge of spying on citizens.

18 November, 2008

How To Be A Better Programmer

Glen Wagley, after reading my post on pair programming said to me: "I understand what you mean about cutting corners. But I don't do it anymore; it's not worth it".

That was a slap in the face. Here I am blogging about ways to be a better programmer, yet I still cut corners myself.

All of this lead me to think that the number one thing you can do to become a better programmer is to have courage and integrity: if you see code that needs to be refactored, refactor it; if you know you need to throw away some of your code, don't be hesitant and throw it away; write your unit tests first; write your documentation... you get the idea: do those things you know you should do even though you don't you don't always want to.

I guess what' I'm trying to say, if you're Mormon, CTR. If you're not Mormon, please contact your local LDS missionaries; they'll be glad to teach you what CTR means. Seriously, however, it's sad that we have sites like The Daily WTF were we laugh about the crap we, so called "pros", write. I know that the profession is new, but that should be no excuse; I don't see a site where surgeons laugh about all the times they've left scissors inside their patients (OK, I really haven't searched and there's probably one out there). I realize we all make honest mistakes, but we should draw the line somewhere.

Now that I'm done ranting, and while I'm on the topic, here's a list of other things you could do to become a better programmer (in no particular order): learn other languages & platforms other than the one you currently use; learn to write good prose (in more than one language?); learn to touch type (as Steve Yegge suggests here); read tons of technical books and even more non-technical books; learn how to market your ideas; be humble and learn from others (regardless of their title/position); use a text editor effectively...

I have a ton of these, but I'd rather hear from you now. What do you do to become a better programmer?

Update: Since writing this article, a good buddy of mine wrote a rather inspirational story on standing up for what's right. If you so desire, you can find said story here.

15 November, 2008

Brevity In Code

Yes, I'm going to try to convince you to write less code. But, this should be an easy feat; I have Shakespeare also advocating my cause:

Brevity is the soul of wit.

Shakespeare's truism is readily apparent in good code: code that expresses complex concepts in succinct statements is beautiful and worthy of admiration. For several reasons, all other things being equal, smaller code is better code.

Fewer lines of code mean less bugs and lower maintenance cost. There are, however, other less apparent reasons for which you should try to write as little as possible:

You won't fall pray to the temptation of writing code that's not immediately necessary. In other words, you'll be YAGNI compliant. :)
It is better to be thought a fool than to write code and remove all doubt. Joking aside, however, the more you write, the more likely you're to make a mistake.
You won't get locked into poor decisions. I was watching Abrams & Cwalina speak at PDC today. One of their comments really struck me: they said, and I'm paraphrasing, that refactoring and correcting design mistakes in frameworks is harder when you have more code than what's absolutely necessary. Specifically, they regretted adding a public constructor to the System.Exception base class. That's it! Just one public constructor too many! And although they wish they could change it, they simply can't. If you make a poor decision, you'll have to maintain it.
You're less likely to repeat yourself. Or, to phrase this positively, you'll be DRY compliant.
Finally, by writing less code, you'll avoid the temptation to over engineer your solutions.

Learning to be concise in code is hard; it takes effort and patience. You'll have to refactor ruthlessly and mercilessly, but it will be worth the effort. Even though you won't have much code to show off, you'll be proud of it.

13 November, 2008

Character Encoding Utility

Have you ever heard of tracer ammunition? Well, this post is kind of a tracer post. In a minute you'll see why.

I recently wrote a small utility to change the character encoding of very large files. I'm thinking about writing a GUI for my utility and making it freely available. Yup, that's right for free.

Except, before I go through the trouble of writing the GUI, I'd thought I'd find out if there's any interest in such tool or not. If you're here, reading this, that's enough to tell me you're interested. Now, if you really need it right now, email me and I'll be happy to send you the command line tool.

And now you know why this is a tracer post. See? I did learn something from The Pragmatic Programmer! Or from The 4 Hour Workweek. Take your pick; they're both excellent books.

06 November, 2008

Benefits (And Costs) of Pair Programming

I really enjoy pair programming, and so I thought I'd write about some of the benefits I've seen from pair programming.

i.
Pair programming increases job satisfaction. Believe it or not, I crave the interaction with other geeks; I need the mental stimulus that comes from talking with my peers.

ii.
Pair programing increases code quality and decreases bug counts. I take pride in my craft; I like to write good code. Unfortunately, under pressure I cut corners all over the place (telling myself I'll refactor later). However, when I have someone watching over my shoulder, it's a lot harder for me to write hacky code.

But having someone watch what I'm doing isn't just about the guilt trip. I also appreciate having someone immediately available to discuss ideas and to steer me away from potential problems. This alone literally saves me hours in wasted effort.

iii.
Pair programming makes better programmers. I can't even begin to tell you how much I've learned from sitting next to someone while they code. Now, I must admit, probably 90% of what I've learned has little or nothing to do with programming. But it doesn't matter! Believe it or not, programming is a social activity; to put it simply, good code cannot be developed in isolation.

iv.
Pair programming makes programmers "faster". I honestly feel I'm 3 times more effective when I'm pair programming than when I'm sitting by myself in my cube. This is probably because when I'm stuck, or I have a question, I have someone immediately available to help. Also, there's a bit of added pressure to be faster since every hour at the keyboard is really 2 man hours at the keyboard.

v.
Now, what if I told you that all of these benefits come with a fairly low cost? Well, the good news is that they do: according to this study, the cost is only about 15% increase in development time. Not bad, huh?

So, if you're still not pair programming, go talk to your boss and start practicing now. You won't regret it!

04 November, 2008

Easy Background Tasks in ASP.NET

Disclaimer: Admittedly, I got this idea from Jeff Atwood, but I think I've improved on it quite a bit.

At work I have an ASP.NET application that needs to check whether the Department of Treasury has published a new OFAC list (a list of people with whom we can't do business). If there's a new list, we need to parse it, store it, and make sure that none of our customers have popped up on the new list.

In phase 2, we plan to move this functionality into a windows service, but for now this is how I made this all happen in the background:

First, we start with the interface for our worker objects:


public interface IAsyncWorker
{
    //the name of the worker object
    string Name { get; }
    //return the next time the object should run
    DateTime AbsoluteExpirationTime { get; }
    //does the actual work the worker needs to do
    void DoWork();
}

Now, all we have to do is simply:

Add an implementation of our IAsyncWorker to the HttpRuntime.Cache.
When the cache item expires, call the worker's DoWork() method.
Add your worker item to the cache again so that it runs again.

If you follow the above steps you should end up with something like:


private static CacheItemRemovedCallback OnCacheRemoved = null;

protected void Application_Start(object sender, EventArgs e)
{
    AddAsyncTask(new BlacklistWorker()); //BlacklistWorker implements IAsyncWorker, of course
}

private void AddAsyncTask(IAsyncWorker worker)
{
    OnCacheRemove = new CacheItemRemovedCallback(CacheItemRemoved);
    HttpRuntime.Cache.Insert(worker.Name, worker, null,
        worker.AbsoluteExpirationTime, Cache.NoSlidingExpiration,
        CacheItemPriority.NotRemovable, OnCacheRemoved);
}

public void CacheItemRemoved(string workerName, object worker, CacheItemRemovedReason r)
{
    IAsyncWorker asyncWorker = worker as IAsyncWorker;
    if(null != asyncWorker)
    {
         asyncWorker.DoWork();
         AddAsyncTask(asyncWorker);
    }
}

I like this version better than Jeff's because it removes all conditional statements the CacheItemRemoved() method would have had if we had not created and IAsyncWorker interface.

This has been working great in our initial tests, but we still plan to move this to an external windows service at some point.

We're not worried about running out of threads (the thread does come out of the AppPool), since our task only needs to run every 24 hours. However, you might run into issues if you need your code to execute under a different identity than the threads in the AppPool.

This is a great technique: it gives you the ability to do async tasks with very little overhead.

01 November, 2008

Linq To Sql Debugger Visualizer

I usually try not to just post links to someone else's content, but the LINQ To SQL Debug Visualizer deserves an exception.

The LINQ debugger is great; it saves you the trouble of having to use something like LINQPad - another great product I wish I had written, and lets you see what your LINQ query will look like in SQL and what it will return.

If you're like me, you'll be shocked you hadn't heard about this VS plugin before.

Note: A big thanks to Alex Greenfield for showing me this tool.

29 October, 2008

Book Review: An Introduction To Bioinformatics Algorithms

Disclaimer: I'm no where nearly important enough for anyone to pay me to write this review. I have no conflicting interest in writing this review... unless you clicked on my amazon.com link and bought the book I guees.

So we're using An Introduction to Bioinfomratics (by Jones & Pevzner) in (surprise, surprise!) my bioinformatics class. In 7 years of higher education, this is the first time I can say "this textbook rocks!". Ok, of course it doesn't "rock"; it's a freaking textbook - not Larry Mullen on the "40" solo. Anyhow, it's a good book, and I'd get it if I was you; and I'd also make sure you click on my amazon.com widget there on the right before you purchase it.

Anyhow, let me tell you two things about this book:

Chapter 3 is a brilliant molecular biology primer. I've taken at least 3 semesters of bio and chem classes, and this book explained all of what I learned in those classes brilliantly and concisely (granted, I never got much out of college). In fact, after reading this book I felt like I finally understood molecular biology and could explain it to my grandma (which is harder than you think seeing how she's been dead for like 5 years now).
One thing that annoys me about algorithm books is that they always have lame sample of how the algorithm could be used. Well, not this book my friend! This book begins every chapter with a real molecular biology problem and then applies classic computer science college algorithms to the problems. This is what really makes me learn and understand the algorithms.

Honestly, if you're considering buying an algorithms books, this is one worth looking at. It's not comprehensive in the number of algorithms it covers, but the ones it does cover, it covers well.

27 October, 2008

Why partial classes are WRONG! (And why you shouldn't use them.)

I'm absolutely positive the fine folks at Microsoft, particularly the brilliant Anders Hejlsberg (C#'s architect), know of a good reason for partial classes. But I've thought about this for a while, and I'm not convinced there's a good use case for partial classes.

To be intellectually honest I must admit I can actually think of a few scenarios where partial classes make sense; let me tell just lay those out before we continue:

You should use partial classes when some tool (such as Visual Studio), not a developer, writes something like an ASPX page.
You should sometimes use partial classes when you write code that will be auto generated (with CodeSmith or such tools). The sometimes I'm thinking of is DAL code.
You should use partial classes when... ummm, when you, ummm..

I guess that was it; there's only 1.5 good use cases I can think of. I know there must be other reasons whey C# allows for partial classes, so if you know of one, don't hate me for being stupid and leave a comment instead.

The most prevalent argument for partial classes I've heard of is the one that says partial classes are great for organizing code. To this I respond: bull. Partial classes are not great for organizing code; if your class needs to be spread across multiple files just to manage its complexity, there's something fundamentally wrong with your class. Furthermore, when was the last time that having to look for something in multiples files made it easier to find? Finally, and this is my biggest complaint about partial classes, the compiler will never know if all the files present make the entire definition for your class - I'm sure you can see the problems this could create for you.

Mike Murray (a friend and ex-coworker at TGN) and I had a discussion about this very topic, based on his post about partial methods. While we were talking about partial classes, he mentioned that at TGN he had seen fulfillment code that used partial classes. The developer that wrote the code thought it would be a good idea to separate the SKU specific code into partial classes. At first this may look like a good idea, but again I respond: bull. There's a much better way to implement SKU specific behavior for fulfillment code; it involves interfaces and polymorphism, two fundamental principles of OOD.

And the list goes on and on. But for just about every argument in favor of partial classes, there's an OO principle that will do the same job in a much better way.

So, in summary, let me tell you why you shouldn't use partial classes: partial classes will only help you write crappy code. I don't know about you, but I'm already pretty good at writing crappy code; I need tools and frameworks that do the opposite.

Just as a footnote: I read an interview where Anders was asked about C#'s support for tooling. He mentioned that yes, he always keeps in mind tool support when designing C#. I'm guessing this is the true reason for partial classes, and why you should steer off them.

23 October, 2008

The Cost of Throwing & Catching Exceptions

Seeing how I recenlty wrote about how throwing exceptions is not a bad idea for, uhmm - you know, communication errors (believe it or not this is a topic of discussion), I thought I'd follow up with short follow up on the cost of throwing exceptions.

Those that argue against throwing exceptions for means of communicating errors, normally cite the "high" cost of catching exception: unwinding the stack looking for a handler, executing finally blocks, and finally returning control to the right place is allegedly a costly operation.

However, because of the way .NET exception handlers are stored in bytecode and because the code is optimized for the case in which an exception is not thrown (which makes sense considering that we shouldn't see TOO many exceptions):

The overall cost of a try...catch block that never handles an exception is a few bytes of memory - or at worst a few words - for the entry in the protected regions table. The only possible runtime penalty is the extra time to load those extra few bytes into memory. Since they are stored way away from the JITted bytecode stream, it's highly unlikely you're going to incur any additional cache-misses at runtime as a result of the handler too. Thus, the cost is essentially nothing.

However, the cost of not handling an exception is quite large:

The cost of not handling an exception that you should have may well be that your program crashes. This results in unhappy customers, a hit to your reputation and development time to go and do a bug-fix, which will almost certainly be much greater than if you had put it in there in the first place. Obviously, protecting code that can not throw an exception under any circumstances is a waste of your development time. But otherwise, it's best to be safe rather than sorry, safe in the knowledge that even if an exception never does occur in that bit of code, it's not really costing anything anyway.

(Both quotes from The Official Programmer's Heaven Blog on SEH performance)

One Important Consideration

Although the performance cost of structured exception handling is almost always minimal, there is one important consideration that you need to make when writing code inside try blocks: the compiler cannot optimize code inside try blocks.

Take Peter Ritchie's classical example; the following code


  int count = 1;
  SomeMethod();
  count++;
  SomeOtherMethod();
  count++;
  Console.WriteLine(count);

will get optimized to something that will effctively look like


  SomeMethod();
  SomeOtherMethod();
  Console.WriteLine(3);

The same code as in Figure 1, inside a try block, however, will not get optimized. The compiler will actually write code that will increment the "count" variable 2 times.

So, when writing exception handling code, try to be brief inside your try blocks. Other than that, you don't have too much to worry about.

21 October, 2008

StackOverflow's Dirty Little Secret

Seeing how you probably got here from my link on SO, I won't bore you with the details about what SO is and how awesome it is. For everyone else, however, let me just mention that StackOverflow is a community where developers go to ask and answer questions - that's it, but it's AWESUM!

Here's a list of ideas on how to maximize your StackOverflow experience. In it, you'll find SO's dirty little secret; take a look:

Do:

Put your foot in your mouth (often). Every time I answer incorrectly at SO the community let's me know it promptly and pointedly. It's kind of embarrassing to be wrong online, so you really learn from your mistakes. Not only does the shock factor help you remember the correct answer but, as an added benefit, you will no longer carry with you the baggage of incorrect knowledge in your head.

I strongly recommend goofing up (and correcting your self); for a good sample of how I've goofed up, see this question. I can guarantee you, I'll never get that one wrong again.

Ask subjective questions. Even though by definitions subjective quesitons do not have one correct answer, there's still value in asking these kinds of questions. I'd say about 50% of the SO population is smarter than I am (yes, I realize why about 50% of the population is smarter than me), so there's a lot of value in seeing what other developers think.

If you care to see one of the subjective questions I've asked, take a lookie here.

Don't:

Think that people know what they're talking about just because they have a high reputation score. And this is SO's dirty little secret: you can get a high reputation (which allegedly represents expertise) by just gaming the system. I'm not saying that the system is broken - there's a lot of people out there that reserve the high rep; there are also many that don't. The other side of this coin is also true: there's a lot of sharp people with low reputation scores (probably because they have better things to do than try to get a high SO rep score).

So, again, don't judge people just because of their rep score.

Trust answers just because they have a lot of up-votes. Even though there's a lot of smart people at SO, there's also just a lot of people, and sometimes the herd mentality takes over and answers get up-voted for no other reason than the fact that others have up-voted it.

If you're a developer and you haven't joined StackOverflow yet, I strongly recommed you to head over there and start asking and answering questions.

16 October, 2008

How To Write Self-Documenting Code

If I got promoted every time I spotted code like the one that follows, I'd probably be the President of the United States by now. Take a look at Joe's code here:


public static void main(String[] args) {
        String db = args[0];
        String uid = args[1];
        String pwd = args[2];
        
        String url = "jdbc:postgresql://localhost/" + db;
        Connection conn;
        try
        {
            conn = DriverManager.getConnection(url, uid, pwd);
            String query =  "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " + "B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " + "AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " + "< 20000 ORDER BY 3;";
            
            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(query);

            ResultSetMetaData resultMetaData = rs.getMetaData();
            System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
                    resultMetaData.getColumnName(3));
            while(rs.next()){
                System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
            }
            rs.close();
        }
        catch(SQLException sqle){
            System.out.println(sqle.getMessage());
        }   
    }

Can you tell what's going on in the code above? You probably can, with some effort - but why should code so simple be hard to understand? How come the code doesn't tell me what it is doing? Plus, when I'm maintaining this beauty 10 years from now, I'll be hunting Joe down and pulling his fingernails every time I have to debug it - seriously, this code sucks.

Even though just about every programmer I've worked with knows better than to write code that's hard to read, under pressure we all get sloppy. In fact, I was just looking at some code I just wrote a couple of weeks ago as inspiration for this post.

So, now that I've gotten that off my chest, let's see if we can clean this up some. The n00b fix would be to pepper in some comments like this:


public static void main(String[] args) {
        String db = args[0];
        String uid = args[1];
        String pwd = args[2];
        
        String url = "jdbc:postgresql://localhost/" + db;
        Connection conn;
        try
        {
            //get the connection to the db
            conn = DriverManager.getConnection(url, uid, pwd);
            String query =  "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " + "B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " + "AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " + "< 20000 ORDER BY 3;";
            
            Statement stmt = conn.createStatement();
            //execute the query defined above
            ResultSet rs = stmt.executeQuery(query);

            ResultSetMetaData resultMetaData = rs.getMetaData();
            
            //print column names
            System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
                    resultMetaData.getColumnName(3));
            
            //print every tuple
            while(rs.next()){
                System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
            }
            rs.close();
        }
        catch(SQLException sqle){
            System.out.println(sqle.getMessage());
        }   
    }

That still looks terrible, however. Not only that, the comments aren't really that useful; they don't tell me anything I couldn't have figured out by reading the code. In fact, I think these types of comments make the code harder to read since they interrupt what's really important - the code. But that's kind of a side note: the real problem here is that comment's don't give me any insight or explain the intent of the code.

Alright then... What do we do now? Well, how about something like this:


public static void main(String[] args) {
        String db = args[0];
        String uid = args[1];
        String pwd = args[2];
        
        String url = "jdbc:postgresql://localhost/" + db;
        Connection conn;
        try
        {
            conn = DriverManager.getConnection(url, uid, pwd);
            
            //we're hard coding the query here because the db is out of SPs - joe
            String query =  "SELECT A.city, A.zip, st_distance(A.longlat_point_meter, " + 
                            "B.longlat_point_meter) FROM utzipcode A, utzipcode B WHERE B.city='MENDON' " + 
                            "AND A.city<>'MENDON' AND st_distance(A.longlat_point_meter, B.longlat_point_meter) " + 
                            "< 20000 ORDER BY 3;";
            
            ResultSet rs = ExecuteQuery(query, conn);
            PrintQueryResults(rs);
            rs.close();
        }
        catch(SQLException sqle){
            System.out.println(sqle.getMessage());
        }
        
    }
    
    private static ResultSet ExecuteQuery(String query, Connection conn) {
        try{
            Statement stmt = conn.createStatement();
            return stmt.executeQuery(query);
        }
        catch(SQLException sqle){
            return null;
        }
    }
    
    private static void PrintQueryResults(ResultSet rs){
        try{
            ResultSetMetaData resultMetaData = rs.getMetaData();
            System.out.println(resultMetaData.getColumnName(1) + ", " + resultMetaData.getColumnName(2) + ", " +
                    resultMetaData.getColumnName(3));
            while(rs.next()){
                System.out.println(rs.getString(1) + ", " + rs.getString(2) + ", " + rs.getString(3));
            }
        }
        catch(SQLException sqle){ }
    }

Ahhh... much better. I can tell what the code is doing in one brief overview - and most importantly, the code itself is telling me what it's doing!

Also, the code is now easier to debug and maintain because each method is only a few lines long; this means I'll know exactly where the code may be having problems next time it breaks by just looking at the stack trace. Finally, look at that comment about the query - that really tells me something I could have never known unless Joe had told me.

So there you go, that's self documenting code in 3 easy steps. Please let Joe know how much you appreciate learning from his mistake by dropping a comment. It'll make him feel better.

11 October, 2008

when to throw exceptions

have you ever heard the saying "exceptions are for exceptional circumstances"? well, i have, and until recently, i was a firm believer of such thoughts. i'd normally code for every "exceptional" condition and try to do everything i could to avoid having to throw an exception.

however, just a few days ago, i came across this statement from Krzysztof Cwalina (program manager for the CLR team at MS):

One of the biggest misconceptions about exceptions is that they are for “exceptional conditions.” The reality is that they are for communicating error conditions. From a framework design perspective, there is no such thing as an “exceptional condition”. Whether a condition is exceptional or not depends on the context of usage, --- but reusable libraries rarely know how they will be used. For example, OutOfMemoryException might be exceptional for a simple data entry application; it’s not so exceptional for applications doing their own memory management (e.g. SQL server). In other words, one man’s exceptional condition is another man’s chronic condition.

Cwalina then goes on to say that exceptions should be used for common errors such as (1) usage errors, (2) program errors, and finally (3) system errors. it seems to me this in direct opposition of the "exceptional exceptions" mantra.

to be honest, i never understood why exceptions should only be used in "exceptional" circumstances. it's hard to define "exceptional", and that's exactly the point Cwalina makes in his quote.

i realize, however, there's circumstances when it doesn't make sense to throw exceptions; for example, why throw a DivideByZeroException when you can easily check for the condition an appropriately terminate?

but for the most part, exceptions provide an objected oriented way to communicate errors to clients. i think i'm switching from the "exceptional exceptions" camp to the "let's use exceptions to communicate errors" camp. what about you?

Update: I have a new article on the cost of throwing exceptions. If you now think that throwing more exceptions is a good idea, you might want to check this article too.

07 September, 2008

everything is an object

over at stackoverflow someone asked "how do i thinki in OO?"

saint_groceon answered:

Some conceptual advice:

"has a": A dog has a tail; therefore the Dog class should have a member "tail"

"is a": A poodle is a dog; therefore Poodle should either be an instance or derived class of Dog

Thinking this way really sped up my ability to design object oriented structures. Otherwise, starting out it's easy to get twisted up and start adding members to classes that should actually just be instances, or vice versa.

to which i responded:

saint_groceon's beginner advice although correct, can lead to trouble:

Objects are not just a collection of attributes and behaviors. If that was the case, dogs and cats would be indistinguishable from each other, as they both have eyes, mouths and legs, and they both eat, sleep and play. In fact, this kind of "object" is almost no different than a C struct.

Furthermore, the type of thinking described in saint_groceon's post also leads to other problems: If your "Duck" class has a "quack" method, what happens when you need to implement a rubber duck that does not quack? (you may recognize this example from Head First Design Patterns)

I agree with the fact that objects are usually nouns from the domain's language. However an object is more than that: an object is anything capable of providing a limited set of useful services. And then, with these objects "we decompose the complex world around, and assemble those objects in various ways so that they can perform useful tasks on our behalf" (David West).

West nails the definition of an objcet; the most important characteristic of an object is it's ability to be composed with other objects to create useful things, much like all of our body parts (properly assembled) make us useful creatures. So, in some respect, everything is an object!

Edit: I realize that I said little about How To Think OO, and a lot more about what OO is not. I apologize for this, but I'm no OO master, and I'm barely at the stage where I have a feeling what OO IS NOT... I'm not at the point where I can instruct on OO.

This post was mostly an exercise to help me gather my thoughts about what I've learned so far. I hope you'll find it useful nonetheless.

29 August, 2008

delegation in a load balanced enivornment

last week i tried to get impersonation and delegation working for a couple of wcf services and a website. unfortunately, getting this to work took me a lot longer than it should have, so i thought i'd share some of the things i learned.

in iis 6.0 and earlier, you can only get delegation to work in a load balanced environment, if you do the following:

make your app run under a domain account

set the proper SPNs (service principal name) on the account:

HTTP/appserver domainName\accountName
HTTP/appserver.fullyqualified.name domainName\accountName

trust the account (and the machine) for delegation in AD
run aspnet_regiis with the -ga flag and the domain qualified user name
finally add your domain account to IIS_WPG

i'm sure most systme engineers know this already, but as a developer, it's the first time i've come across this.

it's also worth mentioning that under iis 7.0 you do not need to do any of this: you can run your apps under NetworkService, and still have them load balanced.

what's funny is that after we finally figured out how to get delegation to work for our load balanced applications, we decided that it was way too much setup and we would move to iis 7.0.
in a way, a lot of the time i put into this was wasted, but i learned a ton about security and delegation/impersonation while doing the research to get this to work. i hope it'll save you some time. if it does, or if you have any other questions, please let me know.

17 July, 2008

the stateless web: it will always bite you!

i spent good part of the day building a little control to upload documents for our crm tool today. because i wanted to give the customer the ability to decide how many files she wants to upload at runtime, i put a little "add another file button" on the control, which would dynamically let the user add another file.

as soon as i had this dynamic functionality built, i threw it on the control where it was supposed to go (yes, my control was nested inside another control, which was then placed on the page where the upload functionality was needed). unbeknown to me, the person who had written this page was also dynamically loading his control; unfortunately, he was only loading the control on the pageload, and not on the postback.

for about an hour i kept trying to figure out why the button onmy control wasn't firing its events. finally, in frustration, i decided to step through all of the code: as soon as I saw the pageload i realized what was going on: my control wasn't firing events on the postback because it wasn't even there when the page submitted!

most developers know that web-apps are stateless, but i think we fail to internalize what this really means. we assume that once we declare a variable, it will remain there as long as it's in scope. i'm not trying to blame my co-worker; in fact, i've made this same mistake several times (which is probably why i recognized it almost immediately). i think the problem is we first learn to program in statefull environments, and then make the switch to the web without fully understanding the implication of moving to this new environment.

anyhow, i'm somewhat upset over the fact that i didn't get to finish the control today. i'm planning on getting up early tomorrow, coming in before anyone can bug me and getting the control done - that is until i run into the next nuisance from the stateless web, i guess.

13 July, 2008

google analytics statistics

i was just reviewing some stats from google analytics, and just want to summarize some things i found interesting.

i'm getting about 30 unique visitors every month; i have to admit, however, that i'm probably 3 of those. most of my traffic is organic and comes through google search (surprise, surprise!).

my most visited blog entry, is the one i had on particle swarm optimization in c#. my guess is that a whole bunch of lazy college students are looking for code so that they won't have to think in order to do their assignment. the traffic that this post gets is so much grater than anything else i've written, that i'm thinking about catering to lazy C.S college students ;)

the other blog entries that get quite a bit of traffic, are the ones on biztalk tips; i can see why this is since there aren't actually that many people that blog about biztalk.

from looking at the reports, however, i'm starting to learn the importance of writing so that you'll come up in search queries. i have an entry, for example, titled "do i really need an orchestration?"; a much better title would have been "when to use an orhestration", or something like that.

i'm excited about the little traffic i'm getting: there's some dude in sweeden that spent 15+ minutes on my site - i didn't even know that i had 15+ minutes worth of material on my blog! even though it's hard to find time to blog, i'll increase my efforts to do so. like jeff atwood, i've found that it truly is almost as much fun to blog about coding as actually writing code is.

02 June, 2008

usu's ezportal security assessment

about 3 weeks ago i finished a security assessment for a new application the university of utah is planning on rolling out so that students, teachers, and staff to can manage all of their usu data.

our analysis included looking at the overall application architecture, looking at the coding, and a test to try to exploit vulnerabilities.

i thought it was interesting that every major hole in the application, came from the developers trusting the libraries and subcomponents they were using. for example, the developers were using an open source rich text editor (so that users could upload nice looking content without having to know html), that could easily be exploited to upload malicious code, or render content from any other site (yes, as in a xss attack).

so, the mantra of "find the dependencies -- and eliminate them", turns out to be true for security problems too.

06 May, 2008

particle swarm optimization (pso) in c#

i was going to give a presentation at "code camp" on particle swarm optimization (pso), but unfortunately was not able to do so because of circumstances beyond my control.

so, i've decided to post my c# implementation here so you can take look at it and play with it. the code graphs the movement of the particle swarm, so it's cool to see how the different pso parameters affect the movement of the swarm.

the swarm is not divided into neighborhoods and each member only knows about their local best and the swarm's best value to date. if you want to change the code so that i does neighborhoods, be my guest.

the code is really rough, especially in the way the UI runs. also, there's no way to dynamically change anything; if you want to make any sort of change you have to find the pertinent code, recompile it and rerun it (ie. there are no config files, no input paramenters, nothing).

so, if you think my code sucks, please let me know!

09 April, 2008

biztalk database problems

when we started implementing biztalk at tgn there was hardly any documentation on it, and so we had to resolve all of our problems through trial and error, or through help from blogs.

we had two database problems early on that made us wish someone would have warned us about potential problems:

by default, the biztalk messagebox is set to autogrow by 1mb. this is a problem, because if you have to autogrow your db, it's likely that you'll need to grow it by more than 1mb and so the 1mb is consumed promplty and the db has to autogrow once again. we had problems with this twice: the first time we just saw the cpu usage spike every time we autogrew; the second time the effects were worst as we we were running out of space on the disk and the server became stuck autogrowing and rolling back once it realized there wasn't enough space. so, if you're gonna autogrow your BizTalkMessageBoxDb database, autogrow it in big enough chunks to avoid performance problmes.
the PurgeSubscriptionJob_BizTalkMsgBoxDb and TrackedMessages_Copy_BizTalkMsgBoxDB jobs need to run every minute to keep biztalk running in good shape, but unfortunately they are not setup to run by default. we only found out about this when our applications started running absurdly slow and all of our clients complained about time outs.

perhaps if we had had a dedicated dba, and admin for our system, we would have noticed these problems before they caused issues. hopefully, you won't have to suffer through these yourself.

04 April, 2008

refactoring code with too many conditional statements

i was once talking with the IS director of a certain company about the tell tale signs of bad code. i mentioned a few of the items that raise flags for me: code duplication, low cohesion in classes, etc.

the director, which happens to be a really good coder even though he's been in management for several years, mentioned that when he sees code with too many if statements (or any conditional branching for that matter), he knows the code could be cleaned up. after hearing his statement, i started considering on how to clean code that has too many if statements.

last week i had the opportunity to help a coworker refactor some code that looked something like this:


public class CreditCardService()
{
  public void DoWork()
  {
       if(creditCardCode = "VISA")
       {
            //about 10 or so lines of code here
       }
       else if (creditCardCode = "MC")
       {
            //about 10 or so lines of code here
            //the code is very similar in all branches
       }
       //a few more branches for the other credit cards we support
       //...
  }
}

the obvious problem with the code above is that it's not making good use of a basic programming principle: polymorphism.

we could clean up the above code by:

writing a base class that implements the common functionality across all credit cards
writing children credit card classes that inherit from the base class and implement what's different in each credit card
writing a credit card factory that return the right implementation to the CreditCardService class

so, we should end up with code looking something like this:


public class BaseCreditCard
{
     //all common fields go here

     public void DoWork()
     {
          //all common functionality goes here
     }
}

public class VisaCreditCard : BaseCreditCard
{
     //all fields pertaining to VISA go here
     
    public void DoWork()
    {
         base.DoWork();
         //visa specific functionality goes here
    }
}

public class CrediCardFactory()
{
     public static BaseCreditCard GetCard(string cardType)
     {
          //return appropriate credit card child class
          if(cardType.Equals("VISA")
               return new VisaCreditCard();
          //more code like the one above
     }
}
public class CreditCardService()
{
     private CreditCard card;
     public CreditCardService(string cardType)
     {
          card = CreditCardFactory.GetCard(cardType)
     }

     public void DoWork()
     {
          card.DoWork();
     }    
}

ideally, our factory will be really smart about picking the right credit card type, so that all of the if statements necessary to pick the right credit card will all be contained in it.

using polymorphism in this case makes the code much easier to read, and way easier to maintain as we now have all decision logic in one place, all common logic in another place, and type specific logic in its own place.

28 March, 2008

simple dependency injection

the idea behind dependency injection has been around for a while, and i believe the term "poor man's dependency injection" is popular as well, but i thought i'd share the poor man's method anyhow.

it's common to find that n-tiered applications (especially applications with a persistence layer and a "business logic layer") although layered, are usually tightly coupled.

one nuisance that this coupling creates is the inability to write unit tests that only test the business layer. even though you can write tests for the business layer, these cannot run without calling the persistence layer, and thus you end up with slow tests (because of db calls) and redundant tests (assuming you have tests for your persistence layer).

a very simple method (the poor man's method, of course) to solve this coupling problem is to:

code to an interface (in the case of the service layer, make sure you're not calling a specific implementation of your persistence layer, but an interface).
overload the constructor(s) for your business logic class so that they also take a specific implementation of the interfaces it depends on.

if you do what i've described above, your business layer should end up looking something like this:



public interface IPersistenceLayer
{
    void DoWork();
}

public class BusinessLogicLayer
{
    IPersitenceLayer myDataStore;
    public BusinessLogicLayer(IPersistenceLayer someImplementation)
    {
         myDataStore = someImplementation;
    }
   
    public void DoWork()
    {
         //business logic here
         myDataStore.DoWork();
    }
}

with the above code you can easily write tests for your business layer that just take a dummy IPersistenceLayer and return mock objects. and so now we have fast tests that don't require any database setup and/or maintenance.

there is however, one obvious problem with the code presented: why should the clients to the business layer have to know about the layer's dependency? the answer is they shouldn't and that's why we keep all the default constructors, and just have those set the IPersistenceLayer reference to the commonly use implementation of the interface.

although almost trivial, this method of dependency injection is appropriate for simple scenarios and provides much flexibility.

24 January, 2008

linq and lambda expressions

i'm aware that i'm about a year (or more, probably) behind on this, but the new c# lambda and linq features are awesome!

i wrote my first dlinq query today, and i'm now of the opinion that linq will literally change the way we code. i'm aware that i don't recognize all of the repercussions that come from linq, but just the very basic queries i did today, literally changed some more of the fundamental programming paradigms i held.

as for lambda expressions, i must say that even though the concept is not new (in fact, anonymous delegates and types are almost as old as programming), there's something to be said for how elegant lambda expression are: what would be ugly (or sometimes even impossible constructs) in c# are extremely simple and easy to read statements thanks to lambda expressions.

i'll keep posting on linq and lambda expressions as i learn more. but if you haven't had a chance to learn about them, i strongly recommend you to.