Archive for March 2007

Walking season … and SPAM

2007/03/31

Today I have started the walking season 2007. I did some gentle preparation during the week – few short, 4-5 km strolls around the neighbourhood, but it was today when I really started. It was beautiful day in Ottawa – sunny, temperature around 8-10, so I took off and did 12 km loop through Westboro, down south and around Dow’s Lake up to downtown. Just fantastic. The companion on the road were Security Now! – I was behind few episodes, but I managed to listen to almost full 3 episodes.

Interesting one was about Spambots – fleet of Zombies, remotely controlled that are used to send out spam. Conservative estimates are that from around 600 milion PC’s, about 150 millions are infected zombies – without their owner’s knowledge or consent, of course.

Steve was speaking about the way how to detect from email headers that the email was spoofed. Basically, what you need to investigate is where the chain of Received headers which contains IP address of the sender is broken – that determines the point where the spammer connected to some SMTP server and send out message, all other headers beneath can be spoofed. I know this is not best explanation, but it is pointless to rephrase what Steve explained very nicely – listen here or read the notes.

So while walking and listening that,  I have got an idea – with all the social websites and Web2.0 communities there may are realistic way how to cut down the spam wave that is everywhere around us (it is estimated that over 80% of all email is spam).

Key ingredients of the solutions are:
1) – owners of the zombie machines who do not know about the “service” their PC’s are providing. It is not easy to identify these machines and they may not know what to do
2) – who suffer the spam effects (and should be motivated to fix it) are the ISP’s of these zombie users, because it is their bandwidth and their IP ranges who get blacklisted
3) – those who would happily cooperate is everybody who hates spam (all of us, minus the spammers) and would not mind to do something – as long as the participation would be easy …

What I was thinking about a Web site/ Web service – something like where you can forward the spam you get which ends in your Junk folder or bounces back to your address. The service would analyze the headers and extract the IP’s of zombies – and keep building and maintaining the list. Extraction is not that hard and doable with nice Perl/Python/Ruby script :-). After a while, it would lead to a list of IP’s with activity record attached to it (which would allow the IP to drop off the list) …

Now imagine that the ISP’s could register themselves and enter the range of their IP’s. They would get back subset of the Zombie list residing in their own address space – and deal with them – for example notify users, ask them to download some malware removal program or even sell some additional service. It clearly must be ISP to deal with the Zombie owners, because they are only one who has access to their identity and it is in their interest to limit amount of bad things origination from their network. It is not only about spam – infected machine that sends spam can as easily and likely be part of DDoS attack, which is quite different legal category of problems. Either way, at the end, the result would be less active zombies around.

If the really big email services such as GMail and Yahoo – or big cable/DSL providers would participate and supply their own filtered spam (or even filtered list of Zombie-candidates) the database would IMHO start to provide valuable data very soon.

What do you think ?

How to access GMail with multiple POP clients

2007/03/30

Man lernt nie aus – as the German say – one never stops learning. Thanks to comment found on my blog I found out that there is very nice solution problem of disappearing emails (after you download it with one POP client, other POP client would not get it). It is even documented in GMail help:

http://mail.google.com/support/bin/answer.py?answer=47948&topic=1556If you’re accessing your Gmail using POP from multiple clients, Gmail’s recent mode makes sure that all messages are made available to each client, rather than only to the first client to access new mail.

Recent mode fetches the last 30 days of mail, regardless of whether it’s been sent to another POP client already.

If you sign in to Gmail using your Blackberry, you’re signed in to recent mode automatically. For all other POP clients, replace ‘username@gmail.com‘ in your POP client settings with ‘recent:username@gmail.com‘.

Thanks to unknown reader who made me aware of it.

Upgrading Java and Open Source libraries

2007/03/28

As Murphy said – you cannot do just one thing. Thanks to JDK 1.4 entering end-of-life support period (and 1.3 being unsupported for some time), we need to move few applications happily running on these antique Java platforms to at least Java 5 and gain few more years of uninterrupted Sun-supported life.

Upgrading the applications was as expected, piece of cake – as long as you have solid build environment with build scripts and everything – which we certainly do. Ant runs on Java5 exactly as well as on 1.4 (maybe faster, but I did not measure), Tomcat was the same no-brainer thing and switching the Eclipse required only selecting different JRE in preferences window. Codewise, the only change required inside the application code was renaming few identifiers to something else than enum, which happens to be keyword in Java 5.

So far so good. What was not so easy were third party libraries. Our applications are using Jakarta Commons, Struts, Torque and few other great pieces of open source software. I like to compile everything from source, so I tried to rebuild 3rd party jar’s. No luck – about one third did not compile (I have to remind you that these were versions from late 2001/early 2002). Unlike with our own code, it would make little sense try to change the 3rd party code – certainly not when there are many newer versions for each of the components already available anyway. Therefore – upgrading components is way to go. And here the fun begins.

The idea behind components such as Jakarta Commons is that they should be used in other components. And indeed – they are used. With little luck and enough components, you will end up with one jar being used/required by two, three or more other jars and over time, nice Web of inter-dependencies will somehow grow inside your little app. Nothing wrong with that. The trouble is that as the libraries mature and evolve, new versions are added, which add new features. In this process, sometimes the component’s APIs are changed, some features are deprecated and eventually removed – and backwards compatibility slowly disappears. Because the different open source projects are evolving with various speed, you will eventually end up with requirements of two different and incompatible versions of the same component at the same time.  This is usually not an issue when you are developing new project, because the time span is too short so that the differences in required versions are not very far apart. It does however cause problems when you are upgrading and working with enough components over a time span of several years.

There are two possible strategies: many small steps or one big jump. The small steps means that you try to upgrade as little as possible, keeping the changes in API minimal – just to achieve the goal (compilable with Java 5 in our case). The problem with this approach is that it is quite time consuming process, because of the cascading updates. Imagine e.g, that upgrading A1 to A2 requires upgrade of B1 to B2 (because A depends on B), but because B depends on C, you need also upgrade C1 to C2. And because the X also depends on B, after upgrading to B2 (unless you are lucky) you may be forced to upgrade X as well . It iterative “snowball” process and easily can take much longer than expected. Another problem is that after spending all that time, you have application that is still running on out-of-date components. Whether this is a problem or not, depends on number and size of business changes required.

The big jump strategy is to get the latest versions of everything. Chances are very high that it will compile and run OK – as long as all components are still active projects. The typical catch is that you have just updated the B1 to B7  and this is fine with all components except X, which was kind of dormant inactive project for last 3 years and still requires B3 as dependency. If the API of B3 and B7 differ enough, you may not be able simply use B7 without diving into unknown code and fixing it. The other catch is that your own application may be incompatible with latest components API of B7 and needs changes. This is better situation because you a) know the code b) it may be better code (if you did a good job) than the code of component X – but in both cases you may need to spend time on something that has no direct link to what you wanted to do in a first place: doing just that one thing 🙂

Ode on Eclipse :-)

2007/03/27

After almost two years of uninterrupted C# and .NET development, I have got back for few days to Java world. One of our old customers asked us to do few enhancements and changes in the application we wrote. And because we always stand behind our work, always do answer the call for help and do support our customers as long as they want to be supported, I have squeezed-in few days worth of work into pretty packed schedule. So here am I on a short detour into world of class files and jars …

The application is based on Java and its roots are going back in 2001. It is Web application, using lots of open-source libraries: several Jakarta Apache Commons modules, Struts (in it’s first incarnation), Freemarker, Apache Torque and JSP with Tiles. Everything running on Tomcat, built with Ant and developed with Eclipse.

It was such a great feeling to work with Eclipse again. It felt just right, right away – a nice, well polished, extremely user friendly and pleasant experience. So good that it caught me by surprise – what was it that caused this great feeling ? It was not about the language – I actually like C# (from a purely language perspective) maybe even better than Java, because the features allow more natural coding (like properties instead of getter/setter methods). It is certainly not the Java based web platform – using the combination of Struts tags with JSP 1.x is not any better than developing for ASP.NET, quite the opposite … So it must be the IDE.

The problem of Eclipse in the past used to be memory requirements and speed. Thanks to hardware evolution and Vista hardware requirement push, typical today’s machine is 2 GB, dual core box an on this box Eclipse is just flying. On the same platform, the Visual Studio 2005 does not have any issues either, but it just does not feel the same way. I really enjoyed small things in Eclipse – such as great and rich refactoring support (out of the box, without add-ons), helper methods (e.g. implement interface) and the way how the source code management systems are seamlessly integrated into the system. It is not that these features would be so special, unique and unavailable elsewhere – they are just done right.

One thing that I really admire about Eclipse (and wish I had it in VS 2005) is the “fearless configurability”. I have had many times temptation to try out some new addin to VS, but did not do it just to avoid the possibilty that it could do something nasty to my development box, registry, Windows or all three of them. I just do not trust the installation programs that like to put strange DLL’s into my Windows/System32 directory and write into registry.. In Eclipse, there is no install required – all you need is to unzip files into proper location and restart the IDE. The way how the platform work is very nicely designed and (after some learning period) actually very logical and quite transparent. I feel free to to try out everything I want. For example Ruby Development Tool plugin :-).

Managing knowledge portfolio

2007/03/25

There are few technical books which you enjoy first time you read them, remember the experience and keep coming back. One of such classic books is The Pragmatic Programmer – from journeyman to master – by David Thomas and Andrew Hunt. If you have not read it, get it, it is worth every cent. Since the book was written – ages ago, back in 2000 – it did not loose anything of its freshness and value – which one cannot say about most of technical books written earlier than 2-3 years ago. Inspired by the success of their initial creation, Dave and Andrew started the website and continued writing whole Pragmatic Programmer series of books.

Book contains 46 tips and explanations and the tip #5 deals with knowledge portfolio – one of developer’s main assets. Unfortunately it is expiring asset so you have to work hard to keep it recent and in good shape. Dave and Andy recommend using the same strategy as successful investors in finance markets:

  • invest regularly – even if small amounts, but create habit investing
  • diversify to embrace change and to widen the reach
  • manage risk – combine high-reward high-risk with low-reward low risk
  • buy low, sell high – try to identify future hits before they become popular
  • review and rebalance – what was true last year, can be different now

They also recommend few practical steps and goals how to achieve the healthy and well balanced portfolio, such as:

– learn new language every year
– read a technical book every quarter
– read nontechnical books too
– participate in local user groups
– experiment with different environments

Which got me thinking about how am doing wrt these suggestions. I guess I have no problem with “read a book” part, either technical or non-technical 🙂 – thanks to Safari I consume 1-2 technical books a month and thanks to e-books even more non-technical ones :-). What I have not done in about 3-4 years was learning a new language. I think I have not learned really new language since C# in 2002. One could argue that for Java programmer, C# is not really such a new language – in that case I have not really updated my language toolset since I started with Python back in 2000. That is terrible ! Obviously, some action is required – I am overdue at least with 4 new languages. Because it is highly impractical to tackle 4 problem at the same time, for this year I will be adding two.

But which programming languages ? Key here is the word pragmatic and the investment strategies above – more specifically, combining high and low risk, buying low and diversifying. I want to explore new territories, but stay away from esoteric and pure-research areas (e.g languages like Janus, Limbo or Concurrent Haskell :-)). After some research I picked two candidates: Ruby and Objective-C for the year 2007.

There are few reasons why exactly these two: they are both similar (with Smalltalk and functional programming/OO heritage) and opposite – one is multiplatform, interpreted, very high level, the other is (practically) single platform – OS-X language of choice, compiled and up to version 2.0 not even garbage-collector enabled. One has pretty clean and nice syntax, other is – well – simple, but quite ugly. And so on.

I have started with Ruby yesterday – found a good book on Safari and started to read and play with the code. I will get back to it when my head stops spinning – and after I get some non-trivial programs done to get some real life feeling of the language. From the book reading point of view, it is pretty amazing what I have found in few hours :-).

Btw, the version 1 of the Ruby boom by the same two authors is available online.

Thought of the day

2007/03/24

” … the main reason why most software is not open-sourced is pure embarrassment …”

Leo Laporte in Twit episode #90

Converting eBooks to Sony Reader format

2007/03/22

Since yesterday, I made nice progress in solving my issues with content creation for PRS500 and it’s readability. There are several ways how to proceed:

The simplest is to download Book Designer. It is free for non-commercial use and current version 5.0 Alpha does the job very well. It allows you to load source in text, HTML, Lit, PDF, PalmDoc (prd/prc), rb and few other formats and process them into native LRF format – plus few others I do not really care about. The result is nice, readable LRF file with three sizes, nicely formatted, with metada. As added benefit, because the author is Russian, the program does not assume that English alphabet is the only one in existence and allows to select encoding. The result is quite good – most of the extended characters from Czech/Slovak are there, some are missing and displayed as space (namely ř,ě,ľ …) but it is readable. What is maybe better option is that with English as language and default encoding, the software “downscales” the extended characters to closest English pairs: ř -> r,ě -> e – which results in familiar computer Czech/Slovak. I am very comfortable with option 2, and will work on getting correct font for #1.

If you want to read more about the program go here and here – as long as you can read Russian. I found out that even after 22 years of not using Russian, I can still reasonably well read and understand it …

The program is useful for creating Palmbooks as well as Microsoft Reader Lit book. I did not try that yet. User interface of Book Designer is not exactly Apple-made – extremely technical,  geekish – looking like designed  by engineer for engineers 🙂  – here is how it looks like.  But it is the functionality that counts. Thank you – whoever made this possible :-).

If you want actually understand how the LRF format works and how the book is formatted on very low level, read the format spec and then download the BBeBinder from Google Code. It is C# 2.0 project, which aims to create something similar that BookDesigner – but as opensource, GPL-ed application. It is very early version (0.2) but in the true spirit of opensource, it actually (mostly) works. I have downloaded it and looked inside the code. The solution contains BBeB encoding/decoding library and main program, which was nicely designed with extensibility in mind. Using plugins, it allows to add additional input data formats (currently works well for text files, some HTML and I had mixed results with others).

If both of my projects were not in C# space (which is causing me being slightly over-see-sharped at the moment), I would not mind volunteering few hours into this – to make sure that Central European encoding is handled OK :-).