Free Software

Professional Software Development

Since 2006 I have been working (with some interruptions) as a freelance software engineer and data scientist in Berlin, Germany. My areas of expertise include Python, Ruby and Ruby on Rails, Haskell, Java, data science and data engineering (Pandas, NumPy, Matplotlib, etc.), database integration (SQL, PostgreSQL, Active Record, Redis, etc.), search engine integration (Solr), as well as text classification and machine learning (scikit-learn, OSBF, etc.).

Current clients include T-Systems. Among my former clients are Dropscan, Freie Universität Berlin, Liquid Democracy, Project A, Rocket Internet, and Zalando.

Free Software Projects

I have contributed to various free software projects, and have also written a few (mostly small) programs of my own.

I'm a contributor to TOML, a simple and easy to read file format for structured data.

Lytspel

The idea: Spelling should be fun, not a burden. The traditional English spelling system is afflicted with exceptions and conflicting rules, making writing and reading texts unnecessarily hard.

Lytspel is a proposal for reforming the English spelling in order to make it strictly follow the alphabetic principle. The alphabetic principle means that there is a predictable relationship between written letters and spoken sounds. When you see a written word, you know how to pronounce it (even if you don't know the word itself), and vice versa.

In addition to the reform proposal, there is an online converter that translates traditional spelling into Lytspel. The converter can also be installed locally as a Python package. It uses a comprehensive dictionary (CSV file) with more than 100,000 entries. The full source code can be found on GitHub.

Here is a short example:

Dhe North Wind and dhe Sun wur di'spiuting wich wos dhe strongger, wen a traveler caim a'long rapd in a worm cloak. Dhay a'greed dhat dhe won hu furst su'xeeded in maiking dhe traveler taik his cloak of shood bee con'siderd strongger dhan dhe udher. Dhen dhe North Wind blu as hard as hi cood, but dhe mor hi blu dhe mor cloassli did dhe traveler foald his cloak e'round him; and at last dhe North Wind gaiv up dhe a'tempt. Dhen dhe Sun shynd out wormli, and i'meediatli dhe traveler took of his cloak. And so dhe North Wind wos o'blyjd tu con'fess dhat dhe Sun wos dhe strongger ov dhe tuu.

Here is the same paragraph written in tradspell. Lytspel might look a bit unusual at first, but should be easy to get used too. Just try reading it out aloud and you'll soon get the hang of it.

The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveler take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew the more closely did the traveler fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shined out warmly, and immediately the traveler took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two.

Miniscripts (on GitHub)

A collection of small and generally not very useful scripts. I wrote them as I needed them, using Perl or other scripting languages. I've put them online for the unlikely case that anybody else might find one of them useful.

Spam Filtering

Information Extraction

The software I've written during my PhD project is called Trainable Information Extractor, or TIE for short: a statistical system that supports not just information extraction, but also text classification and some related tasks such as preprocessing and XML merging and repair.

It's written in Java and available under GPL. But be warned that this is experimented software. It worked very fine for my purposes, but it's hardly ready for general use, due to lack of sufficient user documentation, convenient user interfaces etc. Sorry – you know how it is.

WARNING: apparently, the TIE classifiers don't work correctly under Java 6. Please use Java 5 instead.

Steganography

And then there's NL Stego (Java, GPL), a system for text generation and text-based steganography. You can train it with sample texts (Kant, for example) and then it will generate random pseudo-texts than might lack meaning and grammatical perfectness but still convey some clear resemblance to the trained texts. Hidden in this pseudo-texts you can embed your secret messages. That's called steganography. Usually, steganography is about hiding texts in images or other binary files, but hiding texts in other texts takes less space and is more entertaining.

The program has mainly be done as a case study. It works quite well but usually there are probably better ways to hide secret messages. Anyway, it can be fun, even it you just want to generate some text without having any secret messages to hide in it (ever needed to finish a paper by midnight? ;-) ).

Early Work

From late 2006 to early 2008 I worked on an experimental Web project for Producto, the company best known for its German-language Testberichte website.

From 2001 to 2003 I worked as a software engineer and project leader for the German voice application provider Mundwerk, participating in and finally leading the development of one of the first voice application platforms in German language. In this context I also wrote my final thesis on "A Toolkit for Caching and Prefetching in the Context of Web Application Platforms" (2002). My system uses statistical methods to improve response times and thus user experience by predicting and asynchronously prefetching future page requests. It was successfully employed as part of the Mundwerk voice platform and is, to my knowledge, still quite unique as of today.

In October 2001 I became a Sun Certified Programmer for the Java 2 Platform.

In 2000/2001, I worked for WorldOS in Brooklyn, New York; a small start-up founded by Lucas Gonze whose goal was to introduce trust and reputation management in peer-to-peer architectures. A very ambitious early approach that tackled many questions that are still largely open as of today. The software was to be released as free software, but the company died in the dot.com crash.

Translations

For the GNU Project, I have translated various texts by Richard Stallman and others into German:

I used to maintain a small Software Development Bibliography that lists some books and articles that I've found useful for software development (no longer updated, but includes some classics).

There is more to freedom than just free software. Check out my free society page.


[Last generated: 2023-08-25] Valid XHTML 1.0 Transitional