I write blog entries here using Markdown. Comments also use markdown processing, albeit with HTML filtering. I've also since discovered that blogs just about represent the limit of utility for markdown. I've been taking my course notes in markdown as well, and frankly, it doesn't work.

Markdown is fine for informal, one-step publishing, but trying to do multi-document work, like a series of notes for a class, quickly reveals problems. Markdown isn't meant to be a document preparation language, it's meant to be make generating HTML more natural, and it was shortsighted of me to try and coerce it into other roles.

But I wouldn't write a post just to complain. I've gotta regale you with more thoughts, right?

But first some more complaints. LaTeX:

  • Is ugly, verbose, and full of backslashes
  • Is difficult to learn, install, and debug1
  • Has a high markup-to-content ratio, and a lot of implicit behavior

It is also far superior to Markdown for what I was trying to use it for.

This is because LaTeX has one overwhelming advantage that Markdown doesn't: it can be abstracted. 2

When writing math, I know that when I find myself writing;

  1. \begin{align*}
  2. f\left(x\right) &= ... \\
  3. f\left(x\right) &= ... \\
  4. f\left(x\right) &= ... \\
  5. f\left(x\right) &= ... \\
  6. \end{align*}

I will be able to do this, instead.

  1. \newcommand{\fx}{f\left(x\right)}
  2. \begin{align*}
  3. \fx &= ... \\
  4. \fx &= ... \\
  5. \fx &= ... \\
  6. \fx &= ... \\
  7. \begin{align*}

There just isn't a way to do that in Markdown, because that's not what it's designed to do.

Furthermore, if I'm writing a document with a lot of code samples - by which I mean running code samples, not the pseudocode cop-outs I post here - getting code from an interpreter to markdown, and vice-versa, is a pain. About the only acceptable thing is to use Elisp functions and keyboard macros to automate copying code to and from a scheme prompt in emacs - which is one of the main reasons you'll see Python and Scheme in my course notes: they are languages which can run in Emacs.

But even that is painful, and utterly breaks down when you move past trivial examples.

With LaTeX, in contrast, you can use the listings package to include external files. It lends itself to then actually write code, and then include the entire files in your documents. The listings package can even find "special" comments with LaTeX content - for things like references - and strip them out, allowing you to reference line numbers just you would reference anything else in LaTeX.

And since it's fully expected that you will be rendering to PDF, HTML or DVI before publishing, you can automate the entire "build" process by putting your pdflatex commands into a makefile. That same makefile can also make two or ten or a hundred other documents, while simultaneously running your code samples and piping the output to files which will then get included by the listings package into your final PDFs. Which is a huge win: you know that your code samples always work, and that they always produce the output you have in the document. And when it's time to add another chapter or section, your time investment involved adding one line to you makefile.

Besides being easy to scale to large numbers of output files, LaTeX can also handle ever-increasing amounts of input. If your "quick notes during class" grows to 300 lines and is getting in the way of your main content in the course code, you can just copy out the parts you want in a separate file, and \input{} it from your first source file.

That's also a great way of centralizing \newcommands, BTW.

In all, LaTeX scales, and Markdown hits a limit quickly. Which is why I'm going to have to rewrite everything in LaTeX. Lesson learned.

The moral of the story is "use the right tool for the job", or something like that, "even when the right tool has backslashes."

"Because if you don't, you'll just have to blog about it to admonish fellow coders - er, document preparers - students? - people - to not make the same mistake."

Yeah, cause I made that mistake. And you shouldn't. Got it? Cool.

But wait! There's more.

Python Packaging - aka, Eggs!

I got a new box recently. It's a Dell XPS 410n. Three things of note: It's the first Dell I've bought, the "n" means it's one of the Ubuntu Dells, and I wouldn't have gotten a Dell at all if it hasn't been for #2. So, all told, I guess this means Dell got at least some extra money for choosing to sell desktops with Ubuntu. Good for them. I might post a review of the computer soon, but not today.

Anyway, the reason I bring it up is that on said new box I am simply not going to install easy_install. I don't like using easy_install, although I don't mind eggs in the slightest (stick with me for a second). Easy_install messes up my sys.path. It wants root, unless I setup a dot-file for it, which, because I can't remember it offhand and it's not #1 when I Google "easy_install local", I'm not going to setup3. It's literally too much of an investment of time, because the alternative way is so much easier.

The great thing about eggs is that, a few weird cases notwithstanding, they're distributed, or at least are available, as tarballs. Which means I can and do do the following:

  1. wget tarball.tar.gz
  2. tar -xvzf tarball.tar.gz
  3. cd tarball/
  4. mv src/tarball ~/.lib/python/

The last step varies a bit. It could be in src/packagename, lib/python/packagename, or just packagename. But I've never seen it anywhere other than those three locations, and it's easy to see with a quick ls.

It's hygienic, easier to debug, and doesn't munge my sys.path. Which is good, because I like my sys.path to be, you know, human-parsable. And to fit on a single screen. Because that's really useful when the path isn't working how you want it to, and you want to debug it. It also makes doing things like running from a development version of, say, SQLAlchemy possible when you want to build some patches against the trunk but still have the stable one available for when you're done. With easy_install, I know there's a way of installing from an SVN repo, but is there a way to revert easily back to stable?

Dunno, don't care, got a way that works for me.

All this is a warning: Get it right first. If you (and by "you", I mean "a software package" - does talking in the second person to software make me crazy?) become popular before you have every feature under the sun, there will always be annoying people like myself who will complain. Something about 80/20 or something like that.

Anyway.

Onto more juicy topics, assuming any of this can be called juicy.

Unfaithful

I have been straying. (from Python.)

Back when Lisp Envy hit proggit, I got a lot of suggestions as to alternatives I could try. Pico Lisp is too low-level, SBCL is massive and too VM-ey4, Logix is defunct and non-working with Python 2.5, and I have far too little free time to investigate the remaining alternatives.

Well, past-tense, I had too little time. But over the past few days I've finally given a solid look at one of the systems suggested by a commenter, namely, PLT Scheme. PLT Scheme's claim to fame is DrScheme, which is a great example of what is possible in language-aware text editors/IDEs. But PLT Scheme has one other big advantage: it's not particularly popular (yet), certainly not on the scale of Python or Ruby, and yet it has been developed for a decade, is fairly stable, and fairly featureful.

Don't look now, but the PLT group has managed to put together a great and largely undiscovered system.

There are certainly things that stand out about PLT. For one, because of their work with DrScheme, PLT has a well-developed GUI library. Of course, it still provides a capable terminal-only REPL, the only complaint I'd have there being that readline keybindings aren't enabled by default. It even has a builtin web server, which, like paster or Zope, can be proxied through Apache for scalability goodness. It can compile to native code, and provides many layers of abstraction: The procedure, closures, structure, classes, modules, and units. And being a Lisp, it also provides macros. As as I understand it, classes and structures are both done using macros. Of course, I haven't really investigated the class or unit systems, so I could be completely off the mark there.

The next thing that surprised me about PLT was the standard library, called mzlib. I'll admit it's not as comprehensive as Python's stdlib, but it's still enough to move PLT Scheme solidly from "interesting" to "useful."

But when learning a new language, by far the most important point is documentation. Documentation is king. And here again, the educational nature of PLT's DrScheme experience shines through: the documentation is actually fairly comprehensive and useful. I have about 50 pages worth printed out that I'm referring to constantly. Being able to do that is an essential stage in learning a new programming language, and PLT's challenge is greater than most, since it must teach functional style in addition to pure syntax and library documentation. But so far, it's doing great.

But back to where we were before... about Python. Python's packaging is pretty nice. You import a package with import, it can refer to a local or system library, and it goes through your sys.path to find it. On the basics, PLT Scheme is fairly close, although it, by default and in a fashion typical to Lisps, imports into the local namespace rather than to a named namespace as Python does. It also differentiates between project, site-specific, and PLT libraries, which is also nice.

But what shocked me about PLT Scheme is that it actually gets packaging done right. The (require) function will actually download and compile any missing libraries from PLaneT, which incidentally, is a great way to waste a few hours once you've gotten PLT installed: just find interesting libraries, browse their source, (require) them and learn. It installs to a local dotfile directory by default (good! Because most users won't change it if it weren't hidden, they'd just be silently annoyed) and will compile to bytecode or native code or something, I honestly don't know what. Don't really care right now, either, because it's snappy enough, and runtime details come a little later.

The PLaneT repository is not unlike PyPi: it provides a huge resource for centralized, if not standard, libraries. However, PLaneT doesn't make you think about snake eggs, because frankly, snake eggs are ugly. Contrast with Java Beans or Ruby Gems: snake eggs are just... yuck. Have you ever seen a picture of one? Those things are ugly. Besides the snake eggs issue, PLaneT wins on two counts: it can handle many versions concurrently and intuitively, and the dependency resolution happens at runtime, not through a separate script. You can download a (trusted) script, and as long as the libraries are in PLaneT, your code will run - the first time, getting libraries from the web on-the-fly and later from your local copies.

Honestly, the require function scared me at first. Downloading programs from the big bad Web and running them with the rights of the user? Erm... But, we essentially have the same problem with easy_install, and the solution with PLT is the same: if you need more assurance than the (presumed accurate) hash, download the .plt files, check them against your own hashes or checksums, and install the local copies.

So, perhaps next weekend - honestly, probably not - I'll get to leverage the awesome power of macros to, I dunno, build a motor vehicle with mind power.


  1. As with most such things, the error messages are cryptic until you get used to them. The problem with pdflatex, though, it that it will keep going through errors once it hits a problem (which, I learned today, can be fixed with -halt-on-error) and that it also outputs a lot of text during normal operation, which leads to an information overload when trying to read error messages. 

  2. The other killer advantage that TeX (and thus LaTeX) has is that nothing else comes close when it comes to typesetting math. But if you care about typesetting math, you already knew that. 

  3. And if you think I'm going to try to find the right config directive in here, well, documentation pages that long scare me. If you do find the right section, it admonishes you to use the messy, PATH-munging method for a "cleaner, more usable Python configuration". Well, thanks, but no thanks, I've already got a clean, usable way of setting up my boxen, and it involves wget, tar and mv, but not easymd5:b14a7b8059d9c055954c92674ce60032install. Easymd5:b14a7b8059d9c055954c92674ce60032install is great for packagers, and I love that people use it: the relatively standard and sane directory structure makes it easy for me to bypass easy_install entirely via the process shown here. But am I gonna use it? Nope. 

  4. Everything in SBCL is compiled to native code; there isn't really an "interpreter" anymore. And I don't like it. The REPL (Or shall we call it a RCPL?) manages to take three times as much memory as Python at startup - real memory, not virtual memory, SBCL's virtual memory usage quite honestly scares me - but won't turn my Control-P into a "insert last line"; it'll just print ^P. I mean, come on.

    I suppose I'm a skeptic of the whole JIT thing. "Instead of parsing the code into an AST, and running that through our portable runtime, let's take that code, try to transform it into distinctly nonportable native code, and run that instead." Faster? It can be. The right approach? As Python shows, not always: dynamicity trumps raw execution speed any day. It's the same problem Java has.

    You know, I really shouldn't knock Java. The whole thing about "Well it works for X gazillion people, it must have some merit." Plus I've never actually suffered worked much with it. But bashing Java is like tricking out your bashrc or using a light-on-dark color scheme for your editor: it just makes you feel deliciously UNIXy. And if I'm gonna bash Java for the VM approach, well, I can't spare SBCL, can I?