Friday 3-Sep-2010.
New Book
XRay XML Editor
Company
University
Solutions
<TAG>
Xmlu.com
Current Weather
Ski Conditions

Article from January, 1999.


Processing XML with Python

By Bob DuCharme

Bob DuCharme is a senior software engineer at Moody's Investor's Service. He is the author of the brand-new XML: The Annotated Specification , and SGMLCD , a tutorial and user's guide to free SGML software. Both books are part of Prentice Hall's Charles F. Goldfarb Series on Open Information Management.


Abstract

Python, is an interpreted, object-oriented computer language that feels like C++ and Perl. The backers of Python would like to see their favorite language be the de-facto one for processing XML. Bob discusses the possibilities.


I had similar conversations with several people at the recent Markup Technologies '98 conference in Chicago, on the topic of "Which computer language should I learn next?" The two leading candidates under discussion were Java and Python. Both are recent, object-oriented languages that offer many free resources to the XML developer who wants to quickly write clean, portable code. To learn more about what Python can offer to the XML developer, I talked to Paul Prescod, a Consulting Engineer for ISOGEN, based in Austin, Texas, and the "St. Paul" of Python in the XML world.

First, a little background: Python was invented in the early nineties by Dutch programmer Guido van Rossum, who named it for the British comedy show Monty Python's Flying Circus. The language is interpreted, and interpreters are available on most popular computer platforms, which makes it popular for quick-and-dirty development. It was designed to be object-oriented from the ground up, letting you take better advantage of OO techniques than languages like Perl and C++, which had their OO features patched on to non- OO languages. While the Python language itself is fairly simple, it lets you make system calls to perform more complex tasks, including the development of a graphical user interface for your Python applications. See http://www.python.org for more information.

Your efforts to evangelize Python in the XML world have earned you the nickname "St. Paul." What was your road to Damascus?

Prescod: Believe it or not, most people experience a minor road to Damascus in the same way: they read the Python language tutorial and say to themselves "This is the language I always wanted to use but could never find." It doesn't have a quirky syntax. It doesn't depend on a Unix or Windows background. It isn't unbearably slow. As I said at the XML Developer's conference, it is the first language I've used that isn't painful in one way or the other. Actually, I think that I said that it is the first language that doesn't suck, but <TAG> is a family magazine. Java also does not suck, but it lives in a different market niche than Python.

Now my kids will wonder why I won't let them read this issue. Next question: why is Python so great for quick-and-dirty development?

Prescod: There are four main factors:

  • Python has a really great standard library. In the old days, languages like C and Pascal had tiny standard libraries and you were supposed to go and get the other stuff from other sources. Developers ended up rewriting the other stuff over and over again and code reuse and sharing was a headache. Now, languages like Python, Perl and Java are in a race to have the most robust standard libraries. In Python, you can build an HTTP server in three lines of code by subclassing an HTTServerBase class.

  • There are add-on libraries for everything in the world. The Python community is smaller than the Java or Perl communities, but I think that Python's library support is as good as those other languages because Python programmers are very prolific and share everything. As a sample:
    • The Python Imaging Library allows image conversions from dozens of formats to dozens of other formats

    • Numeric Python does fast matrix computations

    • Zope builds Web applications quickly

    • There are two CORBA implementations

    • The first two implementations of the Web DAV standard were both in Python



And of course there are many tools for working with XML specifically.

  • Python is interpreted, dynamic and really flexible. There is no compilation step and no need to design an entire type system before you start hacking. If there is code that does almost what you want, you can easily subclass or extend it to finish the job. If you make a mistake, Python gives a very informative traceback that helps you to pinpoint the problem easily.

  • Python is easy to integrate with other stuff. Python talks COM, CORBA, HTTP, FTP, SMTP, CGI, WDDX and almost everything else. You can fetch data from a URL with a single function call. You can integrate C++ libraries in half a day. Actually, I became serious about Python when I compared the ease of developing a C/C++ extension for it with the same task for Perl ( XS is really scary stuff!). It is also easy and common to embed Python as a scripting language into a C++ or Java program.

What if you want to do a bigger, more complex system with it? Do the quick-and-dirty scripts scale up well?

Prescod: Python scales really nicely. Deep object orientation helps with modularity. Really solid exception support helps to control errors and allow robust recovery. Easy integration with C means that you can optimize slow parts in C if you need it.

The most important thing is the clean syntax. People with absolutely no Python experience have been known to dive into code and fix bugs. If you are schooled in modern programming languages, much of it will be immediately obvious to you. This means that Python programmers are literally a dime a dozen. A Java, C++ or VB programmer can be retrained in a couple of days. The end result of all of this simplicity is that Python programmers can read each other's code. This is perhaps the most important factor in maintaining large, complex systems.

What does Python offer to the XML developer?

Prescod: Python's underlying language features are very well-tailored to XML processing. First, it has excellent string support: Perl-style regular expressions (without the quirky syntax), fast search primitives and elegant string slice and dice syntax.

Despite all of this, though, I should mention two caveats. The first is that Python is much better at down-translating (converting from XML) than at up-translating (converting to XML). Nobody has made a package that emulates Omnimark-style pattern matching and code invocation. The other caveat is that rich Unicode support isn't due until next year.

For down-translations, the Python community has developed

  • validating and non-validating parsers-some fast, some slow,

  • two implementations of the W3C Document Object Model API,

  • several implementations of the Simple API for XML ( SAX),

  • a few grove implementations,

  • an architectural form processor,

  • support for several standardized DTDs (i.e. WDDX, XML- RPC, HTML).

Eventually one tool in each category will "shake out" into the leader and be the standardized base that everything else builds upon.

Python's platform support is unmatched. There are two implementations of Python. One works anywhere there is a C compiler with a C standard library (and sufficient memory). The other works anywhere there is a Java Virtual Machine. Together that covers everything from handheld computers ("Python CE") to Web browsers, Unix to Windows 95 to DOS. Python code is almost always portable but every platform offers extensions that provides access to that platform's special features. For example, on the JVM you get the security sandbox, on Windows you can use COM to talk to the Microsoft Office, on Unix you can make X-Windows or curses apps. <end/>

Format for Printing



HomeContactusCopyright
All original material on this site is copyright © 1994-2010 by Architag International Corporation, All rights reserved. No part of this information may be reproduced in any form without express permission from
Architag International Corporation.