|
![]() |
|
Article from January, 1999. Processing XML with PythonBy Bob DuCharme Bob DuCharme is a senior software engineer at Moody's Investor's Service. He is the author of the brand-new XML: The Annotated Specification , and SGMLCD , a tutorial and user's guide to free SGML software. Both books are part of Prentice Hall's Charles F. Goldfarb Series on Open Information Management. Abstract
Python, is an interpreted, object-oriented computer language that feels like C++ and Perl. The backers of Python would like to see their favorite language be the de-facto one for processing XML. Bob discusses the possibilities. I had similar conversations with several people at the recent Markup Technologies '98 conference in Chicago, on the topic of "Which computer language should I learn next?" The two leading candidates under discussion were Java and Python. Both are recent, object-oriented languages that offer many free resources to the XML developer who wants to quickly write clean, portable code. To learn more about what Python can offer to the XML developer, I talked to Paul Prescod, a Consulting Engineer for ISOGEN, based in Austin, Texas, and the "St. Paul" of Python in the XML world. First, a little background: Python was invented in the early nineties by Dutch programmer Guido van Rossum, who named it for the British comedy show Monty Python's Flying Circus. The language is interpreted, and interpreters are available on most popular computer platforms, which makes it popular for quick-and-dirty development. It was designed to be object-oriented from the ground up, letting you take better advantage of OO techniques than languages like Perl and C++, which had their OO features patched on to non- OO languages. While the Python language itself is fairly simple, it lets you make system calls to perform more complex tasks, including the development of a graphical user interface for your Python applications. See http://www.python.org for more information. Your efforts to evangelize Python in the XML world have earned you the nickname "St. Paul." What was your road to Damascus? Prescod: Believe it or not, most people experience a minor road to Damascus in the same way: they read the Python language tutorial and say to themselves "This is the language I always wanted to use but could never find." It doesn't have a quirky syntax. It doesn't depend on a Unix or Windows background. It isn't unbearably slow. As I said at the XML Developer's conference, it is the first language I've used that isn't painful in one way or the other. Actually, I think that I said that it is the first language that doesn't suck, but <TAG> is a family magazine. Java also does not suck, but it lives in a different market niche than Python. Now my kids will wonder why I won't let them read this issue. Next question: why is Python so great for quick-and-dirty development? Prescod: There are four main factors:
And of course there are many tools for working with XML specifically.
What if you want to do a bigger, more complex system with it? Do the quick-and-dirty scripts scale up well? Prescod: Python scales really nicely. Deep object orientation helps with modularity. Really solid exception support helps to control errors and allow robust recovery. Easy integration with C means that you can optimize slow parts in C if you need it. The most important thing is the clean syntax. People with absolutely no Python experience have been known to dive into code and fix bugs. If you are schooled in modern programming languages, much of it will be immediately obvious to you. This means that Python programmers are literally a dime a dozen. A Java, C++ or VB programmer can be retrained in a couple of days. The end result of all of this simplicity is that Python programmers can read each other's code. This is perhaps the most important factor in maintaining large, complex systems. What does Python offer to the XML developer? Prescod: Python's underlying language features are very well-tailored to XML processing. First, it has excellent string support: Perl-style regular expressions (without the quirky syntax), fast search primitives and elegant string slice and dice syntax. Despite all of this, though, I should mention two caveats. The first is that Python is much better at down-translating (converting from XML) than at up-translating (converting to XML). Nobody has made a package that emulates Omnimark-style pattern matching and code invocation. The other caveat is that rich Unicode support isn't due until next year. For down-translations, the Python community has developed
Eventually one tool in each category will "shake out" into the leader and be the standardized base that everything else builds upon. Python's platform support is unmatched. There are two implementations of Python. One works anywhere there is a C compiler with a C standard library (and sufficient memory). The other works anywhere there is a Java Virtual Machine. Together that covers everything from handheld computers ("Python CE") to Web browsers, Unix to Windows 95 to DOS. Python code is almost always portable but every platform offers extensions that provides access to that platform's special features. For example, on the JVM you get the security sandbox, on Windows you can use COM to talk to the Microsoft Office, on Unix you can make X-Windows or curses apps. <end/> |


