Post Format

Conquering XML

Perhaps conquer isn’t the best word … Maybe I should say that I am finally getting a sense of XML. One of the first subjects tackled in my Introduction to Databases course (Stanford Class2Go) was XML Data and XML Validation. Was XML a brand new concept for me? No. Had I ever tried to actually understand XML before? Heck no.

We were introduced to two types of XML validation: Document Type Definition (DTD) and XML Schema. We didn’t go into too much depth with XML Schema, but we did have a variety of exercises asking us to write DTDs for XML data. I got a little thrill when I pasted my DTD for the first data set into the homework checker and found out it was right on the first try. Woohoo!

It wasn’t entirely easy, but I do feel like I have a handle on it now. The instructor lectured on the topics but didn’t have any kind of handout, so I kept track of the most important points I learned:

  • Sub-elements separated by a comma in parentheses are required and must be listed in that order:
<!ELEMENT country (president, capital)>
  • Optional sub-elements can be denoted by a question mark (?) after that item:
<!ELEMENT country (president?, capital)>
  • Either/or sub-elements (including ones that may both occur in different orders) use the “|” separator:
<!ELEMENT capital (population | governor)>
  • An asterisk (*) means zero or more, while a plus sign (+) means one or more:
<!ELEMENT country (president, capital, state*, language+)>
  • All of these concepts can be combined:
<!ELEMENT country ((president | prime_minister), capital, (state | province)*)>
<!ELEMENT president (first_name, last_name, term+, (degree | past_job)+>
  • Elements can also include references to other elements, with really cool results:
<!ELEMENT course (title, description)>
<!ATTLIST course number ID #REQUIRED>
<!ELEMENT description (#PCDATA | courseref)*>
<!ELEMENT courseref EMPTY>
<!ATTLIST courseref number IDREF #REQUIRED>
<course number="SPAN101">
   <title>Spanish 101: Beginning Spanish</title>
   <description>Spanish language course for beginners.</description>
</course>
<course number="SPAN201">
   <title>Spanish 201: Intermediate Spanish</title>
   <description>Spanish language course for intermediate students. Students in this course should have completed <courseref number="SPAN101" /> or an equivalent course.</description>
</course>

I would love to keep working on this, to get better at it and to get a sense of how to best structure the XML. Sadly, this course will be zipping right along to other concepts, although we will come back to XML in other contexts later. (I know we will learn about querying XML.) I guess I’ll have to find myself other projects to keep practicing these basics! Maybe I should look at how this ties into XHTML …

P.S. Here’s the short code for posting source code on WordPress.com blogs:

[sourcecode language=”xml”]
your code here
[/sourcecode]

You can replace xml with the language of your choice and include other parameters. (More info on the Posting Source Code support page.) It does, however, have its limitations. It changed some of the syntax of my code, for example switching my self-closing “courseref” tag to two separate tags. Just FYI.

Advertisements

Posted by

Excellence Wrangler for Automattic, the company behind WordPress.com. Linguaphile and Translator. Tester.

1 Comment so far

  1. Pingback: Tackling JSON and Relational Algebra « Coding Linguist

Comments are closed.