Transforming Red Hat's comps.xml File
Red Hat 8.0
Red Hat 9
Fedora Core 1
Red Hat Enterprise 3
Scientific Linux 3
Fedora Core 2
Fedora Core 3
Red Hat Enterprise 4
Scientific Linux 4
Fedora Core 4
Red Hat 9
Note that the browsable, generated html is always color-coded yellow in the linked file tables for each version.
There are other interesting aspects of visualizing the comps.xml file not yet addressed here; for example:
Of course, maybe this is all just a ruse. Maybe I just wanted to give some webbots a __really_ good exercise -- nothing like thousands of links to keep those puppies busy! he he he.
To play with the comps.xml file locally you can make a copy of the file from the install cdrom at RedHat/base/comps.xml.
Note that starting with Fedora Core and Enterprise 3, the full compls.xml file is not on the CD. You must generate it using getfullcomps.py. This utility will whine about missing packages (mostly to do with alternate architectures). These messages can be ignored:
$ cp -p /path/to/top/Fedora/base/comps.xml . $ rpm2cpio /path/to/top/Fedora/RPMS/comps-extras-9.0.3-2.noarch.rpm | \ cpio -imvur --no-absolute-filenames ./usr/share/comps-extras/getfullcomps.py $ chmod u+x ./getfullcomps.py $ ./getfullcomps.py comps.xml /path/to/top "" > compsfull.xml # now edit comps.xml and insert the file compsfull.xml at the end, before the # closing </comps> tag
Then you need to include the XSLT stylesheet file by adding a reference at the top of the XML file; for example:
<?xml version="1.0"?> <?xml-stylesheet href="comps.xsl" type="text/xsl" ?>
(Note that you really don't need the stylesheet reference above if you only use xalan - ie. you decided not to clobber your browser by directly loading the xml file with the stylesheet reference.)
You may want to also remove the <!DOCTYPE ..> reference for xalan-java, which in my experience doesn't care for it.
The files used to transform the XML file are linked below. I have also added the slightly-modified comps.xml file for your convenience. Of course, this XML file is large at more than 600 kb. Not surprisingly the generated comps.html file is also rather large.
You will want to shift-click (for mozilla/netscape) to pick up the files.
I decided to gzip/zip up the comps.xml file. This will save you the pain of accidentally clicking on it and sending your browser into a spin while it tries to present this large transform to you. This is especially true for the 2nd and 3rd versions of the XSLT file; they are fairly complex...
This time we set a global language parameter at the top of the file called lg. It is actually empty, and will default to english. By passing this parameter from xalan we can loop through the list of supported languages and generate html that uses the group names and group descriptions for the specified language.
for i in cs da de es fr is it ja ko no pt ru sv zh_CN zh_TW ; do echo $i::: xalan -IN comps.xml -XSL comps-lang.xsl -PARAM lg $i -HTML -OUT comps-lang-$i.html done
In the second version of the stylesheet, I first generated a list of RPMs in XML format from the CDROM images. The list for each RPM includes the name, the architecture and the descriptive summary. With a bit of sed the CDROM number is also added to the output. Finally the output files are combined into a file called comps-disks.xml, and a container tag is pre- and post-pended.
The new XSLT style sheet file can now use both the comps.xml and the comps-disks.xml files to generate a more detailed, but also a much larger HTML file. The resultant HTML file now shows for each RPM package the descriptive summary, the architecture and on which CD the RPM is found.
Additionally some 'quick' links :-) are added, turning the file into a click-here jamboree! In the following table the new set of files are listed. Note that comps.xml file above is still needed, as is the CSS stylesheet file, comps.css. Note that you would need to edit the .xml file to reference the XSLT stylesheet that you wish to use.
In the third version of the stylesheet, I added the source RPMs to the XSLT syle sheet of version 2 above. The script 'extractSpecFiles' was used to extract the .spec file and to generate the necessary xml data.
Though most of the data can be generated automatically with this script, there are a few minor problems with about 2% of the .spec files that require some manual intervention:
In the version 3 generated html you will find links to the .spec files. If you click on an spec link a new window will open and present the spec file. There is also an html-ized link that shows up as (html) if you like the christmas-tree look :o}. The html-ized files were generated with a wee for-loop and the never-ending capacity of gvim to entertain me:
for i in *.spec ; do echo $i::: gvim -c ':let html_use_css = 1' -f +"syn on" +"run! syntax/2html.vim" +"wq" +"q" "$i" sleep 2 done
I also needed to edit the title tag to prevent it from telling the whole world the full real path to the file on the web server:
for i in *.html ; do echo $i::: sed 's%<title>/path/to/specs/%<title>%' <$i > $i.new /bin/mv $i.new $i done
Note that the CD # does not yet show up for the source package section.
The stylesheet generates the main html page, as in previous versions. However for the detailed 'package' and 'sources' templates three sets of PHP files are produced:
Additional features implemented are (and some scripts and xml files now bear a different name from their predecessors):
The buildLangXML script takes a while to run (about an hour!) and generates many thousands of files using up over 200 MB of disk space. The scripts were clearly not designed to be efficient :-/
The intermediate $lang-comps-disks-desc.xml.zip file for each language is linked under the Description column in the table below.
Only the rpm data is internationalized; I have not attempted to use the specspo data to internationalize the SOURCE package descriptions.
|This web site hosted at:||Triumf - Canada|
|Created:||21 Feb 2003 @ 18:53 GMT|
|Last modified:||28 Sep 2003 @ 14:30 GMT|