comps.xml transforms

Transforming Red Hat's comps.xml File


Red Hat 9

There are five and a half versions of the comps.xml file transformation here. The most useful (light-weight) is the fourth version in english. For other languages see the fifth version.

Note that the browsable, generated html is always color-coded yellow in the linked file tables for each version.

  • Version 1: Simple (*cough*) transform of comps.xml
    • Version 1 (lang): The same transform of comps.xml, but using the group language fields
  • Version 2: Add RPM and CD number information to transform of comps.xml
  • Version 3: Add SRPM and specfile information to version 2 of transformation
  • Version 4: Move all RPM summary, arch. and CD # information, AND the source SRPM and specfile information to javascript-launched windows.
  • Version 5: Like version 4, however this time we use the specspo information to generate equivalent web pages in other supported languages.

There are other interesting aspects of visualizing the comps.xml file not yet addressed here; for example:

  • Present packages that are not part of any installation group (eg sharutils).
  • Visualize the RPM Application GROUP hierarchy provided by Versions 4 and 5 style sheets (not part of the original comp.xml file).

Of course, maybe this is all just a ruse. Maybe I just wanted to give some webbots a __really_ good exercise -- nothing like thousands of links to keep those puppies busy!  he he he.


Version 1

To play with the comps.xml file locally you can make a copy of the file from the install cdrom at RedHat/base/comps.xml.

Note that starting with Fedora Core and Enterprise 3, the full compls.xml file is not on the CD. You must generate it using getfullcomps.py. This utility will whine about missing packages (mostly to do with alternate architectures). These messages can be ignored:

 $ cp -p /path/to/top/Fedora/base/comps.xml .
 $ rpm2cpio /path/to/top/Fedora/RPMS/comps-extras-9.0.3-2.noarch.rpm | \
  cpio -imvur --no-absolute-filenames ./usr/share/comps-extras/getfullcomps.py
 $ chmod u+x ./getfullcomps.py
 $ ./getfullcomps.py comps.xml /path/to/top "" > compsfull.xml
   # now edit comps.xml and insert the file compsfull.xml at the end, before the
   # closing </comps> tag

Then you need to include the XSLT stylesheet file by adding a reference at the top of the XML file; for example:

  <?xml version="1.0"?>
  <?xml-stylesheet href="comps.xsl" type="text/xsl" ?>
  

(Note that you really don't need the stylesheet reference above if you only use xalan - ie. you decided not to clobber your browser by directly loading the xml file with the stylesheet reference.)

You may want to also remove the <!DOCTYPE ..> reference for xalan-java, which in my experience doesn't care for it.

The files used to transform the XML file are linked below. I have also added the slightly-modified comps.xml file for your convenience. Of course, this XML file is large at more than 600 kb. Not surprisingly the generated comps.html file is also rather large.

You will want to shift-click (for mozilla/netscape) to pick up the files.

I decided to gzip/zip up the comps.xml file. This will save you the pain of accidentally clicking on it and sending your browser into a spin while it tries to present this large transform to you. This is especially true for the 2nd and 3rd versions of the XSLT file; they are fairly complex...

Version 1
FileSize (bytes)Description

comps.xml.gz 72068 the slightly modified Red Hat 9 comps file in (GNU) gzip format
comps.xml.zip 72190 the slightly modified Red Hat 9 comps file in ZIP format
comps.xsl 9646 the XSLT style sheet used in this example
comps.css 676 the CSS style sheet used in this example
comps.html 417964 the xalan-generated HTML file that allows you to browse the comps.xml file

Version 1 (lang)

This time we set a global language parameter at the top of the file called lg. It is actually empty, and will default to english. By passing this parameter from xalan we can loop through the list of supported languages and generate html that uses the group names and group descriptions for the specified language.

  for i in cs da de es fr is it ja ko no pt ru sv zh_CN zh_TW ; do
    echo $i:::
    xalan -IN comps.xml -XSL comps-lang.xsl -PARAM lg $i -HTML -OUT comps-lang-$i.html
  done
  
Version 1 (lang)
FileSize (bytes)Description

comps-lang.xsl 11223 the XSLT style sheet used in this example
comps-lang-cs.html 420072 Czech group names and descriptions
comps-lang-da.html 419255 Danish group names and descriptions
comps-lang-de.html 418903 German group names and descriptions
comps-lang-es.html 419189 Spanish group names and descriptions
comps-lang-fr.html 419188 French group names and descriptions
comps-lang-is.html 419437 Icelandic group names and descriptions
comps-lang-it.html 418623 Italian group names and descriptions
comps-lang-ja.html 420155 Japanese group names and descriptions
comps-lang-ko.html 419114 Korean group names and descriptions
comps-lang-no.html 418431 Norwegian group names and descriptions
comps-lang-pt.html 419188 Portuguese group names and descriptions
comps-lang-ru.html 422085 Russian group names and descriptions
comps-lang-sv.html 419048 Swedish group names and descriptions
comps-lang-zh_CN.html 417735 Chinese Simplified group names and descriptions
comps-lang-zh_TW.html 417925 Chinese Traditional group names and descriptions

Version 2

In the second version of the stylesheet, I first generated a list of RPMs in XML format from the CDROM images. The list for each RPM includes the name, the architecture and the descriptive summary. With a bit of sed the CDROM number is also added to the output. Finally the output files are combined into a file called comps-disks.xml, and a container tag is pre- and post-pended.

The new XSLT style sheet file can now use both the comps.xml and the comps-disks.xml files to generate a more detailed, but also a much larger HTML file. The resultant HTML file now shows for each RPM package the descriptive summary, the architecture and on which CD the RPM is found.

Additionally some 'quick' links :-) are added, turning the file into a click-here jamboree! In the following table the new set of files are listed. Note that comps.xml file above is still needed, as is the CSS stylesheet file, comps.css. Note that you would need to edit the .xml file to reference the XSLT stylesheet that you wish to use.

Version 2
FileSize (bytes)Description

genrpmlist 2014 the shell script used to extract the RPM names, architectures and summary descriptions.
comps-disks.xml.gz 29753 the generated Red Hat 9 CDROM file list in (GNU) gzip format
comps-disks.xml.zip 29881 the generated Red Hat 9 CDROM file list in ZIP format
comps-v2.xsl 13813 the XSLT style sheet used in this example
comps-v2.html 642877 the xalan-generated HTML file that allows you to browse the comps.xml file

Version 3

In the third version of the stylesheet, I added the source RPMs to the XSLT syle sheet of version 2 above. The script 'extractSpecFiles' was used to extract the .spec file and to generate the necessary xml data.

Though most of the data can be generated automatically with this script, there are a few minor problems with about 2% of the .spec files that require some manual intervention:

  • A couple of .src.rpm files contain the same spec filename; the script warns about these few cases. You need to edit the output file by hand, and rename any files that have .preserved extentions.
  • Some spec files have an 'Icon:' directive in the preamble, and the rpm query will fail if you have not installed the .src.rpm file
  • Some spec files want to invoke a macro on the 'Release:' or the 'Version:' directives in the preamble, and the rpm query will fail if you have not installed the .src.rpm file (or in some cases a particular binary RPM [httpd-devel])

In the version 3 generated html you will find links to the .spec files. If you click on an spec link a new window will open and present the spec file. There is also an html-ized link that shows up as (html) if you like the christmas-tree look :o}. The html-ized files were generated with a wee for-loop and the never-ending capacity of gvim to entertain me:

  for i in *.spec ; do
    echo $i:::
    gvim -c ':let html_use_css = 1' -f +"syn on" +"run! syntax/2html.vim" +"wq" +"q" "$i"
    sleep 2
  done
  

I also needed to edit the title tag to prevent it from telling the whole world the full real path to the file on the web server:

  for i in *.html ; do
    echo $i:::
    sed 's%<title>/path/to/specs/%<title>%' <$i > $i.new
    /bin/mv $i.new $i
  done
  

Note that the CD # does not yet show up for the source package section.

Version 3
FileSize (bytes)Description

extractSpecFiles 5594 the shell script used to extract the SOURCE RPM names, specfiles and summary descriptions.
srpms.xml.gz 40482 the generated Red Hat 9 SRPMS file list in (GNU) gzip format
srpms.xml.zip 40604 the generated Red Hat 9 SRPMS file list in ZIP format
comps-v3.xsl 17097 the XSLT style sheet used in this example
comps-v3.html 1165061 the xalan-generated HTML file that allows you to browse the comps.xml file

Version 4

For this version I have used the xalan-specific extension 'redirect:' to write multiple output PHP files. All links to the packages and the source packages are now javascript() links which launch mini-windows when clicked.

The stylesheet generates the main html page, as in previous versions. However for the detailed 'package' and 'sources' templates three sets of PHP files are produced:

  • A frameset that associates a package with an SRPM package
  • The detailed RPM package
  • The detailed SRPM package

Additional features implemented are (and some scripts and xml files now bear a different name from their predecessors):

  • The RPM Application GROUP tag was added to the comps-disk.xml file
  • The full DESCRIPTION tag was added to the comps-disk.xml file
  • The number of the CD is now output for the SRPM files
  • The genrpmlist and extractSpecFiles scripts were altered and improved somewhat.
Version 4
FileSize (bytes)Description

jshead.php 1476 the small php function headers file that is included in each of the subsequently generated PHP files
comps-disks-desc.xml.gz 143434 the generated Red Hat 9 CDROM file list in (GNU) gzip format
comps-disks-desc.xml.zip 143567 the generated Red Hat 9 CDROM file list in ZIP format
genrpmlist2 2753 the shell script used to extract the RPM names, architectures cd numbers, summaries and descriptions.
extractSpecFiles2 5988 the shell script used to extract the SOURCE RPM names, specfiles and summary descriptions.
srpms2.xml.gz 37789 the generated Red Hat 9 SRPMS file list in (GNU) gzip format
srpms2.xml.zip 37912 the generated Red Hat 9 SRPMS file list in ZIP format
comps-v4.xsl 20200 the XSLT style sheet used in this example
comps-v4.html 241059 the xalan-generated HTML file that allows you to browse the comps.xml file

Version 5

I used again the version 4 approach of reducing the size of the generated html and of pushing detail about RPMs into javascript() linked PHP files. However this time I pass the language environment to the rpm calls in the scripts. With the language environment set, rpm will use the specspo internationalize data to emit RPM header information in the requested language. Here are a few examples using the kernel RPM:

The buildLangXML script takes a while to run (about an hour!) and generates many thousands of files using up over 200 MB of disk space. The scripts were clearly not designed to be efficient :-/

The intermediate $lang-comps-disks-desc.xml.zip file for each language is linked under the Description column in the table below.

Only the rpm data is internationalized; I have not attempted to use the specspo data to internationalize the SOURCE package descriptions.

Version 5 (lang)
FileSize (bytes)Description

comps-v4-lang.xsl 23753 the XSLT style sheet used in this example
buildLangXML 2589 the script used to generate the cdrom descriptions and the html
jshead-lang.php 1550 the small php function headers file that is included in each of the subsequently generated PHP files. It is slightly different from the version 4 file, since it passes the language variable in doHeader()
comps-v4-lang-cs.html 249401 Czech (cs) group names, descriptions and specspo data
comps-v4-lang-da.html 248782 Danish (da) group names, descriptions and specspo data
comps-v4-lang-de.html 248430 German (de) group names, descriptions and specspo data
comps-v4-lang-es.html 248716 Spanish (es) group names, descriptions and specspo data
comps-v4-lang-fr.html 248715 French (fr) group names, descriptions and specspo data
comps-v4-lang-is.html 248964 Icelandic (is) group names, descriptions and specspo data
comps-v4-lang-it.html 248150 Italian (it) group names, descriptions and specspo data
comps-v4-lang-ja.html 249682 Japanese (ja) group names, descriptions and specspo data
comps-v4-lang-ko.html 248641 Korean (ko) group names, descriptions and specspo data
comps-v4-lang-no.html 247958 Norwegian (no) group names, descriptions and specspo data
comps-v4-lang-pt.html 248715 Portuguese (pt) group names, descriptions and specspo data
comps-v4-lang-ru.html 251612 Russian (ru) group names, descriptions and specspo data
comps-v4-lang-sv.html 248575 Swedish (sv) group names, descriptions and specspo data
comps-v4-lang-zh_CN.html 253565 Chinese Simplified (zh_CN) group names, descriptions and specspo data
comps-v4-lang-zh_TW.html 253755 Chinese Traditional (zh_TW) group names, descriptions and specspo data
Click to show without the menu bar.

Site Contact:
http://penguin.triumf.ca/home/
This web site hosted at: Triumf - Canada
Created: 21 Feb 2003 @ 18:53 GMT
Last modified: 28 Sep 2003 @ 14:30 GMT