this directory<\/a>. Don\u2019t forget to get the highest version. That directory also contains these notes and files compressed into a single archive.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\nMention:<\/span><\/p>\n\n- the core value of SGML is in being able to make changes to the script for mass changes to documents<\/span>\n
\n- The htmltrans script reliably converts the ASM Handbook Series (23 volumes, 2,000 articles, 22,000 pages, 37,000 images, 10,000 tables) to crosslinked html with tables of contents, etc.<\/span><\/li>\n
- This seperation is also useful on even very small scales, because it guarantees consistency and leads to increased reliability.<\/span><\/li>\n<\/ul>\n<\/li>\n
- this module predates XSLT<\/span><\/li>\n
- doing\/allowing things several different ways to give flexibility to use\/develop a variety of idioms<\/span>\n
\n- maybe some of that potential doesn\u2019t pan out, can tighten up later<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
\nThe processing model<\/span><\/p>\n\n- hierarchical code, like the document structure<\/span><\/li>\n
- keep code together and intuitive, especially for pre-content, post-content<\/span><\/li>\n<\/ul>\n
A code sample for OmniMark and ElementMap. This fragment takes elements from \u201c<ext.xref pointer='ARTICLE_ID'>content<\/ext.xref><\/code>\u201d to \u201c<ext.xref vol.no='NUM' collection='NUM'>content<\/ext.xref><\/code>\u201d<\/span><\/p>\n\r\n;; omnimark code\r\nelement ext.xref\r\n local counter junk\r\n and stream volnum\r\n and stream colnum\r\n and switch successful\r\n output \"<%lq\"\r\n repeat over specified attributes as spec-attr\r\n output \" \"\r\n output key of attribute spec-attr\r\n output \"=%\"%hv(spec-attr)%\"\"\r\n again\r\n activate successful\r\n reset junk to system-call \"%g(idcommand) --format='vol.no=%%v col.no=%%c' --save-output=%g(TempFile) %v(pointer)\"\r\n do unless file \"%g(TempFile)\" exists\r\n deactivate successful\r\n put #error \"Warning: auto-generated file %g(TempFile) not found%n\"\r\n increment ErrorCount\r\n else\r\n do scan file \"%g(TempFile)\"\r\n match \"vol.no=\" (letter or digit)+ => vol white-space+ \"col.no=\" (letter or digit)+ => col\r\n set buffer volnum to \"%x(vol)\"\r\n set buffer colnum to \"%x(col)\"\r\n else\r\n deactivate successful\r\n put #error \"Warning: auto-generated file %g(TempFile) is invalid: [%v(pointer)]%n\"\r\n increment ErrorCount\r\n done\r\n ;reset junk to system-call \"rm %g(TempFile)\"\r\n done\r\n do when not active successful\r\n set buffer volnum to \"unknown\"\r\n set buffer colnum to \"unknown\"\r\n done\r\n output \" vol.no=%\"%g(volnum)%\" collection=%\"%g(colnum)%\"\"\r\n output \">%c\"\r\n output \"\"\r\n<\/code><\/span><\/pre>\nOmnimark<\/span><\/p>\n\n- has good sgml parsing and support built in<\/span><\/li>\n
- has rough equivalent of Data::Locations<\/span><\/li>\n
- looks really awful (even worse than perl!)<\/span><\/li>\n
- lacks major advances in programming language design, like functions<\/span><\/li>\n
- actually, has improved a lot<\/span>\n
\n- less pointlessly verbose<\/span><\/li>\n
- functions<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
\r\n# this is just thrown together from the omnimark code: there might be errors\r\n$e->element('EXT.XREF', sub {\r\n my ($engine, $element) = @_;\r\n my ($attrs, $successful, $line, $volnum, $colnum);\r\n $output->print(\"<\".$element->{'Name'}.\" \");\r\n $attrs = $element->{'Attributes'}\r\n foreach (@$attrs) {\r\n $output->print(\" \" . $_ . '=\"' . $attrs->{$_} . '\"');\r\n }\r\n system($idcommand, \"--format='vol.no=%v col.no=%c'\",\r\n \"--save-output=\".$TempFile, $attrs->{'pointer'});\r\n OK: {\r\n $successful = 0;\r\n if (! -f $TempFile) {\r\n warn \"Warning: auto-generated file \".$TempFile.\" not found\\n\";\r\n $ErrorCount += 1;\r\n last OK;\r\n }\r\n $line = <$TempFile>;\r\n if ($line && $line =~ m\/vol\\.no=(\\w+)\\s+col\\.no=(\\w+)\/) {\r\n $volnum = $1;\r\n $colnum = $2;\r\n } else {\r\n warn \"Warning: auto-generated file \" . $TempFile .\r\n \" is invalid: [\" . $attrs->{'pointer'} . \"]\\n\";\r\n $ErrorCount += 1;\r\n last OK;\r\n }\r\n $successful = 1;\r\n }\r\n if (!$successful) {\r\n $volnum = $colnum = 'unknown';\r\n }\r\n $output->print(' vol.no=\"'.$volnum.'\" collection=\"'.$colnum.'\">');\r\n $engine->process_content;\r\n $output->print(\"{'Name'}.\">\");\r\n});\r\n<\/code><\/span><\/pre>\nOther processors<\/span><\/p>\n\n- impossible with PerlSAX<\/span>\n
\n- SAX calls you, so you MUST return from a start element handler before you will get any content events, and the content and end element handlers are called at the same level as the start handler<\/span><\/li>\n
- you have to track ALL the document state and your own to-do<\/span><\/li>\n
- Note this is only a problem because Perl lacks threads<\/span>\n
\n- Java encourages threads, so these were natural decisions for SAX<\/span><\/li>\n
- but bad decisions for PerlSAX<\/span><\/li>\n
- pull vs push<\/span>\n
\n- an event pull API can be translated into a push API<\/span><\/li>\n
- reverse requires threads to partition the parsing from the processing call hierarchy<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n
- XSLT<\/span>\n
\n- if you want to be able to program in it, then you should use a programming language<\/span>\n
\n- pointless reinvention fragments progress<\/span><\/li>\n
- convolutes programmatic processing with static processing<\/span><\/li>\n
- discourages development of good style engines and models<\/span><\/li>\n
- discourages development of good hooks and API in above too<\/span><\/li>\n<\/ul>\n<\/li>\n
- came after my module anyway<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
\nShould read the SGML::ElementMap documentation and start looking at the ElementMap.pm code<\/span><\/p>\nWhy use constants for object data reference?<\/span><\/p>\n\n- obvious, less important: speed (hash lookups much slower than arrays)<\/span><\/li>\n
- less obvious, more important: correctness (no name typos, no subclass name conflicts, actual name space separation)<\/span><\/li>\n
- see code for subclass creation (eg Driver::SGMLS)<\/span><\/li>\n<\/ul>\n
What do we do with handlers?<\/span><\/p>\n\n- store them in bunches by handler type<\/span><\/li>\n
- need to keep ordering for handlers that might match<\/span><\/li>\n
- seperate into two cases: default handlers and name handlers<\/span>\n
\n- use hash and \u201d as name for default handlers<\/span><\/li>\n
- only need to keep order under a name and among default handlers<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
What do the main objects look like? (Notice the colons. This is kind of structure describing pseudo-perl. Nothing formal or correct.)<\/span><\/p>\n\r\nmode : {\r\n 'handler_type' => handler_set : {\r\n 'NAME' or '' => handler_pair : [ pattern, handler_ref ]\r\n\r\n$mode = { '_ MODENAME ' => 'FOO',\r\n '_ FINALIZE ' => '',\r\n 'Element' => {\r\n 'PARA' => [ '.*\/SECTION\/.*', \\\u00a7ion;_para ],\r\n '' => [ '', \\&no;_handler_warning ] },\r\n 'CData' => {\r\n '' => [ '', \\&data;_accumulate ] },\r\n };\r\n<\/code><\/span><\/pre>\nMode<\/span><\/p>\n\n- mode has its name saved in it<\/span><\/li>\n
- current modes is just a list of refs to modes, need to recover names<\/span><\/li>\n
- modes could easily be arrays<\/span><\/li>\n
- generally use hashes early on<\/span>\n
\n- trivial to mix around data fields<\/span><\/li>\n
- might want transient fields<\/span><\/li>\n
- later if the fields are static, convert to arrays<\/span><\/li>\n<\/ul>\n<\/li>\n
- finalize field has ref to lookup proc and data if mode is finalized $finalized_data => [ \\&lookup;_func, $handler1, $handler2, \u2026 ]<\/span><\/li>\n<\/ul>\n
\r\n$main = [\r\n $state_data,\r\n $all_modes,\r\n $global_vars,\r\n $stack_vars\r\n ];\r\n\r\n$state_data = [\r\n driver : SGML::ElementMap::Driver\r\n node_path : ''\r\n handler_modes : [ $mode, $mode_2, $mode_3, ... ]\r\n handler_mode_stack : [ $mode_set_1, $mode_set_2, ... ]\r\n named_handlers : { 'NAME' => \\&handler; }\r\n last_gen_name : 'aaa'\r\n ];\r\n\r\n$all_modes = { 'MODE_NAME_1' => $mode,\r\n 'MODE_NAME_2' => $mode_2 };\r\n\r\n$global_vars = { 'NAME' => $some_value };\r\n\r\n$stack_vars = Hash::Layered;\r\n<\/code><\/span><\/pre>\nWhy global variable support?<\/span><\/p>\n