IRC log for #koha, 2006-12-12

Today | Next day → | Search | Index

All times shown according to UTC.

Time Nick Message
14:58 thd kados: are you there?
15:00 kados hi thd
15:00 thd: what can I do for you? ;-)
15:00 thd hello
15:00 dewey privet, thd
22:55 kados thd: here are some of the errors:
22:55 #161 has a problem: Premature end of data in tag subfield line 49
22:55 Premature end of data in tag datafield line 48
22:55 Premature end of data in tag record line 7
22:55 Premature end of data in tag collection line 2 at /usr/local/lib/perl/5.8.4/XML/LibXML/SAX.pm line 64
22:55 at /usr/local/share/perl/5.8.4/MARC/File/XML.pm line 450
22:57 thd kados:that is a record parsing error
22:58 we need to get those to Ed so he can fix the underlying library
22:59 kados did he make promises to that effect?
22:59 not to me anyway ;-)
23:00 thd kados: I had a very good feeling about how he expressed something similar if we had reproducible problems which we had made a reasonable effort to trace
23:00 kados what constitutes reasonable effort?
23:01 thd kados: well testing to be certain that the record itself is not unreasonably misencoded
23:02 kados it doesn't throw an error in marcdump
23:02 is that enough testing? :-)
23:03 thd kados: what happens if you use the Koha rel 2_2 bulkmarcimport,pl with logging set to maximum verbosity -vv
23:04 kados: that would use MARC record instead of MARC-XML
23:05 kados there is only one perl method to convert to utf8 using the codetables.xml file provided by LOC
23:05 that's provided by MARC::File::XML
23:06 thd kados: is that what ARC
23:07 kados: is that what MARC::Charset uses?
23:08 kados no, but it uses MARC::Charset ;-)
23:10 thd kados: I think that using the MARC-8 to UTF-8 function which  I added to bulkmarcimport.pl  may convert the record without triggering SAX errors
23:11 kados: If we can convert the encoding first without involving XML we eliminate most sources of error
23:12 kados thd: but your code doesn't utilize the codetables.xml file
23:12 thd: that MARC::Charset uses
23:12 at least I don't think it does
23:12 thd my code uses MARC::Charset
23:13 kados and I'm also not sure if it properly handles the conversion of multi byte characters, especially handling of 'combining characters' since MARC-8 and UTF- ordering is so different
23:13 thd kados: the Afognak job was filled with errors like what you reported just now until I wrote my own MARC-8 conversion code
23:15 kados: I did still have problems but they were reduced from many many to about 3 records in 500
23:16 kados: the resulting records all looked fine on my system so I assumed they had worked correctly
23:17 kados: they did not look fine on Afognak's system but after testing extensively I concluded that my code was not at fault
23:19 kados: I think the problem that MARC::File::XML is having is parsing the records first before converting to UTF-8
23:19 kados it looks like I can just call marc8_to_utf8 now
23:20 to convert each subfield
23:20 thd kados: MARC::Record does not care about encoding
23:21 kados: so using MARC::Record allows you to open and step through the record without having immediate errors
23:22 kados I will look at your code again
23:23 thd kados: with M::F::X exclusively if you want to open the record without errors you must convert to UTF-8 first but if that is your only tool for UTF-8 conversion then you have a problem
23:29 kados: I think importing with M::F::X should be  a two stage process ...
23:30 first open the record with MARC::Record and convert encoding
23:30 save the record with MARC::Record
23:31 then use MARC::File::XML to do something after the record is safely converted and now parsable in XML
23:36 after skipping the 3 problem records there were 2 records with a character or 2 which may have been invalid MARC-8 and could not be converted to UTF-8 so the whole subfield was deleted with the code I had at the time
23:37 so that was about 5 problems in 500 records
23:45 kados: does record # 161 pass after using marc8_to_utf8?
23:46 kados thd: I haven't had a chance to write it yet
23:46 thd: got distracted ;-)
00:21 thd kados: look at the statement of responsibility for LCCN 79106336
00:21 Dating the Icelandic sagas
00:22 http://zoomdemo.liblime.com/bib/1972
00:26 kados: do you see an Asian language glyph in the statement of responsibility for http://zoomdemo.liblime.com/bib/1972
00:26 ?
05:44 hdl hi
10:40 kados paul_lunch: are you around?
10:41 paul it's almost 3PM in France. fortunatly, lunch over ;-)
10:41 kados heh
10:41 paul 'morning joshua
10:41 kados hi ...
10:41 http://zoomdemo.liblime.com/search?q=test
10:41 click on a detail page
10:41 paul done.
10:42 kados it's built in about 5 lines of perl
10:42 paul and I bet 10000 that it's marc21 specific ;-)
10:42 kados my $xmlrecord = C4::Biblio::getRecord("biblioserve​r","Local-number=$biblionumber");
10:42 my $xslfile = "/home/kohacat/etc/xslt/MARC21slim2English.xsl";
10:42 my $parser = XML::LibXML->new();
10:42 my $xslt = XML::LibXSLT->new();
10:42 my $source = $parser->parse_string($xmlrecord);
10:42 my $style_doc = $parser->parse_file($xslfile);
10:42 my $stylesheet = $xslt->parse_stylesheet($style_doc);
10:42 my $results = $stylesheet->transform($source);
10:42 my $newxmlrecord = $stylesheet->output_string($results);
10:42 more than 5 ...
10:43 but it's the first time I've attempted to use XSLT to format a MARCXML record
10:44 paul it's really great & powerful. what does the xslt look like ?
10:44 kados http://www.loc.gov/standards/m[…]1slim2English.xsl
10:45 I have tested several of the xslt files on this page: http://www.loc.gov/standards/marcxml/
10:45 they can all be processed as above
10:45 paul loc.gov don't answer on this side of the ocean...
10:45 (or very very very slowly...)
10:45 kados weird
10:45 paul answer in 50seconds
10:46 kados zoomdemo.liblime.com/MARC21slim2English.xsl
10:47 paul which perl packages does it requires ?
10:47 how fast/slow is it ? (if you tested for speed)
10:48 kados I haven't tested speed ... it requires XML::LibXML and XML::LibXSLT
10:49 I decided to just play with it to see if it's worth pursuing
10:49 and it turns out the code to do the transformation is simpler than I thought
10:49 owen: time to brush up on your XSLT ;-)
10:49 owen I think to brush up I'd need to have some to brush.
10:51 kados hdl mentioned XSLT at our kohacon
10:53 hdl kados paul : the only pb is to build good and thorough xsl files from frameworks.... which could be accomplished with a good xml framework description.
10:53 kados hi hdl
10:54 hdl: do you think we could replace a framework with XSLT?:
10:54 hdl yes.
10:54 But not at the moment.
10:54 Would take some time.
10:55 1st step would be to define a good DTD for frameworks.
10:55 That would be a base for input, Output and sumaries.
10:56 But then parsing xml frameworks to produce xlst would be nice.
10:56 And xslt would parse xml records to produce correct HTML.
10:57 But need is to be quite precise in Framework description
10:57 kados didn't we have such a definition for opencataloger?
10:57 I thought toins created it
10:58 paul toins created something close from our actual frameworks, you're right

Today | Next day → | Search | Index

koha1