IRC log for #koha, 2006-12-12

All times shown according to UTC.

Time	Nick	Message
14:58	thd	kados: are you there?
15:00	kados	hi thd
15:00		thd: what can I do for you? ;-)
15:00	thd	hello
15:00	dewey	privet, thd
22:55	kados	thd: here are some of the errors:
22:55		#161 has a problem: Premature end of data in tag subfield line 49
22:55		Premature end of data in tag datafield line 48
22:55		Premature end of data in tag record line 7
22:55		Premature end of data in tag collection line 2 at /usr/local/lib/perl/5.8.4/XML/LibXML/SAX.pm line 64
22:55		at /usr/local/share/perl/5.8.4/MARC/File/XML.pm line 450
22:57	thd	kados:that is a record parsing error
22:58		we need to get those to Ed so he can fix the underlying library
22:59	kados	did he make promises to that effect?
22:59		not to me anyway ;-)
23:00	thd	kados: I had a very good feeling about how he expressed something similar if we had reproducible problems which we had made a reasonable effort to trace
23:00	kados	what constitutes reasonable effort?
23:01	thd	kados: well testing to be certain that the record itself is not unreasonably misencoded
23:02	kados	it doesn't throw an error in marcdump
23:02		is that enough testing? :-)
23:03	thd	kados: what happens if you use the Koha rel 2_2 bulkmarcimport,pl with logging set to maximum verbosity -vv
23:04		kados: that would use MARC record instead of MARC-XML
23:05	kados	there is only one perl method to convert to utf8 using the codetables.xml file provided by LOC
23:05		that's provided by MARC::File::XML
23:06	thd	kados: is that what ARC
23:07		kados: is that what MARC::Charset uses?
23:08	kados	no, but it uses MARC::Charset ;-)
23:10	thd	kados: I think that using the MARC-8 to UTF-8 function which I added to bulkmarcimport.pl may convert the record without triggering SAX errors
23:11		kados: If we can convert the encoding first without involving XML we eliminate most sources of error
23:12	kados	thd: but your code doesn't utilize the codetables.xml file
23:12		thd: that MARC::Charset uses
23:12		at least I don't think it does
23:12	thd	my code uses MARC::Charset
23:13	kados	and I'm also not sure if it properly handles the conversion of multi byte characters, especially handling of 'combining characters' since MARC-8 and UTF- ordering is so different
23:13	thd	kados: the Afognak job was filled with errors like what you reported just now until I wrote my own MARC-8 conversion code
23:15		kados: I did still have problems but they were reduced from many many to about 3 records in 500
23:16		kados: the resulting records all looked fine on my system so I assumed they had worked correctly
23:17		kados: they did not look fine on Afognak's system but after testing extensively I concluded that my code was not at fault
23:19		kados: I think the problem that MARC::File::XML is having is parsing the records first before converting to UTF-8
23:19	kados	it looks like I can just call marc8_to_utf8 now
23:20		to convert each subfield
23:20	thd	kados: MARC::Record does not care about encoding
23:21		kados: so using MARC::Record allows you to open and step through the record without having immediate errors
23:22	kados	I will look at your code again
23:23	thd	kados: with M::F::X exclusively if you want to open the record without errors you must convert to UTF-8 first but if that is your only tool for UTF-8 conversion then you have a problem
23:29		kados: I think importing with M::F::X should be a two stage process ...
23:30		first open the record with MARC::Record and convert encoding
23:30		save the record with MARC::Record
23:31		then use MARC::File::XML to do something after the record is safely converted and now parsable in XML
23:36		after skipping the 3 problem records there were 2 records with a character or 2 which may have been invalid MARC-8 and could not be converted to UTF-8 so the whole subfield was deleted with the code I had at the time
23:37		so that was about 5 problems in 500 records
23:45		kados: does record # 161 pass after using marc8_to_utf8?
23:46	kados	thd: I haven't had a chance to write it yet
23:46		thd: got distracted ;-)
00:21	thd	kados: look at the statement of responsibility for LCCN 79106336
00:21		Dating the Icelandic sagas
00:22		http://zoomdemo.liblime.com/bib/1972
00:26		kados: do you see an Asian language glyph in the statement of responsibility for http://zoomdemo.liblime.com/bib/1972
00:26		?
05:44	hdl	hi
10:40	kados	paul_lunch: are you around?
10:41	paul	it's almost 3PM in France. fortunatly, lunch over ;-)
10:41	kados	heh
10:41	paul	'morning joshua
10:41	kados	hi ...
10:41		http://zoomdemo.liblime.com/search?q=test
10:41		click on a detail page
10:41	paul	done.
10:42	kados	it's built in about 5 lines of perl
10:42	paul	and I bet 10000 that it's marc21 specific ;-)
10:42	kados	my $xmlrecord = C4::Biblio::getRecord("biblioserver","Local-number=$biblionumber");
10:42		my $xslfile = "/home/kohacat/etc/xslt/MARC21slim2English.xsl";
10:42		my $parser = XML::LibXML->new();
10:42		my $xslt = XML::LibXSLT->new();
10:42		my $source = $parser->parse_string($xmlrecord);
10:42		my $style_doc = $parser->parse_file($xslfile);
10:42		my $stylesheet = $xslt->parse_stylesheet($style_doc);
10:42		my $results = $stylesheet->transform($source);
10:42		my $newxmlrecord = $stylesheet->output_string($results);
10:42		more than 5 ...
10:43		but it's the first time I've attempted to use XSLT to format a MARCXML record
10:44	paul	it's really great & powerful. what does the xslt look like ?
10:44	kados	http://www.loc.gov/standards/m[…]1slim2English.xsl
10:45		I have tested several of the xslt files on this page: http://www.loc.gov/standards/marcxml/
10:45		they can all be processed as above
10:45	paul	loc.gov don't answer on this side of the ocean...
10:45		(or very very very slowly...)
10:45	kados	weird
10:45	paul	answer in 50seconds
10:46	kados	zoomdemo.liblime.com/MARC21slim2English.xsl
10:47	paul	which perl packages does it requires ?
10:47		how fast/slow is it ? (if you tested for speed)
10:48	kados	I haven't tested speed ... it requires XML::LibXML and XML::LibXSLT
10:49		I decided to just play with it to see if it's worth pursuing
10:49		and it turns out the code to do the transformation is simpler than I thought
10:49		owen: time to brush up on your XSLT ;-)
10:49	owen	I think to brush up I'd need to have some to brush.
10:51	kados	hdl mentioned XSLT at our kohacon
10:53	hdl	kados paul : the only pb is to build good and thorough xsl files from frameworks.... which could be accomplished with a good xml framework description.
10:53	kados	hi hdl
10:54		hdl: do you think we could replace a framework with XSLT?:
10:54	hdl	yes.
10:54		But not at the moment.
10:54		Would take some time.
10:55		1st step would be to define a good DTD for frameworks.
10:55		That would be a base for input, Output and sumaries.
10:56		But then parsing xml frameworks to produce xlst would be nice.
10:56		And xslt would parse xml records to produce correct HTML.
10:57		But need is to be quite precise in Framework description
10:57	kados	didn't we have such a definition for opencataloger?
10:57		I thought toins created it
10:58	paul	toins created something close from our actual frameworks, you're right

Today | Next day → | Search | Index