Today | Next day → | Search | Index
All times shown according to UTC.
Time | Nick | Message |
---|---|---|
14:58 | thd | kados: are you there? |
15:00 | kados | hi thd |
15:00 | thd: what can I do for you? ;-) | |
15:00 | thd | hello |
15:00 | dewey | privet, thd |
22:55 | kados | thd: here are some of the errors: |
22:55 | #161 has a problem: Premature end of data in tag subfield line 49 | |
22:55 | Premature end of data in tag datafield line 48 | |
22:55 | Premature end of data in tag record line 7 | |
22:55 | Premature end of data in tag collection line 2 at /usr/local/lib/perl/5.8.4/XML/LibXML/SAX.pm line 64 | |
22:55 | at /usr/local/share/perl/5.8.4/MARC/File/XML.pm line 450 | |
22:57 | thd | kados:that is a record parsing error |
22:58 | we need to get those to Ed so he can fix the underlying library | |
22:59 | kados | did he make promises to that effect? |
22:59 | not to me anyway ;-) | |
23:00 | thd | kados: I had a very good feeling about how he expressed something similar if we had reproducible problems which we had made a reasonable effort to trace |
23:00 | kados | what constitutes reasonable effort? |
23:01 | thd | kados: well testing to be certain that the record itself is not unreasonably misencoded |
23:02 | kados | it doesn't throw an error in marcdump |
23:02 | is that enough testing? :-) | |
23:03 | thd | kados: what happens if you use the Koha rel 2_2 bulkmarcimport,pl with logging set to maximum verbosity -vv |
23:04 | kados: that would use MARC record instead of MARC-XML | |
23:05 | kados | there is only one perl method to convert to utf8 using the codetables.xml file provided by LOC |
23:05 | that's provided by MARC::File::XML | |
23:06 | thd | kados: is that what ARC |
23:07 | kados: is that what MARC::Charset uses? | |
23:08 | kados | no, but it uses MARC::Charset ;-) |
23:10 | thd | kados: I think that using the MARC-8 to UTF-8 function which I added to bulkmarcimport.pl may convert the record without triggering SAX errors |
23:11 | kados: If we can convert the encoding first without involving XML we eliminate most sources of error | |
23:12 | kados | thd: but your code doesn't utilize the codetables.xml file |
23:12 | thd: that MARC::Charset uses | |
23:12 | at least I don't think it does | |
23:12 | thd | my code uses MARC::Charset |
23:13 | kados | and I'm also not sure if it properly handles the conversion of multi byte characters, especially handling of 'combining characters' since MARC-8 and UTF- ordering is so different |
23:13 | thd | kados: the Afognak job was filled with errors like what you reported just now until I wrote my own MARC-8 conversion code |
23:15 | kados: I did still have problems but they were reduced from many many to about 3 records in 500 | |
23:16 | kados: the resulting records all looked fine on my system so I assumed they had worked correctly | |
23:17 | kados: they did not look fine on Afognak's system but after testing extensively I concluded that my code was not at fault | |
23:19 | kados: I think the problem that MARC::File::XML is having is parsing the records first before converting to UTF-8 | |
23:19 | kados | it looks like I can just call marc8_to_utf8 now |
23:20 | to convert each subfield | |
23:20 | thd | kados: MARC::Record does not care about encoding |
23:21 | kados: so using MARC::Record allows you to open and step through the record without having immediate errors | |
23:22 | kados | I will look at your code again |
23:23 | thd | kados: with M::F::X exclusively if you want to open the record without errors you must convert to UTF-8 first but if that is your only tool for UTF-8 conversion then you have a problem |
23:29 | kados: I think importing with M::F::X should be a two stage process ... | |
23:30 | first open the record with MARC::Record and convert encoding | |
23:30 | save the record with MARC::Record | |
23:31 | then use MARC::File::XML to do something after the record is safely converted and now parsable in XML | |
23:36 | after skipping the 3 problem records there were 2 records with a character or 2 which may have been invalid MARC-8 and could not be converted to UTF-8 so the whole subfield was deleted with the code I had at the time | |
23:37 | so that was about 5 problems in 500 records | |
23:45 | kados: does record # 161 pass after using marc8_to_utf8? | |
23:46 | kados | thd: I haven't had a chance to write it yet |
23:46 | thd: got distracted ;-) | |
00:21 | thd | kados: look at the statement of responsibility for LCCN 79106336 |
00:21 | Dating the Icelandic sagas | |
00:22 | http://zoomdemo.liblime.com/bib/1972 | |
00:26 | kados: do you see an Asian language glyph in the statement of responsibility for http://zoomdemo.liblime.com/bib/1972 | |
00:26 | ? | |
05:44 | hdl | hi |
10:40 | kados | paul_lunch: are you around? |
10:41 | paul | it's almost 3PM in France. fortunatly, lunch over ;-) |
10:41 | kados | heh |
10:41 | paul | 'morning joshua |
10:41 | kados | hi ... |
10:41 | http://zoomdemo.liblime.com/search?q=test | |
10:41 | click on a detail page | |
10:41 | paul | done. |
10:42 | kados | it's built in about 5 lines of perl |
10:42 | paul | and I bet 10000 that it's marc21 specific ;-) |
10:42 | kados | my $xmlrecord = C4::Biblio::getRecord("biblioserver","Local-number=$biblionumber"); |
10:42 | my $xslfile = "/home/kohacat/etc/xslt/MARC21slim2English.xsl"; | |
10:42 | my $parser = XML::LibXML->new(); | |
10:42 | my $xslt = XML::LibXSLT->new(); | |
10:42 | my $source = $parser->parse_string($xmlrecord); | |
10:42 | my $style_doc = $parser->parse_file($xslfile); | |
10:42 | my $stylesheet = $xslt->parse_stylesheet($style_doc); | |
10:42 | my $results = $stylesheet->transform($source); | |
10:42 | my $newxmlrecord = $stylesheet->output_string($results); | |
10:42 | more than 5 ... | |
10:43 | but it's the first time I've attempted to use XSLT to format a MARCXML record | |
10:44 | paul | it's really great & powerful. what does the xslt look like ? |
10:44 | kados | http://www.loc.gov/standards/m[…]1slim2English.xsl |
10:45 | I have tested several of the xslt files on this page: http://www.loc.gov/standards/marcxml/ | |
10:45 | they can all be processed as above | |
10:45 | paul | loc.gov don't answer on this side of the ocean... |
10:45 | (or very very very slowly...) | |
10:45 | kados | weird |
10:45 | paul | answer in 50seconds |
10:46 | kados | zoomdemo.liblime.com/MARC21slim2English.xsl |
10:47 | paul | which perl packages does it requires ? |
10:47 | how fast/slow is it ? (if you tested for speed) | |
10:48 | kados | I haven't tested speed ... it requires XML::LibXML and XML::LibXSLT |
10:49 | I decided to just play with it to see if it's worth pursuing | |
10:49 | and it turns out the code to do the transformation is simpler than I thought | |
10:49 | owen: time to brush up on your XSLT ;-) | |
10:49 | owen | I think to brush up I'd need to have some to brush. |
10:51 | kados | hdl mentioned XSLT at our kohacon |
10:53 | hdl | kados paul : the only pb is to build good and thorough xsl files from frameworks.... which could be accomplished with a good xml framework description. |
10:53 | kados | hi hdl |
10:54 | hdl: do you think we could replace a framework with XSLT?: | |
10:54 | hdl | yes. |
10:54 | But not at the moment. | |
10:54 | Would take some time. | |
10:55 | 1st step would be to define a good DTD for frameworks. | |
10:55 | That would be a base for input, Output and sumaries. | |
10:56 | But then parsing xml frameworks to produce xlst would be nice. | |
10:56 | And xslt would parse xml records to produce correct HTML. | |
10:57 | But need is to be quite precise in Framework description | |
10:57 | kados | didn't we have such a definition for opencataloger? |
10:57 | I thought toins created it | |
10:58 | paul | toins created something close from our actual frameworks, you're right |
Today | Next day → | Search | Index