← Previous day | Today | Next day → | Search | Index
All times shown according to UTC.
Time | Nick | Message |
---|---|---|
12:46 | kados | sanspach: hi there |
12:46 | sanspach | hey! |
12:46 | kados | sanspach: it seems I'm having some problems with the data |
12:46 | zebra is complaining when I index | |
12:46 | but I don't have any details yet | |
12:46 | I'm thinking of running a check on the records using MARC::Record | |
12:46 | later today | |
12:46 | sanspach | there may be some usmarc rules that aren't followed |
12:47 | I'm thinking in particular that there may be repeated 001 fields | |
12:47 | kados | sanspach: it's strange since it gets through quite a few records before crashing |
12:47 | hmmm, that might be a problem | |
12:48 | sanspach | our system doesn't natively store in MARC and therefore doesn't enforce usmarc rules |
12:48 | I knew I needed to strip out our non-standard subfields | |
12:48 | the newer records (created on current system, ca. 4.5 yrs) would have only single 001 | |
12:49 | but records imported from old system went through a merge algorithm and ended up with multiple 001 fields | |
12:49 | didn't think about it until after I'd sent the file | |
12:49 | kados | interesting |
12:50 | do you think it would be easier to modif them from the big raw file or on your end? | |
12:50 | sanspach | depends on your tools; I can't natively work with them in MARC, so I do the edits then convert |
12:51 | if you can work with them in MARC, it might be better for you to manipulate within the file | |
12:52 | also, when I looked at bits of the files I noticed that fields aren't always in usmarc order-- | |
12:52 | specifically 008 seems to be in odd places (sometimes at end) | |
12:54 | kados | I'll give it a shot and if I can't do it I'll let you know |
12:55 | sanspach | great; I've got the data all extracted now, so it will just be a matter of re-parsing the records and converting |
12:55 | kados | sweet |
13:04 | sanspach: if it's not extracted as MARC what is it extracted as? | |
13:04 | (out of curiosity) | |
13:04 | sanspach | flat ascii! (I live and die by perl but I use activestate on win32, so Marc::... aren't available) |
13:10 | kados | sanspach: right |
13:10 | sanspach: so how big is the flat ascii file? | |
13:10 | sanspach: it might actually be easier for me to import that with MARC::Record (as it will automatically enforce correct MARC syntax) | |
13:11 | sanspach | don't have one (merged them after converting to MARC) but could easily do it; in fact, |
13:11 | I could remove the duplicate 001's as I'm merging | |
13:14 | kados | hmmm |
13:14 | well it's up to you | |
13:14 | I owe you already for providing the records ;-) | |
13:15 | so a little work to tweak them isn't really a problem | |
13:15 | on the other hand, if you've got the proc time and bandwidth to do another export and ftp that'd be ok too ;-) | |
13:15 | sanspach | I'll try to figure out good compression; re-sending them in ASCII is going to be no problem at all! (MARC's the hard part) |
13:15 | kados | gzip is pretty good |
13:15 | if you'd like to cut back on bandwidth | |
13:19 | sanspach | can MARC::Record read them in from LC's marcbreaker/marcmaker formate? |
13:21 | kados | no idea |
13:36 | sanspach | kados: just reviewed MARC::Record docs at cpan and it looks like those tools are for MARC records |
13:37 | so you have a script that reads in flat files and does the creating? | |
13:39 | kados | sortof ... I can use MARC::Record to take data in any format and feed it in to construct a valid MARC record |
13:40 | and export as iso2709 | |
13:40 | I've done this in the past for various projects | |
13:40 | like Koha's Z39.50 server | |
13:40 | sanspach | OK; marcbreaker has three idiosyncrasies: |
13:40 | 1) each line (=field) starts with = character | |
13:41 | 2) next comes tag (leader has name LDR rather than 000) and two spaces | |
13:41 | 3) next comes indicators with spaces substituted by \ character (backslash) | |
13:43 | each line is thus /^=(LDR|\d{3}) (.*)$/ | |
13:43 | with $1 being tag and | |
13:44 | $2 being data (where tag<10, all data, tag>9 /^(.)(.)(.*)$/ for ind1=$1,ind2=$2,field=$3) | |
13:46 | OK, done; I've removed dup 001 (can't say for sure tag order is up to standard); text file slightly smaller | |
13:47 | than MARC file was (makes sense--no directories) | |
13:47 | kados | right |
13:57 | sanspach: if you take a look at http://liblime.com/zap/advanced.html you can see the latest results I'm getting from the data | |
13:57 | sanspach: it looks like the search is working but the data coming back isn't displaying normally | |
13:57 | (choose the LibLime to see your data) | |
13:58 | sanspach | hmm |
13:58 | kados | (also notice that it's extremely fast ;-) |
13:58 | (which is good news) | |
13:58 | i'd be interested in comparing it's speed and results to your current system | |
13:58 | do you have a link for that? | |
14:00 | sanspach | specs for z39.50 connection are at http://kb.iu.edu/data/ajhr.html |
14:01 | kados | k ... just a sec |
14:03 | heh ... ok ... try that | |
14:04 | so the result set numbers aren't adding up | |
14:04 | interestingly | |
14:04 | sanspach | yeah, saw that from my favorite author search (durrenmatt) |
14:04 | looks like field and/or record boundaries are all messed up | |
14:04 | kados | yea probably |
14:05 | sanspach | maybe from multiple 001s? |
14:05 | kados | could be ... wanna send me the updated one and we'll try that? |
14:05 | sanspach | working on compressing now |
14:05 | kados | cool |
14:07 | sanspach: it'll be neat to compare Indiana's Zserver to Zap/Zebra | |
14:07 | sanspach | our server's big (/fast) but I'm not sure how optimized we are for z39.50 connections--that's never been very high priority |
14:08 | kados | sanspach: esp since you're prolly paying about 4-6K per year for that module |
14:09 | sanspach | mostly 'cause we think we need it (as state inst. / major research lib. / etc.) not 'cause we actually want to support it |
14:09 | don't think we've got anybody knockin' down our door when it goes down! | |
14:09 | kados | right ... still ... it'd be neat if you were able to propose cutting back on the ILS budget a bit |
14:21 | sanspach | kados: compressed file 26% of original; ftp begun but will take ca. 40 minutes |
14:22 | kados | sweet |
14:23 | let me know when it's done | |
14:23 | (FYI the indexing takes about 4 min too) | |
14:23 | s/4/40/ | |
14:23 | sanspach | will do |
14:24 | still slays me it goes so fast | |
15:00 | kados: ftp is done, right on schedule; let me know if there are any problems with the file or record format | |
15:02 | kados | sanspach: sweet ... I'll get started on the indexing |
15:04 | unzipping now | |
15:04 | tar -xzvf /home/sanspach/all.tar.gz | |
15:04 | all.txt | |
15:04 | tar: Skipping to next header | |
15:04 | tar: Archive contains obsolescent base-64 headers | |
15:07 | sanspach | working on it; google says common error; workaround should be possible... |
15:09 | kados | sanspach: tar: Read 3790 bytes from /home/sanspach/all.tar.gz |
15:10 | tar: Error exit delayed from previous errors | |
15:10 | sanspach: any clue why that's happening? | |
15:11 | sanspach | 'cause I used a win32 tool to tar/gzip?! |
15:11 | kados | could be :-( |
15:11 | sanspach | workaround is to unzip first then tar, but I'm seeing an error there, too; but maybe it will finish ok |
15:12 | ls | |
15:12 | kados | all.txt |
15:12 | sanspach | oops, wrong window :) |
15:12 | kados | hehe |
15:13 | so all.txt is it? | |
15:13 | sanspach | not good; the all.txt file should be about same size as all.tar (ever so slightly smaller: without tar header) |
15:13 | way too small--it is choking partway through or something | |
15:14 | kados | right .. |
15:14 | look at the output from tail | |
15:15 | it's choking here: | |
15:15 | =505 1\[v. 1.] Theoretical and empiri | |
15:15 | for some reason | |
15:15 | sanspach | data is probably irrelevant; most likely bad length from header, etc. |
15:16 | kados | fair enough |
15:24 | sanspach | kados: the all.tar file should be good to use if you can just strip the first few bytes |
15:24 | maybe read in first line and dump everything before the = that is the start of the data? | |
15:25 | don't know what text editing tools you might have that can handle file that large; don't want to read it all into memory! | |
15:31 | kados | grep, sed, awk, bash ;-) |
15:31 | perl even ;-) | |
15:36 | sed 's/*=//' all.tar | |
15:36 | I'm making a backup first | |
15:36 | :-) | |
15:40 | hmmm, seems it didn't work | |
16:05 | sanspach | OK, think I've got it with perl |
16:09 | kados | sweet |
16:09 | let me know when i's done uncompressing | |
16:12 | cool ... done eh? | |
16:12 | sanspach | looks like the right size... |
16:12 | seems right | |
16:12 | kados | ok ... I"m gonna index it (I'll move it first) |
16:12 | sanspach | sorry for the hassle |
16:14 | kados | hmmm, strange error: 14:05:08-07/06 ../../index/zebraidx(32333) [warn] records/sample-records:0 MARC record length < 25, is 0 |
16:14 | it's not indexing the file | |
16:15 | sanspach | it's flat ascii, not marc |
16:15 | kados | well that would explain it ;-) |
16:24 | sanspach: so ... just so I have this straight | |
16:24 | the file is currently in MARCBreaker format | |
16:24 | you already tried using MARCMaker and it didn't produce valid MARC records | |
16:25 | so now we're going to try to use MARC::Record to create a valid MARC record | |
16:25 | sound right? | |
16:25 | sanspach | well, only sort of |
16:25 | I had separate small files which I converted into MARC | |
16:26 | I'm guessing the problem was the repeated 001's | |
16:26 | kados | using Marc Maker for the conversion (and join) |
16:26 | right | |
16:27 | sanspach | I used MarcMaker for the conversion; I joined them afterward |
16:27 | kados | ok ... how big was each file (approx) |
16:28 | sanspach | 100mgb |
16:54 | kados | sanspach: I'm headed home now ... I hacked together a start of a script to convert from marcmaker to usmarc using MARC::Record and I'll try to finish it up tonight |
16:55 | sanspach | OK; if I think of anything brilliant, I'll let you know :) |
16:56 | kados | sanspach: sounds good ;-) |
02:37 | osmoze | bonjour |
02:37 | paul | salut js |
02:37 | osmoze | coucou paul |
02:37 | t as deux minutes ? | |
02:38 | paul | vas y, je t'écoute |
02:39 | osmoze | j ai une question : Il y a t il un moyen simple pour avoir la liste des retard dans overdue du type personne1 --> livre1,livre2,livre3 au lieu de personne1--> livre 1; personne1->livre2 etc etc |
02:39 | ceci pour un mailing | |
02:40 | paul | dans la prochaine version, on a bien la liste des ouvrages en retard. |
02:40 | osmoze | j avais fait un petit script php, mais j ai une erreur que je ne peux reparer :( |
02:40 | alors j en viens a vos services :) | |
02:40 | paul | c'était un manque évident. |
02:40 | osmoze | comment ca ? |
02:40 | il y avait deja un module (overdue.pl) | |
02:41 | paul | le overduenotice.pl a été amélioré. |
02:41 | il envoie un mail à tous les lecteurs ayant un mail pour leur donner leur liste de retard. | |
02:41 | et il envoie un mail à la bibliothèque avec tous les lecteurs ayant des retards mais pas de mail | |
02:41 | osmoze | le probleme est qu on ne peux pas se servir de ces données |
02:42 | du mail a la bibliotheque | |
02:42 | car mon but etait de creer une lettre type mailing et d inclure les noms automatiquement | |
02:42 | cependant, ca ne marche qu avec une base ou un fichier text bien etabli | |
02:43 | paul | exact. En fait, il faudrait que l'on mette en PJ un fichier CSV avec les infos |
02:43 | osmoze | tout a fait cela |
02:43 | mais il y aura toujours le probleme de la redondance des noms | |
02:44 | pour les emprunteur qui ont plus de un livre en retard | |
02:46 | paul | oui. On pourrait imaginer faire ca avec les titres séparés par une , |
02:46 | ils apparaitraient sur une seule ligne dans le mailing. | |
02:48 | osmoze | c est exactement ce que je cherche :) |
02:50 | comme cela, je peux faire un mailing rapide et efficace pour l envoi de lettre ^^ | |
02:50 | paul | avec OpenOffice ? |
02:51 | si c'est le cas, faudra mettre ce doc dans le CVS. | |
02:51 | osmoze | J avais tester avec word, je vais tester avec openoffice |
02:51 | (les machines de l accueil son sous windows + word, mais j exclu pas de mettre openoffice-win32) | |
02:53 | hdl_away | hi. |
02:53 | osmoze | hello hdl |
02:54 | paul, tu n aurais pas un petit fichier csv tout fait sous la main ? ^^ | |
02:54 | hdl | osmoze : c'est juste un fichier texte séparé par des points virgules. ;) |
02:55 | osmoze | donc c'est bon |
02:55 | ca marche bien | |
04:55 | jean | hi/bonjour |
05:04 | paul | mercredi, c'est la journée des enfants ET la journée de Jean sur Koha ;-) |
05:04 | bonjour Jean. Tu vas bien ? | |
05:05 | jean | :) |
05:05 | oui tres bien | |
05:06 | paul | ton doc sur l'optimisation avance bien ? |
05:06 | jean | je pense release aujourd'hui |
05:06 | paul | super ! |
05:06 | je suis impatient de le lire. | |
05:06 | jean | mais j'ai travaille qu'un jour par semaine dessus et encore avec de multiples ralentissement |
05:06 | c'est pour ca que ca a mi un peu de temps :) | |
05:07 | bah en tout cas je compte sur toi pour me donner ton avis | |
05:07 | paul | tu peux y compter. |
05:10 | bon, allez, à table. A tout à l'heure | |
10:29 | Sylvain | hi |
10:29 | hdl | hi Sylvain. |
10:34 | Sylvain | is it envisaged to include xml in koha in any way ? |
10:35 | hdl | No, as far as I know. Are you interested in doing it ;) ? |
10:36 | Sylvain | no, just because a customer was asking ... I hadn't heard anything about it so I wondered |
10:38 | paul | sylvain : "include xml" is not enough. |
10:39 | what does he want : exporting XML, importing xml, showing xml... | |
10:39 | hdl | ... using xml ? |
10:39 | Sylvain | I know paul it's not enough :) But the customer is a librarian and didn't say more in its mail. So I was asking if anything was envisaged in xml |
10:40 | paul | zebra cause xml pas trop mal il semble... |
10:40 | Sylvain | mouais, rien de bien précis en tout cas concernant xml alors |
10:41 | paul | on est en phase "bazar", et la roadmap devrait être prête d'ici la fin du mois |
10:42 | Sylvain | ok |
10:42 | et la 2.2.3 une date précise ? (peut être passé sur les ML mais j'ai pas fait gaffeà | |
10:42 | ) | |
10:43 | paul | je vais l'annoncer pour la semaine prochaine. |
10:43 | il reste surtout de la trad à faire, et quelques peaufinages éventuellement. | |
10:43 | (par exemple, faut que je copie tes plugins unimarc dans la 2.2 | |
10:43 | ) | |
10:45 | Sylvain | c'est matthieu qui a fait ça mais ok |
11:13 | owen | Hi sanspach |
11:33 | Sylvain | can someone explain me the meaning of "datelastseen" ? |
11:36 | hdl | latest date when you see the book... For Inventory purpose IMHO. |
11:36 | Sylvain | last time it was "barcoded" ? |
11:37 | scanné à la douchette ;) | |
11:37 | sanspach | hi owen (sorry, started IRC then walked away!) |
11:38 | hdl | Pas seulement, vois l'onglet inventaire/Récollement des stats ;) |
11:38 | English : Not Only, hav a look at Inventory/StockTaking in reports | |
11:38 | Sylvain | "01:53 +1d chris datelastseen is the last time the item was issued, or returned, or transfered between branches" |
11:39 | hdl the stats are too powerful and have too many things, I haven't had time yet to explorate them ;) | |
11:39 | hdl | That's why I told you about that. |
11:40 | And *I* am not the only one to have worked on that ;) | |
11:40 | Sylvain | ok, I thought you had all done alone |
11:40 | hdl | So long ;) |
11:47 | owen | sanspach, you're in Indiana? |
11:47 | sanspach | yes |
11:51 | kados | sanspach: I tried indexing the new marc file |
11:51 | sanspach: results are displaying weirdly | |
11:51 | similar to before | |
11:51 | http://liblime.com/zap/advanced.html | |
11:52 | sanspach | kados: yes, I see; very odd |
11:52 | almost like the directory is off and the fields are getting all mangled | |
11:52 | kados | yea |
11:53 | sanspach | only this time the MARC was generated w/MARC::Record |
11:53 | kados | right ... so it should be valid |
11:53 | sanspach | how can both (very different) methods produce the same problems? |
11:53 | kados | well it may be the indexer |
11:53 | but I haven't had trouble with it using other MARC records | |
11:54 | sanspach | do you want batches of smaller sections of the db? I still have the original 54 files |
11:54 | kados | sure ... send em over |
11:54 | maybe if we do them one-by-one we can catch the problem | |
11:55 | sanspach | I'll ftp in batches of 10, in numeric order (you'll see the pattern) |
11:55 | kados | k ... |
← Previous day | Today | Next day → | Search | Index