IRC log for #koha, 2005-06-09

← Previous day | Today | Next day → | Search | Index

All times shown according to UTC.

Time Nick Message
12:46 kados sanspach: hi there
12:46 sanspach hey!
12:46 kados sanspach: it seems I'm having some problems with the data
12:46 zebra is complaining when I index
12:46 but I don't have any details yet
12:46 I'm thinking of running a check on the records using MARC::Record
12:46 later today
12:46 sanspach there may be some usmarc rules that aren't followed
12:47 I'm thinking in particular that there may be repeated 001 fields
12:47 kados sanspach: it's strange since it gets through quite a few records before crashing
12:47 hmmm, that might be a problem
12:48 sanspach our system doesn't natively store in MARC and therefore doesn't enforce usmarc rules
12:48 I knew I needed to strip out our non-standard subfields
12:48 the newer records (created on current system, ca. 4.5 yrs) would have only single 001
12:49 but records imported from old system went through a merge algorithm and ended up with multiple 001 fields
12:49 didn't think about it until after I'd sent the file
12:49 kados interesting
12:50 do you think it would be easier to modif them from the big raw file or on your end?
12:50 sanspach depends on your tools; I can't natively work with them in MARC, so I do the edits then convert
12:51 if you can work with them in MARC, it might be better for you to manipulate within the file
12:52 also, when I looked at bits of the files I noticed that fields aren't always in usmarc order--
12:52 specifically 008 seems to be in odd places (sometimes at end)
12:54 kados I'll give it a shot and if I can't do it I'll let you know
12:55 sanspach great; I've got the data all extracted now, so it will just be a matter of re-parsing the records and converting
12:55 kados sweet
13:04 sanspach: if it's not extracted as MARC what is it extracted as?
13:04 (out of curiosity)
13:04 sanspach flat ascii!   (I live and die by perl but I use activestate on win32, so Marc::... aren't available)
13:10 kados sanspach: right
13:10 sanspach: so how big is the flat ascii file?
13:10 sanspach: it might actually be easier for me to import that with MARC::Record (as it will automatically enforce correct MARC syntax)
13:11 sanspach don't have one (merged them after converting to MARC) but could easily do it; in fact,
13:11 I could remove the duplicate 001's as I'm merging
13:14 kados hmmm
13:14 well it's up to you
13:14 I owe you already for providing the records ;-)
13:15 so a little work to tweak them isn't really a problem
13:15 on the other hand, if you've got the proc time and bandwidth to do another export and ftp that'd be ok too ;-)
13:15 sanspach I'll try to figure out good compression; re-sending them in ASCII is going to be no problem at all! (MARC's the hard part)
13:15 kados gzip is pretty good
13:15 if you'd like to cut back on bandwidth
13:19 sanspach can MARC::Record read them in from LC's marcbreaker/marcmaker formate?
13:21 kados no idea
13:36 sanspach kados: just reviewed MARC::Record docs at cpan and it looks like those tools are for MARC records
13:37 so you have a script that reads in flat files and does the creating?
13:39 kados sortof ... I can use MARC::Record to take data in any format and feed it in to construct a valid MARC record
13:40 and export as iso2709
13:40 I've done this in the past for various projects
13:40 like Koha's Z39.50 server
13:40 sanspach OK; marcbreaker has three idiosyncrasies:
13:40 1) each line (=field) starts with = character
13:41 2) next comes tag (leader has name LDR rather than 000) and two spaces
13:41 3) next comes indicators with spaces substituted by \ character (backslash)
13:43 each line is thus /^=(LDR|\d{3})  (.*)$/
13:43 with $1 being tag and
13:44 $2 being data (where tag<10, all data, tag>9 /^(.)(.)(.*)$/ for ind1=$1,ind2=$2,field=$3)
13:46 OK, done; I've removed dup 001 (can't say for sure tag order is up to standard); text file slightly smaller
13:47 than MARC file was (makes sense--no directories)
13:47 kados right
13:57 sanspach: if you take a look at http://liblime.com/zap/advanced.html you can see the latest results I'm getting from the data
13:57 sanspach: it looks like the search is working but the data coming back isn't displaying normally
13:57 (choose the LibLime to see your data)
13:58 sanspach hmm
13:58 kados (also notice that it's extremely fast ;-)
13:58 (which is good news)
13:58 i'd be interested in comparing it's speed and results to your current system
13:58 do you have a link for that?
14:00 sanspach specs for z39.50 connection are at http://kb.iu.edu/data/ajhr.html
14:01 kados k ... just a sec
14:03 heh ... ok ... try that
14:04 so the result set numbers aren't adding up
14:04 interestingly
14:04 sanspach yeah, saw that from my favorite author search (durrenmatt)
14:04 looks like field and/or record boundaries are all messed up
14:04 kados yea probably
14:05 sanspach maybe from multiple 001s?
14:05 kados could be ... wanna send me the updated one and we'll try that?
14:05 sanspach working on compressing now
14:05 kados cool
14:07 sanspach: it'll be neat to compare Indiana's Zserver to Zap/Zebra
14:07 sanspach our server's big (/fast) but I'm not sure how optimized we are for z39.50 connections--that's never been very high priority
14:08 kados sanspach: esp since you're prolly paying about 4-6K per year for that module
14:09 sanspach mostly 'cause we think we need it (as state inst. / major research lib. / etc.) not 'cause we actually want to support it
14:09 don't think we've got anybody knockin' down our door when it goes down!
14:09 kados right ... still ... it'd be neat if you were able to propose cutting back on the ILS budget a bit
14:21 sanspach kados: compressed file 26% of original; ftp begun but will take ca. 40 minutes
14:22 kados sweet
14:23 let me know when it's done
14:23 (FYI the indexing takes about 4 min too)
14:23 s/4/40/
14:23 sanspach will do
14:24 still slays me it goes so fast
15:00 kados: ftp is done, right on schedule; let me know if there are any problems with the file or record format
15:02 kados sanspach: sweet ... I'll get started on the indexing
15:04 unzipping now
15:04 tar -xzvf /home/sanspach/all.tar.gz
15:04 all.txt
15:04 tar: Skipping to next header
15:04 tar: Archive contains obsolescent base-64 headers
15:07 sanspach working on it; google says common error; workaround should be possible...
15:09 kados sanspach: tar: Read 3790 bytes from /home/sanspach/all.tar.gz
15:10 tar: Error exit delayed from previous errors
15:10 sanspach: any clue why that's happening?
15:11 sanspach 'cause I used a win32 tool to tar/gzip?!
15:11 kados could be :-(
15:11 sanspach workaround is to unzip first then tar, but I'm seeing an error there, too; but maybe it will finish ok
15:12 ls
15:12 kados all.txt
15:12 sanspach oops, wrong window :)
15:12 kados hehe
15:13 so all.txt is it?
15:13 sanspach not good; the all.txt file should be about same size as all.tar (ever so slightly smaller: without tar header)
15:13 way too small--it is choking partway through or something
15:14 kados right ..
15:14 look at the output from tail
15:15 it's choking here:
15:15 =505  1\[v. 1.] Theoretical and empiri
15:15 for some reason
15:15 sanspach data is probably irrelevant; most likely bad length from header, etc.
15:16 kados fair enough
15:24 sanspach kados: the all.tar file should be good to use if you can just strip the first few bytes
15:24 maybe read in first line and dump everything before the = that is the start of the data?
15:25 don't know what text editing tools you might have that can handle file that large; don't want to read it all into memory!
15:31 kados grep, sed, awk, bash ;-)
15:31 perl even ;-)
15:36 sed 's/*=//' all.tar
15:36 I'm making a backup first
15:36 :-)
15:40 hmmm, seems it didn't work
16:05 sanspach OK, think I've got it with perl
16:09 kados sweet
16:09 let me know when i's done uncompressing
16:12 cool ... done eh?
16:12 sanspach looks like the right size...
16:12 seems right
16:12 kados ok ... I"m gonna index it (I'll move it first)
16:12 sanspach sorry for the hassle
16:14 kados hmmm, strange error: 14:05:08-07/06 ../../index/zebraidx(32333) [warn] records/sample-records:0 MARC record length < 25, is 0
16:14 it's not indexing the file
16:15 sanspach it's flat ascii, not marc
16:15 kados well that would explain it ;-)
16:24 sanspach: so ... just so I have this straight
16:24 the file is currently in MARCBreaker format
16:24 you already tried using MARCMaker and it didn't produce valid MARC records
16:25 so now we're going to try to use MARC::Record to create a valid MARC record
16:25 sound right?
16:25 sanspach well, only sort of
16:25 I had separate small files which I converted into MARC
16:26 I'm guessing the problem was the repeated 001's
16:26 kados using Marc Maker for the conversion (and join)
16:26 right
16:27 sanspach I used MarcMaker for the conversion; I joined them afterward
16:27 kados ok ... how big was each file (approx)
16:28 sanspach 100mgb
16:54 kados sanspach: I'm headed home now ... I hacked together a start of a script to convert from marcmaker to usmarc using MARC::Record and I'll try to finish it up tonight
16:55 sanspach OK; if I think of anything brilliant, I'll let you know :)
16:56 kados sanspach: sounds good ;-)
02:37 osmoze bonjour
02:37 paul salut js
02:37 osmoze coucou paul
02:37 t as deux minutes ?
02:38 paul vas y, je t'écoute
02:39 osmoze j ai une question : Il y a t il un moyen simple pour avoir la liste des retard dans overdue du type personne1 --> livre1,livre2,livre3 au lieu de personne1--> livre 1; personne1->livre2 etc etc
02:39 ceci pour un mailing
02:40 paul dans la prochaine version, on a bien la liste des ouvrages en retard.
02:40 osmoze j avais fait un petit script php, mais j ai une erreur que je ne peux reparer :(
02:40 alors j en viens a vos services :)
02:40 paul c'était un manque évident.
02:40 osmoze comment ca ?
02:40 il y avait deja un module (overdue.pl)
02:41 paul le overduenotice.pl a été amélioré.
02:41 il envoie un mail à tous les lecteurs ayant un mail pour leur donner leur liste de retard.
02:41 et il envoie un mail à la bibliothèque avec tous les lecteurs ayant des retards mais pas de mail
02:41 osmoze le probleme est qu on ne peux pas se servir de ces données
02:42 du mail a la bibliotheque
02:42 car mon but etait de creer une lettre type mailing et d inclure les noms automatiquement
02:42 cependant, ca ne marche qu avec une base ou un fichier text bien etabli
02:43 paul exact. En fait, il faudrait que l'on mette en PJ un fichier CSV avec les infos
02:43 osmoze tout a fait cela
02:43 mais il y aura toujours le probleme de la redondance des noms
02:44 pour les emprunteur qui ont plus de un livre en retard
02:46 paul oui. On pourrait imaginer faire ca avec les titres séparés par une ,
02:46 ils apparaitraient sur une seule ligne dans le mailing.
02:48 osmoze c est exactement ce que je cherche :)
02:50 comme cela, je peux faire un mailing rapide et efficace pour l envoi de lettre ^^
02:50 paul avec OpenOffice ?
02:51 si c'est le cas, faudra mettre ce doc dans le CVS.
02:51 osmoze J avais tester avec word, je vais tester avec openoffice
02:51 (les machines de l accueil son sous windows + word, mais j exclu pas de mettre openoffice-win32)
02:53 hdl_away hi.
02:53 osmoze hello hdl
02:54 paul, tu n aurais pas un petit fichier csv tout fait sous la main ? ^^
02:54 hdl osmoze : c'est juste un fichier texte séparé par des points virgules. ;)
02:55 osmoze donc c'est bon
02:55 ca marche bien
04:55 jean hi/bonjour
05:04 paul mercredi, c'est la journée des enfants ET la journée de Jean sur Koha ;-)
05:04 bonjour Jean. Tu vas bien ?
05:05 jean :)
05:05 oui tres bien
05:06 paul ton doc sur l'optimisation avance bien ?
05:06 jean je pense release aujourd'hui
05:06 paul super !
05:06 je suis impatient de le lire.
05:06 jean mais j'ai travaille qu'un jour par semaine dessus et encore avec de multiples ralentissement
05:06 c'est pour ca que ca a mi un peu de temps :)
05:07 bah en tout cas je compte sur toi pour me donner ton avis
05:07 paul tu peux y compter.
05:10 bon, allez, à table. A tout à l'heure
10:29 Sylvain hi
10:29 hdl hi Sylvain.
10:34 Sylvain is it envisaged to include xml in koha in any way ?
10:35 hdl No, as far as I know. Are you interested in doing it ;) ?
10:36 Sylvain no, just because a customer was asking ... I hadn't heard anything about it so I wondered
10:38 paul sylvain : "include xml" is not enough.
10:39 what does he want : exporting XML, importing xml, showing xml...
10:39 hdl ... using xml ?
10:39 Sylvain I know paul it's not enough :) But the customer is a librarian and didn't say more in its mail. So I was asking if anything was envisaged in xml
10:40 paul zebra cause xml pas trop mal il semble...
10:40 Sylvain mouais, rien de bien précis en tout cas concernant xml alors
10:41 paul on est en phase "bazar", et la roadmap devrait être prête d'ici la fin du mois
10:42 Sylvain ok
10:42 et la 2.2.3 une date précise ? (peut être passé sur les ML mais j'ai pas fait gaffeà
10:42 )
10:43 paul je vais l'annoncer pour la semaine prochaine.
10:43 il reste surtout de la trad à faire, et quelques peaufinages éventuellement.
10:43 (par exemple, faut que je copie tes plugins unimarc dans la 2.2
10:43 )
10:45 Sylvain c'est matthieu qui a fait ça mais ok
11:13 owen Hi sanspach
11:33 Sylvain can someone explain me the meaning of "datelastseen" ?
11:36 hdl latest date when you see the book... For Inventory purpose IMHO.
11:36 Sylvain last time it was "barcoded" ?
11:37 scanné à la douchette ;)
11:37 sanspach hi owen (sorry, started IRC then walked away!)
11:38 hdl Pas seulement, vois l'onglet inventaire/Récollement des stats ;)
11:38 English : Not Only, hav a look at Inventory/StockTaking in reports
11:38 Sylvain "01:53 +1d  chris  datelastseen is the last time the item was issued, or returned, or transfered between branches"
11:39 hdl the stats are too powerful and have too many things, I haven't had time yet to explorate them ;)
11:39 hdl That's why I told you about that.
11:40 And *I* am not the only one to have worked on that ;)
11:40 Sylvain ok, I thought you had all done alone
11:40 hdl So long ;)
11:47 owen sanspach, you're in Indiana?
11:47 sanspach yes
11:51 kados sanspach: I tried indexing the new marc file
11:51 sanspach: results are displaying weirdly
11:51 similar to before
11:51 http://liblime.com/zap/advanced.html
11:52 sanspach kados: yes, I see; very odd
11:52 almost like the directory is off and the fields are getting all mangled
11:52 kados yea
11:53 sanspach only this time the MARC was generated w/MARC::Record
11:53 kados right ... so it should be valid
11:53 sanspach how can both (very different) methods produce the same problems?
11:53 kados well it may be the indexer
11:53 but I haven't had trouble with it using other MARC records
11:54 sanspach do you want batches of smaller sections of the db?  I still have the original 54 files
11:54 kados sure ... send em over
11:54 maybe if we do them one-by-one we can catch the problem
11:55 sanspach I'll ftp in batches of 10, in numeric order (you'll see the pattern)
11:55 kados k ...

← Previous day | Today | Next day → | Search | Index

koha1