IRC log for #koha, 2005-06-05

← Previous day | Today | Next day → | Search | Index

All times shown according to UTC.

Time Nick Message
12:02 gavin they should be in there now
12:02 you can try it at [opac]/cgi-bin/koha/opac-search-biblio.pl
12:06 kados cool
12:06 I'll test it out on NPL's 150,000 biblio db
12:06 give me a minute
12:06 gavin that would be a good test alright.
12:06 by the way, where's the cvs mailing lisyt
12:07 duh! http://lists.sourceforge.net/l[…]listinfo/koha-cvs
12:08 kados :-)
12:10 welccome sanspach
12:10 welcome even ;-)
12:10 sanspach just testing out irc client
12:11 been lurking on lists for a bit, but just now installed mIRC
12:11 kados cool ...
12:11 gavin work for you?
12:11 kados gavin: still updating from sourceforge which is quite slow today ;-)
12:12 gavin sorry, i see :)
12:14 kados it's working -- not sure if it's much faster though:
12:14 http://search.athenscounty.lib[…]-search-biblio.pl
12:14 still testing
12:15 gavin i take it you built the indexes?
12:17 kados didn't know there were indexes ;-)
12:17 where are they ?
12:17 gavin it's possible it might not be that much quicker, but it's quite a bit simpler and adds relevance ordering and boolean searching
12:17 kados cool
12:17 gavin ALTER TABLE biblio ADD FULLTEXT (author,title,unititle,seriestitle);
12:18 kados hmmm, that could take a while on our database ... better run it on the test machine to time it ;-) (staff is already complaining about slowness ;-))
12:18 gavin fair enough
12:18 kados gavin: is that the only index?
12:18 gavin for now
12:19 what I would intend adding is that one and one on each individual text col in biblio
12:19 kados ok ... only 8 secs ... I can do that
12:19 hmmm, but you do realize that will greatly slow down editing of biblios
12:19 gavin this is a possible problem, yes
12:20 but it might mitigate the need for the marc_word index (I'm very unsure of this)
12:20 kados ok ... index is written
12:21 gavin hmmm, that's no quicker at all
12:21 perhpas i have the index wrong
12:23 sanspach with commas between, isn't that just one index with a very long key?
12:23 don't separate indexes need to be built for each field?
12:23 gavin I don't think so for the search I'm trying to do:
12:23 WHERE MATCH(biblio.title,biblio.author,b​iblio.unititle,biblio.seriestitle)
12:24 kados well by default mysql only indexes the first 3 (or is it 4) chars in any given string
12:25 so that may be part of the problem too
12:25 gavin " The MATCH() column list must exactly match the column list in some FULLTEXT index definition for the table, unless this MATCH()  is IN BOOLEAN MODE."
12:25 kados we're likely returning a very large set even with the index
12:26 my personal opinion is that we won't see much improvement until we start using a textual database engine for searches
12:26 the other problem with fulltext is it's mysql specific
12:26 sofar we're database-independent
12:26 gavin it exists for postgresql but in contrib
12:27 kados right but what about, say, oracle ;-)
12:27 gavin actually I'm unsure
12:27 kados we've got some folks using oracle with Koha
12:27 gavin i read something recently which suggested something like that was there but I don't know if it's at all compatible
12:27 kados I'm not saying it's a bad thing to be dependent on mysql -- it's certainly not going anywhere
12:28 I just think we should carefully consider if before we go that route
12:28 gavin well, i wouldn't break compatibility on a whim :)
12:28 kados s/if/it/
12:28 right ;-)
12:28 gavin: have you much experience with non-RDB textual search engines?
12:28 (like Lucene?)
12:29 I think this is still working:
12:29 gavin little bits from some years ago (verity, htdig) but not a lot
12:29 kados http://search.athenscounty.lib[…]?query=stephenson
12:30 gavin Internal Server Error?
12:30 kados looks like it's not ;-)
12:30 (I suspect the index has gone away from /tmp)
12:30 gavin ah.
12:30 kados yep
12:30 bummer ...
12:31 gavin that'll take a while to rebuild i imagine
12:31 kados ya ... about 6 hours
12:31 I'll put it out of tmp next time ;-)
12:31 gavin did you test the more advanced syntax in boolean +,-,<,>,*
12:31 kados nope ... it was just a proof of concept
12:31 but Plucene supports all of that
12:32 which would be nice
12:32 but I wasn't impressed with the speed
12:32 gavin i mean in mysql
12:32 kados things should be lightning quick -- we're only dealing with 150,000 items here ... it's not a huge data set
12:32 gavin indeed.
12:33 what version of mysql is that?
12:33 kados 4.0.24
12:33 I've thought about upgrading to 4.1
12:34 gavin don't think it should make a much functional difference to what I was suggesting. something seems wrong.
12:34 does anyone have a nice big data set I could borrow to test things?
12:34 sanspach how big/complex?
12:35 gavin something reasonably big I guess. plenty of biblio entries particularly
12:36 sanspach I've gotten nothing into Koha yet; all my data's still in full MARC communciations format
12:36 gavin kados: something wierd is happening on that test dn
12:36 sanspach but I've got lots of it
12:36 gavin s/dn/db/
12:36 kados gavin: yea?
12:36 gavin well as long as it's not sensitive
12:37 well, the +- syntax isn't working which is rather odd.
12:37 kados what's happening?
12:37 hmmm ... so in keyword I should be able to do:
12:37 +neal -stephenson
12:38 and it will find all the neals without stephenson, right?
12:38 gavin I put in ireland and then ireland -paradise and i get what i would expect from ireland +paradise
12:38 supposed to yes
12:39 i've just thought of an enormous optimisation.  will implement it
12:39 that page is taking *all* results regardless of how many were to be displayed
12:40 that's very silly
12:40 on my part
12:40 kados here's something funny
12:40 gavin ?
12:40 kados although the search starts out using opac-search-biblio.pl
12:40 it ends using opac-search.pl
12:40 was that intentional?
12:41 gavin no, it certainly wasn't <argh>
12:41 just spotted that now
12:41 kados it's probably looping over on itself
12:41 and on my db it's only displaying the first 20 results as it should
12:41 (I suspect in fact, that it's not using your script at all ;-))
12:42 gavin yes, but the query is getting all 1500
12:42 if it's running mine at all which ist may not be
12:42 kados getting all 1500 is not a bad thing
12:43 we need to be able to sort by different things: title, author, date published, popularity, etc.
12:43 gavin it is a bad thing if you're not displaying them, it just slow things down
12:43 does your template specify opac-search.pl in the form tage
12:44 i'm not seeing that behaviour here
12:44 kados let me check
12:44 yep
12:44 I'll make a new one
12:44 just a sec
12:45 gavin seems to be according to the html anyway
12:45 you should be able to just leave the action empty to get it to post to itself
12:46 kados heh ... wel now it's fast
12:47 it just doesn't return anything ;-)
12:47 hmmm, not so fast anymore ... and it's returning stuff
12:47 a search on 'new' returns no results
12:47 gavin i just got 131 records
12:48 that's got to be fixed
12:48 mysql's default min word limit is 4 I think
12:48 use a bigger word
12:48 kados oooh ... do a search on 'neal stephenson'
12:48 you're returning all the items as biblios
12:49 rather than a 'biblio' with item status information attached to it
12:49 gavin i know, work-in-porgess
12:49 kados also, it's much slower than the original ;-)
12:49 gavin how long does the original take?
12:50 kados try it
12:51 gavin both are fairly similar for me. perhaps bandwidth is getting in my way
12:51 kados well this isn't an accurate test ... but I count about 9 secs for the old search about 14 for the new one
12:51 we could use 'time' to find out for sure
12:52 gavin no you're right it is slowe
12:52 kados and dprof
12:52 gavin .er
12:53 i'll look at it further. I think the mysql indexing should be quite a bit quicker, it's probably what I'm doing that's not
12:56 kados I get to the second message:
12:56 The main problem at the minute is that everything is still too slow.
12:56 ;-)
12:56 so i guess we're not the only ones with that analysis
12:57 gavin eh
12:58 heh
13:00 you said the index took a very short time to complete (like 15 seconds?)
13:00 kados yep
13:01 more like 8 secs
13:01 which is very unusual
13:01 building an index on marc_word for instance usually takes about 4-6 hours ;-)
13:01 gavin yeah, i would have expected longer. the select took longer
13:03 there are lots of interesting posts on the manual page, I think i need to look into these.
13:04 kados yep ... the mysql manual is invaluable
13:09 this Plucene bit is an interesting read ... apparantly, (at the time of writing) Plucene was 65 times slower than Lucene in searches
13:13 gavin is it a free clone of lucene or what?
13:16 sanspach: any chance of getting some of that data to test with?
13:16 \/who
13:16 oops
13:16 kados gavin: yea plucene is a perl port of lucene
13:16 lucene is OSS
13:17 sanspach how many MARC records would you like?  I can see what I've got laying around
13:17 kados sanspach: as many as you've got :-)
13:18 gavin yeah, assuming I'm not going to be downloading gigs of course
13:18 kados gavin: if you just need a few thousand MARC records you're welcome to use LibLime's data
13:19 gavin: have you seen our demos?
13:19 gavin: opac.liblime.com is the opac
13:19 gavin: koha.liblime.com is the intranet, etc.
13:19 gavin: you can use the marc exporter to nab all the data on the intranet
13:20 gavin: though I'll warn you it's not a very good collection of items for show-and-tell
13:20 gavin well, i'm really looking to test performance so whatever
13:21 kados right ... well go ahead and grab those MARC records from LibLime
13:21 gavin it wouldn't be possible to just do a mysql dump?
13:21 kados well ... it would ... but that's actually much more complicated
13:21 gavin fair enough
13:22 kados the easiest thing to do would be to grab the MARC records (they come out as a single file)
13:22 then import with misc/migration_tools/bulkmarcimport.pl
13:22 it's a cinch
13:22 gavin where's the exporter?
13:23 kados http://koha.liblime.com/cgi-bi[…]oha/admin-home.pl
13:23 at the bottom ... under "Tools"
13:23 MARC biblio export
13:23 just leave both blank to get all the records
13:24 gavin great, that's downloading them nw
13:24 how big roughly is it?
13:24 oh ,they're done
13:24 kados about 3000 biblios
13:25 pretty small file
13:27 gavin importing now
13:27 kados sweet
13:27 gavin should give me a better idea of the times
13:27 kados yep ... though 3000 is much smaller than 150,000 ;-)
13:28 as I've come to realize ;-)
13:33 gavin yes, that would make a bit of a difference
13:33 yes I've just over 3000 records now
13:34 kados sweet
13:37 tim kados: I get an internal server error when I try to make barcodes on the liblime demo.
13:37 kados right ... that feature isn't implemented
13:37 it's on my list tho
13:37 tim I wanted to see how it works since mine doesn't.
13:37 kados right
13:37 IIRC you're missing the perl module that it requires
13:38 tim that's one of the things we'll need before we start using Koha.
13:38 kados gotcha
13:38 tim hmm... I guess I gotta figure out what module and how to get it.
13:38 gavin thanks for the help kados, i'm going to go have my dinner
13:39 tim but I need to head out now so it'll have to wait.
13:41 kados shoot ... I've got the fix
13:42 perl -MCPAN -e 'install PDF::API2';
13:42 oughta do it
14:44 chris around?
14:44 I think this is what we've been looking for:
14:44 http://indexdata.dk/zebra/doc/[…]ion.tkl#id2518049
14:45 zebra not only does fulltext indexing ... it's got an integrated Z39.50 server
14:51 zebra can also index MARC directly ... so ...
14:51 1. dump out MARC to a single file
14:52 2. index with zebra
14:52 3. update periodically with new MARC files as things are added to the catalog
15:00 owen Oh no, another program named Zebra?!
15:12 kados hehe ... that's what I thought too
15:12 I was thinking what staff would think
15:12 "so we started with Zebra and now we're using .... Zebra?"
15:12 hehe
15:12 I actually wonder if Spydus uses Zebra ;-)
15:12 I bet they do
15:13 sanspach kados: OK, I've successfully genericized by project-oriented MARC record retrieval
15:13 the largest set I have at the moment is ca. 10,000
15:13 kados sweet!
15:14 sanspach but I can now extract any arbitrary subset of our ca. 5 million bibs
15:14 kados well ... I'd take all 5mil ;-)
15:14 it would be extremely useful for testing purposes
15:14 (if you're willing to donate that many)
15:15 sanspach let me think for a minute or two about which ones we don't normally expose to z39.50
15:15 kados sure
15:15 sanspach I'd think you could have any of the rest
15:15 kados sweet ... thanks!
15:15 we'll add you to the list of contributers to Koha ;-)
17:15 chris morning
17:21 kados morning chris
17:22 http://indexdata.dk/zebra/doc/[…]ion.tkl#id2518049
17:22 sanspach kados: just sent you email to arrange file transfer, etc. of sample records
17:22 leaving soon; talk to you later
17:22 kados sanspach: sweet ... ahh yes ... I see it
17:22 sanspach: thanks!
17:22 sanspach: a single file would work well
17:23 sanspach OK, will combine when they've all downloaded
17:25 kados sanspach: and if you can make the file available via http I can grab it ... otherwise, I can set you up with a login and you can ftp it to one of our serverst
17:25 sanspach it might be better for me to ftp it to you;
17:26 I've go quota issues on our webserver and this'll surely cause me trouble!
17:26 kados ok ... I'll email you login info
17:26 do you have a gpg key?
17:27 sanspach I do; just had a harddrive crash and haven't reinstalled the mail client plugin,
17:27 but if you look me up, I'll manage to decrypt
17:29 kados cool ...
17:30 sanspach OK, more later (6 segments out of 54 are complete at this point)
17:30 gotta go
17:31 kados sweet ... read you soon ... and thanks!
17:43 chris ahhh
17:43 well thats fast
17:44 4.6 seconds to index 2646 marc records
17:45 now i just have to figure out the attributes so i can actually search it
17:58 kados chris trying out zebra?
17:58 chris yep
17:58 kados sweet ... I was just getting to that ;-)
17:59 chris it seems the trick is going to be setting up the config files right
17:59 the rest is just going to be writing a wrapper to search a z3950 server
18:01 if it works, you could do the same thing as i was thinking for plucene
18:01 each branch runs its on zebrasvr
18:02 which you search against, then just talk to the main db for item status, circulation, etc
18:02 kados right
18:11 chris woot
18:13 got a sec joshua?
18:14 kados yep
18:14 got it going?
18:14 chris yep, do you have yaz installed?
18:14 kados sure
18:15 chris try
18:15 yaz-client bigballofwax.co.nz:2100
18:16 kados sweet
18:16 chris right
18:16 now try
18:16 find chris
18:16 find chris
18:16 Sent searchRequest.
18:16 Received SearchResponse.
18:16 Search was a success.
18:16 Number of hits: 8, setno 2
18:16 SearchResult-1: chris(8)
18:16 records returned: 0
18:16 Elapsed: 0.000915
18:16 now thats fast
18:16 and then
18:16 show
18:16 will give you the record
18:16 kados sure is!
18:16 chris show again gives you the next one
18:16 etc
18:16 kados wow! ... super fast
18:17 that's the kind of speed I'm talking about ;-)
18:17 chris so i think the trick is wrapping it
18:17 its only 2646 records
18:17 (liblimes db)
18:17 but it looks promising
18:17 kados right
18:17 damn gpg signing
18:18 chris ill email you my config
18:18 kados sweet
18:19 chris on its way
18:21 woo
18:23 kados chris: that's a pretty long config file
18:23 chris its 4 different files
18:23 kados right ... mutt combined them
18:24 so I wonder if I can export MARC to a file on the server (without using the web interface)
18:24 chris probably
18:25 running the script from teh commandline
18:25 and >
18:25 kados yep
18:27 ok ... running
18:27 this may take a while ;-)
18:29 chris how big is your marc file?
18:29 chris 2124
18:29 not big
18:29 kados right ...
18:30 -rw-r--r--  1 root   root   5369856 Jun  3 19:29 marc.records
18:30 and growing
18:31 ya know, zebra could probably be the storage engine for all our marc stuff
18:31 but that's another topic ;-)
18:32 chris yep
18:32 thats a 3.0 thing
18:33 kados yea
18:33 chris we can do this so that you can switch it on in systempreferences
18:33 for 2.4
18:33 kados sweet
18:33 chris ie .. things will work without zebra still
18:33 kados right
18:34 indexdata folks are the shit!
18:34 chris yeah
18:41 kados wow this is going to be so easy to customize too
18:41 these conffig files basically give you total control over which marc fields are searched
18:41 chris yep
18:42 excellent
18:42 my little perl script is working
18:42 kados :-)
18:43 chris !/usr/bin/perl
18:43 use Net::Z3950;
18:43 use strict;
18:43 my $conn = new Net::Z3950::Connection('localhost', 2100);
18:43 my $rs = $conn->search('@attr 1=4 adventure') or die $conn->errmsg();
18:44 returns me all the records with adventure in the title
18:44 the trick is going to be getting the biblio numbers
18:44 but that should be a config thing
18:45 kados right
18:46 chris also, u can make it rank the results
18:46 kados if you can get an array of biblio numbers to return we can insert that code directly into catalogsearch
18:46 chris and just hand you back n number
18:47 kados :-) ... that's sweet
18:48 chris im pretty sure that
18:48 my $rs = $conn->search('@attr 1=4 @attr 1=4 a') or die $conn->errmsg();
18:48 returns u them sorted by title
18:48 have a good dinner
18:48 kados so just repeat the @attr too sort?
18:49 chris i think
18:49 more reading will be required
18:49 but ill wait and see how fast the searches are on a big db
18:56 kados right ... I'll let you know as soon as the export is finished
20:03 chris: OK ... it finished, but I can't work on it for a couple of hours (finishing up some LibLime marketing stuff)
20:03 chris: I can either make the file available to you
20:04 or just catch up with you later tonight
22:35 Genji hiya all.
22:35 back from away.
23:18 kados chris: I'm back
00:09 for some reason dumping out to a file didn't result in a valid marc record so I'm using the web interface to dump out the records again -- should be done in about 20 minutes
00:09 emilda.org an interesting small open source ILS
00:10 (integrated with zebra already)
00:26 Genji kados: tried my search options bar?
00:33 kados Genji: haven't had a chance yet ... maybe tomorrow
00:33 (it's at the top of the list ;-))
00:34 Genji cool.
00:34 whats the structure of your itemtypes list?
00:34 kados Genji: what's your affiliation?
00:35 Genji Theosophical Society, Palmerston North Branch, New Zealand.
00:35 kados ahh ... and you run Koha?
00:35 (supported by Katipo?)
00:38 Genji Nope. Supported by me, alot of human delays in getting the library and Koha ready. Well.. Koha is ready, the library isn't. Still needs shelf names typed up, for shelf barcodes and the virtual shelf list.
00:38 kados ahh right ...
00:39 Genji Theosophy comprises Science, Philosophy and Religion.
00:39 kados Genji: do you want a dump of my itemtypes table?
00:40 Genji Just the itemtype codes.
00:41 kados right ... well there are quite a few of them
00:42 MZ       | Magazine                            |         14 |               2
00:42 |       0.0000 |       NULL |
00:42 | AF       | Fiction                             |         14 |               2
00:42 |       0.0000 |       NULL |
00:42 | AV       | Videocassette                       |         14 |               2
00:42 |       0.0000 |       NULL |
00:42 | AVJ      | Videocassette, juvenile             |         14 |               2
00:42 |       0.0000 |       NULL |
00:42 | AVNF     | Videocassette, non-fiction          |         14 |               2 |       0.0000 |       NULL |
00:42 | BIO      | Biography                           |         14 |               2 |       0.0000 |       NULL |
00:42 | DVD      | Video disc                          |         14 |               2 |       0.0000 |       NULL |
00:42 | EASY     | Juvenile picture books
00:42 | JB       | Juvenile biography                  |         14 |               2 |       0.0000 |       NULL |
00:42 | JNF      | Juvenile non-fiction                |         14 |               2 |       0.0000 |       NULL |
00:42 | JF       | Juvenile fiction                    |         14 |               2 |       0.0000 |       NULL |
00:42 | JREF     | Juvenile reference
00:42 | LH       | Local history                       |          0 |               0 |       0.0000 |          1 |
00:42 | LP       | Fiction, large print                |         14 |               2 |       0.0000 |       NULL |
00:42 | LPNF     | Non-fiction, large print            |         14 |               2 |       0.0000 |       NULL |
00:43 | MYS      | Mystery
00:43 NF       | Non-fiction                         |         14 |               2 |       0.0000 |       NULL |
00:43 | REF      | Reference                           |          0 |               0 |       0.0000 |          1 |
00:43 | SCI      | Science fiction/Fantasy
00:43 WES      | Western                             |         14 |               2 |       0.0000 |       NULL |
00:43 | YA       | Young Adult fiction                 |         14 |               2 |       0.0000 |       NULL |
00:43 | YANF     | Young Adult non-fiction             |         14 |               2 |       0.0000 |       NULL |
00:43 | CDM      | Music CD
00:43 | KIT      | Juvenile book/tape kit
00:43 AVJN     | Videocassette, juvenile non-fiction |         14 |               2 |       0.0000 |       NULL |
00:43 | CDR      | CD-ROM software                     |         14 |               2 |       0.0000 |       NULL |
00:43 | JAC      | Juvenile audiobook                  |         14 |               2 |       0.0000 |       NULL |
00:43 PB       | Paperback romance                   |         14 |               2 |       0.0000 |       NULL |
00:43 | YAC      | Young Adult audiobook               |         14 |               2 |       0.0000 |       NULL |
00:43 | AB       | Audiobook                           |         14 |               2 |       0.0000 |       NULL |
00:43 | AC       | Audiobook (cassette)
00:43 that's it
00:44 Genji k. why do you use itemtypes for subject areas, instead of Virtual shelves?
00:44 kados er?
00:45 virtual shelves are just editable lists of items
00:46 Genji SCI, Science Fiction... Yes, they could be used to locate books in physicality.
00:46 kados hum ... how so?
00:47 Genji okay. you create a virtual shelf for each physical shelf.
00:47 and put books on the virtual shelf.
00:47 kados yikes!
00:48 that's a ton of human overhead
00:48 I've got like 400,000 items here ;-)
00:48 Genji nope.
00:52 kados why use virtual shelves when there is a location and callnumber?
00:52 (in KOha)
00:58 Genji easy, huh?
01:00 kados huh ... I need to thinik about it a bit
01:27 chris around?
01:28 the indexing of our records too about 3 minutes
01:28 66.213.78.76:9999 is the z-server
01:28 using yaz it's FAST!
02:40 osmoze hi
02:58 kados chris around?
02:59 here's the link to that generic z-server:
02:59 http://www.g7.fed.us/enrm/pilot/genericz.html
02:59 for some reason, it's not returning results on my implementation
02:59 66.213.78.76:9999
03:00 it looks like it might be broken as it's also not returning results for my actual z-server
03:08 course, that's currently down ;-)
03:08 sigh
03:40 I found a way to run tests via LOC's online z-client
03:41 http://www.loc.gov/cgi-bin/zga[…]66.213.78.76,9999
03:41 I think it's a tricky way to go about it though
03:41 there's no way to speficy a database name in the CGI (that I could find)
03:42 so I'm re-indexing my data with the VOYAGER database name to see if that works
03:42 should be ready to test in about 2 minutes
03:43 yep ... it works
03:43 wow this is fast!
03:44 like crazy fast!
03:47 Genji kados: to finish my idea. ill be creating a page in koha where you can copy the barcodes resulting from the stocktake scan, and paste them into the page, and it'll find the shelf barcode, put the books into the shelf.. come across another shelf barcode, and put the books into that shelf etc.
03:53 osmoze hum...I have a question : In overdue.pl, we have a list with borrower1--> book1 ; Borrower1-6>Book2 etc etc...  I want borrower 1--> book1,book2 etc etc for a mailing after
03:53 have you got some idea ?
03:53 kados hmmm
03:53 Genji: sounds great!
03:54 osmoze: I'm too tired to think ;-) (it's 4:55am here ;-))
03:54 I've got to get to sleep
03:55 osmoze oh, excuse  kados, i wish you a good night ^^ (it's too late for you, are you a geek ;) )
03:56 for other, date it's not important, purhups it's group by ?
04:10 gavin kados: can you ask sanspach for permission to copy those records to me?
04:10 (when you wake up of course)

← Previous day | Today | Next day → | Search | Index

koha1