← Previous day | Today | Search | Index
All times shown according to UTC.
Time | Nick | Message |
---|---|---|
15:08 | Burgwork | owen, having fun with css and IE? |
15:09 | owen | Yeah, loads |
17:31 | thd | tumer [A]: are you there? |
17:33 | tumer | hi thd |
17:34 | thd | tumer: hello. Would you explain the top two levels of your proposed XML schema to me? |
17:35 | tumer | thd: look at http://library.neu.edu.tr/kohanamespace |
17:35 | schema is there together with koharaecod.xsd | |
17:36 | i already put it to production code at NEU | |
17:36 | thd | tumer: I cannot connect to that |
17:36 | tumer | one sec |
17:36 | you may now | |
17:41 | thd | tumer: that was what I had assumed |
17:41 | tumer | you mean you do not approve? |
17:41 | thd | tumer: no I definitely approve |
17:42 | tumer | was it your question at yaz-list? |
17:42 | thd | tumer: I just had not understood how the top two elements in your schema worked |
17:43 | tumer | top-two meaning kohacollection and koharecord? |
17:43 | thd | tumer: I asked a question about indexing on the koha-zebra list |
17:43 | tumer: Index Data has not answered | |
17:43 | tumer | well you see you can index say 001 bibliographic separately from 001 holdings |
17:44 | i already do that | |
17:44 | thd | tumer: I asked poorly the first time and now they are punishing me |
17:44 | tumer: what method do you use? | |
17:44 | tumer | my holdings records have their own 001 004 005 and 008 separately indexed |
17:45 | the schema allows me to define them separately as xpaths | |
17:45 | thd | tumer: how well does that scale? |
17:46 | tumer | slower on retrieval, very fast indexing overal acceptable and scalable |
17:47 | thd | tumer: slow retrieval was what I was trying to avoid |
17:47 | tumer | the retrieval is slower cause i have to get xml and not marc-record |
17:47 | but i could not find any other way | |
17:47 | thd | tumer: why is getting XML necessarily slower? |
17:48 | tumer | lots of verbal coming through |
17:48 | about twice the size of marc-record | |
17:49 | thd | tumer: In my koha-zebra list message I quoted an Index Data zebra list message from February |
17:49 | tumer | did i miss that? |
17:49 | thd | tumer: In the message that I quoted, Marc tells Paul there is a fast method |
17:50 | tumer: Paul was satisfied with a slower method for what he had asked about indexing authorities at the time | |
17:51 | tumer | i dont remember this |
17:53 | thd | tumer: http://lists.nongnu.org/archiv[…]-08/msg00001.html |
17:53 | tumer: my purpose is this http://wiki.koha.org/doku.php?[…]er_meta_record_db | |
17:59 | tumer | thd:marcxml already has type definition for bibliographic,holdings,authorities,community etc that you can set as <record> attribute |
17:59 | thd | tumer: yes but it does not have one for FRBR |
18:00 | or FRAR, FRASR, etc. | |
18:00 | tumer | using the new alvis filter of zebra and an xmlstyle sheet i can index anything and now i can get DC,MOD and FRBR from my meta-record |
18:01 | on the fly conversion to any type | |
18:01 | thd | tumer: FRBR is not a record type |
18:01 | at least not yet | |
18:02 | tumer | i know but there is a conversion route defined by loc |
18:02 | from marcxml to frbr | |
18:03 | thd | tumer: yes but very primitive for works and expressions only and not efficient |
18:03 | tumer | and not much important for me at this stage for koha |
18:05 | thd | tumer: if Koha can provide a design for such a major preoccupation of library science for the past 10 years then it could be very important for Koha to. |
18:06 | tumer | yes and you may work and improve on this. not in my priorities list |
18:06 | thd | tumer: but about what Marc told paul |
18:07 | tumer | i am not quite sure what he said |
18:07 | thd | tumer: did you see that part of my koha-zebra list message? |
18:07 | tumer | yes i read it but did not understand MARC's suggestion |
18:09 | thd | tumer: the problem paul was trying to answer was how to index $a in 200 differently from $a in 700 |
18:10 | tumer | well we already do dont we? |
18:10 | thd | tumer: in MARCXML at the element level all $a look alike |
18:10 | tumer | no they are melm 700$a and melm 200$a |
18:10 | thd | tumer: I thought we were only indexing fields mostly not subfields specifically |
18:11 | tumer | no we index at either level field or subfield by subfield |
18:11 | i index 952 at 25 different subfield level | |
18:12 | thd | tumer: what method do we use to distinguish the subfields from different fields? |
18:13 | tumer | syntax is melm 700$a Author melm 700$d Date etc. |
18:13 | or just melm 245 Title | |
18:14 | thd | tumer: and that works for MARCXML in addition to MARC? |
18:14 | tumer | yes |
18:14 | although i do not use that any more. Thats dev-week indexing | |
18:14 | thd | tumer: What do you use? |
18:15 | tumer | koharecord/holdings/record/datafield(856)/subfield(a) |
18:16 | thd | tumer: is retrieval efficiency a function of the XPATH length in any way? |
18:17 | tumer | i am not quite sure yet |
18:18 | thd | tumer: the answer for efficiency was a focus of my koha zebra list question |
18:19 | tumer: and if you look closely I want to be able to index UNIMARC and MARC 21 and anything else in the same DB. | |
18:20 | tumer | same server but different db names will do |
18:21 | thd | tumer: I want to put them in the same meta-record |
18:21 | tumer | that is possible |
18:21 | thd | and separate them by a different XPATH |
18:21 | tumer | than you can index any part of meta-record as you wish |
18:22 | thd | tumer: I am trying to find the maximally efficient method for indexing |
18:22 | tumer | |
18:22 | tumer | thats what i do for holdings |
18:22 | thd | tumer: not merely something that can work |
18:23 | tumer: I think it will be very useful for you to have related authorities in the same meta-record as a bibliographic record | |
18:23 | tumer | yazproxy converts from marc21 to unimarc on the fly |
18:24 | and duplicate every authority for 1000 times? | |
18:24 | thd | tumer: really, does it do the inverse as well |
18:24 | ? | |
18:24 | tumer | yes it uses usemarcon utility from any marc to anymarc |
18:25 | thd | tumer: yes many duplicates but very efficient retrieval by authority tracings and references |
18:26 | tumer: I have usemarcon but I have no functioning configuration files for doing any conversion | |
18:27 | tumer | well british library provides the one for unimarc both ways |
18:28 | thd | tumer: I downloaded from a BL link and they left it out with a nice note saying that I could buy commercial support from the original developer if I wanted more default features |
18:29 | tumer | typical brits |
18:29 | thd | tumer: I downloaded it last about a year ago |
18:29 | tumer: do you understand the indexing advantage of having related copies of authorities in the same meta-record as bibliographic records? | |
18:29 | tumer | try again i think now they have it |
18:30 | i know but i have to look into some benchmarks | |
18:31 | thd | tumer: including authorities in the meta-record should give you the same advantage as including holdings in the meta-record with bibliographic records |
18:33 | tumer: given the indexing limitations of Zebra including authorities in the meta-record seems to me the only way to do interesting things with authorities | |
18:34 | tumer | searching the authorities (separately) during retrieval is fast enough already |
18:35 | thd | tumer: yes if you only want to retrieve matches from one authority at a time |
18:36 | tumer: yet suppose I do a subject search and want to sort by the ones with the largest number of holdings or some other factor of most used | |
18:37 | tumer | well its endless i know |
18:38 | thd | tumer: If my result set is small it may be manageable but if my result set is large it is the same problem as with 10,000 biblio matches and knowing which are in a particular library |
18:39 | tumer: storing copies of related records together should solve that problem at the indexing level | |
18:39 | together in the same meta-record | |
18:40 | tumer | thd:i understand the question. It did not pose a problem to me yet. if you have such a need than you have to include them in your meta-record |
18:40 | thd | tumer: your users are not demanding enough |
18:41 | tumer: you need me breaking your system with a single query | |
18:41 | tumer | my user only take what i give them |
18:42 | thd | tumer: I used to cause stack overflows at a fairly distant libraries circulation system by borrowing too much at one time to avoid the 5 hour round trip commute |
18:44 | tumer: do you understand well what FRBR does? | |
18:44 | tumer: I have a simple power point for you | |
18:45 | tumer | i have seen a powerpoint at oclc i think it was |
18:45 | thd | yes this one http://www.oclc.org/research/p[…]eill/frbrddb2.ppt |
18:45 | tumer | power fights |
18:46 | thd | tumer: try this though which includes FRAR and FRSAR http://www.kaapeli.fi/~fla/frb[…]ject%20access.pdf |
18:47 | tumer | well nice talking to you thd, i'll check those later i have to go to sleep now. G'night! |
18:48 | thd | tumer: look at that pdf link when you are awake |
18:48 | tumer: good night | |
18:48 | tumer | will |
01:10 | ai | hi, |
01:11 | any here ?? can give me some help plz | |
02:37 | hdl | hi |
02:37 | osmoze | hello all |
02:37 | toins | hello |
08:54 | thd | kados: are you there? |
08:57 | kados | thd: of course :-) |
08:58 | thd | kados: did you see the conversation I had with tumer last night in the logs? |
08:58 | kados | no |
08:59 | paul | &kados, take time to read my mail on koha-devel & what you think of my suggestion. |
08:59 | (& good morning to kados & thd) | |
08:59 | thd | kados: tumer has implemented his schema for bibliographic and holdings records in one record |
08:59 | good morning paul | |
09:00 | kados: his indexing fields differently with xpath | |
09:01 | paul: I do not find any abs files using xpath in CVS. Am I looking in the wrong place? | |
09:02 | kados: I am suspecting that tumer has not yet committed his recent improvements | |
09:03 | kados: I only see devel-week related files for zebra in CVS | |
09:05 | kados: are you still there? | |
09:19 | kados | paul: did you see mj's response? |
09:19 | paul | not yet arrived. |
09:19 | kados | paul: i agree with much of what he says there |
09:19 | wait ... | |
09:20 | that was youre response to MJ :-) | |
09:20 | paul | ah, ok |
09:20 | kados | yes, I very much agree |
09:20 | I will try to respond today | |
09:20 | phone running off the hook today ... | |
09:21 | thd: he hasn't committed his stuff yet | |
09:21 | thd | kados: yes I just checked the koha-cvs logs |
09:49 | slef: you fix users instead of problems? | |
09:50 | paul | thd : lol |
09:50 | thd | slef: are users the problem? |
09:52 | tumer: are you really here | |
09:52 | tumer | yes looking for paul |
09:52 | paul | hi tumer, i'm here |
09:52 | (as well as toins) | |
09:52 | thd | tumer: do you use xpath in your queries |
09:52 | tumer | hi paul and toins and thd |
09:52 | toins | hi tumer |
09:52 | dewey | hi tumer is still strugling |
09:53 | thd | hello tumer |
09:53 | tumer | paul: why dont we have the structure as it installs |
09:53 | i always thought you keep cvs this way to make it DIFFICULT to install | |
09:54 | paul | tumer : I agree with you, as what I suggest is almost the install structure in fact ;-) |
09:54 | tumer | my installer makes koha ->intranet &opac |
09:54 | thd | tumer: you mean that you want CVS to be organised the same as the install? |
09:54 | paul | (I just separate htdocs & non htdocs for templates) |
09:55 | thd : yep. | |
09:55 | tumer | intanet->cgi-býn &htdocs |
09:55 | opac -> cgi-bin& htdocx | |
09:55 | paul | tumer: i'm not sure we need this additional level, as it will just contain 2 sub dirs |
09:55 | so, 2x2 or 4 ? | |
09:55 | I think 4 is OK | |
09:55 | tumer | so separate htdocs for both opac and intranet |
09:56 | paul | yes, in my structure they are. |
09:56 | tumer | well 2_2 installer does that on windows |
09:56 | so why not have it as it should be once installed as well from the beginning | |
09:57 | thd | paul: Do you mean that your original suggestion was to have CVS organised the same as the install? |
09:57 | tumer | so even untaring it should be enough if you are not running upgrade |
09:57 | yes thats what i am saying | |
09:57 | paul | thd : yes indeed. |
09:58 | but maybe we should continue to speak of this on mailing list, to let other express their opinion | |
09:58 | tumer | k |
09:59 | thd | paul: Are suggesting changing both CVS and install then to something matching, not keeping either the same as now? |
09:59 | paul | thd : look at my mail on koha-devel, i hope it's self explanatory |
09:59 | thd | tumer: before you leave, do you use xpath in your queries? |
09:59 | paul | (at least I really hope, because it needed almost 1 hour to write !) |
09:59 | tumer | paul: i think we have not solved the utf8 problem either. We just managed to get over it |
10:00 | paul | thd: nope |
10:00 | tumer | thd: i use xsl stylesheets with xpath indexing |
10:00 | paul | hdl may be interested by this tumer, as he should play with encoding problems next weeks |
10:01 | thd | tumer: so your queries are no different than without XPATH? |
10:01 | tumer | i had problems using XML::Libxml same problem we had with MARC::File::Usmarc |
10:02 | so probably we are missing something | |
10:02 | whtas wrong with irc am i on or not? | |
10:02 | thd | tumer: but you have no queries like find @attr 1=/*/addresshidden'245']/addresshidden'a'] someterm |
10:03 | tumer: you are on now | |
10:03 | tumer: I just posted a hypothetical example query from Sebastian in February | |
10:04 | tumer | thd: no i still use the old way of searching |
10:04 | i index every field that i will require | |
10:04 | even if they do not exist now | |
10:04 | thd | tumer: when do you plan to commit your abs file? |
10:04 | tumer | all 3.0 stuf |
10:05 | it will not change the way you query though | |
10:05 | thd | tumer: I understand but I want to see exactly so that I can understand perfectly what you are doing |
10:06 | tumer | i will commit when toins and paul are ready |
10:06 | thd | tumer: why wait for them? |
10:06 | tumer | that whats agreed |
10:06 | thd | tumer: I am ready :) |
10:06 | tumer | synching and so on |
10:07 | thd | tumer: what does it harm if you commit early. Oh, does that harm synching? |
10:07 | tumer | thd:yes on IRC meeting thats whats agrred |
10:08 | thd | tumer: would you commit your abs file to me? |
10:08 | tumer | there is no more abs file, a whole bunch of xsl files |
10:09 | thd | tumer: you use xsl files for indexing? |
10:09 | tumer | yes |
10:09 | thd | tumer: is that documented? |
10:09 | tumer | ID zebra documented |
10:10 | thd | tumer: yes, is that in the ID Zebra documentation? |
10:10 | tumer | yes thats what i said |
10:10 | thd | tumer: ok, I have not read it thoroughly enough |
10:11 | tumer | get the new one from their cvs |
10:11 | thd | tumer: let me give a good example for the trivial one I gave last night |
10:11 | tumer: maybe I am using an out of date set of documentation | |
10:12 | tumer: in authorities you can search by references and tracings to search for the authorised form using non-authorised forms | |
10:13 | tumer: currently to fill authorised headings a separate search must be done for each authorised heading | |
10:14 | tumer: currently to fill authorised headings in the search form a separate search must be done for each authorised heading | |
10:15 | tumer: that is a good careful but not extra quick way of performing searches using authorities | |
10:17 | tumer: it is also often necessary because the user may never successfully guess the authorised heading successfully unless the user is a librarian with years of experience or otherwise especially familiar with the authorised headings needed | |
10:17 | dewey | okay, thd. |
10:18 | thd | tumer: however, there could be an option to search the authorities references and tracings directly from the search form collectively |
10:18 | tumer: so instead of building the query slowly for more than one authorised heading the user types in whatever terms come to mind | |
10:19 | tumer: the only way that would work for indexing is if the meta-record contained authorities | |
10:20 | tumer | but bibliographic record alraedy has 650 with authorities filled in |
10:20 | thd | tumer: this would allow finding records with the conjunction of two subject headings without knowing the precise headings |
10:20 | tumer | or 100 or 700 for that matter |
10:21 | thd | tumer: yes but the 650 100 700 only contains the authorised heading |
10:21 | tumer | what else are we looking for, enlight me |
10:22 | s/enligten/ | |
10:22 | thd | tumer: with authorities you can find authorised headings by searching the 4XX 5XX in authority records for non-authorised forms |
10:22 | tumer | or whatever |
10:22 | i see | |
10:22 | thd | tumer: so for example maize is a food plant native to North America |
10:23 | tumer: maize is no longer the authorised heading under LCSH | |
10:23 | tumer | so you want to find maize by searching corn |
10:23 | thd | tumer: yes |
10:24 | tumer: I can do that now for more than one authority by building the query one authority at a time | |
10:24 | tumer: I can do that now for more than one authority by building the query one authority at a time | |
10:24 | tumer | the proxy or irc is blocking me today |
10:25 | thd | tumer: that is the slow careful good method |
10:26 | tumer: users growing up with Google are unlikely to have patience to be slow and careful most of the time | |
10:26 | tumer | i see where you are heading thd |
10:27 | thd | tumer: the system can still give people good results faster and they can still use authorities to refine their query afterwords |
10:29 | tumer: this is not quite FRSAR it is basic linking that any system should be able to do yet only Sirsi Unicorn does to my knowledge | |
10:30 | tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely | |
10:31 | tumer: this is merely an explicitly defined relationship contained in library systems records | |
10:32 | tumer: some FR*R relationships are not explicitly defined and would require something extra | |
10:33 | tumer | my experience is we are giving too many answers to the user they prefer lesser precise answers |
10:33 | thd | tumer: they can always have that option |
10:33 | tumer: paul just wants to be certain that behaviour is not default for his users so as not to surprise them when the result set does not match the query terms precisely | |
10:33 | dewey | i already had it that way, thd. |
10:34 | thd | tumer: yet, most users fail to do successful subject searches because they seldom choose the correct authorised terms |
10:35 | tumer: the Google mentality is that any results are good enough | |
10:36 | tumer: library systems allow better results than Google | |
10:36 | tumer: we should not deprive users of better results | |
10:38 | tumer: if my very precise but uninformed (ignorant of the actual database content for authorised forms) query returns 3 bib records I may or may not be satisfied. | |
10:39 | tumer: if the best record for addressing some problem that I am trying to solve is in a larger set with 10 records then I need the 10 record result set and 3 records was insufficient | |
10:41 | tumer: subject searches with a small number of query terms in a large collection tend to give much larger result sets and the user needs help from the system. | |
10:42 | tumer: do you not see that as a significant advantage for every user as long as the user has the option of turning the behaviour on or off for the query | |
10:42 | ? | |
10:43 | tumer | well it is an advanced system |
10:43 | thd | tumer: Koha is already an advanced system |
10:44 | tumer: what is difficult about adding copies of authorities to meta-records | |
10:44 | tumer | i did not say difficult, its feasable |
10:45 | thd | tumer: paul has had the references and tracings working for building queries slowly for a couple of years |
10:45 | tumer: what is your hesitation over feasible? | |
10:46 | tumer | why dropped out? |
10:46 | thd | tumer: paul has had the references and tracings working for building queries slowly for a couple of years |
10:46 | tumer | having a too big record to index is my concern |
10:46 | thd | tumer: do you think that would be a performance problem? |
10:47 | tumer | thats the only concern i have |
10:48 | thd | tumer: that is also my concern which is why I asked Index Data about the efficiency of XPATH indexing |
10:48 | tumer: they have not responded | |
10:49 | tumer | they have answered before saying its slow on indexing |
10:49 | thd | tumer: I did not ask my question of Index Data correctly the first time |
10:50 | tumer: maybe there could be a supplementary database with larger meta-records which was slower to index | |
10:50 | tumer | you could not do this with existing version of zebra anyway. only forthcoming zebra |
10:51 | thd | tumer: why, what is the problem? |
10:51 | tumer | no meta-record ýndexing was possible |
10:51 | thd | tumer: Does your meta-record work? |
10:52 | tumer | yes but with cvs zebra |
10:52 | thd | tumer: so I understand that this only works in CVS now |
10:52 | tumer | not released yet |
10:52 | yes | |
10:53 | thd | tumer: yet, It is intended for release in due course |
10:53 | tumer | hope so |
10:53 | thd | :) |
10:54 | tumer: what would be wrong with having a supplementary database of records which was slower too index | |
10:54 | tumer | i was thinking along the smae lines |
10:54 | same | |
10:55 | thd | tumer: those could be updated by a batch process while the smaller records were updated in real time |
10:55 | tumer | correct |
10:55 | thd | tumer: well then we are thinking along similar paths |
10:56 | tumer | i even slowed my realtime updating to within 2 minutes, safer on zebra db |
10:57 | thd | tumer: Joshua was willing to rephrase my XPATH indexing question |
10:58 | tumer | i already answered your xpath question i thought |
10:58 | thd | tumer: we should have some answer from Index Data about how to get maximum performance from XPATH |
10:58 | tumer | i index xpath with xslt stylesheets |
10:58 | thd | tumer: my question is really about whether shorter XPATHs make a difference in performance |
10:59 | tumer | at indexing it makes it faster not for searching |
10:59 | thd | tumer: yet that is important |
11:00 | tumer | but having said that if you xpath everything than i think it will be slow and cumbersome |
11:00 | thd | tumer: we might be able to design meta-records with shorter XPATHs |
11:00 | tumer: do you not XPATH everything now? | |
11:01 | tumer: do you not XPATH everything now? | |
11:01 | tumer | i have the shortest path to bibliographic record keeping in sync with marc21 |
11:01 | no i do not | |
11:01 | i do not index everything | |
11:02 | similar to record.abs | |
11:02 | thd | tumer: so you still have elem 100$a sometimes? |
11:02 | tumer | i choose what to index |
11:02 | similar structure | |
11:02 | i choose which paths to index | |
11:02 | and only those | |
11:04 | thd | tumer: I do not quite understand what you mean by choosing which paths to index except as opposed to indexing every arbitrary and unneeded path |
11:04 | tumer | so its a hybrid xpath indexing. xpath only allows me to index same datafields with differnt indexes |
11:04 | 001 bibliographic for biblionumber 001 holdings for itemnumber etc | |
11:05 | cause they have differnt xpaths | |
11:05 | thd | tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster? |
11:06 | tumer: so if the same datafield is not being indexed you use the devel_week method of elem 100$a or whatever because it is faster? | |
11:06 | dewey | i already had it that way, thd. |
11:06 | tumer | yes but elem100$a does not distingush differnt datafields this does |
11:08 | i do not use xpath enabled indexing it is slow and so it says ID documentation | |
11:08 | thd | tumer: ahh, so your xslt method is much faster |
11:09 | tumer | yes |
11:09 | 100K metarecords less than 10 min | |
11:10 | thd | tumer: I had assumed the slowness for xpath enabled was a function of allowing xpath in queries |
11:11 | tumer | no it is sloe in indexing not in retrieval |
11:11 | thd | tumer: Marc wrote that you could speed things up with xpath enabled by indexing xpaths |
11:12 | tumer | xpath enabled makes bigger indexes and is slow |
11:12 | but he suggested xelem which does not exist | |
11:12 | not out of the box, pay ID and they will write it | |
11:13 | thd | tumer: yes, Sebastian corrected him. But I misunderstood what out of the box meant. |
11:14 | tumer | if there is nothing else i have to go for dinner now |
11:14 | thd | tumer: would you zip or tar your xslt files so that I can see them? |
11:15 | tumer | one sec |
11:15 | thd | tumer: I would like to see everything related to how your are indexing now |
11:16 | tumer: I want to understand this perfectly | |
11:17 | tumer | http://library.neu.edu.tr/koha[…]ce/koha2index.xsl |
← Previous day | Today | Search | Index