IRC log for #koha, 2006-03-05

← Previous day | Today | Next day → | Search | Index

All times shown according to UTC.

Time Nick Message
11:00 pierrick_ OK, I try your documentation and tell you later where I'm blocked ;-)
11:00 kados sounds good :-)
11:01 (of course, you must replace references to sourceforge with savannah)
11:01 pierrick_ (I've already done the correct checkout)
11:02 |hdl| pierrick_: for zebraserver, you have to change the call : zebrasrv localhost:2100/yourbase
11:02 rebooting
11:04 kados morning owen
11:04 owen Hi
11:05 kados owen: this afternoon I'll be setting up the new Koha server for NPL
11:45 |hdl| hello again.
11:45 héhé
11:45 paul ca marche !
11:47 |hdl| je shuis en franchais chuiche, mais an utf-8 :)
12:00 kados :
12:00 DBD::mysql::db do failed: Can't create table './kohazebra/#sql-cf8_d.frm' (errno: 150) at ../../updater/updatedatabase line 973.
12:01 kados |hdl|: what is this caused by?
12:02 |hdl| my $sql="alter table $table ADD FOREIGN KEY $row->{key} ($row->{key}) REFERENCES $row->{foreigntable} ($row->{foreignkey})";
12:02                        $sql .= " on update ".$row->{onUpdate} if $row->{onUpdate};
12:02                        $sql .= " on delete ".$row->{onDelete} if $row->{onDelete};
12:02
12:02 kados you have mysql 4.1?
12:02 |hdl| yes
12:02 paul mmm... sounds like a mysql corruption
12:02 (or permission)
12:03 kados maybe myisamchk?
12:04 |hdl| is not a Myisam DB :/
12:05 InnoDB :)
12:06 only bibliothesaurus is MyISAM
12:07 paul don't worry :
12:07 drop table bibliothesaurus !
12:07 it's useless
12:17 |hdl| paul kados : didyou know convmv ?
12:17 convmv is a PERL utility which converts filenames to a certain char set for me UTF-8
12:43 and get it into zebra..
12:47 kados I didn't know of convmv
12:48 but hdl, the latest MARC::File::XML will automatically convert your MARC records to utf-8
12:48 and since it's the primary way we currently bulkmarcimport this should already happen
12:49 I am 90% sure that that utf-8 problems are in fact just the system locale
12:49 though I haven't had a spare moment to test this
12:58 |hdl| I spoke about this utility 'cause it is PERL and deals with encoding.
13:06 kados |hdl|: try changing your system locale to utf-8
13:06 |hdl|: i suspect it will work perfectly after this
13:06 |hdl| I did.
13:06 kados still not working?
13:06 |hdl| Always my warning Wide character in print.
13:07 kados why does it work fine on koha.liblime.com?
13:07 |hdl| when using rebuild_zebra_idx.pl
13:07 kados using rel_2_2
13:07 the only difference that I can tell is that head uses utf-8 encoding in mysql
13:07 |hdl| Americans donot have the same fancy characters as french :)
13:07 kados and I don't know why that is necessary
13:08 |hdl|: but it works with chinese or french in koha.liblime.com
13:08 |hdl|: try it yourself
13:08 |hdl|: add a new branch with a french name (with fancy characters)
13:09 http://koha.liblime.com/cgi-bi[…]admin/branches.pl
13:11 |hdl| It is not a problem of base nor characters in base.
13:12 It is when writing UTF-8 files on disk for import into zebra :)
13:12 In my case, at my point :)
13:16 kados back
13:19 |hdl| open F,">:utf-8","$filename";
13:26 no.
13:28 kados writing from where?
13:28 what are you doing exactly?
13:33 paul update_zebra_idx
13:34 to generate XML biblios in files, to be able to reindex zebraidx update biblios/
13:40 kados why would you want to run zebraidx from the command line?
13:41 if biblios already exist in Koha, it should just be a matter of exporting as MARC::Record, converting to a MARC::XML::File object and indexing directly in zebra
13:41 no need to write the files to disk
13:42 paul kados : |hdl|tried that because the zoom update_zebra was very slow.
13:42 kados right
13:42 paul maybe he should try the new one (and copy your zebra.cfg improvements to unimarc)
13:42 kados i will commit the new Context.pm
13:42 paul (like shadow register)
13:43 kados shoudl be 5 times faster
13:43 but still quite slow in fact
13:43 we are still working on this
13:43 (shadow registers are probably required in fact)
13:45 committed
13:45 my load tests indicate that the slowest part of the import
13:46 is converting from MARC::Record to MARC::XML::File
13:46 especially the conversion from MARC-8 to UTF-8
13:46 paul which xml parser do you use ?
13:46 (as the pure perl one is really slow :-( )
13:46 kados I only use MARC::XML::File which uses MARC::Charset
13:47 can I change the default xml parser somehow?
13:49 paul I installed another one (search a mail about this on koha-zebra, or zebra, or perl4lib, from me) and it has been automatically choosen
13:50 kados ok ... I will
13:53 paul holidays / news commited on HEAD
13:55 kados great! I will check it out asap
13:58 paul (wait a minut, some missing files)
14:06 kados thd-away: are you present?
14:11 |hdl| kados : where are you zebra imporvements ?
14:13 kados |hdl|: just to Context.pm
14:13 |hdl|: commented out the 'search' when checking for a Zconn
14:14 paul_away bye & see you on monday
14:14 |hdl| bye paul_away
14:14 kados bye paul_away
14:14 osmoze i m following paul, bye all
14:27 |hdl| Why is there always Record 0 Type XML ? Is this normal ?
14:28 in zebrasrv log.
14:31 kados I don't know
14:31 |hdl|: I can't remember if I tested rebuild_zebra.pl with the new subroutines
14:31 |hdl|: I will check
14:32 |hdl| seems it works. But I am waiting for the results.
14:32 kados it does not use new routines
14:33 in fact, it should be fixed
14:33 I will do so and commit immediately
14:33 pierrick_ I'v encountered many problems installing ZOOM (because the available version was not compatible with my yaz version) and less than 50% of test during "make test" were satisfied :-/ I've run "make install" to finish the installation. I haven't done symbolic links yet. I'll do them on monday (or this week end if I feel like working on Koha ;-) See you on monday, enjoy your week-end
14:34 kados pierrick_: you will need the most recent versions of yaz and zebra
14:34 pierrick_: and the tests will fail
14:34 pierrick_: because currently the test server that it runs queries on is down
14:35 pierrick_: but dont' worry, it works fine
14:35 pierrick_ (I realised that and downloaded the last version of yaz, ZOOM and zebra and compiled them from source)
14:35 kados pierrick_: you MUST upgrade to latest version of yaz and zebra for perl-ZOOM to work
14:35 pierrick_: if you run debian this is quite easy
14:35 pierrick_: as index data maintains a deb repo
14:36 pierrick_ I don't really run Debian at work (because my laptop graphic chipset was not very well recognized), so I installed Ubuntu
14:37 ... installing from source is not a problem. updating may be one...
14:38 kados pierrick_: I should be around most of the weekend if you need help
14:38 pierrick_: Monday I'll be in and out
14:38 pierrick_: but here for the Koha meeting
14:38 pierrick_: (will you attend it?)
14:38 pierrick_: (it's great to have you on board btw!)
14:38 pierrick_: (I look forward to working with you!)
14:38 pierrick_ (Paul told me it was at 21h, french hour)
14:39 (I hope Erwann, my 7 months son, will be sleeping)
14:40 (very happy to work on the project, meeting with Paul was great, I learned many things at once)
14:41 bye :-)
15:06 thd kados: I am present again
15:07 kados thd: I'm working on a opac authorities search
15:07 http://opac.liblime.com/cgi-bi[…]thorities-home.pl
15:07 still quite early in the dev :-)
15:08 my goal is to get it to be as functional as /authorities.loc.gov
15:08 thd kados: did you understand the problem that I had identified with subjects in MARC 21 that could need multiple $9s ?
15:08 kados not fully
15:09 I am currently trying to expand the thesaurus frameworks
15:09 thd kados: Did you read the full MARC 21 authority doc?
15:10 kados not all of it, but much of it
15:10 thd kados: What are you adding to the thesaurus?
15:10 kados leader and fixed fields :-)
15:11 (I hope they are supported
15:11 thd kados: Oh, I had imagined that you were adding columns to the controlling table
15:11 kados do we need more columns?
15:12 thd kados: We do for all frameworks to work very well but that is not a problem for minimal working.
15:13 kados I'd like to see how far we can take the current model
15:13 so that 2.2.6 at least supports what Koha is capable of now
15:13 thd kados: It would still be the current model but we should get to minimal working first
15:13 kados of course, once I understand what it can currently do and what the limitations are, that will enable me to decide on a path for 3.0
15:13 thd with the current columns if possible
15:14 kados sounds like a plan
15:14 http://authorities.loc.gov/cgi[…]03130510&PID=3574
15:14 I'm going to add that record to Koha
15:14 and see what I can do with it
15:14 but first I need to add the frameworks support
15:15 then, maybe tomorrow, I will begin working on bulkauthimport.pl
15:17 thd kados: how would you get an external record in without bulkauthimport.pl doing something?
15:18 kados i will hand code it :-)
15:18 well ... copy/paste
15:19 thd: in your opinion, should the tag 003 be marked mandatory? or should the subfield @ be marked mandatory?
15:19 thd kados: I cannot seem to get LC to open it for me.  What is in the 1XX for that record?
15:19 kados 953716
15:20 thd kados: if the field is a control filed then both should necessarily be mandatory.
15:21 kados: I think that you gave me the 001 but what is the 1XX?
15:21 kados Lewis, C. S.
15:21 sorry
15:25 thd: is the marc21 leader plugin going to work for authorities too?
15:25 thd kados: well, with modification
15:25 kados what modification will I need to make?
15:26 thd kados: 000/06-07 at the very list are not the same
15:26 kados ok ... I'll create a new plugin right now
15:27 thd kados: you do not need to know the media type of a person or a concept
15:28 kados: the starting point should be the bibliographic plugin
15:28 kados yep
15:32 I wonder why a tag would ever be given an authorized value
15:41 owen-away: when you get back ... I'm still having trouble adding an authority type using the npl templates
15:51 thd kados: the default should be 000: #####nz##a22#####o##4500
15:52 where # is blank and ought to be filled by MARC::Record for most cases.
15:52 s/cases/positions
15:52 kados: http://www.loc.gov/marc/authority/ecadldrd.html
15:52 kados k
15:53 thd kados: The authorised values that you were most likely looking at were for the indicators
15:53 kados: That is something where more columns are needed
15:54 kados why?
15:54 http://koha.liblime.com/cgi-bi[…]etail.pl?authid=5
15:54 I have cataloged it to the best of my ability given the limitations in the current auth editor
15:55 (seems it's in an even worse state than the MARC editor)
15:55 thd kados: More columns to support separation of the indicators, and plugins instead of merely an authorised value list collectively for both independent indicators
15:55 kados: No one uses it therefore no fixes have been applied
15:55 kados I will fix it :-)
15:56 ok, so now I will find all books by C.S. Lewis in this collection
15:56 and change the author to use the authorized value
15:57 thd kados: paul uses building from the bibliographic record and no one has the time to create their own references and tracings, especially on a buggy editor.
16:00 kados: remem,ber that the authorised value should be $a Lewis, C. S. $q (Clive Staples), $d 1898-1963  not merely $a Lewis, C. S.
16:00 kados ?
16:01 I believe that's what I entered
16:01 http://koha.liblime.com/cgi-bi[…]etail.pl?authid=5
16:01 thd kados: I was just reminding you about the $a limitation currently.
16:01 kados or do you mean that the thesaurus plugin does not currently put in the $q and $d?
16:01 ahh ... right
16:01 so I'll need to fix that
16:02 there is so much to fix :/
16:02 how can any library use this?
16:02 it baffles me :-)
16:03 so first, can tag 100 in a bib record have a $q and a $d?
16:03 thd kados: remember, they are using this where one library is not even using 200 which is the UNIMARC equivalent of 100
16:04 kados ok ... $q and $d added to default MARC framework on koha.liblime.com
16:05 thd kados: that library is only entering the statement of responsibility in the title field, which is not supposed to be an authority controlled field.
16:05 kados thd: it works!
16:06 thd: http://koha.liblime.com/cgi-bi[…]etail.pl?bib=1400
16:06 it filled in $q and $d automatically
16:06 !!!
16:06 wohooo!
16:06 this IS a nice feature :-)
16:06 I can't wait to show it to a client :-)
16:06 but first we must prettify it :-)
16:07 thd kados: $q and $d are not the only considerations
16:07 kados of course not
16:08 thd kados: you should not necessarily have it fill every subfield, some could be special
16:09 kados: sorry I wrote that incorrectly
16:09 kados thd: 100$a should 'search also' 100$q right?
16:10 thd kados: you should have it fill every subfield but you may need protection for the value returned used in searches except that it would find $9 and not matter.
16:11 kados: yes it should but it does not matter if $9 is there and populated.
16:11 kados: oh except that the OPAC is not checking authorities.
16:12 kados it can now
16:12 well ... it will very soon
16:12 thd kados: so until the OPAC always uses authorities for every Koha install those other things are important
16:12 kados http://opac.liblime.com/cgi-bi[…]thorities-home.pl
16:13 do a search on Lewis
16:13 yay, it's working! :-)
16:15 thd kados: you have not adjusted the code enough for the OPAC
16:16 kados of course not
16:16 thd kados: obviously the edit authority record should become view authority record in MARC
16:16 kados yep
16:17 don't worry, it will :-)
16:17 thd kados: but the 6 bilio link must still be linked to the intranet file
16:17 kados: I get a file not found for search.pl
16:18 kados 'Used in' is working now
16:18 owen: http://opac.liblime.com/cgi-bi[…]thorities-home.pl
16:18 owen: search for 'Lewis'
16:19 owen Excellent. Summary and 'used in' both working!
16:20 thd owen: Koha may surpass Sirsi while you blink :)
16:21 kados I chose CS Lewis because that's such a problematic search currently
16:23 thd kados: There are much mre problematic ones than him
16:24 owen Funny... the authority search is basically a 'browse' search.
16:24 thd kados: He at least has an English name
16:24 kados: Variant name transliterations are where the worst problems happen without authorised names
16:25 kados owen: it introduces something we don't have currently in collections like NPLs: a relationship between different records
16:26 owen: once you have an authorities catalog you can start doing all your author searches using the authorities search
16:26 owen: same with subjects
16:26 owen: so the patron then selects a given author/subject and then they get all the results that have that exact one
16:27 thd kados: Your collections do not seem to be old enough to see what has happened to transliterations of famous Russian names over time.
16:28 owen thd: we have problems enough with names like John Le Carre, with or without the accent on the E
16:34 kados w00t!
16:34 'View Authority Record' now working
16:35 http://opac.liblime.com/cgi-bi[…]thorities-home.pl
16:42 owen ?
16:55 kados owen: not working for you?
16:55 owen Where is view authority record?
16:56 kados search for Lewis
16:56 needs major template work :-)
16:56 owen Oh, okay. I swear it still said 'edit' last time I looked :)
16:57 How does this authority stuff fit into the general scheme of the opac?
16:59 kados well it should eventually be on the main search page
16:59 as a search type
16:59 http://authorities.loc.gov/
16:59 that's the goal
17:00 also, I think we'll need a syspref for 'OpacSubjectAuthorities' and 'OpacNameAuthorities'
17:00 those will let us turn on/off a true authorities search when clicking on a Name in the OPAC
17:00 rather than just a normal author search as is done now
17:01 (I think in fact that people are surprised when they click on the name now and get results for items without that name)
17:03 bbiab
17:04 thd kados: authorities.loc.gov is useful but left anchored searches are a big limitation
18:19 owen Long drive ;)
18:20 kados heh
18:20 well I've been here a while actually
18:20 having some problems with installing the server
18:24 thd: you consorting with the enemy? :-)
18:25 thd kados: yes
18:26 kados: I cannot get an answer from LC CDS for two weeks now but I want to solve this well
19:04 kados: did you see the problem that I was describing on code4lib?
19:05 kados: I mean the subject authority problem
19:09 kados no I didn't
19:10 thd: I'll read my log
19:13 thd kados: more authority records at http://batman.gsu.edu/pineshel[…]nes-authority.mrc
19:13 kados thd: I don't see any discussion of problems with subject authorities
19:13 thd: maybe I didn't read back far enough
19:13 thd: yea got that already :-)
19:13 thd kados: busy channel
19:13 kados yep
19:14 this:
19:14 I am having trouble finding subdivisions for subject authority records from http://authorities.loc.gov .  I find no 180 fields for example only 150.
19:14 edsu: I want to find the records with 180 etc. as opposed to 150.
19:15 thd kados: that and posts before and after
19:15 kados right
19:15 I don't quite comprehend the problem
19:15 thd kados: mostly after
19:15 kados but I think I see your goal
19:16 currently Koha's authorities system will only allow you to fill values within a single tag
19:16 unless I'm wrong
19:16 thd kados: the problem that I did not relate on code4lib is just what you said.
19:17 kados cool
19:17 I don't think it will be that hard to fix that
19:17 thd kados: With repeatable $9 ?
19:17 kados the trick will be how to map which tags in the auth record match which tags in the bib record
19:18 one way to do it is to hard-code the mapping in the thesaurus plugin
19:18 this would probably be the quickest solution
19:18 thd kados: it is multiple authority records for a single bibliographic field with subdivided subjects
19:18 kados but long-term I think we would want a way to easily configure it
19:18 I dont' really understand subjects in marc
19:21 so what possible tags in the bib records should get their values from the authority record for subjects?
19:21 ie, what is the mapping?
19:21 thd kados: Subjects can be subdivided. in MARC 21 as in UNIMARC.
19:22 kados: There is a difference in the definition for authority records relating to subjects between MARC 21 and UNIMARC.
19:24 kados: MARC 21 seems to specify different authority record types for various subdivisions of the main subject while UNIMARC has just one type of subject authority record no matter how a subject may be subdivided.
19:25 kados so given 650  #0$aArchitecture$zIllinois$z​Chicago$xHistory$vPictorial works.
19:27 kados: there ought to be a heading topical term authority using $a in 150
19:27 kados: that much is trivial
19:28 kados hmm
19:30 thd kados: then the two $z would be repeated geographic subdivisions in a 150 or maybe two separate 150 authority records.
19:31 if they were two separate then $z would not be repeated within the 181
19:31 s/150/181/ on the line before
19:37 kados: the general subdivision in $x would be in a separate 180 authority record that might include the $y as well, otherwise one more authority record is needed for the $y in a 155
19:41 kados: although I do see that the 155 examples have it all in one such as 155
19:41 ##$aDictionaries$xFrench$y18th century
19:44 kados: how do I search a MARC file for the presence of records containing a particular field?
19:48 kados use dumpmarc.pl
19:48 it's in misc/ directory
19:48 dumpmarc.pl -file marc.mrc |more
19:49 then you can use the / to search once it's there
19:52 thd ok
19:52 kados: I may have been looking in the wrong place
19:53 kados: I know MARC bibliographic well but I have not spent many years looking at authority records
19:53 kados: the answer may be in 7XX rather than 1XX
20:08 kados: The question is how to find the correct authority records to match a subdivided subject like the example and I think I know now.
20:13 kados: there is also the search help qualification that I had guessed for authorities.loc.gov  "This release does not include ... Search access to form, genre, and topical subject subdivisions"
21:34 kados: are you around?
21:55 kados thd: I am now
21:55 thd: just got back from dinner
21:56 thd kados: the pines records do not have the right data for subject subdivisions
21:56 kados: Do you any from your authorities client?
21:57 kados let me see
21:58 here is an example:
21:58 NUMBER 28 =>LDR 00102nz   2200037o  4500150  0 _aChildren       _xPreparation for medical care       _xJuvenile literature
21:59 another:
21:59 NUMBER 32 =>
21:59 LDR 00089nz   2200037o  4500
21:59 150  0 _aWitchcraft
21:59       _zTexas
21:59       _zTexas Hill Country
21:59       _xFiction
21:59 another:
21:59 thd kados: records with type 180, 181,182,185
21:59 kados NUMBER 36 =>
21:59 LDR 00139nz   2200037o  4500
21:59 151  0 _aWest (U.S.)
21:59       _xSocial life and customs
21:59       _xStudy and teaching
21:59       _xActivity programs
21:59       _vJuvenile literature
21:59 ahh
21:59 I see no 180s
22:00 thd kados: although I see $x and $v
22:00 kados I still don't fully understand subjects in MARC :-)
22:00 thd kados: and $z
22:00 kados yep
22:01 and repeated $x
22:01 though I don't really understand the significance of this
22:01 or what the expectations are for how an ILS will treat them
22:03 thd kados: I understand LCSH reasonably well for bibliographic records but I have a gap for how that applies perfectly well to authority files.
22:05 kados: There is someone from Canada on the autocat list who knows systems.
22:05 kados: These questions need to be asked Monday through Wednesday to get a good answer
22:07 kados: Very few to know systems actually do what we are intending so knowledge about this is scarce.
22:07 s/know/no/
22:09 kados: This really could be a great leap forward from how even the most sophisticated systems manage authorities.
22:09 kados cool
22:10 I've got about 20 minutes of work to do setting up a new cvs repo for openncip
22:10 then i intend to spend the rest of the evening on authorities
22:10 so bear with me for a bit
22:10 thd kados:I will fetch food
22:23 kados hey rach
22:23 good to hear from you
23:57 thd kados: is your cvs server merely down or are you struggling with how Debian has packaged cvs files?
23:58 s/files/related software/
00:07 kados I just can't get the server to bind to the proper port
00:09 thd kados: Are you trying to change from the default port?
00:11 kados nope
00:11 s'all good ... I'll just register a new project at SF
00:11 i've wasted too much time on it :-)
00:11 ok ... so ... authorities
00:12 thd kados: Is it as easy to register at Savannah?
00:13 kados I did ... but that was about two weeks ago
00:13 and I haven't heard back :/
00:13 ok ... first thing I'm going to do is commit my authorities work thusfar
00:13 thd I guess that means it is much more difficult
00:15 kados: Why not just create a test tag in the Koha cvs?
00:15 kados cause it's for openncip
00:19 thd kados: I had confused your authorities work thusfar with opencip
00:21 will be back shortly
00:26 kados k
01:07 thd kados: so I found what may be part of the solution for subject authorities
01:08 kados: are you still there?
01:11 kados: It is necessarily still multiple authorities but I have seen an applicable authority for my example 650
01:35 kados thd: I'm here
01:35 thd: been doing some cleaning in rel_2
01:35 so what is the solution (and first, in simple language, what is the problem?)
01:38 thd so it requires subdividing the 1XX
01:38 kados: chopping it right in the middle
01:42 kados I don't understand what you mean and I don't understand the problem fully
01:44 there are apparantly 7 types of authority records:
01:44 http://www.itsmarc.com/crs/Auth0077.htm
01:46 thd kados: now I am off phone
01:46 kados though I do see that LOC has 'Title' authorities
01:46 but I don't see that on the MARC Authorities pages of the Cataloger's reference
01:47 thd kados: uniform tittles and series
01:47 kados thd: do you know why LOC has only four 'types'?
01:47 ahh ... types of headings
01:48 thd kados: by LOC you mean authorities.loc.gov ?
01:48 kados yep
01:48 so should Koha support all types of headings?
01:48 thd kados: there is also the search help qualification that I had guessed for authorities.loc.gov  "This release does not include ... Search access to form, genre, and topical subject subdivisions"
01:49 Koha should support authorities completely.
01:49 kados we currently don't distinguish between established and unestablished headings
01:49 i don't think
01:50 thd kados: by established you mean authority controlled and not authority controlled?
01:51 for established and not established?
01:53 kados: what are you distinguishing with established and unestablished?
01:54 kados: authority control applied, NACO heading used, authorised form used, or something else?
01:55 kados Established heading: A heading that is authorized for use in other MARC records as a main entry (1XX), added entry (700-730), or series added entry (440 or 800-830) field or as the lead element in a subject access (600-655; 654-657) field.
01:55 Unestablished heading: A heading that is not authorized for use in other MARC records as the lead element of a main, added, series, or subject access field. An unestablished heading may be a reference to a variant form of the established heading, a form of the heading used only for authority file organizational purposes, or a subject subdivision that is authorized for use with an established heading in an extended subject heading.
01:55 http://www.itsmarc.com/crs/auth1532.htm
01:59 is the unestablished heading where the 'see also' comes from?
02:00 thd kados: I believe that is a distinction between a heading conforming to the cooperative authorities database maintained by NACO for most AACR2 users and a local system heading
02:01 kados but it says that 'An unestablished heading may be a reference to a variant form of the established heading'
02:03 thd kados: well can you find an unestablished heading in records that you have?
02:03 kados no
02:03 thd kados: did you search for one already?
02:06 kados no :-)
02:08 I wouldn't know how to look for that
02:15 thd Iam trying to construct the regeex for the MARC dump but it stops one character past the 008
02:15 kados: does vim not use greedy matching?
02:17 kados: I already have my 17k copy of pines authorities open but you could use grep
02:18 kados: or awk or Perl if you like
02:23 kados: in any case you want to find the tenth position past ^008 and then [find whatever the confused documentation claims is correct]
02:23 the tenth position is 008/09
02:26 kados: well reading the documentation more closely, using the heading as the lead element is an established heading.
02:27 kados: searching 008 is no help in that case since about every possibility applies
02:27 kados: established headings start with $a
02:28 kados: unestablshed headings do not start with $a
02:29 kados: The lead element used for an authorised heading is always $a
02:32 kados: I had forgotten to escape my \+ in vim.  Why cannot every program agree on the one true regex standard?
02:34 kados: If you have the following regex in your MARC dump then you have an unestablished heading
02:36 kados: ^[14]\d\d....[^a]
02:37 kados: that matches nothing in the pines authority file
02:38 kados: oops should have been ^[14]\d\d...._[^a]
02:38 typo
02:39 as I originally typed it also matched nothing
02:40 kados: you do however have extended subject headings
02:40 kados is that grep ^[14]\d\d...._[^a] AUTH.mrc ?
02:41 if so I have none
02:42 thd kados: yes that regex is grep compatible
02:44 kados: although do you not need grep /regex/ AUTH.mrc with the '/' for using a regex instead of a string match?
02:51 kados: this will find your extended subject records with a vim regex ^[147]\d\d...._a[^_]\+\n[^_]\+_[vzxy]
02:52 kados: grep probably does not need escaping the \+ but I do not no if it can match across the newline
02:54 kados: there is an easier search for 008 that should match those in 1XX
03:03 kados can you find any matches for ^008..............[^a]   ?
03:04 kados: I have no matches for the previous regex they are all for ^008..............[a]
03:05 kados: which means that every record is an established heading record
03:07 kados: are you still awake?  I can tell you how this would work in my 650 example with architecture in Chicago
03:08 kados I'm here and listening
03:08 thd kados so given 650  #0$aArchitecture$zIllinois$z​Chicago$xHistory$vPictorial works.
03:09 kados: If we want to create that record ...
03:10 kados: we search for architecture history
03:10 kados: you can succeed in finding that at authorities.loc.gov
03:10 kados lets actually do it in Koha
03:11 isn't that 'history of architecture'?
03:11 'in illinois and chicago'?
03:11 thd kados: yet we do not have the authority records there although we could create them
03:11 kados lets create them
03:11 thd ok
03:12 kados: what is the test server?
03:12 kados go ahead and use koha.liblime.com
03:12 or koha2.liblime.com
03:12 (but if you use koha2.liblime.com it will not be in the live demo
03:13 and also, I haven't fully fixed authorities on koha2.liblime.com
03:13 thd kados: well let us see if it will work where you have fixed them
03:13 kados k
03:14 thd kados: this cannot work yet.
03:14 kados: this requires 3 authority records
03:15 kados: I will create the first one and describe what should happen where the second and later ones should be used
03:15 kados why three?
03:17 thd There is no single authority record for $aArchitecture$zIllinois$zC​hicago$xHistory$vPictorial works.  except maybe in UNIMARC authorities where there is only one type of subject authority
03:17 kados: That is built from information contained in 3 authority records
03:19 kados: we can build all the required authority records but a change is needed to manage all 3
03:19 within a single 650 for the bibliographic record
03:20 kados: shall I build the 3 authority record that would be needed?
03:21 kados: or should I describe the process and then build them?
03:22 kados describe the process first
03:22 thd kados: ok
03:23 kados: so the first record needed is 150 $aArchitecture$xHistory
03:24 kados: Koha can now add both subfields to the 650 in the bibliographic record and link with $9 to the authority record
03:25 the framework may need $x for 150 if it is not here yet
03:26 kados: so I fill 650 with $aArchitecture$xHistory
03:26 kados ok
03:26 why not with $aArchitecture$zIllinois$zC​hicago$xHistory$vPictorial works.?
03:26 thd kados: now for the fun and confusion
03:26 kados (really, you don't fill 650, you fill 150, right?
03:27 thd kados: there is no such complete authority in the NACO database
03:28 kados: I believe that if NACO worked the way UNIMARC authorities must do that would be there in an authority record
03:29 s/NACO/MARC 21 authorities
03:30 kados: Instead we move our field position location to between the $a and $x in the 650
03:31 kados wait ... I'm confused
03:32 where in the auth record are you storing the values?
03:54 thd kados: I will repost what only the ether saw
03:55 <thd> kados: so right in the middle of  $aArchitecture  [right here]   $xHistory   we need a link to add more
03:55 <thd> kados: actually we should have links before and after every subject subfield if not for many other types of fields as well
03:55 <thd> kados: so now we will add the geographic subdivision in the correct place after $a
03:55 <thd> kados: so we search for Chicago (ill.)
03:55 <thd> kados: and we find it but instead of adding 151 $aChicago (Ill.)
03:55 <thd> kados: in that same 151 geographic authority record is the form when used as a geographic subdivision
03:55 <thd> kados: that appears as 781 $zIllinois$zChicago in that same 151 $aChicago (Ill.) authority record
03:55 <thd> kados: so the system must know from the context when we are filling for a subdivision
03:55 <thd> kados: then it will use the 7XX form
03:55 <thd> kados now we have 650  #0$aArchitecture$zIllinois$zChicago$xHistory with one more subfield to go and who knows how the authority records is tracked except by one $9 for each subfield applied
03:55 <thd> so we go to the end of the subfield and search for picture books or something as a form subdivision
03:55 <thd> kados: and it returns the form subdivision that is not searchable at authorities.loc.gov
03:55 <thd> kados: that should been end of th field after the $x not end of the subfield one line above
03:55 <thd> kados: our search returns the form subdivision authority 185 $vPictorial works
03:55 <thd> kados: we append that to the end and we are done
03:55 <thd> kados: the system probably has to supply the final full stop to the last subfield
03:57 kados hmmm
03:58 I think what you describe is possible in Koha
03:59 but I'm not sure why or how to fully use it
03:59 thd kados: from your question 650 in the bibliographic record was filled from 3 different types of authority records 150 topical, 151 geographic, and 185 for subdivision
03:59 kados ie, why would you _ever_ want three separate subject authority records for a single biblio?
04:00 those three are separate records?
04:00 (are they separate _types_ of subject auth records?)
04:01 thd kados: You would want the UNIMARC way if you were designing this from nothing but we have NACO with MARC 21 authorities
04:01 kados: yes those were three separate authority records
04:02 kados: before we say that MARC 21 is all bad consider this problem
04:03 kados: UNIMARC authorities would need a system generating all possible authority records n advance or require the user to build them when they are missing much the way we would have done to fill our biblio in this example
04:06 kados: MARC 21 systems will match against the pre-existing supply of 650 fields in biblio records but there are no authority records for very many common cases.
04:07 kados: 3 authority records referenced would be uncommon but 2 would be common.
04:08 3 would not be unusual merely not prevalent
04:12 kados: having a $9 for each subfield could work with quite a bit of code change
04:12 kados: so you would have 5 $9 linking to 3 authority records for my example
04:13 kados hmmm
04:13 right now it is only possible to have a single authority record for a single bib tag ... right?
04:14 that's what you're saying?
04:14 thd kados: yes
04:14 kados I think you're also saying that it's only possible to have an authority record add values within a single tag -- whereas it should allow us to add values outside of a given tag
04:14 right?
04:16 thd kados you mean outside of a single subfield do you not?
04:17 kados no ... because currently it will already add multiple values within a single tag
04:17 ie if I have a subject authority that contains $a and $x in 150
04:17 when I add it to the 650
04:17 it will populate $a and $x
04:18 take a look at the auth record for Lewis, C. S.
04:18 to see that in action
04:18 (well ... look at the linked bib records off of that auth record)
04:18 you can do so from the opac now
04:18 opac.liblime.com
04:19 thd kados: I understand that you made that change earlier and that is all that is needed for the easy non-subject authorities
04:19 kados the first thing I must do
04:20 is to fix the authorities editor
04:20 so it is at least on par with the bib record editor
04:20 thd kados: that will also work for about half of subject fields maybe even a little more than that in the world of bibliographic records.
04:20 kados so that's good ... but I think we can do better
04:21 but like I said, let me see how paul has it set up now
04:22 thd kados: so this is for the subject headings of all the interesting books and all the extremely boring books unless you are specialist in whatever and find them extremely interesting
04:23 kados: there is another aspect of how paul has it set up
04:24 kados: Currently geographic 151 authority records would go with 651 subject headings but we needed to use them in our 650 as well.
04:25 kados: The framework design would need extension to accommodate that change
04:26 kados: Did you get enough funding for the generalised solution?
04:27 kados I still don't know what the generalized solution is :-)
04:27 thd kados: I just gave it to you in a vague directional outline
04:28 kados I'm still digesting it :-)
04:29 thd kados: the thing troubles me is dividing $a from $x derived from a single 150 when adding the geographic qualification
04:32 kados I don't quite understand that (having trouble parsing that sentence)
04:32 you mean that we currently have to divide $a from $x because of Koha's limitations?
04:32 or that we should be able to divide them but Koha can't?
04:34 thd kados: It does not seem much of a great problem in the bibliographic record editor
04:38 kados: but when using bulkauthimport.pl to match the 150 $aArchitecture$xHistory to a record that had divided those two subfields with a geographic subdivision seems understandable but requires a level of search matching that requires extra thought.
04:40 kados: I mean when importing authorities to set up $9 for records that do not have $9 yet and for newly copy catalogued bibliographic records
04:47 kados: just to be clear separating $a from $x follows the practise used in existing records.  My description had removed all the limitations from Koha.
04:50 kados thd: let's examine build_authorities.pl
04:51 thd: and make one that works for koha.liblime.com
04:51 BEFORE RUNNING this script, you MUST edit it & adapt the %whattodo hash to fit your needs. It contains :
04:51 * as key, the code of the authority to be created. It's the one you've choosen (or will choose) in Koha >> parameters >> thesaurus structure >> add). It can be whatever you want. NP/CO/NG/TI/NC in CVS refers to UNIMARC french RAMEAU category codes.
04:51 * in values a sub-hash with the following values :
04:51 \ttaglist : the list of MARC tags using this authority
04:51 \tkey : the list of MARC subfields used as key for authority. 2 entries in biblio having the same key will be considered as the same.
04:51 \tother : the list of MARC subfields not used as key, but to be copied in authority.
04:51 thd kados:do you men the poor man's way? :)
04:51 kados \tauthtag : the field in authority that will be reported in biblio. Remember that all subfields in tag "authtag" will be reported in the same subfield of the biblio (in MARC tags that are in "taglist")
04:51 don't forget to define the itemfield. In UNIMARC, it should be 995, in MARC21, probably 852
04:52 yea
04:52 I just want it to work on the demo
04:52 (for now)
04:52 so it doesn't seem like a broken feature :-)
04:52 so we have two codes right now:
04:52 SUBJECT
04:52 AUTHOR
04:52 should we create others?
04:53 thd kados: that script combined with bulkmarcimport.pl would be the starting point for bulkauthimport.pl as the existing code in bulkauthimport.pl is useless.
04:53 kados right
04:54 thd kados: we should also have UNIFORMTITLE
04:54 kados # the list of MARC tags using this authority
04:54                                taglist => "700|701|702",
04:54                                # the list of MARC subfields used as key for authority. 2 entries in biblio having the same key will be considered as the same.
04:54                                key             => "a|b|c|d|f|x|y|z",
04:54                                # the list of MARC subfields not used as key, but to be copied in authority.
04:54                                other   => "j",                                # the field in authority that will be reported in biblio. Remember that all subfields in tag "authtag" will be reported in the same subfield of the biblio (in MARC tags that are in "taglist")
04:54                                authtag => "200",
04:54 do you understand what 'other' is?
04:55 thd kados: and SERIESTITLE
04:55 kados (notice also that multiple tags can be specified in 'taglist'
04:55 thd kados: where is the 'other' ?
04:56 kados # the list of MARC subfields not
04:56               used as key, but to be copied in authority.
04:56 23:53 < kados>                                 other   => "j",
04:59 thd: notice the last one in the has has comments that I pasted above
04:59 thd kados: what cvs dir is this in?
05:00 kados misc/migration_tools
05:00 thd: what bib tags should use a SUBJECT authority?
05:00 I know 650 ... but what others?
05:01 thd: ?
05:01 thd kados 6XX except that we actually have multiple types of subject authorities even for $a
05:02 so 150 fills 650 $a for topical headings
05:03 151 fills 651 $a for geographic headings
05:05 100 fills 600 $a for personal name subject headings
05:05 kados so it seems like we need a different authority framework for each of these
05:06 thd 110 fills 610 $a for corporate name subject headings
05:06 kados but the problem is, there is no way to search across multiple authority frameworks is there?
05:06 so lets make a quick list of all the types of headings we'll need
05:06 thd kados: In what contest are you wanting to search across multiple subject headings
05:07 kados I don't know yet
05:07 lets just get the data in and then we can see what it does :-)
05:07 cause frankly I'm still confused by how subjects are supposed to work
05:07 should we do a minimal test case?
05:08 thd kados: under current behaviour if you are filling 650 it will or should only search 150 authorities
05:08 although it needs to search others for subdivisions
05:08 kados thd: do you know what 'other' is for in the hash?
05:09 here is what I have so far:
05:09 SUBJECT =>      {       taglist => "650",
05:09                                key             => "a|i|x|k|l|m|n|q|y|z",
05:09                                other   => "",
05:09                                authtag => "150",
05:09                        },
05:09 AUTHOR =>       {
05:09                                taglist => "100",
05:09                                key             => "a|b|c|d|f|x|y|z",
05:09                                other   => "j",
05:09                                authtag => "100",
05:09                        },
05:09 thd kados: I was looking for that when you pinged about the subsequent question for the multiple frameworks needed
05:09 kados thd: what do I put in 'other'
05:09 thd: and what subfields should 650 have and what subfields should 100 have?
05:11 thd easy answer first
05:11 650 should have at least $a $z $x $y $v
05:14 kados: both are easy after reading
05:15 except maybe I will read some more to be sure I am right
05:15 kados: did you see an example from UNIMARC?
05:16 kados: an example of the key and an example of the other?
05:17 kados: I see an example and now check the UNIMARC documentation
05:20 kados: having checked the UNIMARC documentation the other used in the example makes no sense
05:21 kados: other should be empty or a numeric field
05:21 s/field/subfield/
05:21 kados I'm running it right now on just SUBJECT
05:21 with other as empty
05:22 thd kados: the example shows a key for an or boolean operating on all letter subfields
05:25 kados: I imagine it will be awhile building 650 for your 50k records :)
05:26 kados thd: it's not creating 'summary'
05:26 I wonder if that's what 'other' is for
05:26 thd kados: other is for excluded subfields
05:27 kados: other is empty in most examples given for UNIMARC
05:28 kados thd: http://opac.liblime.com/cgi-bi[…]thorities-home.pl
05:28 thd kados: summary is for the framework not the authority records themselves
05:28 kados thd: do a search on Frontier
05:30 summary isn't getting populated for some reason
05:31 thd kados: maybe the templates were not fixed on this system or summary was empty all along for the subject authority framework
05:32 osmoze hello
05:33 kados hi osmoze
05:33 osmoze: are you familiar with paul's authorities system?
05:36 osmoze not really
05:36 thd kados: it seems to be working well except for the value of the 1XX from the authority to appear in the template
05:37 kados ?
05:37 ahh ... you mean the summary
05:37 I have no idea why it's not
05:37 since it is set up the same way as NAME
05:38 thd kados: summary actually was from the authority framework originally
05:39 kados ?
05:39 what do you mean?
05:39 thd kados: I see that for Lewis, C.S. only 100 $a appears in the summary column
05:39 kados right, but I could change that
05:39 what else do you want to show up there?
05:40 thd: ?
05:40 thd kados: summary is a framework column, it is not the right name for what ought to be called the authorised heading column
05:41 or something like authorised heading
05:41 kados ok I'll change it
05:41 what fields should show up for the NAMES authorized heading?
05:41 thd kados: summary as a column may have been mean to show the framework type originally
05:42 kados $a $q $d according to LOC
05:42 ok ... they should show up now
05:43 thd kados: more than that although that was there in the case of CS Lewis
05:44 kados: there is some code for cremating the correct HTML in Biblio.pm that can be adapted from 6XX use
05:46 kados: that will capture all the subfields in that may be present in the correct order.
05:48 kados thd: got it!
05:48 thd: so lets talk about what the authorized heading should look like for SUBJECT
05:48 what subfields should it have in what order?
05:54 thd kados well that is easy  do not need to even change the code really
05:55 kados: you do not inform the system what order the subfields should be in you read that from the system
05:55 s/from the system/from the record/
05:55 kados we don't have that choice unfortunately
05:56 thd kados we do have a choice and the code is already written
06:01 sorry not Biblio.pm but SearchMarc.pm
06:01 kados: getMARCsubjects
06:04 kados ok ...
06:04 so the authorized heading should be built using that SQL?
06:05 thd kados: obviously you need only one known 6XX to match one authority framework starting at $a
06:05 kados that will require re-writing paul's use of ISBD for display of the authorized heading
06:06 thd kados: oh yes the whole ISBD system in Koha is backwards
06:07 kados :-)
06:07 how so?
06:07 thd kados: everything throughout Koha should follow the model of getMARCsubjects
06:08 kados for display you mean
06:08 thd kados: order should be read from the record not set by the system
06:09 kados before we start that ... what should the subfields be for a NAME authorities record?
06:09 so I can restart the batch process
06:10 then I will take a look at ISBD
06:10 thd kados: the system should only display which fields and subfields are included not their relative order within a field or repeated set of fields
06:14 abcdefghjklmnopqrstvxyz
06:14 kados thd: are you getting that from here:
06:15 http://www.loc.gov/marc/author[…]head.html#mrca100
06:15 ?
06:15 thd kados: well that is one place but most of those would never be found in a record
06:18 kados: abcqd are the most common
06:19 with e for good measure
06:19 kados do all of the heading types listed on that page corospond to the tags in bibliographic records?
06:19 ie, do they map exactly?
06:20 thd kados: it is a one to many mapping
06:21 kados er?
06:21 so you mean that tag 100 in an authority record maps to may bib record tags?
06:22 thd kados: authority 100 maps to bibliographic 100, 600, 700 commonly and maybe others less commonly
06:23 kados but the value in 100 and 600 and 700 is always the same right?
06:23 it's that problem of MARC not being normalized?
06:23 (ie, the same value is in three places)
06:24 thd kados: not the same value in the same record unless it is an autobiography
06:24 kados: and then there would be no 700
06:25 kados so it only goes in one place then
06:25 how do we know which place it goes in?
06:26 ie, personal names ... do they always go in 100 $a?
06:26 thd kados: authority 100 is for a personal name and goes in at least 100, 600, and 700 if applicable for that bibliographic material being catalogued
06:27 kados so it _does_ put the same value in multiple places
06:27 thd kados: 700 is for an additional author if there is a co-author
06:28 kados http://www.loc.gov/marc/authority/ecadintr.html
06:28 thd kados: the same value would only be the case where the author 100 and the subject 600 were the same
06:28 kados thd: so should we have a separate authority type for each individual type?
06:28 ie for name there are many types:
06:28    * Personal names (X00)
06:28    * Corporate names(X10)
06:28    * Meeting names (X11)
06:28    * Names of jurisdictions (X51)
06:28    * Uniform titles (X30)
06:28    * Name/title combinations
06:28 thd kados: yes
06:29 kados should we try to pack them all into NAME? or should they all be separate authority types?
06:29 thd what is NAME? ?
06:29 kados http://www.loc.gov/marc/authority/ecadintr.html
06:29 NAME would be a higher-level grouping of all of those types
06:30 and SUBJECT would be a higher-level grouping of the types:
06:30    * Chronological terms (X48)
06:30    * Topical terms (X50)
06:30    * Geographic names (X51)
06:30    * Names with subject subdivisions
06:30    * Terms and names used as subject subdivisions
06:31 thd kados: name is an authority concept that is not helpful to OPAC users
06:31 kados or should all of those types have their own auth types?
06:31 thd kados: OPAC users often expect to search all author types or all title types
06:32 kados: although it can be useful for searching names as subjects
06:33 kados so we need 'auth group'
06:33 so we can group together the individual types for searching
06:33 thd kados: so name authorities can be useful to OPAC users
06:34 kados: this grouping that you are describing is not part of Koha now is it?
06:34 kados no
06:35 thd kados: except for branches
06:35 kados how do we tell when to put the values from 100 into 600 and 700?
06:35 thd kados: in what context?
06:36 kados: for bulkauthimport.pl ?
06:36 kados yes
06:36 (so I will delete NAME and SUBJECT and create the many types i listed above ... sound right to you?)
06:38 thd kados: you search each 100, 600, 700 in the bibliographic record for a matching 100 authority record or the other way around
06:39 kados: yes we need several types in the current flat arrangement that could be hierarchical with a few more columns for the framework
06:42 kados: I had thought that there may have been a problem with build_authorities.pl not showing the subject because the key had an or connector.  Try building where the key is only 'a'.
06:44 kados thd: http://koha.liblime.com/cgi-bi[…]dmin/authtypes.pl
06:44 thd: does that look right?
06:44 thd: is that what you had in mind?
06:47 thd: ?
06:48 thd kados: series is missing
06:48 kados I don't see it in the concise authorities list
06:50 thd kados: no and I do not see jurisdiction name
06:51 kados: there is genre/form though
06:51 kados http://www.loc.gov/marc/authority/ecadintr.html
06:51 Names of Jursdictions
06:51 and also very tired
06:54 thd kados: authority uniform tile must also be used for series title when applied to the bibliographic record
06:56 kados I will have to continue working on this tomorrow
06:57 I still don't understand how our framework even comes close to providing what we need
06:57 but we can discuss it tomorrow :-)
06:57 good night thd
06:57 thd kados: that name jurisdiction is what the geographic name is mapped to but I think it only need the 151 geographic name authority unless we need one authority type for every controlled bibliographic field
06:58 kados: you never had so much fun
06:58 kados: the fun will be spoilt if you are tired though :)
06:59 kados: I will see you at some mutually awake hour which will probably not still be the morning :)
06:59 good night kados

← Previous day | Today | Next day → | Search | Index

koha1