← Previous day | Today | Next day → | Search | Index
All times shown according to UTC.
Time | Nick | Message |
---|---|---|
15:01 | rach | it's halloween? |
15:02 | owen | No, I checked. |
15:02 | I haven't checked for the possibility of a mummy's curse, though. | |
15:10 | kados | hehe |
15:18 | rach | :-) |
15:24 | hi | |
15:24 | sanspach | hello |
15:28 | rach | you guys have had a busy night :-) |
15:28 | well - day for you :-) | |
15:28 | sanspach | yeah; still nothing solved, though :( |
15:30 | rach | you worked out the ^m tho |
15:30 | they are windows line breaks | |
15:30 | end of line markers | |
15:30 | sanspach | yeah, but still not certain which files are affected by it |
15:31 | and not exactly clear why some of the files that don't have them still don't work | |
15:31 | rach | and if you change one record to get rid of them that doesn't help - so make a 1 record clean file? |
15:32 | but I see gavin tried that | |
15:33 | so gavin didn't manage to get it in? | |
15:34 | gavin | hi |
15:34 | rach | hi |
15:34 | gavin | the stuff was inserting for me at the end |
15:36 | i took one of sanspach's files which he emailed me (small.sample2.mrc) and substituted one delimiter for another after which it worked | |
15:36 | rach | ah cool |
15:36 | gavin | then a newer file failed due to having some wierd win32 linebreaks stuck in the middle |
15:36 | no idea why they're there | |
15:36 | rach | yep you'll have to take them out too, in the same sort of way |
15:36 | the magic of windows :-) | |
15:37 | gavin | I haven't seen kados big file so I don't know what is wrong with what he got |
15:37 | sanspach | it is probably messed up in exactly the same way |
15:37 | it seems that MARC::Record doesn't strip the trailing ^M from the leader field when it re-writes it | |
15:38 | or maybe I messed it up; I'll have to check | |
15:38 | gavin | yes if i remove the ^M out that one works too |
15:38 | they ^Ms are all over the middle of records | |
15:38 | it almost looks like an editor wrapped them or something | |
15:39 | sanspach | they're *all* separate lines to begin with ("flat" format) |
15:39 | but when MARC::Record writes them out, I figured all the formatting would be fixed | |
15:39 | gavin | not any of the ones i've seen |
15:39 | rach | you wish :-) |
15:40 | gavin | do you mean marc format should have linebreaks? none that I've seen have them |
15:40 | sanspach | no, no just for me |
15:40 | gavin | but i know little or nothing about marc |
15:41 | sanspach | I get the data out of our system db (Oracle, but same for mysql) as separate lines |
15:41 | gavin | i see, and you patch them up together? |
15:41 | sanspach | then I put everything back together and have MARC::Record create true marc format out of them |
15:42 | gavin | Oracle. that's an expensive library system! |
15:42 | sanspach | not for a univ. that has a site license already (!) |
15:43 | but yes, actually, Sirsi's Unicorn product isn't the cheapest out there | |
15:43 | gavin | universities are indeed wonderful places |
15:43 | rach | ah well, at least it sounds like you know how to work on the data now |
15:44 | gavin | sanspach: what do you think we need to do with kados data? |
15:45 | sanspach | rm * and start over |
15:45 | gavin | not fixable? |
15:45 | sanspach | I've lost track of what the problems might be. |
15:45 | if it is just ^M we could strip those | |
15:46 | if it is subfield delimiters too, we could do that | |
15:46 | gavin | as far as I can tell it boils down to ^M and possibly delimiter substitution which would be very quick |
15:46 | rather than go through the pain of downloading 2GB again | |
15:48 | sanspach | problem is, I think the delimiter that's wrong is used elsewhere in the data, which means no global replace |
15:48 | I think the data's got to be processed again | |
15:48 | gavin | ah. |
15:49 | in that case I guess we'd better get the recreation process moving | |
15:50 | would it help if we rehearsed on a small data set? | |
15:50 | sanspach | definitely! |
15:51 | gavin | well if you want to give it a go and send me some stuff I'll try it out |
15:51 | then we can organise getting the 2GB batch off you | |
15:52 | i have a good amount of bandwidth in my university which I can use for that | |
15:55 | sanspach | OK, how should I get you the test files? I don't think putting them on my windows box and then |
15:55 | sending them through email is good ?! | |
15:56 | gavin | you were able to put it on a web server before |
15:56 | if you bzip it you, windows will just treat it as a blob and it should be safe | |
15:56 | sanspach | I'll work on that |
15:56 | gavin | so whatever works |
15:58 | sanspach | OK, same place: two files--one with 2 records, one with 100 |
16:01 | gavin | those seem fine to me |
16:04 | sanspach | want to try 10K ? |
16:05 | gavin | yeah if you like. whatever size |
16:05 | but start thinking about bzipping it | |
16:05 | it'll save both of us time and bandwith | |
16:06 | width.. | |
16:06 | sanspach | gzip? |
16:06 | gavin | yeah, that's fine either, bzip2 just gets a greater compression (although it takes more cpu time) |
16:07 | if we step up to 2gb that'll make a whale of a difference | |
16:08 | sanspach | don't seem to find bzip/bzip2 so I'll have to use gzip |
16:08 | gavin | n prob |
19:23 | kados | well that's a trick ;-) |
19:24 | chris | whats that then? |
22:07 | sanspach | kados: problems? |
22:14 | kados | sanspach: you still around? |
22:14 | sanspach | yeah |
22:14 | kados | sanspach: What's the deal with the latest conversion? |
22:14 | (looks like the process stopped) | |
22:14 | sanspach | looks like the script stopped executing; I got disconnected a couple times, but I thought it would keep going |
22:15 | it was only about 1/4 done | |
22:15 | kados | hmmm, guess not ... |
22:15 | I can start it on my end -- sound good? | |
22:15 | sanspach | I removed the partial files |
22:15 | I had it running on my machine and it has finished | |
22:15 | kados | sweet |
22:15 | sanspach | I'm bzip2'ing it now |
22:15 | kados | great |
22:17 | sanspach | as soon as it is done I'll start it transferring, but then I'm going to bed |
22:17 | kados | that's cool |
22:18 | shoot me an email with the size and I'll start indexing when it's finished uploading | |
22:18 | sanspach | will do |
22:32 | Genji | kados: tried my search options sidebar? |
02:31 | paul | salut hdl |
02:39 | hdl | salut paul |
← Previous day | Today | Next day → | Search | Index