| * ndushay leaves | 03:32 | |
| * cbeer joins | 09:22 | |
| * bess joins | 10:04 | |
| * MrDys joins | 10:08 | |
| * bess leaves | 10:23 | |
| * jamieorc joins | 10:33 | |
| * erikhatcher leaves | 11:18 | |
| * bess joins | 11:26 | |
| * bess leaves | ||
| * bess joins | 11:28 | |
| * erikhatcher joins | 11:48 | |
| * BillDueber joins | 12:09 | |
| * jkeck joins | 12:13 | |
| * ndushay joins | 13:32 | |
| * ndushay leaves | 13:35 | |
| * ndushay joins | 14:02 | |
| * ndushay leaves | 15:33 | |
| * ndushay joins | 15:35 | |
| * bess leaves | 16:27 | |
| <ndushay> | erikhatcher: you here? | 16:48 |
| (or there) | ||
| <erikhatcher> | i am, but only for a few more | |
| fyi - i plan on getting one of the MARC sets that rsinger pointed me to and giving SolrMarc a try this week to see if i can find out what is causing the indexing slowdown | 16:49 | |
| <ndushay> | erikhatcher: ok | |
| i can give you my jar if you like | 16:50 | |
| with all the stanford processing goodies | ||
| but we have 999s | ||
| i can probably get you our marc data also | ||
| just need to find out where to put it | ||
| and don't forget - we don't write to index via Solr; we just write to the index directly | 16:51 | |
| <erikhatcher> | ndushay: i'll see if i can provide an ftp spot for that stuff | |
| ndushay: well, that writing to the index directly is what i want to change! it's not the scalable way to do it | ||
| whatever is going on can be fixed if the actual MARC processing isn't the bottleneck - no reason indexing is taking that long otherwise | ||
| <ndushay> | erikhatcher: both bob and i are not facile in profiling | 16:53 |
| erikhatcher: any help you can provide would be awesome | ||
| it's about 8G of data for our 6M records | ||
| and the index is about 27G | ||
| <erikhatcher> | ndushay: how many MARC files? | |
| <ndushay> | 6G | |
| M for million - whoops | 16:54 | |
| <erikhatcher> | not how big, how many? | |
| <ndushay> | oh | |
| 17 | ||
| <erikhatcher> | one issue is if you want to parallelize the indexing and process multiple MARC files at a time, you can't do it with the embedded indexer anyway | |
| at least not without writing the code to be multithreaded yourself | ||
| <ndushay> | k | |
| <erikhatcher> | but with indexing via HTTP you can fire up multiple indexers | |
| i'll take a look at this stuff this week. maybe not tomorrow, but soon, promise | 16:55 | |
| <ndushay> | so i would potentially do 17 indexers? | |
| <erikhatcher> | this is in my ramp up to c4lcon :) | |
| <ndushay> | awesome | |
| <erikhatcher> | yup, you could fire up that many indexers probably | |
| <ndushay> | let me know if you want me to do any part of the solr black belt workshop | |
| i figure i'll just be in there learning and helping folks when i can. | 16:56 | |
| <erikhatcher> | that'd potentially cut indexing down by 1/17th | |
| <ndushay> | yep. | |
| <erikhatcher> | to do any part of it? you're co-teaching it! :) 50/50 dear! | |
| <ndushay> | lol | |
| <erikhatcher> | no worries... it'll be fun | |
| <ndushay> | well, what could i possibly have to say that you wouldn't say better? | |
| i think our marc processing is a lot more involved that uva | 16:57 | |
| just fyi. | ||
| !#$!@#$ call numbers, for one thing. | ||
| and 500 flavors of title fields. | ||
| <erikhatcher> | even UVa's indexer is too slow IMO | |
| anyway, gotta run, more later | 16:58 | |
| * erikhatcher leaves | ||
| <jrochkind> | the marc processing is definitely involved. First step is figuring out how much of your/our time is SolrMarc processing, and how much is Solr indexing itself. | 17:03 |
| If a bunch of it is SolrMarc, then we can probably find bottlenecks to optimize in SolrMarc. But, yeah, there's gonna be a lot of processing no matter what. | 17:04 | |
| * ndushay leaves | 18:07 | |
| * ndushay joins | 18:47 | |
| * cbeer_ joins | 18:51 | |
| * ndushay leaves | 19:31 | |
| * jkeck leaves | 20:04 | |
| * ndushay joins | 21:00 | |
| * erikhatcher joins | 21:18 | |
| * bess joins | 21:24 | |
| * bess leaves | 21:32 | |
| * bess joins | 21:52 | |
| * bess leaves | 23:54 | |
Generated by Sualtam