Disambiguating individuals from Medal Card info
I have additional information about Sapper Edward J. Wilson of the Royal Engineers. However, there are two people with exactly this name, rank and regiment in the Medal Card data. I have no idea what his service number was.
I need to be able to flag that my additional facts relate to one of these individuals, but I can't be sure which.
Is there additional information available from the IWM which could help with disambiguation of individuals? For example, my Edward was from Grasmere.
Alathea Anderssohn commented
I have a number of family members with very common Welsh names (David Griffiths as an example). When I look for these I find dozens of matches, and no way of narrowing them down since I don't know the regiments (or because there were several people with the same name in a single regiment). What I do know are their home addresses in 1914, and I know that medal cards often included the addresses. It would be very helpful to be able to view the medal cards themselves (eg as on Ancestry) to see whether the address matches up: or if the transcript provided by Lives of the First World War included the address, where it is given.
David Underdown commented
[quote]3) Adding new data, which increases duplicates:
Life Story pages are only created from some special record sets, the rest of the records are there so they can be attached to those already created Life Stories, to add more colour and background to their lives. By this arrangement hopefully fewer duplicates are created. The person served in the Army and then the Air Force yes might get duplicated when we create Life Stories from each major record set. In all cases those, this disambiguation is very difficult to write programs for, and human eyes should be more accurate[/quote]
Going slightly off-topic, the example you cite of someone serving in the army and then the RAF is actually one where in a lot of cases automated matching will work. Men in the RFC retained the same service number in the new RAF (obviously there are some who were merely attached and retained their original regimental number, who might subsequently have received a new RAF number). Likewise, men going from RNAS to RAF for the most part had an RN number prefixed F, on transfer the numeric part of their RNAS number remained the same, but the F prefix was changed to a 2 and as many zeroes as were required to make it into a six-figure number ie hypothetically F 345 would become 200345, F4738 becomes 204738 and F38713 becomes 238713. There are edge cases, and sometimes the numbering got confused, but following those matching rules (and matching on surname too for safety) will allow linkage of a high proportion of RAF men to their previous service.
Returning to the original query, sadly this situation is all too common, largely due to the number of service records which did not survive the Second World War. Sometimes a process of elimination does work, there's an address noted on the back of one medal card which allows one to be ruled in or out, one or more of the candidates can also be found in CWGC registers with address details, some can be found in the London Gazette for the award of a decoration and a residence location is mentioned (though sometimes this can add further confusion, as the location is often actually that recorded for the man's next of kin, who may have ended up somewhere unexpected). The great ting about this project is that ultimately there should be enough people with knowledge from local newspapers, church magazines etc etc that will allow many of these questions to be resolved.
Thank you for the explanations. I can certainly see the sense in your approach to creating new records. As regards merging, if you go for a master/slave design, you could still allow users to create an "orphan slave" record, which would only become generally visible once linked to a "master". I think a design where each source is clearly separated, and/but the "best" information from all sources is correlated into the "master", would be good.
As regards the paywall issue, my point was simply that you have provided lots of additional data sets, such as census data and regimental lists, in the Search Official Records activity. My assumption is that users will no longer be able to check these, and decide which records relate to this person, without paying for the privilege.
The project is interesting, in that it focuses on "stories", which implies narrative text, photographs, etc., while putting a lot of its energy into fact-gathering. I would be keen for the result, insofar as it is a collection of facts, to be visible (and searchable) as a Linked Data resource. Are you thinking along these lines? Might it be helpful if I came in to have a chat about this aspect of the project?
Hi Richard, great feedback and I'm glad that people are questioning the assumptions we've had to make in order to get the project as far as we see today. I'll attempt to answer everything you've raised, but there's quite a bit there, so do nudge me if I've missed or misunderstood anything:
1) Contributors not being able to create Life Stories
We decided to retain the rights to do this (initially at least) from a quality point of view. Whilst I believe that you and many other users knowledgeable about history would use this ability responsibly, we felt that some other might not. It might not even be malicious, but we felt it wouldn't be long before we had Bart Simpson and Mickey Mouse as Life Stories. Also we have the issue of this site being largely open to the public, and thought it was a strong possibility that the living relatives of people in Lives might create themselves on the system. Of course this opens up privacy questions we are not 100% equipped to deal with compared to if we focus on running a site where the vast majority of information on the site relates to people who are sadly not still with us.
We felt that if these things did come about, whilst we can of course fix them and to an extent "police" against them, this is all effort that would be better put into building new site features.
One such feature that we do hope to get to would be a system where some trusted members can graduate to an administrator level, at which point they too would have the rights to also create Life Stories.
2) Merging Stories
As you point out, merging does in a sense imply some sort of irreversibility. The alternative to us at least seems to be uncertainty as to which Life Story should I or am I allowed to improve. Plus we might get into situations where we have competing versions of the same person. Details are not yet 100% nailed down yet, but I think we favour a more Master/Slave type approach where everything remains visible, but it's clearly signposted which is the master record, and is the right place for all improvements. Hopefully this would allow a future separation too if needed. But like I say we can still be influenced on the design of this feature, so very happy to hear ideas you might have.
3) Adding new data, which increases duplicates:
Life Story pages are only created from some special record sets, the rest of the records are there so they can be attached to those already created Life Stories, to add more colour and background to their lives. By this arrangement hopefully fewer duplicates are created. The person served in the Army and then the Air Force yes might get duplicated when we create Life Stories from each major record set. In all cases those, this disambiguation is very difficult to write programs for, and human eyes should be more accurate
On this point, I think you're saying that by charging for some records, we stand a lower chance of records having as many eyes as possible to correct any issues in the records? Please do correct me if I'm wrong.
As for Life Stories themselves, there are a number of ways to improve a Life Story, many of which will always be free. We hope too that people will pitch in to help take facts from letters and diaries once we have build the document viewer for that. I have the diary of my Great Uncle from 1918 and once I get the chance to upload it, I would never expect a charge levied for other people to collaborate with me to transcribe it.
Also, when your paywall goes up this week, you will thereby deprive non-paying contributors of the possibility of improving your paid-for data sources by linking them to the "medals card" records. Isn't that something of an own goal?
I wonder if your underlying model is actually going to work at scale. As I understand it, you are retaining the sole right to create person records; contributors can't do this at all. Conversely, you will be adding new person data sets, which will contain duplicate records for some of these people. You will make no effort to correlate these multiple sources yourselves.
Thus, contributors will have zero or more possible records against which to match their stories, assuming perfect knowledge as to which is the "right" person. If they aren't sure which person record matches their data, they either have to guess, or give up on adding their information to the shared story. Neither outcome is satisfactory.
If you were offering a complete, authoritative source yourselves, the proposed strategy might have more going for it, but the medals data is pretty sparse, to be honest, and clearly fails to cover whole groups of people who your contributors want to add.
Also, merging records implies an irreversible certainty that the identity matches, which isn't in my view how historical research actually works (or should work). I still think it would be better (a) to allow contributors to create person records and (b) to support links between records rather than going for a merging strategy.
On the slightly different topic of merging records for the same person, this is not something that can be automated (for exactly the reasons we've been discussing around disambiguation!), and is exactly the sort of thing this project is designed to work on. It would be too huge a task for a small team of researchers, but with thousands of people finding duplicate entries for the individuals they're interested in, we can make fantastic progress.
If you think you've found two or more Life Stories for the same person (rather than two Life Stories for two individuals with the same name in the same regiment), here's more about what to do: http://support.livesofthefirstworldwar.org/knowledgebase/articles/316832-what-if-i-ve-found-two-or-more-life-stories-for-th
Thanks for the replies. It seems to me that you yourselves will have to address this "identity" issue, as soon as you add another 'official' data source (and for every subsequent data source). Unless the set of individuals in data set #2 is guaranteed to be disjoint with your medal card set ("data set #1"), you will have to do something about it. You could either undertake a disambiguation exercise, and fully merge all #2 data into the matching #1 record (which is what you are asking us to do, with our contributed data), or keep the records separate and accept that there are now lots of duplicate records describing the same person.
A middle way would be to have separate records, but to support cross-record links which say "definitely the same person as X", "possibly the same", "definitely different", etc. However, if you'er doing that for 'official' records, we could also use it for contributed data, and my problem would be resolved.
Hi Richard - the difficulty is that there is no definitive matching up of medal index cards with other data held on individuals (at the time they didn't foresee us all trying to do all this research!). This is actually one of the reasons behind this project - to pull together information like this - because it's never been done before.
So there is no definitive resource that says 'the medal index card for this Edward is for Edward from Grasmere, and that Edward is Edward from Bridgnorth'.
As we've used the medal index cards as seed data, the medal index cards are your starting point for research.
Assuming you have no chance of finding out your Edward's service number from a surprise discovery of perfectly-preserved family documents, there are a few things you can do to start narrowing it down:
- search the 'official records' by putting in the name and one or other of the two service numbers; you may find there's another official record (like an army death record) that lists both his name and service number, and helps you pin down a bit more detail.
- search the records we add to the site in future; you may find new records that connect the service number with the name, and also list extra information like addresses.
- view the scans of medal index cards available on the National Archives site as well as in other places; sometimes one may have an address or additional notes written on the back that can help disambiguate.
Hi Richard, at this stage you're right, there are a few too many cases where there's very little to go on and it's hard to determine whether you've remembered the right individual. The answer of course is to add more record sets, which I hope you believe me when I say doing this is one of our highest priorities! Fingers crossed we'll have more to share in the next week or so.