I'm confident that it may be possible to get a system inplace within one census as ultimately the online census database is providing all the data. I agree quality of data and accuracy from one census to the other is a big issue, and manual comparison by human may be the only workable system. I tried my five point system with the 1901, I also had added indexing 1-9500 to the full census and 1-1900 to the refined heads of families as I was very anxious not to loose my original work, I do keep reference copies of raw collected data and refined data so I can start again if my experiments don't work. I persisted with the manual comparison and it was very time consumming, there was an unexpected problem with the way excel sorts alpha/numerical data so although the majority did line up there were many exceptions which took more time, and the repeated entries where the system wasn't able to discriminate sufficiently well. I also did a significant amount of what I call data cleaning such as eliminating spelling mistakes and the way data was presented. Taking religion as an example there were so many variations of Roman Catholic, RC, Catholic, Church of Rome etc etc plus spelling errors. The same with place names, and occupations, marital status and I added a DOB worked out from the declared age, and an age for 1911. The order of the data has to be changed as the initial wildcard search returns in a different order to the families data, and this is also different between census. This cleaning was also required again when I had down loaded the dependants from the refined head of family list some 8000 people in total particularly on the five search catagories for the identifier otherwise it wouldn't work at all.
One problem I have had is insufficient skills with excel, and this is an area where I'm improving, making a macro to do the simplest thing saves so much time. For instance after I have the refined heads of family list, you have to add extra lines between each entry to accept the dependants. I initially used to add these extra lines manually before collecting the data, but now I have a macro which will add 10 free lines after each entry. The more I can do like this the quicker I'll get to the end result. I think I'd like to see a referencing identifier added which is flexible enough to add people later and have an inherent meaning, ideally which can be added in an automated way. One other problem is that each family group is not made up of Father,Mother, Child1-N, and there are numberous others, parents inlaws, Grand children, cousin,etc.etc. Ideally each generation will have a standard form, but the census doesn't supply names of deceased parents. A numbering system to cope with all these permutations might be possible based on one of the standard systems, for each family group plus a sequencial family ID. But we are getting more and more labour intensive to achieve the goal, and its begining to look like a realational database implented in a spread sheet.
There's still the issues of exporting to GEDCOM. I'm thinking out loud that I'll have to progess the 1911 family groups until I can compare them to the 1901 list of families, maybe there's a total number of groups like 2500 when the two census are combined, as there must be a large overlap in the two sets of data. Then start manually building a Family tree for every family, in one large family tree which is possible in FTM 2012 being careful to keep the final arrangement for identifiers which might be a combination of straight forward numerical list plus one of the standard methods for family members. I'd like it to be future proof as future generations will come along i.e. 1921 census, and there must already be instances within the data on hand which should be included within older groups and not added as a new group.
I'm now more than 50% through the collection of dependants in the 1911 census so soon I'll have the benfit of experience when trying to combine the families in due course. I've just been made redundant so I could well have the time in the new year. I'm not quite old enough to retire but if thats the way it turns out then I know I'll have plenty to keep me occupied, and I'll be off to Ireland to do some reaseach too.