Video tutorial: Marking History Sheets

We’ve published our first video tutorial on marking History Sheets. We will be publishing more video tutorials about marking and transcribing in the upcoming weeks.

We’d love your feedback about what aspects of marking and transcribing you need a pointer about.

Thanks for your contributions to Measuring the ANZACs. Lets keep transcribing!

Our first seminar with Measuring the ANZACs data

In our first blog post introducing Measuring the ANZACs to the world we told some of the story of the Dibble brothers from Auckland.

The story of the Dibble brothers from Auckland (New Zealand) illustrates some of the questions we will explore using data from Measuring the ANZACs. Victor Thomas Dibble, along with brothers Ralph Ambrose and Jesse Cyril enlisted together in the NZEF in 1916. Victor and Ralph were both bankers, while Jesse was a farmer. Their serial numbers were sequential, 26571, 26572, and 26573. Portions of their service files are mistakenly interleaved with each other. All three served in France in the New Zealand Rifle Brigade.

3/4 portrait of three Dibble brothers, Corporal (later Sergeant) Ralph Ambrose Dibble, Reg no 26572 (centre), and Privates Jesse Cyril Dibble(right), Reg No 26571, and Victor Thomas Dibble (left), Reg No 26573, all of the New Zealand Rifle Brigade, 8th Reinforcements to the 4th Battalion, - H Company, 17th Reinforcements.

3/4 portrait of three Dibble brothers, Corporal (later Sergeant) Ralph Ambrose Dibble, Reg no 26572 (centre), and Privates Jesse Cyril Dibble(right), Reg No 26571, and Victor Thomas Dibble (left), Reg No 26573, all of the New Zealand Rifle Brigade, 8th Reinforcements to the 4th Battalion, – H Company, 17th Reinforcements.

Jesse’s service was marked with both distinction (receiving a Croix de Guerre from the Belgian government) and disciplinary issues including overstaying and drunkenness. Ralph’s service was more ordinary than his brothers, staying in service until the end of the war. In 1917 he was evacuated to England for treatment of a lacerated hand suffered when trying to open a bottle by banging it against a bank. Letters in his file show that despite the incident being his own fault he was not disciplined and given base duty while he recuperated. Victor meanwhile was injured in action by a shell, and had his left leg amputated. He recuperated in England at Oatlands Park, and returned to New Zealand in February 1919. All three had survived the war unlike 18,166 of their compatriots.  

Victor married in 1927, and his rehabilitation training helped him gain a job as the Secretary to the Manawatu Racing Club in Palmerston North three years later. But in 1932 Victor’s body was discovered on the grounds of the race course. He had shot himself. More than a decade after the war ended it still enacted its toll on New Zealand’s soldiers. Jesse Dibble served in World War II and lived to be 81, while Ralph returned to his job in the National Bank and lived to age 93. The story of these brothers who grew up together, enlisted together, fought together, and came home together encapsulates many of our goals in Measuring the ANZACs.

Studying suicide in returned soldiers was not part of our original research plan in the project that led to Measuring the ANZACs. But we found interesting stories that spoke to the larger issues we were interested in about how war affected men’s health in the long-term, and the data in New Zealand is quite good for asking about suicide in particular.

But we have a relatively small sample of men whose lives we have traced in depth, and we need a larger sample to more effectively study suicide. Suicide is relatively rare, so we need large numbers—the complete transcription you are helping us with—to find the stories of men who later took their lives.

On Talk several people have pointed us to files where there is evidence of post-war suicide by men who survived the war. We were able to then look at these files and think about these men’s stories in comparison to the group whose information we’ve already collected. Today we presented a seminar about suicide in World War I soldiers, and included some of the stories that you, our citizen scientists, have turned up on Measuring the ANZACs. So, thank you, your work is helping us with our research.

But … we must do more. Our understanding of suicide will be tremendously improved by the data from the History Sheets and the Statement of Services telling us what experiences men had in the war: their wounds, sickness, and the units and battles they were in. This is data we don’t yet have, and we must transcribe.

Our research on suicide stands only as an example of the kind of research that can be done with a complete transcription of these records where we can connect people to their families, to their peers in the same military units, and find rare stories. 125,000 New Zealand men served. If we have stories about things that happened to 1000 or 2000 men we can find them with a complete transcription. Onwards with Measuring the ANZACs, and thank you.

Who is Māori in the ANZAC data?

It was Waitangi Day over the past weekend, New Zealand’s national day. It marks the signing of the Treaty of Waitangi between the indigenous Māori population and the British crown. Māori make up 15 percent of New Zealand’s population, a relatively high proportion of the population in a former colony. In Australia, Canada, and the United States the indigenous population makes up less than 5% of the population.

Thus in New Zealand attention to questions of Māori well-being are more prominent in daily life. Our research attempts to add a long-term dimension to this question. In a recent paper the research team found that Māori and Pākehā were about as tall as each other until people born around 1900, and then Māori average stature fell behind. After 1950 Māori caught up very rapidly, and now the European-descended (Pākehā) and Māori population have about the same average height.

Throughout this research we have been confronting a very basic problem. Who is Māori? Interestingly the question was never asked on World War I or World War II enlistment forms. To modern readers used to being asked their race in surveys this seems strange. In New Zealand surveys today people are asked if they have Māori descent. It seems strange also to people familiar with the frequent attempts to measure race in other societies, such as the United States.

So in the military records we have to identify Māori by using Māori names. We assume that someone with a Māori name has a Māori background. Here is an example from a History Sheet of a man named Huia, who we assume to have Māori ancestry (transcribe his record today). Screen Shot 2016-02-08 at 5.28.24 PM.png

Names are obviously an imperfect measure of ethnicity. Pākehā could have given their children Māori names. We think from reading the work of other New Zealand historians that this wasn’t as common for boys. And Māori given European names will be lost to this strategy.

Screen Shot 2016-02-08 at 5.44.05 PM.pngInformation on complexion from the medical side of the attestation can also help us. Māori had darker hair, and may have been described as having darker skin (transcribe the description of Huia Lister). As we gather a complete collection of data we’ll be able to get a better sense of what descriptors correlate with Māori names, or with enlistment in units known to have many Māori.

We anticipate that the data we’re collecting with our citizen scientists’ help on next of kin will help us identify other Māori connections and ancestry in New Zealand society. Understanding the intertwined and yet sometimes differing history of Māori and Pākehā in New Zealand society is an important part of our research. We hope this post has given you some insight into the challenges we face in proceeding, and how we’re trying to overcome them. Your help in creating more data to measure more ANZACs is incredibly important. Thank you.

Evan Roberts

(Please be in touch with the research team for copies of papers based on our research. Our email is linked above.)

Ditto to you!

Perhaps the third most frequent question that we get on Measuring the ANZACs Talk (check it out and join the conversation) is “How do you want us to deal with ditto marks?” Just in case you don’t know what a ditto mark is, it’s the quotation mark (“) or series of quotation marks (” ” ” “) that indicate the entry for a particular line is the same as the one above it. In this image there are a series of ditto marks for rank, indicating that our subject, Stewart Litchfield, stayed a private from January 1918 to March 1919.

Screen Shot 2016-01-27 at 4.37.15 PM.png

This is a great question, and in posing it our citizen scientists have recognized some of the challenges and tensions in creating accurate, structured data that still reflects the original sources.

One principal of transcription that we ask you to adhere to and promote is “Type what you see.” Don’t make editorial judgments. Spelling mistakes are interesting in the original sources. Although the research team is first and foremost interested in measuring the ANZACs—how tall were they, what did they weigh, what did they die of, what were their jobs—we know that we’re collecting a large amount of text that will give incredible insights into language in everyday use in the early twentieth century.

One way that spelling mistakes and abbreviations are interesting is in indicating what was common enough to be abbreviated. It’s interesting to know if men with names that could have a diminutive used those names. Did James call himself Jim when he enlisted? Did William call himself Bill? Type what you see.

Ditto marks are a challenge to that principle, because they introduce the potential for error into the data we’re creating. When we have all the data entered for all the fields (read this, then go transcribe, and tell your friends to transcribe) the ditto marks won’t be a problem. We’ll sort the data here by date (we can also sort it by the X-Y co-ordinates of the marks which are recorded in the database) and then we’ll see a ditto in the Rank field. It’s quite straightforward in a statistical package because this is a common problem in lots of situations in data analysis (if you’re curious, follow this link).

But your questions get at the potential for error. What if a row is missed and the ditto marks end up being replaced by the wrong original entry? This is a real concern.

Thus, when you come across a ditto mark we’d like it if you entered what the text is indicated to be. Look up the column, and find the original entry, and enter that. This is, after all, what the ditto mark is indicating “This entry in row n is what’s written in row n-1“.

So, the instructions here differ a little from the strict “type what you see” dictum that we otherwise want our citizen scientists to adhere to. But typing in the real thing and not the ditto mark is good practice in historical social science. Overall it reduces the possibilities of errors in the data we’re creating. Thanks for reading, and thanks for your contributions to Measuring the ANZACs!

Evan Roberts



Why is there not a back button?

One of the most frequent questions we get from our citizen scientists on Measuring the ANZACs is “Why is there not a back button?” so people can correct quickly-realised mistakes when classifying or transcribing?

The second most frequent question we get is “How does Measuring the ANZACs come to involve Minnesota?!” Here’s an answer to that one.

So, why is there not a back button?

The answer lies in the intersection of computer software and the genesis of the Zooniverse projects in classification projects in the sciences. By classification, we mean the projects on Zooniverse that ask you categorical and sometimes binary questions such as

  • Are there any animals in this picture?
    • How many animals are there?
    • If so, what kind of animal is it?
    • What is the animal doing?

With classification projects you’re often working more with your mouse or trackpad than the keyboard.

The design of the databases and software underlying Zooniverse projects are all structured so that when you make a decision it is passed to the server straight away, and a new record of your classification is created in the database. This design has worked really well and very few citizen scientists asked about re-doing their work on these projects.

What we’ve discovered on Measuring the ANZACs, which is one of the first transcription projects on Zooniverse, is that people often realize a mistake just after they’ve submitted an entry. (Do check out the other transcription projects: Emigrant City and Old Weather Whaling which are all based on Scribe software. But come right back to Measuring the ANZACs)

This experience is consistent with my own experience in transcribing lots of historical documents, and my experience in working with dozens of undergraduate research assistants on ANZAC research and other historical transcription projects. The process of data entry from historical documents is that when you’re transcribing hard-to-read material you often realize your better guess a few seconds after you’ve finished your first effort. All of which is to say, we know the problem from personal experience!

Hence, your question! Why is there not a back button? It really would be a great idea.

The problem, and it’s not an unsolvable one, gets back to the design of the software and database. To have a back button we’d need to change the software so that when you finish the data entry on a particular field the information would be retained locally for a short period, and then transmitted to the server with a timed lag. This would give you sometime to realize “oops, that was really Palmerston North, not Paston North” and go back to correct it.

Building this lag into the software is technically possible, but has a couple of costs for the user experience and for our data capture. First of all, it has the potential to make the web browser experience a little slower as your browser is now holding the information for a period of time. Second, it increases the chance of data loss.

In short, we’re aware this is an issue and that we need to think about it, try to assess how big of a problem it is, and the costs and benefits of the solution.

In closing I want to leave you with a couple of thoughts. First, we understand your frustration and really appreciate that our citizen scientists want to get it absolutely right! Thank you. We hope it’ll reassure you to know that

  • Everything is transcribed multiple times. The chance of everyone making the same mistake at the same place on the same piece of text is pretty small.
  • A lot of the information is “coded,” so minor spelling mistakes in place names and occupations, and even some acronyms aren’t going to matter. If you type NSRB instead of NZRB (for NZ Rifle Brigade) it will be obvious when we look at all the entries that there are thousands of NZRBs and a couple of mistaken NSRBs, and given the “fat finger” closeness of S and Z, it’s obvious what was meant.
  • The data we’ve seen from the first few months of Measuring the ANZACs looks great. We’re not worried about the quality of transcriptions.

We hope this blog post has given you an insight into the interesting intersection of historical transcription and website design, and thank you all for joining the forces at Measuring the ANZACs.


The sad story of Frederick McReynolds

On Boxing Day we tweeted about the sad story of Private Frederick McReynolds who committed suicide at Trentham Camp on Christmas 1915.

A couple of other World War I twitter accounts picked up on the timely story (see the conversation here on Storify), and raised the question of when Private McReynolds passed away. The Commonwealth War Graves commission lists the date of his death as 26 December 1916, not 1915. One thing we’ve learned in our work with demographic data (and military records are demographic data) is that dates can be wrong. It’s easy to write one date down wrong, and propagate errors through many sources.

Looking at the whole of Private McReynolds’ file supports our initial story that the date of his death was 26 December 1915, but also hints at what must have been an incredibly sad story.

Frederick Thomas McReynolds was born in Auckland on 11 September 1882 to Mary Ann and Thomas McReynolds (you can find his birth certificate details at the NZ Births, Deaths and Marriages site under registration 1881/10210). A brother, William Higgins McReynolds, was born to the McReynolds 4 years later (registration 1885/604). searches show him in the New Zealand electoral rolls, living in Onehunga in Auckland in the early twentieth century, and working in a workshop, and then as a “carter” in 1914. There is no record of a marriage in New Zealand, and the electoral rolls show him living with his parents in 1914, on the eve of the war. His brother meanwhile had married in 1907.

On Christmas Eve 1915, Frederick McReynolds attested for service. FL23619967

He was still living on Trafalgar St in Onehunga, and working as a driver, a natural extension of his previous job as a “carter.”

His attestation was voluntary, with conscription yet to be introduced, though it was being vigorously debated at the time. When McReynolds enlisted a national war census had just been taken, requiring all men between 17 and 60 to register. The pressure on men to enlist was heavy, though we do not know what it was like for any individual man. But the fact that McReynolds was single made him more likely to be a target of pressure to enlist voluntarily.

Screen Shot 2015-12-28 at 2.30.01 PM.png

Although just overweight (161 pounds on a 5′ 7″ frame) McReynolds was otherwise judged healthy and fit to serve, after examination on the 20th of December.

Screen Shot 2015-12-28 at 2.35.23 PM.png

The details of McReynolds’ suicide are scant. The file notes that his death was due to “suffocation caused by self-inflicted wound,” having cut his own throat.

Screen Shot 2015-12-28 at 2.40.23 PM.png

A telegram was sent to the family, apparently by Captain William Edward Vine. Although the file does not give any other details of Vine, there was only one man named Vine in the NZEF who reached the rank of Captain and was present in New Zealand at the time of McReynolds’ death. Several pictures of Vine later in the war can be seen in PapersPast, the National Library of New Zealand’s excellent digitized newspaper collection.

There are no named mentions of McReynolds in the New Zealand newspapers, nor in official papers. But the official reports from the Defence Forces show several other suicides in New Zealand’s military camps in 1916 and in 1917. We know little about these sad stories. Military authorities were understandably not keen on publicizing them at the time, and like other stories of suicide, suicide in military service may be hidden by family members as well. There is little scholarly literature on suicide during service in World War I, despite official attention to the question of soldiers’ suicides in nineteenth century Britain (see this chapter by Janet Padiak).

As we progress with Measuring the ANZACs we will uncover the stories of the other men who took their own lives while in the New Zealand Expeditionary Forces, just as we will uncover the stories of all others who served. Let us remember Frederick McReynolds.

Measuring the ANZACs Tutorial 5: Attestations

In our first four tutorials (One, Two, Three, Four) we covered classification, and marking of Death Notifications and History Sheets, including the Statement of Services. Today we move along to the last key document in the files: the attestation (Check out the Field Guide for a shorter synopsis)

We’ve emphasized in previous posts how Measuring the ANZACs is trying to create an efficient index to the documents. We’d love to transcribe everything right now, but it’s more realistic to think we can do some key information that tells us the kind of people we have in the files, who they are, and the types of things that happened to them.

You can think of the three different types of form we’re collecting in this way

  1. Attestations describe who men were when they arrived in the New Zealand Expeditionary Force. It’s a snapshot of their life at the start of their engagement with the war. We call this “cross-sectional” data in our analyses.
  2. The History Sheet and Statement of Services describes key events that happen to men in service, and after. We call this “longitudinal” data in our analyses.
  3. The Death Notifications are a memorial to those who fell in service, and tell us something about men’s post-war lives. Ultimately the research team want to study how men’s lives and health before and during the war influenced how long they lived.

So the Attestations are like a survey of men before their lives were changed by the war. New Zealand burned its census records after the results were published. While family historians and social scientists have used the censuses in Canada, Britain, Scandinavia, and the United States to study social life in the nineteenth century and early twentieth century, New Zealanders can’t do that. The military attestations are like a census of 10% of the New Zealand population in the early twentieth century, telling us about men’s occupations and birthplaces and education.

Because the attestations are cross-sectional, they are a little easier for the research team to analyze. We don’t need to order a series of events like we do on the Statement of Services or the History Sheet. The Attestations are also mercifully free of the sticky notes that were a design challenge for the History Sheets.

But there was a design challenge with the attestations. The form changed significantly over the course of the war. The research team has a database of 23,000 men from World War I , so we selected two attestation dates from each month of the war. We looked at the questions asked in each month, and found there were more than 30 different versions of the form. It’s really hard to predict what will be on each form! That’s where you, the citizen scientists of Measuring the ANZACs come in. You have to recognize what’s on the form, match it to the questions we’re expecting, and draw the boxes for transcription.

A final design challenge with the attestations was that some questions are conditional or multi-part. They ask, for example, “Have you served in the military before”. If you have they sometimes ask, what branch, and how were you discharged? Across the various versions of the form we have seen two, three and four part questions. They’re the hardest to recognize.

With that background, lets look at how we identify an attestation and then mark it.

Identifying an attestation

Screen Shot 2015-12-22 at 2.14.30 PM.pngOne way to identify an attestation is that it says so right up the top! A lot of the forms are also labeled E.F. Form No. 2 (E.F. stands for Expeditionary Force). Many of these forms will appear to be of poor quality. The original paper copies were microfilmed in the the 1960s, and some of the original destroyed for some files. New paper copies were printed from the microfilm, and inserted into the files. These copies were then scanned in the 2000s to create the images used in Measuring the ANZACs. But you will see some original attestation forms, and in some files you’ll notice two versions. Please identify and mark both! It’s better to have too much information than not enough.

Another way to recognize the Attestation (General Form) is the initial sequence of questions which often starts with the recruits’ name, and questions about his birth date, birth place and next-of-kin. The questions about kin vary tremendously in form across the war. Screen Shot 2015-12-22 at 2.19.43 PM.png

Marking the attestation

Screen Shot 2015-12-22 at 2.21.30 PM.png

The first thing you’ll be asked to identify is the serial number which is written in the header of the page. The location of this can vary tremendously. Sometimes it’s in a specified place and labeled serial number or regimental number. Other times it’s just written in the header. We’re asking you to transcribe this as a check that we’re getting the right serial numbers associated with the right person.

We then ask you to identify the various questions, with slightly different dialogs for the format of the question. Screen Shot 2015-12-22 at 2.24.07 PM.png







Here are some examples of one-part questions marked. Notice that the boxes can overlap a littleScreen Shot 2015-12-22 at 2.25.52 PM.png

The next questions (below) are two part questions, asking the same thing about the recruits’ father and mother. These are quite obviously two part questions because they ask about two things, and there are two lines.

Screen Shot 2015-12-22 at 2.26.57 PM.png

But there are more complicated two part question as seen in these examples

Screen Shot 2015-12-22 at 2.29.46 PM.png

Lets take a closer look: The first question has the form “Did something happen”, and the second part has the form “If so, tell us more”

Screen Shot 2015-12-22 at 2.30.37 PM.png

We go all the way down the page marking sections where there are questions or text, making sure to conclude with the date and place of attestation, which is at the bottom (Christchurch, 24th day of August 1917 in this example, it’s hard to read).Screen Shot 2015-12-22 at 2.35.15 PM.png

How does this work when we get to transcription?

Because there are so many different forms of the question we, unfortunately, have to solicit your help in transcribing the questions too. Screen Shot 2015-12-22 at 2.46.56 PM.png

Luckily the question text is printed, and easier to read! You don’t need to transcribe the number of the question.

Here’s the next entry for which we’re doing the same thing. Note that we’ve transcribed the birthplace here exactly as written “Ch Ch”. We know this is Christchurch, and we’ll be able to classify it as such without you needing to correct the abbreviation.

Screen Shot 2015-12-22 at 2.49.46 PM.png

If you think a mark has been placed around the wrong place on the page you can select “bad subject”. Sometimes in marking people draw erroneous boxes. This is our way of handling them.

If you can’t read the text please mark “Illegible.” This is more helpful than a bad guess. If it’s marked Illegible it’ll be offered up for someone else to transcribe. This is the power of the crowd! We try to distribute the work to someone who can do it.

In our next tutorial we’ll turn the page again, and look at the medical part of the attestation. This is a really important part of the researchers’ data collection, and it turns out to be one of the simplest pages to mark and transcribe. The information and layout didn’t change much, and the writing is often much better.

Happy marking and transcribing, and thanks for Measuring the ANZACs with us!