Archive | January 2016

Ditto to you!

Perhaps the third most frequent question that we get on Measuring the ANZACs Talk (check it out and join the conversation) is “How do you want us to deal with ditto marks?” Just in case you don’t know what a ditto mark is, it’s the quotation mark (“) or series of quotation marks (” ” ” “) that indicate the entry for a particular line is the same as the one above it. In this image there are a series of ditto marks for rank, indicating that our subject, Stewart Litchfield, stayed a private from January 1918 to March 1919.

Screen Shot 2016-01-27 at 4.37.15 PM.png

This is a great question, and in posing it our citizen scientists have recognized some of the challenges and tensions in creating accurate, structured data that still reflects the original sources.

One principal of transcription that we ask you to adhere to and promote is “Type what you see.” Don’t make editorial judgments. Spelling mistakes are interesting in the original sources. Although the research team is first and foremost interested in measuring the ANZACs—how tall were they, what did they weigh, what did they die of, what were their jobs—we know that we’re collecting a large amount of text that will give incredible insights into language in everyday use in the early twentieth century.

One way that spelling mistakes and abbreviations are interesting is in indicating what was common enough to be abbreviated. It’s interesting to know if men with names that could have a diminutive used those names. Did James call himself Jim when he enlisted? Did William call himself Bill? Type what you see.

Ditto marks are a challenge to that principle, because they introduce the potential for error into the data we’re creating. When we have all the data entered for all the fields (read this, then go transcribe, and tell your friends to transcribe) the ditto marks won’t be a problem. We’ll sort the data here by date (we can also sort it by the X-Y co-ordinates of the marks which are recorded in the database) and then we’ll see a ditto in the Rank field. It’s quite straightforward in a statistical package because this is a common problem in lots of situations in data analysis (if you’re curious, follow this link).

But your questions get at the potential for error. What if a row is missed and the ditto marks end up being replaced by the wrong original entry? This is a real concern.

Thus, when you come across a ditto mark we’d like it if you entered what the text is indicated to be. Look up the column, and find the original entry, and enter that. This is, after all, what the ditto mark is indicating “This entry in row n is what’s written in row n-1“.

So, the instructions here differ a little from the strict “type what you see” dictum that we otherwise want our citizen scientists to adhere to. But typing in the real thing and not the ditto mark is good practice in historical social science. Overall it reduces the possibilities of errors in the data we’re creating. Thanks for reading, and thanks for your contributions to Measuring the ANZACs!

Evan Roberts




Why is there not a back button?

One of the most frequent questions we get from our citizen scientists on Measuring the ANZACs is “Why is there not a back button?” so people can correct quickly-realised mistakes when classifying or transcribing?

The second most frequent question we get is “How does Measuring the ANZACs come to involve Minnesota?!” Here’s an answer to that one.

So, why is there not a back button?

The answer lies in the intersection of computer software and the genesis of the Zooniverse projects in classification projects in the sciences. By classification, we mean the projects on Zooniverse that ask you categorical and sometimes binary questions such as

  • Are there any animals in this picture?
    • How many animals are there?
    • If so, what kind of animal is it?
    • What is the animal doing?

With classification projects you’re often working more with your mouse or trackpad than the keyboard.

The design of the databases and software underlying Zooniverse projects are all structured so that when you make a decision it is passed to the server straight away, and a new record of your classification is created in the database. This design has worked really well and very few citizen scientists asked about re-doing their work on these projects.

What we’ve discovered on Measuring the ANZACs, which is one of the first transcription projects on Zooniverse, is that people often realize a mistake just after they’ve submitted an entry. (Do check out the other transcription projects: Emigrant City and Old Weather Whaling which are all based on Scribe software. But come right back to Measuring the ANZACs)

This experience is consistent with my own experience in transcribing lots of historical documents, and my experience in working with dozens of undergraduate research assistants on ANZAC research and other historical transcription projects. The process of data entry from historical documents is that when you’re transcribing hard-to-read material you often realize your better guess a few seconds after you’ve finished your first effort. All of which is to say, we know the problem from personal experience!

Hence, your question! Why is there not a back button? It really would be a great idea.

The problem, and it’s not an unsolvable one, gets back to the design of the software and database. To have a back button we’d need to change the software so that when you finish the data entry on a particular field the information would be retained locally for a short period, and then transmitted to the server with a timed lag. This would give you sometime to realize “oops, that was really Palmerston North, not Paston North” and go back to correct it.

Building this lag into the software is technically possible, but has a couple of costs for the user experience and for our data capture. First of all, it has the potential to make the web browser experience a little slower as your browser is now holding the information for a period of time. Second, it increases the chance of data loss.

In short, we’re aware this is an issue and that we need to think about it, try to assess how big of a problem it is, and the costs and benefits of the solution.

In closing I want to leave you with a couple of thoughts. First, we understand your frustration and really appreciate that our citizen scientists want to get it absolutely right! Thank you. We hope it’ll reassure you to know that

  • Everything is transcribed multiple times. The chance of everyone making the same mistake at the same place on the same piece of text is pretty small.
  • A lot of the information is “coded,” so minor spelling mistakes in place names and occupations, and even some acronyms aren’t going to matter. If you type NSRB instead of NZRB (for NZ Rifle Brigade) it will be obvious when we look at all the entries that there are thousands of NZRBs and a couple of mistaken NSRBs, and given the “fat finger” closeness of S and Z, it’s obvious what was meant.
  • The data we’ve seen from the first few months of Measuring the ANZACs looks great. We’re not worried about the quality of transcriptions.

We hope this blog post has given you an insight into the interesting intersection of historical transcription and website design, and thank you all for joining the forces at Measuring the ANZACs.