ABBYY Screenshot Reader | Dr Andus's toolbox

Dragon NaturallySpeaking 12 is an indispensable tool for me and this weekend (begins midnight Thursday, ends midnight Monday) Nuance has a 50% off sale for the home edition, and also for some other software, at its UK online store. I’m not sure if these offers are also available to non-UK shoppers but it might be worth a try. Here are the prices:

Dragon NaturallySpeaking 12 Home – £79.99 £39.99

PDF Converter Professional 8 – £99.99 £49.99

PaperPort 14 – £49.99 £24.99

OmniPage 18 – £79.99 £39.99

I already have DNS 12, but I’m looking for an OCR application to convert entire (including scanned) PDF articles to text (that would be OmniPage then), so I can import them into ConnectedText in one go and then delete the unnecessary bits, rather than continuing with my current piecemeal method of converting bits of text with the otherwise excellent ABBYY Screenshot Reader.

Update:

After a bit more research I’ve decided to pay the extra couple of pounds and go for ABBYY FineReader 11 Pro. For one, it has generally better reviews than OmniPage 18, but also I’ve been so impressed with their Screenshot Reader – which I got for free and have been using for years – that I’m happy to reward them for that.

Update no. 2:

Wow, OCR applications have moved on since the last time I used them a few years ago! So far I’m very pleased with ABBYY FineReader. It only took a couple of seconds to convert a 17-page PDF article to Word, complete with not only all the footnotes and headings but also all of the images.

The main purpose of the scanning is to end up with a text version to be imported into ConnectedText. The scan wasn’t entirely perfect in the sense that the first page with the abstract ended up appended to the end of the Word document, but it was easy to fix that with a cut and paste job. ~~There were a couple of italicised words in the original that weren’t kept in italics in the scan and some block quotes didn’t end up looking the way they were supposed to~~, however the OCR-ing, which matters to me the most, was very close to 100% accuracy.

Update no. 3:

Correction: in fact the ABBYY FineReader scan was perfect: the problem I mentioned regarding lost italicisation and indentation of block quotes had nothing to do with ABBYY. Those features got lost in the conversion process from .docx to .htm and the CT import process. Apologies to the ABBYY folks…

If you’re wondering, I’m towards the end of the qualitative data analysis process that I’ve been describing in my ConnectedText (CT) tutorials (and in particular in this chart on the right). More specifically, I’m working on my “Findings” topic (a ‘topic’ is a document in CT’s lingo), and just today I have finally managed to complete the analysis and evaluation of all the underlying topic levels. This means that the “Findings” topic has now collected (by way of CT’s magical “include” markup) all the =Final findings= sections of its child-topics, gathering all the findings of my empirical research on one page.

This is obviously an important moment for my research project, as this will be the first time that I will be able to survey all the disparate conclusions I have drawn from nine case studies. The primary data that I have imported into CT amounts to around 800,000 words. The secondary material that I have externally linked to and which I have also reviewed could probably double that figure. This material and its associated analysis are contained in exactly 560 topics in my CT database as of today. My “Findings” topic is pulling the analysis from all these topics together into a single topic. Under the =Summary of final findings= I have now a structured list of conclusions, with several levels of headings.

The text of the “included” findings amounts to 2,834 words. This is the output of what I half-jokingly referred to as my “idea-sausage machine,” which had allowed me to process and reduce 1 million+ words into 2,834 words. In a sense this model was really a kind of machine, as part of the process involved mechanical extraction of text from one document and incorporating it into another document, over and over again. However the other half of the machine was my brain, where the “theory filters” had been applied to the data and where abstraction was carried out.

Nevertheless, both processes depended on each other: the mechanical extraction was part of the mental process of abstraction, but so was abstraction part of extraction. Tool and thought were very much co-dependent and they needed each other to produce the result: 2,834 words, which hopefully are transporting some interesting messages that can help answer my original research question.

So what next? Tomorrow I will need to reduce these 2,834 words further, possibly to a single (thesis) sentence of 15 or so words, which will be the short answer to my research question, and then to a 150-word abstract and also some more verbose formulations of my findings. How will I do that? Well, as my attached chart shows, I have been through the hoops and loops of “abstraction by way of extraction” a few times, as I gradually climbed my way up the daisy chain of my CT model.

Basically, I will need to use some tools in association with some brain cells once more.They key process involves classifying the remaining text on the basis of the themes and arguments it contains, and organising these into a hierarchy, where the more general points (conclusions) rise to the top of the hierarchy, while the specific points (examples, supporting evidence, details) are relegated to lower levels of the hierarchy, and superfluous points are relegated to the bottom of the list or deleted altogether. This task calls for some kind of an outliner software.

Now, CT is fairly well equipped to provide you with tools to guide you through this entire process. If the text to be analysed is simple enough, you could do the above analysis in the body of a CT topic itself, or in CT’s Notes pane, or in its dedicated Outliner tool, which can be docked or undocked. Indeed, I have constructed some enormous outlines with CT’s outliner, e.g. one with over 1,200 items. However, when it comes to the very last stages of drawing conclusions and making sense of complex outlines with important information, I like to switch to my favourite outliner, Natara Bonsai (Desktop Edition – see my mini-review of it here).

CT allows you to export its outline as an OPML file, which then can be imported into Bonsai (if you install this OPML filter here). However, if you don’t have a CT outline, you can just as well copy your text in CT’s view mode and paste it into the body of a new Bonsai outline, and it will look decent enough. Bonsai’s killer features for the type of analysis I need are the following:

1) Its ability to choose different colours for different levels of the outline hierarchy. This just makes the analysis so much easier, especially if you end up staring at an outline with a thousand items for several hours. 2) One-click collapsing of levels, so you can choose to see only level 1 items (which at the end of the analysis will be your main findings, as they will have risen to the top of the hierarchy), or also level 2, 3, or 4 items, or have all items expanded. This allows you to toggle on and off layers of different degrees of detail (with each layer or level being a different colour). Finally, its feature to zoom in and out of a branch of an item (also called “hoisting” in other outliners) with one click again makes it very quick to shut out the noise and focus on analysing just a single theme or issue.

When I’m done with my analysis and my most important findings have been promoted to the top of the outline hierarchy in Bonsai, I could just export the outline as OPML and import it back into CT as an outline. Although it’s easy enough to do, there is a quicker way. I could just directly copy and paste my new outline into the body of a CT topic. The slight problem with that is that if you select all the top level outline items in a collapsed view, you will still copy and paste the underlying levels of the hierarchy as well, which you may no longer need (as we are after abstraction here).

This is where a wonderful little tool comes in very handy: ABBYY Screenshot Reader. It just sits in my Windows toolbar at the bottom right, and when I click on it, it allows me to select any area of my screen, take a screenshot of it and via OCR extract the text and copy it into my clipboard. It is literally two clicks, select the area, and CTRL+V to paste it. All I need to do is use ABBYY to read my screenshot of the collapsed top-level hierarchy of findings in Bonsai, and presto, my abstractions are extracted and pasted into CT. Abstraction through extraction…

By the way, you can also do this ABBYY trick (i.e. extract the top level of a hierarchy) with BrainStorm, which is another VERY interesting tool I’ve been recently playing around with to carry out this final sorting of lists of findings. (See Manfred Kuehn’s post on how he uses BrainStorm with CT.)

Dr Andus's toolbox

for research, outlining, writing, personal information management and productivity

Tag Archives: ABBYY Screenshot Reader

Nuance software 50% off in the UK this weekend

Abstraction through extraction