Pathagoras can 'scrape' the usable data from completed Acrobat PDFs. Scraping is a 'deeper' process than 'scanning'. Scraping digs down to the fillable field name, not just the surface value discussed in the previous pages.
Let’s say you have send an Acrobat form to a client or customer to complete. When it is returned, you can tell Pathagoras to pull the data from the form and import it directly into your Pathagoras system.
To scrape data from an Adobe Acrobat form, select the form and click the 'Scrape Data' box in the Action column.
When you 'scrape' a document for its values, two elements of each PDF field are involved:
1.the field name. Each fillable field has a name associated with it. The field name is not visible when you look at a PDF form. It sits in the background, but is essential to the effective implementation of scraping. The field name serves as the 'variable name' for purposes of creating an Instant Database record.
2.the field value. This is the part of the completed form that you can see (and that you or your client/customer completed.)
When the document is scraped, Pathagoras records the invisible field names and pairs them with what is typed in the corresponding field value. It then transfers both values to the Instant Database screen (in Word).
With no further setup, you can scrape any data from a PDF. The current 'raw' field names and their corresponding values will be displayed. the problem is the raw field names are probably useless. Unfortunately, they typically have names line "Field1" and "Field2" (useless terms) or, if some thought was put into the namings scheme by the author, something like "User.Name.First" or "FirstName". (While the latter field names are not useless, they probably don't match the names you already use in your Instant Database system. Not to worry. We have that all figured out. We allow two solutions. The first is simply to rename the fields to match your variable. The other is to use a 'Pairing Table.' Pairing tables are discussed in another page.
So, the prerequisite to successful scraping of data that is immediately usable in your Instant Database system is that the PDFs field names must match the variable names you already use. Pathagoras makes this easy to do. Pathagoras can rename your field names automatically.
•The easiest way to change the base field names to [bracketed variable] names is to open the PDF and just type in the field the name of the variable you want the field to represent. Something like this will work:
Click the Rename button on the Acrobat screen and click the Next button:
You will be presented two options. Select the first one. (It reads 'Replace the name with the [bracketed value] of each field.") The replacements take place almost immediately. Save the PDF. You can now distribute the PDF to your clients/customers for completion. When the completed forms are returned to you, you will be able to scrape them for the data they contain directly into an Instant Database record.
VALIDATED DATE FIELDS: You will not be able to place a [bracketed variable] into a field that Acrobat has been told to 'validates'. For example, if the field calls for a date, and the field is 'validated' to make sure that what you inserted was a real date, Acrobat will reject [Date of Birth]. In such case, you should put in a valid, but 'coded' date. Remember your code. (E.g., 2/22/2222 might in your 'code' stand for Date of Marriage. When Pathagoras encounters a date field that is outside of the last 100 years, it will stop and ask for a bracketed variable.)
And alternative way to rename date fields (or any field for that matter) is to manually edit the form. Instructions on how to manually edit PDF are provided by Adobe at this link. Third parties provide good information on editing Acrobat forms. Here is one example.
•The alternative way involves using the Instant Database screen. The left side of the IDB screen will contain the current field names and the right side will reflect the new names. To complete the left side, you can tell Pathagoras to SCAN the selected PDF, and choose the 'Raw values' options. Pathagoras provides a tool called 'Print Chart" which can list all of the field names and the values currently typed into those fields is displayed onto a table. This will help you to identify which field names you need to change.
More often than not, the names of the Adobe fields will be meaningless. Pathagoras let's you use a 'Pairing Table' that you can create in Excel and store in the same folder as the scraped file. These are the requirements:
Column A contains the names of the original (PDF) fields.
Column B contains the names of the Instant Database variables (no brackets required)
The file must be stored in the same folder as the PDF file being scraped, and must be called by the original base name (no extension) PLUS a space PLUS "(pairing table)" PLUS ".xlsx" (the Excel extension)
E.g., if the PDF file is named "Intake Form.pdf", the pairing table must be called "Intake Form (pairing table).xlsx"