Renaming and tagging based on additional XML file

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Dear,

I am just switching to the Mac world and to Hazel as well. So first of all: Hello :mrgreen:
Based on my Newbie status it might happen that I need to ask several times to understand everything correctly, so sorry in advance.

On my Windows machine, I had a DMS called ecoDMS which was working well. Unfortunately, I cannot get it running under Mac OS, so I was looking for an alternative. Seemingly DEVONthink is what I am looking for.
After having identified where to go to, I am now looking for a way how. The export of ecoDMS gives me an export of my roughly 5.000 documents (plus partially the tiff-data from the scanning process) along with an XML file, which is structured like this:

Code: Select all
<document docid='4'>
   <files id='4' origname='20120204201433174.pdf' filePath='ecodms_docid_0000004.pdf'>
      <fileVersion id='4' version='1' origname='20120204201433174.tif' filePath='ecodms_docid_0000004_revision_0001.tif'>
         <pdfFile origName='20120204201433174.pdf' filePath='ecodms_docid_0000004_revision_0001.pdf'/>
         <user></user>
         <fixed>true</fixed>
         <date>MjAxMi0wMi0wNCAyMToxMDozMC4w</date>
         <fixuser></fixuser>
         <fixdate>2012-02-04 21:10:30.0</fixdate>
      </fileVersion>
   </files>
   <classifyInfos>
      <classifyInfo cla_docs_id='4' revision_count='2' trashed='false'>
         <Version>
            <ordner>GKV</ordner>
            <hauptordner>Finanzen</hauptordner>
            <bemerkung>&#220;bersendung elektronische Gesundheitskarte</bemerkung>
            <status>Erledigt</status>
            <revision>1.1</revision>
            <dokumentenart>Anschreiben und Informationen</dokumentenart>
            <letzte-änderung>2012-02-04 21:25:07.815</letzte-änderung>
            <datum>2011-12-30</datum>
            <bearbeitet-von>Philip Deubner</bearbeitet-von>
            <zurückgestellt-bis></zurückgestellt-bis>
            <zu-bearbeiten></zu-bearbeiten>
            <zur-ansicht></zur-ansicht>
            <belegnummer>null</belegnummer>
            <kunden--kontonummer></kunden--kontonummer>
            <steuerrelevant></steuerrelevant>
            <ordner-extkey></ordner-extkey>
         </Version>
         <Version>
            <ordner>GKV</ordner>
            <hauptordner>Finanzen</hauptordner>
            <bemerkung>&#220;bersendung elektronische Gesundheitskarte</bemerkung>
            <status>Zu Bearbeiten</status>
            <revision>1.0</revision>
            <dokumentenart>Anschreiben und Informationen</dokumentenart>
            <letzte-änderung>2012-02-04 21:10:31.799</letzte-änderung>
            <datum>2011-12-30</datum>
            <bearbeitet-von>Philip Deubner</bearbeitet-von>
            <zurückgestellt-bis>null</zurückgestellt-bis>
            <zu-bearbeiten></zu-bearbeiten>
            <zur-ansicht></zur-ansicht>
            <belegnummer>null</belegnummer>
            <kunden--kontonummer></kunden--kontonummer>
            <steuerrelevant></steuerrelevant>
            <ordner-extkey></ordner-extkey>
         </Version>
      </classifyInfo>
   </classifyInfos>
</document>


for EACH ONE of the files.
What I want to achieve is to have the files in DT with

- <bemerkung> as file name
- <datum> as creation date
- <ordner>, <dokumentenart>, <belegnummer> and <kunden--kontonummer> as tags

My idea is to import this into DT afterwards.
Open question: How is this possible?

Thank you for sharing your thoughts and for pointing me into the right direction
Best regards,

Connor
Connor
 
Posts: 8
Joined: Sun Jun 02, 2019 1:56 pm

You're going to need an AppleScript to import into DevonThink. There may be one here you can adapt. Try searching around otherwise you might have better luck on their forums.

If there is a separate XML file for each file, then you might be able to have Hazel drive it. You can use "Contents contain match" and set up a custom attribute to grab the text between tags. You can then send those custom attributes to the AppleScript.
Mr_Noodle
Site Admin
 
Posts: 7976
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

It seems that this is a rather intense thing.
Thank you fo your feedback. I will try to get the seperate XML files first.

Best regards - and thank you for your support,

Connor
Connor
 
Posts: 8
Joined: Sun Jun 02, 2019 1:56 pm

Just for someone who is looking for a similar topic:

I was able to split the XML file by using a csplit command based on the <document> tag.
This worked extremely well. After having the XML file separated, I needed to add a .txt extension to the file names.
After having done so, a file renaming based on "document contains match" did the trick.

So first step done - I need to elaborate to get the respective .pdf files being renamed, related and tagged based on the now available .xml files.

Best regards,

Connor
Connor
 
Posts: 8
Joined: Sun Jun 02, 2019 1:56 pm


Return to Support