Title of scientific articles PDF

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Title of scientific articles PDF Wed May 17, 2023 2:02 am • by cristianepv96
I handle many PDFs that are scientific articles, and I would like to keep them with their original names, as they are always downloaded with very strange names. The problem is that I can't find a pattern that they all follow, so I can't make Hazel automatically associate the titles with the file names. I want to adapt only the title to the file name. Note: I tried using metadata, but it doesn't work in most cases. I thought about using the DOI to obtain the title, but I'm not sure if Hazel can do that, and if it would be very complex to write such a code. Or if there's any other way.

The only characteristic I see that they all share is that the title is always on the first page and it is always in a larger font size compared to the rest of the text in the document or the first page.
cristianepv96
 
Posts: 4
Joined: Wed May 17, 2023 1:55 am

Re: Title of scientific articles PDF Wed May 17, 2023 9:00 am • by Mr_Noodle
Can you provide a specific example?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Wed May 17, 2023 10:16 am • by cristianepv96
https://ibb.co/zhcddxd
https://ibb.co/tpQ5VQQ

Like in those two images, the article title is always presented on the first page and with a larger font size, but I can't find a clear pattern that works for most articles.
cristianepv96
 
Posts: 4
Joined: Wed May 17, 2023 1:55 am

Re: Title of scientific articles PDF Thu May 18, 2023 9:04 am • by Mr_Noodle
Are these PDFs? If you select the files in Finder and do "Get Info", do you see any metadata there that might work (check under the "More info" section)?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Sat Jul 01, 2023 2:57 am • by IvanPsy
I'm on the same boat.

This is an example:
Filename: 31019-Full article-57958-1-10-20230630.pdf

This is what I find under the Get Info of the Finder, and if I check for the text inside Hazel:
https://imgur.com/a/dBg4Drt

Hope you can check the link: it should be public.

Is it possible to create a rule to change the name accordingly?

Any help is much appreciated...
IvanPsy
 
Posts: 26
Joined: Thu Jul 14, 2022 3:38 am

Re: Title of scientific articles PDF Mon Jul 03, 2023 8:27 am • by Mr_Noodle
I do see "Titolo" (I assume that means "Title"). Is there a reason you can't use that?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Mon Jul 03, 2023 2:18 pm • by IvanPsy
Mr_Noodle wrote:I do see "Titolo" (I assume that means "Title"). Is there a reason you can't use that?


Alas I can't.
The title of the document is:

"Microsoft Word - Salton 2023_17_2 Willin
gnessToSelf-Disclose_proof_word to pdf_04_04"

but the real title of the research is:

"Willingness to Self-Disclose Cyber Victimization to Friends or Parents: Gender Differences
in Cyber Victimization a Year Later"

Now that I see, the full text there is:

Salton, M. R., Cohen, R.,Deptula, P. D., & Ray, G. E. (2023). Willingness to self- disclose cyber victimization to friends or parents: Gender differences in cyber victimization a year later. Cyberpsychology: Journal of Psychosocial Research on Cyberspace, 17(2), Article 2. https://doi.org/10.5817/CP2023-2-2

The title is between the year "(2023)." and the name of the journal "Cyberpsychology: Journal of Psychosocial Research on Cyberspace".
Maybe I can tweak around it, but how?
IvanPsy
 
Posts: 26
Joined: Thu Jul 14, 2022 3:38 am

Re: Title of scientific articles PDF Tue Jul 04, 2023 9:31 am • by Mr_Noodle
Is the title always after the date (in parentheses) and terminated with a period? If so, you can try and extract it using that contextual info.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Tue Jul 04, 2023 11:36 am • by IvanPsy
Mr_Noodle wrote:Is the title always after the date (in parentheses) and terminated with a period? If so, you can try and extract it using that contextual info.


Indeed there si always the date (that may not be the actual year, if the research is older) that ends with the two characters and a following space

Code: Select all
).


Before the date there are the authors of the research, that change accordingly.

After the title there is a dot, a space, and the text

Code: Select all
Cyberpsychology: Journal of Psychosocial Research on Cyberspace


followed with the numbers of the specific issue.

So, listening to your tip, how do I tell Hazel to extract the text contained between

Code: Select all
*).


and

Code: Select all
. Cyberpsychology: Journal of Psychosocial Research on Cyberspace*


where the asterisk means "any character"?
IvanPsy
 
Posts: 26
Joined: Thu Jul 14, 2022 3:38 am

Re: Title of scientific articles PDF Wed Jul 05, 2023 8:20 am • by Mr_Noodle
You can do a pattern like:
Code: Select all
). <<custom attribute>>.


where <<custom attribute>> is a custom text attribute you create. You can set it to match "anything".
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Thu Jul 06, 2023 12:50 am • by cristianepv96
Mr_Noodle wrote:You can do a pattern like:
Code: Select all
). <<custom attribute>>.


where <<custom attribute>> is a custom text attribute you create. You can set it to match "anything".


Hey, I'm still trying to get it and I haven't been able to. Create a RegEx that can be useful. A text that always begins with a capital letter, contains at least 4 words and includes all the text before the next word with the first capital letter. [A-Z] [a-z]+s+(?:[a-z]+s+(?! [A-Z] [a-z]+)){4,}[a-z]+ It worked perfectly, tested for extract the texto then...

I'm not really good with applescript or any kind of code, I asked chatgpt to create a shell or applescript code to use it but it doesn't seem to work.

Code: Select all
on hazelProcessFile(theFile)
    tell application "System Events" to set fileName to name of theFile
    set fileExtension to name extension of theFile
    set newFileName to do shell script "echo " & quoted form of fileName & " | sed -E 's/([A-Z][a-z]+\\s+(?:[a-z]+\\s+(?![A-Z][a-z]+)){4,}[a-z]+)/\\1/'"
    set newFilePath to POSIX path of (parent of theFile) & newFileName & "." & fileExtension
    tell application "Finder"
        set name of theFile to newFileName
    end tell
    log newFilePath
end hazelProcessFile

on hazelProcessFiles(theFiles)
    repeat with theFile in theFiles
        hazelProcessFile(theFile)
    end repeat
end hazelProcessFiles


It doesn't seem to work. It occurred to me to replicate that RegEx in Hazel but I can't find an option where it recognizes only words with the first capital letter. I don't think it exists.

Thank u a lot.
cristianepv96
 
Posts: 4
Joined: Wed May 17, 2023 1:55 am

Re: Title of scientific articles PDF Thu Jul 06, 2023 6:24 am • by IvanPsy
Mr_Noodle wrote:You can do a pattern like:
Code: Select all
). <<custom attribute>>.


where <<custom attribute>> is a custom text attribute you create. You can set it to match "anything".


I modified the rule this way:
https://imgur.com/a/aqZLIzT

But than I run the rule the filename is changed in
Code: Select all
strategies
all the previous part of the title is missing.

BTW: it doesn't allow to save the changes.
IvanPsy
 
Posts: 26
Joined: Thu Jul 14, 2022 3:38 am

Re: Title of scientific articles PDF Thu Jul 06, 2023 8:30 am • by Mr_Noodle
Odd, since the preview is showing the whole title matching (the yellow highlighted part). Can you check the logs?

Also, in your pattern, is the title of the journal always the same? If not, it might not be a good idea to hard code that name there.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Title of scientific articles PDF Thu Jul 06, 2023 3:23 pm • by IvanPsy
Mr_Noodle wrote:Odd, since the preview is showing the whole title matching (the yellow highlighted part). Can you check the logs?


I've quit and opened Hazel, now it lets me save the rule.
But another issue occurs. Please see the screenshot:
https://imgur.com/a/BQ4aiKJ

It doesn't match the content, even of the text is right.
I've checked both with and without the "..." before the brackets and the dot, with the same result: the content doesn't match.

Mr_Noodle wrote:Also, in your pattern, is the title of the journal always the same? If not, it might not be a good idea to hard code that name there.


Yes it is: I've checked several PDFs from the same journal.
Why are you asking? is there a smarter way?
I'm here to learn.
IvanPsy
 
Posts: 26
Joined: Thu Jul 14, 2022 3:38 am

Re: Title of scientific articles PDF Fri Jul 07, 2023 9:18 am • by Mr_Noodle
If the journal name is hardcoded in there, then that means that the rule will only work with that journal. If you deal with multiple journals, then you end up having a separate rule for each one. If you have a limited set of journals, you can look into using a custom table attribute where you can list all the journals you know about.

As for why it isn't working, it's not clear from the screenshot. What does the end of your pattern look like (it's cut off in the screenshot)? What if you replace the whole journal name with (...)?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Next

Return to Support