Category: pdf

A process to find and extract data-points from graphs in pdf files

Ever since I discovered that it’s sometimes possible to extract the x/y values of the points/circles/diamonds appearing in a graph, within a pdf, I have been trying to automate the process. Within a pdf there are two ways of encoding an image, such as the one below. The information can be specified using a graphics […]

July 27, 2025
Working with PDF Highlight Annotations Programmatically

PDFs are the format of choice in academia, but extracting the information they contain is annoyingly hard.I’ve just started working on my degree’s final project. An academic project requires lots of research, which means reading lots of papers.Papers a…

January 14, 2018

A process to find and extract data-points from graphs in pdf files