.pdf to .csv converter or .pdf to database convertor for EDAX EDS files
These python codes will allow conversion from .pdf to .csv format so data can be uploaded directly to a database such as PostgreSQL or Maria.
Purposes
Struggling with energy dispersive X-ray analysis (EDXA or EDAX) results that are outputted from the scanning electron microscope (SEM) as PDF files? These helpful scripts work to easily convert the PDF outputs to CSV formats or upload the data directly to a database.
Energy dispersive X-ray analysis (EDXA or EDAX), also known as energy-dispersive X-ray spectroscopy (EDS, EDX, EDXS or XEDS) or energy dispersive X-ray microanalysis (EDXMA) is a common analytical technique used for the elemental analysis or chemical characterization of a sample. However some SEM-EDS setups only output data files as uneditable .pdf files.
To avoid hours of transcribing data by hand, these python scripts have been written to do all the work for you!
.csv files are easily edittable, and compatible with most text editors and graphing softwares, including Excel, Origin, and Veusz.
Functionality
The codes are written to be compatible with files produced from a Tescan Vega XMU scanning electron microscope (SEM) coupled to a 40 mm2 EDAX ApolloTM energy dispersive x-ray detector (EDS) running EDAX GenesisTM software. The code is easily edittable to be adapted to other SEM-EDS machine/software outputs.
Future Work:
- Generalized file naming structure import to metadata
Usage
This code allows converts EDAX data saved as pdf to csv format (or uploads it to a database) with few lines of code.
To use, you need to specify directory that contains multiple or single .pdf files.
Setting up an Environment
Pip
pip install -r requirements.txt
Conda
For a new environment
conda env update -n my_sem_edax_env --file ENV.yaml
In an existing environment (e.g., setup with PyCharm)
conda env update --file environment.yml
Updating Dependencies
To publish updated environment configurations, make a conda environment YML file and a pip requirements file.
conda env export --no-builds > environment.yml
pipreqs --mode compat --use-local --force . > requirements.txt
To fetch data from Postgres, use the following query:
7172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
 SELECT sem_chemistry.sample_id,
    sem_chemistry.sem_chem_id AS chem_id,
    sem_chemistry.sem_im_id AS im_id,
    sem_chemistry.element AS sem_element,
    sem_chemistry.wt_percent AS wt_pct,
    sample.sample_name,
    sample.tube_id,
    tube.tube_name,
    tube.depth_top AS tube_depth_top,
    tube.depth_bottom,
    sem_instrument_metadata.method AS sem_method,
    sem_instrument_metadata.quantification_method AS quant_method,
    sem_instrument_metadata.quantification_standard AS quant_standard,
    sem_instrument_metadata.sem_user,
    sem_instrument_metadata.date AS sem_date
   FROM sem_chemistry
     LEFT JOIN sem_instrument_metadata ON sem_chemistry.sem_im_id = sem_instrument_metadata.sem_im_id
     LEFT JOIN sample ON sem_chemistry.sample_id = sample.sample_id
     LEFT JOIN tube ON sample.tube_id = tube.tube_id;
Dependencies
Python Version
Python 3.11
Python Libraries
- PyMuPDF ==1.19.5
- pillow ==8.4.0
- python.dateutil ==2.8.2
- pandas ==1.4.1
- hyperspy ==1.6.5
- hyperspy-base ==1.6.5
- psycogp2 ==2.8.6
- tqdm ==4.62.3
- python-dotenv ==0.20.0
- tabulate ==0.8.9
Support
If you experience issues with the code, support can be sought by emailing hanna.brooks@maine.edu.
Authors and acknowledgment
Written by Hanna L Brooks and Camden G Bock. Last update: 2023.
License
Code is licensed with a MIT License. See license section for more information.