Leveraging Python for Multi-Drug Cocrystal Screening

  • Sadaf Taheri

Student thesis: Doctoral Thesis


Cocrystals offer the ability to modify and optimise the physicochemical properties of the parent active pharmaceutical ingredients (API) through the inclusion of a second component. Using another API as the second component therefore offers multiple benefits, including a preformulation solution to poor solubility as well as improvements in adherence through the administration of multiple APIs. Although several screening methods have been proposed, data analysis is a tedious and manual process, which becomes extensively more time consuming with increasing volumes of data. This work aimed to improve data analysis efficiency for high-throughput screening of multi-API cocrystals by investigating the use of Python and its numerical packages to automate
parts of the workflow. Specifically the automation and optimisation using differential scanning calorimetry (DSC) and powder X-ray diffraction (PXRD) signals to screen for binary mixtures capable of cocrystals formation.

The databases of DSC and PXRD signals were generated using 90 binary mixtures of 18 therapeutically relevant APIs, commonly used in the treatment of cardiovascular diseases. APIs used in this work include but are not limited to Aspirin, Felodipine and Valsartan. Tests were carried out in triplicates to generate large pools of data. Python was used to rapidly analyse every sample in the database of DSC and PXRD signals to isolate potential hits. The key Python packages used for this work were r e, NumPy, Pandas, SciPy, Plotly and Dash. API combinations were identified as potential hits by detecting peaks in their corresponding signals and assessing the total number and location of peaks. The performance of computational algorithms for baseline correction, based on Asymmetric Least Squares Smoothing (AsLS), were also investigated. Finally, a
bespoke (multi-page) dashboard was developed to make analysis of new data accessible to users with no prior programming knowledge.

Python and regular expressions were found to be a useful tool for converting different data file types to a uniform format for screening. Python was successfully used to detect peaks in DSC and PXRD signals. Optimisation of the peak finding parameters, especially pr ominence, were found to be essential. The ideal parameter values could be determined once and afterwards be applied to all of the samples in the database. The time taken to perform tasks was reduced from days to seconds. A total of 60 combinations were consistently found to be potentially capable of cocrystal formation. Python was also used to isolate a total of 9 amorphous or partially crystalline materials. The pharmaceutical particle analysis dashboard (PPAD) can be used on any machine
with access to the internet and a web browser. PPAD was successfully used to perform the presented screening method to identify potential hits. Further functionalities, including baseline correction and peak fitting, were also added. The code was refactored and simplified multiple times, with the end user in mind, in case custom pages and features need to be implemented. The code repository for PPAD are available opensource on GitHub under the MIT license.

The work presented here showed that Python can be leveraged to effectively and
efficiently analyse DSC and PXRD signal data to isolate potential hits when screening for cocrystals. Python’s object oriented design features were essential for structuring the code so that it was easy to maintain and extent. PPAD allows users with no prior knowledge of Python to rapidly visualise and analyse DSC and PXRD signals. Pharmaceutical scientists would benefit from employing Python for increased productivity when analysing large datasets.
Date of Award26 Oct 2022
Original languageEnglish
SupervisorKofi Asare-Addo (Main Supervisor) & Lisa Gillie (Co-Supervisor)

Cite this