DIRAC Interware is a platform for the integration of different computing resources into a unified system. In JINR it is used to run various jobs related to Monte-Carlo generation and data reconstruction. So far almost a million jobs have been successfully executed which consumed 500 years of wall-time. With monitoring data available it is possible to predict the duration of jobs running on the integrated infrastructure. It would allow to plan load more precisely. This project involves diving into the terminology of High-Throughput Computing and the practical use of collected data. Please, include a link to your Github(GitLab, BitBucket) account if you already have some projects completed in python.
Tasks
1. Analyze already performed workloads. 2. Design a model of the system. 3. Apply existing parameters to the model. 4. Perform prediction for the future workloads. 5. Compare prediction and real data.
Preliminary schedule by topics/tasks
The schedule is going to be established after beginning of the project
Required skills
A strong python programming skill is required. Good knowledge of OOP is required. Knowledge of design patterns is gonna be an advantage.
Acquired skills and experience
Practical experience in product development with Python. Data analysis experience.
Recommended literature
"Design Patterns: Elements of Reusable Object-Oriented Software" by The "Gang of Four"