
This article outlines how Pew Research Center adopted Kubernetes to support its computational social science team. It highlights the benefits, such as improved resource flexibility, as well as challenges including the steep learning curve for researchers. The successful integration of Kubernetes into their infrastructure has enhanced team efficiency, though it’s recognized that Kubernetes may not be suitable for all organizations due to its complexity and maintenance requirements.
Main Points
Adoption of Kubernetes at Pew Research Center
Pew Research Center’s adoption of Kubernetes facilitated a flexible, scalable, and efficient data science infrastructure, improving resource availability and team collaboration.
Challenges with Kubernetes
Challenges with Kubernetes include its complexity, operational overhead, and the need for additional technical skills among researchers.
Careful consideration for Kubernetes adoption
Despite its benefits, careful consideration is advised before adopting Kubernetes, given its demands on skills, resources, and maintenance.
Insights
Kubernetes solves infrastructure scalability and management issues.
Kubernetes is used for organizing and controlling a cloud-based computing cluster, allowing on-demand resource provisioning without manual configuration or management, and ensuring applications stay healthy with autoscaling.
Kubernetes presents deployment and management challenges.
Incorporating Kubernetes into a research team is challenging due to its complexity, the overhead it adds, and the requirement for additional engineering resources to maintain the cluster.
Kubernetes impacts the workflow and skill requirements for researchers.
Kubernetes’ complexity and engineering focus present a steep learning curve for researchers unfamiliar with deeper levels of software development, potentially creating friction in their everyday work.
Links
- How we review code at Pew Research Center
- How Pew Research Center uses git and GitHub for version control
- How we built our data science infrastructure at Pew Research Center
- Data Labs unit
- standard “data science stack”
- Kubernetes
- recommended ways to run JupyterHub