Olive42.GitHub.io

View My GitHub Profile

Olivier Tharan Site Reliability Engineer / DevOps olivier@tharan.org +33 6 25 13 31 08

I enjoy working on infrastructure and making sure it is always on, and design computer systems that allow automation, on the principle of “Don’t Repeat Yourself”, in order to make people more productive. Along my career, I evolved from a pure operations perspective with oncall duties on the firefighting side, to be more development-oriented, and would like to share this knowledge with other teams, and grow with them.

Work experience

Criteo (2016 - current)

Senior Devops engineer for the Devtools team, based in Paris, France. My main responsibilities are taking care of the Continuous Integration: Gerrit for code reviews, Jenkins with 2500+ jobs spanning Linux, Windows and Mac builders, continuously building and testing all Criteo code from source and preparing for the releases.

Google (2004 - 2016)

Site Reliability Engineer for the Corporate Engineering team, based in Dublin, Ireland, then Paris, France. My main job for 12 years has been to ensure that internal services for the company keep running, and the 60,000+ Google employees can work productively.

Beyond Corp “redefine the concept of enterprise network perimeter”

Redefine the concept of Enterprise architecture beyond the traditional network perimeter, by providing finer-grained access controls for all internal services and stop relying on internal IP addresses as indicators of trust. A presentation on this project was made at LISA’13.

I initiated the project, which was mostly program management and directing the work of dozens of different teams across the IT and Security departments.

I designed a client-server project to send logs from all corporate machines to our centralized logs systems; I wrote the server in C++.

Moving from a permit-all network to permissions based on machine certificates and LDAP group membership requires self-service tools for Google employees.I am currently designing and implementing a tool to troubleshoot network access issues for corporate users, leading a small team around it and building the Go-based server software.

Ganeti “put all corporate servers on virtual machines”

Admin our fleet of internal virtual machines running the Ganeti software (opensource, built in-house). These go from 2-machine clusters to several racks worth of virtual machines located in our offices and datacenters. My job was to reply to alerts, performing upgrades and build automation tools to manage large clusters of virtual machines.

Other projects

Be oncall for critical Google services with a tight SLA, < 5m response.

Change management: we have a corporate change management process, which consists in reviewing upcoming changes, making sure they are planned with appropriate thoughts for user communication, planned rollback, escalation paths, etc. I chaired the weekly meetings and brought lots of improvements to the process itself.

Manage the fleet of NetApp fileservers across the company. Google has some of the largest distributed fleet of filers for its corporate needs.

Manage a fleet of proxy servers (Squid and in-house).

Major skills

Write and deploy application servers (HTTP + RPC) written in C++, Go and Python on standard Google production environment (Borg), with all the relevant SRE practices. Gained internal readability in C++ and Python, working on Go. Project management skills involve project tracking, making sure various services are up to SRE standards by performing a Production Readiness Review (and updating the review process itself), help development teams get to these standards, lead 2-3 person teams on a project, communicate with teams all across the company.

Systems design: I wrote several designs ending up in full-fledged projects. The major one is “Beyond Corp” described above. I also reviewed dozens of projects with relevant feedback.

8-year proven track record of working remotely from the rest of my respective project teams: from Paris, France with teams in Switzerland, Ireland, US East and West Coast.

A taste for security-oriented architecture and large-scale networking.

Institut Pasteur (2002 - 2004)

Systems and network administrator for a 2000-person biology research campus in the center of Paris. My main tasks were maintaining servers, doing user support and network connectivity for the campus. Significant accomplishments: redid the email infrastructure using Postfix and tying it to the internal LDAP servers for user lookup; setup replication between 3 LDAP servers; redid the caching proxy infrastructure; upgraded some relatively high traffic (for the time) web servers.

IDEALX (now OpenTrust) (2000 - 2002)

IDEALX was one of the first entirely opensource-oriented company in France, with a dedication to releasing most of their client projects as opensource. I worked with clients on various projects; the main ones consisted in performing systems administration for Tiscali (massive web hosting provider, part of the ISP), and doing Perl-SOAP development for Cegetel (telecommunications provider and ISP).

Skills

Very good knowledge of Unix-like systems administration: Linux, FreeBSD; shell scripting, Python, Go; fair, but limited knowledge of C++, Java, Groovy.

System automation, change management, use of version control systems: Perforce, and build management tools: Chef, Jenkins, some internal to Google. Make heavy use of automation and unit testing to ensure code and data will perform as intended.

Education

Masters degree as engineer-manager (majored in networking and distributed systems) from Telecom Sud Paris, one of the leading French engineering schools.