View Our Website View All Jobs

Site Reliability Engineer

We are a Marketing Platform and Analytics Solution to help Fashion, Luxury and Cosmetics professionals discover, activate and measure the voices that matter for their brands.

Founded in NYC with operating headquarters in Paris, and offices in London, Milan, Los Angeles, Tokyo, Madrid, Girona and Craiova; we work with over 1,000 brands in more than 100 countries as well as partners like IMG, the Council of Fashion Designers of America, the British Fashion Council, Pitti Uomo, and Google to accelerate their business and build lasting exposure. Our company's industry communities GPS Radar & Style Coalition bring together over 50,000 influencers, editors, buyers and more to share content, events, news, images and more.

ROLE:

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that our services —both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

SRE is also a mindset and a set of engineering approaches to running better infrastructure  systems; we build our own solutions to operations problems. Much of our software development focuses on optimising existing systems, building infrastructure and eliminating work through automation. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.

SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Launchmetrics brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

From developing and maintaining our data centers to building the next generation of our platforms, we make Launchmetrics’ product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We're always on call to keep our networks up and running, ensuring our users have the best and fastest experience possible.

RESPONSIBILITIES:

  • Administer our infrastructure built on Amazon Web Services
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Automate key deployment, monitoring, testing, and verification processes.
  • Work collaboratively with developers in supporting new features, services, releases.
  • Continuously monitor/improve the quality of our  platforms.
  • Take place in rotating on-call.

WHAT WE'RE LOOKING FOR: 

  • At least 5-7 years of demonstrable systems administration experience working in large setups.
  • Linux (fundamental)
  • Strong experience in AWS
    • Networking: VPC, subnet, security groups,
    • Compute services: EC2, ECS, Lambda,
    • Developer tools: CodePipeline, CodeBuild, CodeDeploy,
    • Databases: RDS (MySQL, SQL/Server), Elasticache, ElasticSearch
  • Strong experience with change management systems (Preferably Ansible)
  • Experience with Terraform framework for manage AWS infrastructure
  • Experience in one or more of the following:
    • Perl
    • Javascript
    • Java
    • (other backend/frontend languages are welcome )

Personal Attributes:

  • Team player. Above all we believe in the power of teamwork over individual success
  • Highly motivated self-starter
  • Quick learner. Capacity to quickly grasp new concepts and tools rapidly.
  • Great communication skills and able to give visibility on the work being done

Languages:

  • Spanish
  • English
  • Catalan

WHAT CAN WE OFFER?

  • A fantastic opportunity to be part of a fast-paced and rapidly growing team that has revolutionised the industry.
  • A chance to work with a passionate, driven and fun global team.
  • Hands-on work after quick ramp up & openness to new solutions and improvements.
  • Want to know more? Check out: launchmetrics.com

If you'd like to review our candidate privacy policy, please click here .

Read More

Apply for this position

Required*
Apply with
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

150
Human Check*