Job Details

Site Reliability Engineer II

LEWISVILLE-75027, TX, US
11/11/2019

-


Required Skills

    Shell scripting
Company

Infinity Consulting Solutions, Inc

Experience

2 to 4 Year(s)

Job Description

SITE RELIABILITY ENGINEER III

Plano, TX

6+ month contract to hire

The Position: GTI-Application Software Engineering (ASE) is focused on developing and delivering services that integrate software solutions with infrastructure in innovative, cost effective and efficient ways.

The ASE team provides services for use by business-aligned application delivery organizations, across all layers of the software stack, which may include application interface services, productivity and collaboration tools, and data integration solutions.

We are looking for Site Reliability Engineer (SRE) who runs, maintains and improves the service/product against established Service Level Objectives by applying software engineering practices.

SRE is Responsible for the availability, performance, change management, monitoring, and capacity management of their services

What You'll Do:

Designs, develops, tests and delivers the software to automate manual operational work

Troubleshoots priority incidents, conducts blameless post-mortems and ensures permanent closure of the incidents

Engages with development team throughout the life cycle to help develop software for reliability

Applies analytics on the past data like incidents and usage patterns for predicting issues and takes proactive actions

Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc.

Designs and conducts the performance tests, identifies the bottlenecks, opportunities for optimization and the capacity demand

Defines and drives adoption of a best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting

Deploys the software and product upgrades

Adds value to team delivery and works with team to complete tasks to high quality and actively learns new skills

Facilitates maximum speed of delivery by objectively binding to error budgets of the service

Manages the effort split between manual operational work and engineering work

Be part of the 24x7 support coverage as needed

Coaches other team members and manages teams as needed

Skills/Experience You'll Need:

Bachelor's degree (or equivalent experience) in Computer Science/Engineering

2+ years of experience in developing enterprise software and proficiency in multiple technologies
preferably Java, Python, Shell scripting

2+ years of experience in performance engineering and monitoring using tools such as AppDynamics,
Splunk, Apica, Jmeter and Blaze meter etc.

1+ years of incident resolution experience in an large scale operations environment

Experience with configuration Management tools like Ansible/Puppet/Chef/Powershell

Proven ability to understand and troubleshoot complex problems under pressure

Experience working in an Agile Development environment.

Experience/knowledge administering application servers, web servers, and databases (Tomcat,
WebSphere, Nginx, Microsoft IIS, Oracle, MySQL, etc.)

Experience with private and public cloud environments is a plus




Others
Information Technology

No Preference
FullTime Job
Other
1

Candidate Requirements
-
Bachelors

Walkin Information
-
10/23/2019
-

Recruiter Details
Doug Klares
1350 Broadway, Suite 2205, NEW YORK-10018, NY
-