One of our direct client is urgently looking for a SRE DevOps Engineer @ Sunnyvale, CA
TITLE:SRE DevOps Engineer
LOCATION: Sunnyvale, CA
Duration: 6 to 12+ Months
Rate: DOE
Key Skills: Splunk, Grafana, SRE, Cloud, DevOps, Azure, Docker, KuberNetes, Java (Basic), Python (Scripting)
Description:
- You’ll sweep us off our feet if…
- Needs to be able to dig into issues on our eCommerce site and identify root cause, and experience in
- support and triage production incidents
- Creating Dashboards, Alerting and Monitoring Subject Mater Expert
- Experience with Application development and root cause analysis
- Develops Innovation strategies, processes, automation, failover experience
- Drives the execution of multiple business plans and projects
- Experience in driving high availability across multiple organizations
- Experience in putting together architecture diagrams
- Experience in managing workloads in private and public data centers
- Infrastructure experience that involves, setup, scale, and decommissioning.
- Prior cloud experience, planning and driving efficiencies
- Automation and CI/CD experience
- Application container experience using Kubernetes
- Experience with event streaming platforms like Kafka is a plus
- Experience with analytics & monitoring platform like Grafana/graphite/MMS/Splunk is a plus
You’ll make an impact by:
- Supporting java full stack backend application system components in a massively scalable, high performance, multi-tenant, international eCommerce platform with multiple micro-services deployed in cloud environment, root causing every reactive/proactive production issues.
- Leads and participates in medium- to large-scale, complex, cross-functional projects
- Partners with architects and development leads to come up with high level design to accelerateomnicustomer experience, recommending out-of-box engineering best practices.
- Pro-Actively identifies areas to drive automation/speed/innovation
- Troubleshoots business and production issues by gathering information (for example, issue, impact criticality, possible root cause); performing root cause analysis to reduce future issues; engaging support teams to assist in the resolution of issues; developing solutions; driving the development of an action plan; performing actions as designated in the plan; interpreting the results to determine furtheraction; and completing online documentation.
- Provides support to the business by responding to user questions, concerns, and issues (for example, technical feasibility, implementation strategies); researching and identifying needed solutions determining implementation designs; providing guidance regarding implications of new and enhanced systems; identifying short and long term solutions; and directing users to appropriate contacts for issues outside of associate's domain.
- Assists in providing guidance to small groups of 5 to 6 engineers, including offshore associates, for assigned Engineering projects by proving pertinent documents, directions, examples, and timeline.
- Demonstrates up-to-date expertise and applies this to the development, execution, and improvement of action plans by providing expert advice and guidance to others in the application of information and best practices; supporting and aligning efforts to meet customer and business needs; and building commitment for perspectives and rationales.
- Models compliance with company policies and procedures and supports company mission, values, andstandards of ethics and integrity by incorporating these into the development and
- implementation/Support of business plans; using the Open Door Policy; and demonstrating and assisting others with how to apply these in executing business processes and practices.
- Provides and supports the implementation of business solutions by building relationships and partnerships with key stakeholders; identifying business needs; determining and carrying out necessaryprocesses and practices; monitoring progress and results; recognizing and capitalizing on improvementopportunities; and adapting to competing demands, organizational changes, and new responsibilities.
Minimum Qualifications:
- Hands on experience debugging 5xx and 4xx
- Java/Spring and Node/Python Experience is required
- Creating database objects (tables, views, indexes)
- CI/CD experience automation and implementation experience
- Kubernetes and Docker experience is a plus
- Implement the database structure such as tables, indexes
- Reviewing and tuning the SQL scripts
- Reviewing database structure changes that provided by application developers and data modelers
- Working with application developers to tune the performance of the database
Ideal Candidate Must-Haves:
- Experience creating best in class application availability metrics and dashboards
- Managing infrastructure scale, setup and decommissioning
- Public cloud experience, Azure, GCP and Private Data Centers
- Driving P1 production incident calls, communicating up to the point & summarizing action plans foreach owners and follow-up until closure
- Ability to take right priority decision and run the operational excellence with innovative ideas, without
- much guidance/supervision
- Ability to build and run tools necessary for operational success
- Documenting SOPs for repetitive issues, building knowledge base articles for team’s benefit