We are urgently looking for
Data Engineer for our Direct client requirement
TITLE: Data Engineer (Spark, Flink, Scala Engineer)
LOCATION: Irvine, CA
DURATION: 6+ Months
Rate: DOE
Job Description:
Role Purpose:
We are looking for Scala Engineers with experience with batch and/or streaming jobs. We utilize Spark for batch jobs and Flink for real-time streaming jobs. Experience with Hadoop, Hive, AWS S3 is also an asset.
Major Responsibilities:
- Create new, and maintain existing, Spark jobs written is Scala
- Create new, and maintain existing, Flink jobs written in Scala
- Produce unit and system tests for all code
- Participate in design discussions to improve our existing frameworks
- Define scalable calculation logic for interactive and batch use cases
- Interact with infrastructure and data teams to produce complex analysis across data
Background, Experience& Qualifications:
- A minimum of 2 years of experience with Scala and/or Java
- Required experience with Hadoop, Spark
- Knowledge and experience with cloud-based technologies
- Experience in batch or real-time data streaming
- Ability to dynamically adapt to conventional big-data frameworks and open source tools if project demands
- Knowledge of design strategies for developing scalable, resilient, always-on data lake
- Strong development/automation skills
- Must be very comfortable with reading and writing Scala code
- An aptitude for analytical problem solving
- Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
- Good understanding/knowledge of HDFS architecture and various components such as Job Tracker, Task Tracker, Name Node, Data Node, HDFS high availability (HA) and Map Reduce programming paradigm.
- Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features
- Experience in developing Spark Applications using Spark RDD, Spark-SQL, Spark -Yarn, Spark Mlib and Data frame APIs
- Experience with real-time data processing and streaming techniques using Spark streaming and Kafka, moving data in and out HDFS and RDBMS.
- Familiarity with open source configuration management and development tools
Other:
- Hands on experience and production use of Hadoop/Cassandra, Spark, Flink and other distributed technologies would be a plus
- Other Technologies
- Scalatest
- Gradle/Maven
- Airflow
- SQL
- AWS
- Bachelor’s Degree required