Staff Software Engineer, Data Platform – Near Real Time Data Collection

  • The Walt Disney Company
  • Bristol, CT, USA
  • May 18, 2019
Full-time Agile AWS Cloud Computer Science Data Collection DevOps Hadoop Kafka Machine Learning Spark

Job Description

We have created a new Big Data Platforms group within Disney Direct-To-Consumer and International (DTCI) technology with the skills, drive, and passion to innovate, create, and succeed in enabling a Direct to Consumer Strategy for ESPN, Disney and ABC products. We are here to disrupt and start a cultural revolution that can lead a revolution in the application of data and analytics across The Walt Disney Company, focused on Content Personalization/Recommendation, Deep User Understanding, and Audience Segmentation for Linear to Digital Ad Sales, and Analytics. We need an experienced Staff Software Engineer who drive multiple data initiatives applying innovative architecture that can scale in the cloud. We are looking for a creative and talented individual who loves to design a scalable platform which scale at peta-byte level and extract value from both structured and unstructured real-time data. Specifically, we are looking for a technology leader to build a highly scalable and extensible Big Data platform which enables real time collection, storage, modeling, and analysis of massive data sets from numerous channels. You will also build a self-serve machine learning (and deep learning) pipeline for multiple data scientists develop, test, deploy, a/b test models on top of the data platform you are responsible for. You must be self-driven to continuously evaluate new technologies, innovate and deliver solutions for business-critical applications with little to no oversight from management team.

The internet-scale platforms that you design and build will be a core assets in the delivering the highest quality content to over 150MM+ consumers on monthly basis. This is an opportunity to fundamentally evolve how DTCI delivers content and monetizes our audiences.

Job Summary:

  • Build cool things – Build software across our entire cutting-edge data platform, including near real-time data collection, processing, storage, and serving real-time analytics and other real-time use cases such as recommender systems & segmentation.
  • Lead and coach – Mentor other sr. engineers by developing re-usable frameworks. Review design and code produced by other engineers working across the organization
  • Harness curiosity – Change how we think, act, and utilize our data by performing exploratory and quantitative analytics, data mining, and discovery.
  • Innovate and inspire – Think of new ways to help make our data platform more scalable, resilient and reliable and then work across our team to put your ideas into action.
  • Think at scale – Lead the transformation of a peta-byte scale batch based processing platform to a near real-time streaming platform using technologies such as Apache Kafka, Cassandra, Spark and other open source frameworks.
  • Have pride – Ensure performance isn’t our weakness by implementing and refining robust data processing, REST services, RPC (in an out of HTTP), and caching technologies.
  • Grow with us – Help us stay ahead of the curve by working closely with data architects, stream processing specialists, API developers, our DevOps team, and analysts to design systems which can scale elastically in ways which make other groups jealous.
  • ML First – Provide expert level advice to data scientists, data engineers, and operations to deliver high quality analytics via machine learning and deep learning via data pipelines and APIs.
  • Build and Support – Embrace the DevOps mentality to build, deploy and support applications in cloud with minimal help from other teams.

Basic Qualifications:

  • Not your first rodeo – Have 10+ years of experience developing with a mix of languages (Java, Scala etc.) and open source frameworks to implement data collection, processing, and serving technologies in near-time basis.
  • Data and API ninja –You are also very handy with big data framework such as Hadoop & Apache Spark, No-SQL systems such as Cassandra or DynamoDB, Streaming technologies such as Apache Kafka; Understand reactive programming and dependency injection such as Spring to develop REST services.
  • Have a technology toolbox – Hands on experience with newer technologies relevant to the data space such as Spark, Kafka, Apache Druid (or any other OLAP databases).
  • Cloud First - Plenty of experience with developing and deploying in a cloud native environment preferably AWS cloud.
  • Embrace ML – Work with data scientists to operationalize machine learning models and build apps to make use of power of machine learning.
  • Problem solver – Enjoy new and meaningful technology or business challenges which require you to think and respond quickly.
  • Passion and creativity – Are passionate about data, technology, & creative innovation.

Preferred Qualifications:

  • Prior experience building internet scale platforms – handling Peta- byte scale data, operationalizing clusters with hundreds of compute nodes in cloud environment.
  • Prior experience in building real-time data collection infrastructure including client SDKs will be a huge plus.
  • Experience in operationalizing Machine Learning workflows to scale will be a huge plus as well.
  • Experience with Content Personalization/Recommendation, Audience Segmentation for Linear to Digital Ad Sales, and/or Analytics
  • Experience with open source such as Spring, Hadoop, Spark, Kafka, Druid, Pilosa and Yarn/Kubernetes.
  • Experience in working with Data Scientists to operationalize machine learning models.
  • Proficiency with agile development methodologies shipping features every two weeks. It would be awesome if you have a robust portfolio on Github and / or open source contributions you are proud to share.

Required Education:

  • Bachelor’s degree or better in Computer Science or a related technical field or equivalent job experience.

Preferred Education:

  • Masters in Computer Science or similar is preferred.

 

Job ID

667465BR