Tech Roots http://blogs.ancestry.com/techroots Ancestry.com Tech Roots Blogs Wed, 29 Apr 2015 16:23:51 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 2015 Hack Days at Ancestryhttp://blogs.ancestry.com/techroots/february-2015-hack-days/ http://blogs.ancestry.com/techroots/february-2015-hack-days/#comments Wed, 29 Apr 2015 16:23:51 +0000 Christopher Bradford http://blogs.ancestry.com/techroots/?p=2984 Several years ago, we introduced FedEx Day at Ancestry: a 24-hour hackathon to build something fun & innovative, work with people other than your everyday team, and learn new technologies and skills. Participation is voluntary and we noticed that the number of people participating was starting to decline. We gathered feedback from the teams and made… Read more

The post 2015 Hack Days at Ancestry appeared first on Tech Roots.

]]>
Several years ago, we introduced FedEx Day at Ancestry: a 24-hour hackathon to build something fun & innovative, work with people other than your everyday team, and learn new technologies and skills. Participation is voluntary and we noticed that the number of people participating was starting to decline. We gathered feedback from the teams and made a few changes in February of this year in response.

First, the name “FedEx Day”, as noted in the post linked above, came from the report of Atlassian‘s 24-hour hackathon in Daniel Pink‘s book, Drive – essentially asking “What can you deliver in 24 hours?” Atlassian has since renamed their event “ShipIt Days”, and we have also changed the name of our event to “Hack Days”. Our format has shifted from a single 24-hour period to 2 full business days with our showcase on the afternoon of the second day. Teams are still welcome to stay as long as they like overnight, but we had learned that for many people, the 24-hour format was a deterrent. Another important point of feedback was that there were essentially two kinds of projects people wanted to work on: “just for fun” projects that may not have any connection to our business, and product ideas that they would like to see included in our product offerings. (It’s remarkable how many of our software developers are really passionate about our customers and the service Ancestry offers!) So we decided to create two corresponding prize categories, with fun awards (board games, trophies, bobbleheads, etc.) for the first and cash awards for the second.

So, what was the outcome of these changes? We had our highest participation ever, with over 30 teams showcasing their work (we even had to extend our showcase by half an hour to accommodate everyone). We also noticed that the quality of the projects was very high, with far fewer crashes, bugs, and “it worked on my machine 15 minutes ago!” (we think this may be because people actually slept at some point during the night).

Judges for our event include senior executives, including our CEO, Tim Sullivan, who were very impressed with the great ideas and execution.

The prize-winning projects included:

“Just For Fun” awards

Bug Award: Team Automagic — Michael Russo, Jed Burgon. This was an internal tool built by IT folks to help automate the configuration and installation of custom software packages for teams.

Most Evil: Roots of Evil — Ishpeck Tedjamulia, Peter Funk, Emanuel Blanco — a dungeon-like game that pits your battle skills against those of ancestors in your family tree!

Geek Award: Team Loosely Coupled — Robert Schultz, Alex Arkhipov, Chris Bradford — a proof-of concept for cloud deployed, automatically managed microservices running Node & Zookeeper on AWS

Most Entertaining: Team Rewarders — Ramya Rengarajan, Phani Kumar Balusu, Bonnie Bingham, Jason Bramble — award badges, merchandise, and discounts for activity on the site

Product-related cash awards

Honorable Mention: Team Awesome — Danny Darais, David Graham, Dave Menninger, George Gerard, Chris Adams, Jeff Alton, Kelv Cutler, Mike Smeltzer, Jeff Lord, John Mulholland — break through “brick walls” in your research by requesting help from other users, and offer your help to others for recognition on a leaderboard

3rd Place: 1940CEnsRecs2AMT — Roy Mill, Jeff Gardner – Give a face and a story to those in the census; connecting you from census-to-picture-to-story

2nd Place: Mobile — Gary Mangum, Brian Mullen, Jon Bott, Eric Williamson, Sam Gubler, Kory Garner, Keld Sperry, Bart Whiteley, Dan Lincoln, Sophal Mok — a new mobile app experience

1st Place: Eye of Sauron — Gaurav Shetti, Gann Bierner, Alex Kudinov, Max Bolotin, Hui Zheng — A guided search experience drawing on the characteristics of our data collections to help users narrow down searches

We were very pleased with the success of our first event this year incorporating these changes. We have heard from a number of teams that their good experience with Hack Days has started to influence how they work together as a team in their day-to-day work — an added benefit! Teams find themselves really energized by sitting and working together to brainstorm and solve problems without as much regard to roles & process. Some teams are incorporating short “Hack Days”-like sessions into their sprints to swarm on solving interesting problems in their current projects.

With the success of Hack Days, our next big challenge is to figure out how to make this scale as participation continues to grow.

The post 2015 Hack Days at Ancestry appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/february-2015-hack-days/feed/ 5
Recent University of Oxford study sheds light on estimating Great Britain ethnicityhttp://blogs.ancestry.com/techroots/recent-university-of-oxford-study-sheds-light-on-estimating-great-britain-ethnicity/ http://blogs.ancestry.com/techroots/recent-university-of-oxford-study-sheds-light-on-estimating-great-britain-ethnicity/#comments Tue, 21 Apr 2015 00:10:15 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=3080 This post was co-written with Peter Carbonetto, Ph.D., Computational Biologist at AncestryDNA For every AncestryDNA customer, we estimate the ethnic origins of their ancestors from their DNA sample—what we call a “genetic ethnicity estimate.” AncestryDNA customers can currently trace their ancestral origins to specific parts of the world, including 26 regions across Europe, Africa, Asia,… Read more

The post Recent University of Oxford study sheds light on estimating Great Britain ethnicity appeared first on Tech Roots.

]]>
This post was co-written with Peter Carbonetto, Ph.D., Computational Biologist at AncestryDNA

For every AncestryDNA customer, we estimate the ethnic origins of their ancestors from their DNA sample—what we call a “genetic ethnicity estimate.” AncestryDNA customers can currently trace their ancestral origins to specific parts of the world, including 26 regions across Europe, Africa, Asia, and the Americas. Since the AncestryDNA science team is always on the lookout for ways to improve our genetic ethnicity estimates, we were excited about the appearance of a new scientific article in the journal Nature, “The fine-scale genetic structure of the British population.” How well do their findings match up with patterns of genetic ethnicity for individuals with British ancestry at AncestryDNA?

In the study, researchers at several universities, including University of Oxford, attempted to learn about historical trends from the DNA of people living in the United Kingdom today. They reconstructed a detailed portrait of the history and diversity of the British Isles from the DNA of over 2,000 people with deep roots in the UK – one of the most comprehensive collections of genetic data from the UK to date. The study reports several new discoveries, several of which are of particular interest to us at AncestryDNA. While their findings suggest the potential for more detailed ethnicity estimates for people of British ancestry, the study also illuminates some of the challenges in pinpointing British origins from DNA.

Teasing apart the geographic origins of British and continental European DNA is extremely challenging due to the complex history of Europe. Individuals with ancestors from Britain might find that their AncestryDNA ethnicity estimate has a lower proportion of “Great Britain” than they might expect. In fact, it is common for even individuals with all four of their grandparents born in Britain to have much less than 100% of their ancestry assigned to Great Britain.

Screenshot from the Great Britain ethnicity estimate content page on AncestryDNA. This diagram shows, from top to bottom, the maximum, 75th percentile, median, 25th percentile, and minimum amount of Great Britain ethnicity estimated for individuals who have four grandparents born in Great Britain.

Screenshot from the Great Britain ethnicity estimate content page on AncestryDNA. This diagram shows, from top to bottom, the maximum, 75th percentile, median, 25th percentile, and minimum amount of Great Britain ethnicity estimated for individuals who have four grandparents born in Great Britain.

Such estimates are due to the fact that there has historically been continuous, frequent genetic exchange between Great Britain and neighboring regions of Europe. In other words, people have moved to and from the British Isles a lot over the past thousand years or so, and their DNA has been shared with other groups in complicated ways. For example, we find that Europe West, Scandinavia, and other regions often appear in the ethnicity estimates of people with deep ancestry in Great Britain.

Screenshot from the Great Britain ethnicity estimate content page on AncestryDNA. This diagram shows, for individuals with all four grandparents born in Great Britain, the proportion who also have genetic ethnicity estimated from other regions. For example, almost 50% of these individuals have some estimated Scandinavian ancestry.

Screenshot from the Great Britain ethnicity estimate content page on AncestryDNA. This diagram shows, for individuals with all four grandparents born in Great Britain, the proportion who also have genetic ethnicity estimated from other regions. For example, almost 50% of these individuals have some estimated Scandinavian ancestry.

In our continued research into ethnicity estimates for individuals born in the UK, we also discovered that the proportion of DNA attributed to such continental European ancestry even varies by location in the UK. For example, we find the greatest concentration of Scandinavian ancestry in the East Midlands and Northern England, and we find higher proportions of Europe West ancestry in the South East of England.

Mean Scandinavia (left) and Europe West (right) ancestry proportions for all AncestryDNA customers born in different parts of Britain and Ireland.

Mean Scandinavia (left) and Europe West (right) ancestry proportions for all AncestryDNA customers born in different parts of Britain and Ireland.

These patterns of genetic ethnicity for individuals with British ancestry at AncestryDNA complement those in the recent study published in Nature. For example, one major finding of the Nature paper highlights the incredible genetic diversity of the British Isles: DNA from Western German, Northern Belgian, Danish and French parts of continental Europe all have contributed heavily to the DNA of individuals across the British Isles. This likely reflects past migration events into the UK, including Roman occupation, settlement of Saxons from the Danish peninsula, and the Norman invasion (see Figure 3 of the study). The Nature study reinforces the difficulty of estimating genetic ancestry in the British Isles because of its complex history.

However, the study also showed that by identifying “clusters” of individuals based on their DNA, the British Isles itself can be subdivided into different regions. Some parts of Britain, such as Wales, the Orkney Islands, and a third region that includes Scotland and Northern England were found to be highly genetically differentiated from other regions. On the other hand, a single genetically homogenous “cluster” covers most of central and southern England – meaning that further subdividing this region is difficult with genetic data alone (see the red dots in Figure 1 of the study).

These results suggest the potential to subdivide Great Britain ethnicity estimates into finer-scale regions, including Wales, Scotland and the Orkney Islands. Realizing this possibility for our AncestryDNA customers would require the right statistical tools and adequate DNA samples from the British Isles. First, we need to obtain adequate genetic data from individuals with deep ancestry in these regions. Second, more basic research is needed to translate these results to individualized ethnicity estimates. The Nature study only examined trends in the genetic data, and did not attempt to calculate ethnicity predictions for individual people. Despite these challenges, these new findings suggest the exciting potential for providing more detailed estimates of British ancestry from DNA.

In summary, the complex genetics of the modern-day British mirrors the complicated history of England and the British Isles. Thus, while the study underscores the challenges of estimating Great Britain ancestry from genetic information, at the same time it highlights potential ways to provide more detailed ethnicity estimates within Great Britain. As we continue to improve and refine our ethnicity estimates at AncestryDNA, we’ll continue to survey the latest science to enhance our customers’ discoveries of their ancestral origins.

 

 

 

The post Recent University of Oxford study sheds light on estimating Great Britain ethnicity appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/recent-university-of-oxford-study-sheds-light-on-estimating-great-britain-ethnicity/feed/ 1
The Science Behind New Ancestor Discoverieshttp://blogs.ancestry.com/techroots/the-science-behind-new-ancestor-discoveries/ http://blogs.ancestry.com/techroots/the-science-behind-new-ancestor-discoveries/#comments Thu, 02 Apr 2015 13:51:49 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=3047 At AncestryDNA, we empower our customers to uncover exciting details about their family stories.  Today, we announced a new AncestryDNA experience based on years of research and development by the AncestryDNA science team that is revolutionizing the way people discover, preserve, and share their family history. Learn more about the announcement here. We are combining DNA… Read more

The post The Science Behind New Ancestor Discoveries appeared first on Tech Roots.

]]>
At AncestryDNA, we empower our customers to uncover exciting details about their family stories.  Today, we announced a new AncestryDNA experience based on years of research and development by the AncestryDNA science team that is revolutionizing the way people discover, preserve, and share their family history. Learn more about the announcement here.

We are combining DNA testing with the power of 65 million family trees to make it faster and easier than ever to discover ancestors you never knew you had. New Ancestor Discoveries, our new patent-pending capability available with every AncestryDNA test, are a new way to discover your story – finding possible ancestors or relatives for you, even if you know nothing about your family history.

New Ancestor Discoveries provide you with a list of individuals who might be your ancestors – allowing you to build an otherwise empty family tree.

New Ancestor Discoveries provide you with a list of individuals who might be your ancestors – allowing you to build an otherwise empty family tree.

The building blocks: DNA matching and member trees

New Ancestor Discoveries build on two features that are already part of AncestryDNA.  The first building block of New Ancestor Discoveries is DNA matching: the identification of pairs of people who seem to share identical pieces of DNA with one another, and so are likely to be related.

The second building block? While a person’s New Ancestor Discoveries are not based upon any information in your own tree, they are based upon the trees of your DNA matches, or genetic relatives.

Photo two

New Ancestor Discoveries extend from an existing AncestryDNA feature known as DNA Circles™, which integrate these two building blocks.

A DNA Circle is a group of people who all claim to be descended from a particular ancestor (say, William Ogden) in their family trees.  But in addition, they all share DNA with other members in the group.  These two pieces of evidence, and some statistics behind the scenes, support that the members of the DNA Circle really are descendants of William Ogden.

DNAcircle-GRAPH only

Network representation of a DNA Circle. Each icon is an individual in the DNA Circle; an orange line between two individuals means that two people share DNA that they likely inherited from the common ancestor of the DNA Circle.

New Ancestor Discoveries: the power of DNA matching

But what about a descendant of William Ogden who doesn’t know it, and consequently isn’t in the William Ogden DNA Circle?  That is where New Ancestor Discoveries can help – by drawing his or her missing link to the DNA Circle and to William Ogden.

The key idea is that if your DNA matching shows that you are related to a significant number of an ancestor’s descendants in a DNA Circle, you too are likely a descendant – and so should receive a New Ancestor Discovery to this ancestor.

Kathleen Smith would receive a New Ancestor Discovery to this DNA Circle because she matches a large number of people in the Circle.

Kathleen Smith would receive a New Ancestor Discovery to this DNA Circle because she matches a large number of people in the Circle.

For every AncestryDNA member, we examine their DNA matching results to every DNA Circle in the database.  If it looks like they belong to a DNA Circle, they’ll get a New Ancestor Discovery.

That means that the AncestryDNA member who didn’t know she was a descendant of William Ogden might receive a New Ancestor Discovery, and jump-start her family tree.

Photo 5

The nitty gritty

The reality is that if you share DNA with members of a DNA Circle, it does not necessarily mean that you also share the DNA Circle ancestor.  You could instead have another ancestor in common with the Circle members – for example, if the Circle ancestor is the sister of your great-grandmother.  You could also share several different common ancestors with multiple members of the Circle – even if none of them are actually the ancestor of the Circle.

To provide New Ancestor Discoveries that more often suggest direct-line ancestors, we combined several pieces of information into our algorithm: the number of people in the DNA Circle with whom you share DNA, the amount of DNA shared with each DNA Circle member, the number of generations back to the ancestor for each individual in the Circle, and our confidence that you and each member of the Circle share only one common ancestor.

Then, to assess the algorithm’s performance, the science team did what we love and conducted a lot of experiments. For thousands of individuals from a variety of ethnic backgrounds with deep and full family trees, we took them out of their DNA Circles and calculated their New Ancestor Discoveries.

We found that New Ancestor Discoveries correctly identified ancestors in those individuals’ trees, or was related to the ancestor in another way, about 70% of the time. Keeping in mind that as with all statistics, there’s a tradeoff between how many discoveries we can provide to customers and how often they’re correct, this is an impressive feat: we are providing almost 1 million New Ancestor Discoveries to AncestryDNA customers. Even when New Ancestor Discoveries do not point to a direct-line ancestor, they provide a useful starting point for genealogical research.

While someone’s experience with New Ancestor Discoveries may differ depending on their family’s history and how much they already know about their family tree, our testing has convinced us that New Ancestor Discoveries are the way forward: providing a glimpse of your ancestors, or a head start in finding them, as has never before been possible.

As AncestryDNA members update their trees and as the AncestryDNA database grows, we will continue to identify new DNA matches and discover additional and larger DNA Circles.  So the most exciting part about New Ancestor Discoveries is that their impact can only increase over time – bringing the power of DNA matching integrated with family trees to even more AncestryDNA members with diverse and complex family histories.

 

The post The Science Behind New Ancestor Discoveries appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/the-science-behind-new-ancestor-discoveries/feed/ 18
Scaling Node.js in the Enterprisehttp://blogs.ancestry.com/techroots/scaling-node-js-in-the-enterprise/ http://blogs.ancestry.com/techroots/scaling-node-js-in-the-enterprise/#comments Tue, 31 Mar 2015 14:53:03 +0000 Robert Schultz http://blogs.ancestry.com/techroots/?p=2978 Last year we began an effort internally at Ancestry to determine if we could scale out Node.js within the frontend applications teams. Node.js is a platform that we felt could solve a lot of our needs as a business to build modern, scalable, distributed applications using one of our favorite languages: JavaScript. I want to… Read more

The post Scaling Node.js in the Enterprise appeared first on Tech Roots.

]]>

Last year we began an effort internally at Ancestry to determine if we could scale out Node.js within the frontend applications teams. Node.js is a platform that we felt could solve a lot of our needs as a business to build modern, scalable, distributed applications using one of our favorite languages: JavaScript. I want to outline the steps we took, the challenges we have faced and what we learned to this point after six months of officially scaling out a Node.js ecosystem within an enterprise organization.

Guild

We introduced the concept of a guild initially in Q4 of 2014 to get those who are doing anything with or who are interested in Node.js. The guild concept comes from Spotify and their agile engineering model which is a group of people who are passionate about a particular subject. In this case, we wanted to get everyone together to identify the steps we need to take to get Node.js adopted within the organization. We meet once a month and introduce topics related to Node.js which promotes a high level of transparency across the company and anyone is welcome to join and recommend topics. Once we established the guild, it was a great starting point to get passionate people in the same room.

Training

Before we began to invest in Node.js as a platform we wanted to ensure we had a consistent level of knowledge across our engineering group on building Node.js applications. We organized two training sessions for a small group of engineers both in our Provo, UT office and San Francisco offices which was lead by the awesome guys over at nearForm. The group was about 15 engineers in each session. The idea of keeping it small was that we wanted to provide a wide enough level of influence so that the individuals who were part of the training effectively start to build applications and in turn spread their own knowledge. This worked well as we had teams immediately starting to think about components that can be done in Node.js.

Interoperability

As you accumulate multiple technologies in your ecosystem you need to ensure they are all interopable. This means you need to decouple some of your systems, ensure you’re communicating over a common protocol everyone understands such as HTTP and using a good transport such as JSON. We have a lot of backend services in our infrastructure that were built with C#, so in order to support multiple technologies we needed to work with the dependent service teams to ensure we have pure REST services exposed.

We also distribute service clients via NuGet which is the standard package management system for C#, but this is not going to work for any other languages. You will need to ensure that you are building extremely thin clients with well documented API specifications. We want to treat our internal clients like we would with any external consumer of an API. This allows any platform to build on top of our backend services and allows us to prepare and scale for the future on any emerging technologies.

Monolithic Architecture

One of the biggest anti-patterns for Node.js applications are monolithic architectures. This is a pattern of the old where we build large applications that handle multiple responsibilities. The responsibilities were typically hosting the client-side components such as HTML, CSS and JavaScript, hosting an application API with many endpoints and responsibilities, managing the cache of all of the services, rendering the output of each page and so forth. This type of architecture has several problems and risks.

First, it’s extremely volatile for continuous deployment. Rolling out one feature of the application potentially can break the whole application, thus disrupting all of your customers.

It’s also extremely difficult to refactor or rewrite an application down the road if it’s all built as one big large application; having 3 or 4 separate components is easier to rebuild or throw away than 1 large one.

Last, everything should be a service. Everything. Having a large web application that is a combination of different responsibilities goes against this and they should be seperate.

As you begin to break down your monolithic applications, one recommendation is to use a good reverse proxy to route external traffic to new and separate applications while still maintaining integrity on your URI and endpoints.

Documentation

You need to document everything. We created an internal guide on anything and everything related to building Node.js applications at Ancestry. From architecture, best practices, use cases, web frameworks, supported versions, testing, deployment. Anyone within our engineering team who is interested in adopting Node.js is able to use this guide as a first step to get up and running. It ensures that we have an open and transparent model of how to setup, configure, build, test and deploy your applications. The document is an evolving document that we review often together as a group.

Define Governance

Since Node.js is so evolving, it would be wise to establish a small governance group to manage it within your organization. This group should be responsible for defining the standards, adoption of new frameworks, optimizations to architecture and so forth. Again, keep it transparent and open to provide a successful ecosystem. For example, this group decides which web application framework we use such as Express or Hapi.

Scaffolding

It’s extremely important to help engineers get started on a new platform. With technology stacks like Microsoft ASP.NET or Java Spring MVC the convention is a lot more defined. In the Node.js world, there are many different ways to do one thing so we want to make this process a bit more standardized and simple. We also want to ensure all engineers are including common functionality in their applications without having to individually add it in themselves one by one.

So we have built generators by using a tool called Yeoman. It allows you to define templates, or generators as they call them, to scaffold out new Node.js applications easily. This ensures consistency with the Node.js architecture, all common components and middleware is included, an initial set of unit tests with mocks and stubs are added, build tools are configured (such as Grunt to Gulp) and even scripting out your local hosting environment with Vagrant and Docker configuration.

Internal Modules

As your engineering teams begin to scale out efforts in Node.js, you will begin to need cross cutting functionality. One of the principals of Node.js is that it’s great at doing very lean things well. This is a core unix philosophy. In the case of Node.js it should also apply to your common functionality. The package management system for Node.js is NPM. When you build applications you’re essentially building a composite application from open source modules in the community. Today all of these are hosted on npmjs.org. But for larger companies who have security policies in place you do not want to publish your common functionality out to the public so you will need a way to host your modules internally.

Initially we went with Sinopia. It’a an open source NPM registry server that allows you to publish your modules internally to. It also acts as a proxy so that if the module isn’t hosted internally it will go fetch it from npmjs.org and cache it. This is great for hosting all of your common code as well as providing performance improvements since your build system doesn’t have to fetch the package every time.

Over time as more teams begun to publish packages we needed something that would scale better. We introduced Artifactory which provides a lot more functionality and also hosts many other package management systems such as NuGet, Maven, Docker, etc. This allows us to define granular rules around package whitelists, blacklists, aggregation of multiple package sources and more.

Ownership

Building common shared functionality across teams can be difficult to maintain. Our approach was more of an open source model. Each team has the ability to build common functionality that they need to implement a Node.js application, but they must follow a few rules to allow features, bug fixes and enhancements to go into modules. First, they have to define a clear readme.md in their git repository. Second, each module always has an active maintainer. This maintainer is listed right at the top of the readme.md and is the go to for questions or even pull requests. This allows for a flexible ownership model and transparency on these common bits of functionality. You absolutely must agree on your process as an organization for this to work.

Security

When you adopt any new platform you need to ensure security is a top concern. We’ve done this by using the helmet module which gives you a lot of protection against common web attacks like XSS and so forth which suites most of the OWASP Top 10. It’s easy enough for anyone to use and comes as Express middleware. We are also investing in authentication at our middleware layer as well.

You also want to make sure that the modules you’re using are trusted modules. Since the Node.js community is built by free and open source modules there is a risk that an engineer will use one without validating if it’s trusted or secure. We want to use only modules from trusted sources we know or who have a high level of confidence on npmjs.org. This is also where our internal npm registry comes in so that we can effectively blacklist npm modules that do not fit our criteria.

Last, ensure modules are validating your licensing model. Using a module that is MIT license is good. But as an enterprise you may have more strict requirements on other licenses. I recommend looking into off the shelf software to do this or initially investing in some open source tools. There are some npm modules that can do this for you.

DevOps

In your DevOps organization, you will need to make adjustments to support Node.js deployments most likely. A Node.js application deployment works differently than other applications but it’s actually quite simple. Here we use Chef for our provisioning of deployments, so we needed to make adjustments to our Chef recipes to add support for Node.js.

We needed to provision our servers to install Node.js, install Supervisor and install Nginx. We use this setup to gain the hiighest amount of throughput in a production environment.

Supervisor manages the Node.js process to ensure if it dies it is automatically restarted. It also manages the amount of instances of Node.js than run on the server. We take advantage of multiple cores on the server to scale both vertically and horizontally.

Nginx manages the inner process balancing of the incoming requests across the Node.js instances. Nginx is extremely efficient and is able to scale web requests really well. We prefer to use the tools that do a specific job and do it well.

If you have already used Node.js you are aware of the cluster module. The concern with using the cluster module to load balance your requests is that it’s still experimental according to the Node.js stability index. We prefer to build a long lasting model around deploying and managing Node.js instances in the case the cluster module changes it’s API or gets deprecated one day.

Community

The Node.js is a really amazing community. We leverage this community as much as we can in many ways. One way is we have reached out to others in the community to collaborate with them on how they overcame challenges with their adoption of Node.js. We’ve also brought in a few speakers to talk with our engineering group about the same topic as well as build a relationship with others in the community. For example, we’ve invited both Groupon and PayPal in to talk with our group which provided a lot of insight and you recognize than everyone has different business models but we’re after a lot of the same goals in regards to technology such as scalability, performance, security and so forth.

Envy

As we have continued to make progress and start to ship Node.js applications to production, something interesting started to happen. We’ve had other teams begin to want to do new applications in Node.js, prototype some new ideas in Node.js which effectively has started to create engineer envy. The way we want to roll an emerging platform out is through this model. If your engineering team feels there is a problem that is being solved, and it will help them be better at their job then they are much more inclined to adopt it. Happy engineers can ultimately lead to amazing products and new ideas.

Future

So what are our next steps in scaling Node.js here at Ancestry?

We’re continuing to invest in more common and cross cutting concerns. This is crucial to ensure as teams have common dependencies we get them built and in the right way. Optimizations in our architecture. Ensuring everything is exposed as a service communicating over common protocols and transports is crucial for some applications. We continue to make some more introductions to other industry leaders in the Node.js space and be more visible which is extremely helpful. More presence at Node.js meetups. We are also working to host Node.js meetups in our SF office soon.

This year we are also pushing to build our application service architecture around the microservices architecture. This includes also optimizing our application delivery platform with containerization and Docker.

Conclusion

Overall, it’s been an awesome learning experience for us but we are just getting started. But Node.js doesn’t come as free lunch and takes work. Hopefully this may help your adopt it efficiently and give you some tips. Oh, and we’re hiring!

The post Scaling Node.js in the Enterprise appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/scaling-node-js-in-the-enterprise/feed/ 3
Lesson Learned: Sharing Code With Git Submodulehttp://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/ http://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/#comments Thu, 26 Feb 2015 03:28:31 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=2954 You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any… Read more

The post Lesson Learned: Sharing Code With Git Submodule appeared first on Tech Roots.

]]>
You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any project that makes use of such scenarios, irrespective of programming languages.

Recently, my team had issues with working with submodules. These ranged from changes in project structure to abrupt change in using tools and commands when working on projects that involve submodules.  In the industry, there are opinions that consider Git submoduling as an anti-pattern, where the ideal solution is to reference shared code only as precompiled packages (e.g. NuGet, Nexus etc).

This post is a short reflection on how we can restrict ourselves to only certain scenarios and how best to use projects utilizing submodules daily.

When should you use submodules?

  • When you want to share code with multiple projects
  • When you want to work on the code in the submodule at the same time as you work on the code in the main project

In a different light, this blog highlights several scenarios where Git submodule will actually make your project management a living nightmare.

Why use submodules?

To solve the following problems:

  • When a new person starts working on a project, you’ll  want to avoid having this person find out the individual shared reference repositories that need to be cloned. The main repository should be self-contained.
  • In a CI (continuous integration) environment, plainly hooking up shared reference Git repositories as material pulls would be detrimental as there is no coupling of versions between modules. Any modification in a repository would trigger the dependent CI pipeline; hence possibly causing a pipeline to be blocked if there is a breaking change.
  • Allow independent development of projects. For example, say both projects, ConsumerProject1 and ConsumerProject2 which depend on a SharedProject can be worked on without worrying about breaking changes that would affect the pipeline status (which may block development and deployment of the separate project/services).

How should submodules be restricted?

We found that the best way to prevent complexity from creeping in to this methodology is to do the following:

  • Avoid nested submodule structures, meaning that submodules containing other submodules, which may share the same submodules as the others, thus creating duplicates. Thus, the parent repository would NEVER be a shared project.
  • Depending on the development environment (i.e. Visual Studio) submodules should only be worked on when opened through the solution file of the parent repository. This is to ensure consistent relative path works across other parent repositories which consume the same submodules.
  • Submodules should always be added to the root of the parent repository (for consistency).
  • The parent  repository would be responsible for satisfying the dependency requirements of submodules by linking necessary submodules together, similar to the responsibility of an IoC (Inversion of Control) container (i.e. Unity).

What are the main Git commands when working with submodules?

  • When importing a repository that contain submodules:
    git clone --recursive <url>
  • When pulling in latest change for a repository (e.g. parent project) that contain submodules:
    git pull
    git submodule update --init –recursive

    The ‘update’ command is used to update the contents of the submodule folders after the first git pull updates the commit references of the submodules in the parent project (yes, it’s weird)

  • When you want to update all submodules for a repository to their latest commit
    git submodule update --init --remote --recursive
  • By default, the above submodule update commands will result in your submodules being in a detached state. Before you begin work, create a branch to track all changes.
    git checkout -b <branchname>
    git submodule foreach 'git checkout -b <branchname>'

 

So, how do you use Git submodules? What best practices do you use to keep Git modules well-managed and within a certain amount of complexity? Do share your experience and feedback here in the comment section below.

The post Lesson Learned: Sharing Code With Git Submodule appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/feed/ 0
AncestryDNA Scientists Achieve Advancement in Human Genome Reconstructionhttp://blogs.ancestry.com/techroots/ancestrydna-achieves-scientific-advancement-in-human-genome-reconstruction/ http://blogs.ancestry.com/techroots/ancestrydna-achieves-scientific-advancement-in-human-genome-reconstruction/#comments Tue, 16 Dec 2014 18:22:41 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=2911 Passed down through the generations, fragments of the genomes of long-gone ancestors exist today in the genomes of their living descendants. Those fragments can actually be used to recover parts of those ancestors’ genomes – without having to resort to some more morbid techniques for obtaining their DNA.  That means a potentially easier way for… Read more

The post AncestryDNA Scientists Achieve Advancement in Human Genome Reconstruction appeared first on Tech Roots.

]]>
Passed down through the generations, fragments of the genomes of long-gone ancestors exist today in the genomes of their living descendants.

Those fragments can actually be used to recover parts of those ancestors’ genomes – without having to resort to some more morbid techniques for obtaining their DNA.  That means a potentially easier way for you to be able to trace your freckles, for example, back to a particular ancestor.

The AncestryDNA science team is excited to unveil new developments in the science of ancestral human genome reconstruction using genetic data of living people. Using an approach similar to reassembling a document that has been shredded, we have attributed an unprecedented proportion of a human genome to a 19th Century American and his two successive wives using the genome-wide genetic material of their descendants.

This scientific feat is a step forward in the use of consumer genetics in family history.

Diving deeper

Attributing pieces of DNA from living individuals to a particular ancestor requires both pedigree data in addition to genetic data – and lots of both.  AncestryDNA, with over half a million DNA samples and over 60 million family trees, was thus in a unique position to attempt it.

For all pairs of individuals in the member database, AncestryDNA determines whether they “share” DNA; that is, share a haplotype that is nearly identical likely because they likely both inherited it from a common ancestor.  Then, we leverage our database of member family trees to search for their shared ancestors.

Recently, we released a new statistical algorithm that integrates these two analyses to identify groups of likely descendants of a particular ancestor.  These groups, called DNA Circles™, are sets of individuals who all share DNA with one another and all have a particular individual as their most recent common ancestor. To date, we have identified over 150,000 DNA Circles – connecting groups of descendants of over 150,000 distant ancestors.

To prove out the task of genome reconstruction, we used DNA Circles to first identify a set of individuals with pedigree and genetic evidence to suggest that they were likely descendants of a man named David Speegle, born in the early 1800’s in Alabama.

 

Image obtained from therestorationmovement.com

Image obtained from therestorationmovement.com

Why David Speegle?  With many children between his two successive marriages to wives Nancy and Winifred, Mr. Speegle and his spouses were excellent candidates for reconstruction.  That’s because having lots of children means having a large number of living descendants potentially carrying pieces of their DNA.

Validating the statistical methods underlying DNA Circles, Ancestry family historian Crista Cowan conducted extensive genealogical research, verifying that the individuals in David Speegle’s DNA Circle were indeed his descendants.  In the process, she also discovered how most of them were related to one another genealogically.

The science team then applied two different statistical models of genetic inheritance to these identified genetic and genealogical relationships.  They were able to infer, with a high degree of confidence, which parts of David, Nancy, and Winifred’s genomes were passed down to these descendants – thereby piecing together parts of their genomes.

Developed by the science team in-house, the first method used is a computational algorithm that efficiently identifies and stitches together chunks of DNA from a set of ancestors (like David, Winifred, and Nancy) given genetic data shared among their descendants. Unlike the first method, the second approach requires a pedigree linking all descendants and uses methods specifically designed in the academic research community for inferring inheritance of DNA in family trees.

Our discoveries

Both methods showed strong agreement in identifying DNA segments that could be attributed to the Speegle family ancestors and used to piece together portions of their genomes. For example, they identified pieces of the genome indicating that potentially David Speegle or his spouses had a version of a gene attributed to a higher likelihood of having male pattern baldness. And while Mr. Speegle did not likely pass down versions of the genes for darker hair or freckles to his descendants, he did likely pass along the version of a gene needed for blue eyes.

In addition to these selected traits, for nearly half of the length of the human genome the team was able to find representation of at least one of the copies of the genomes of David, Nancy, and Winifred Speegle.  And, because David Speegle had two wives, for roughly 12% of the length of the genome the team was able to identify genomic material that likely belonged to David Speegle himself.

These proportions are remarkable considering the number of generations separating Speegle from his descendants – an average of over six generations (that’s your great-great-great-great grandparent!).  The fact that the team could reach these numbers attests to the power of AncestryDNA’s massive dataset – and its power as it continues to grow. 

Although we’re still refining our methods for reconstructing pieces of the genome of human ancestors from genetic material from their descendants, we’re excited about the implications of this research in genetic genealogy and in the genomics industry. Future insights gained may come in the form of tracing the source of particular traits in a population, reaching a better understanding of recent population history, and enabling more targeted genetic genealogy research.

The new DNA Circles experience and the genome reconstruction project are just a fraction of the AncestryDNA science team’s ongoing research efforts to further personalize findings from big data. By leveraging AncestryDNA’s continually expanding database of DNA samples paired with Ancestry family tree data, the team will continue to innovate to provide unique insights to both consumers and the scientific community — potentially even elucidating the genetic makeup of many more distant ancestors.

The post AncestryDNA Scientists Achieve Advancement in Human Genome Reconstruction appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/ancestrydna-achieves-scientific-advancement-in-human-genome-reconstruction/feed/ 10
Lessons Learned from a Monster Artisthttp://blogs.ancestry.com/techroots/lessons-learned-from-a-monster-artist/ http://blogs.ancestry.com/techroots/lessons-learned-from-a-monster-artist/#comments Wed, 19 Nov 2014 23:52:23 +0000 Dan Lawyer http://blogs.ancestry.com/techroots/?p=2895 Yes, we made monsters out of clay. If you happened to be in Midway, Utah at the very end of September you might have bumped into the Ancestry product team holding our annual product summit. About 80 of us gathered for an action packed two-day event filled with team building, strategic conversations, and a few… Read more

The post Lessons Learned from a Monster Artist appeared first on Tech Roots.

]]>
Yes, we made monsters out of clay.

monster

If you happened to be in Midway, Utah at the very end of September you might have bumped into the Ancestry product team holding our annual product summit. About 80 of us gathered for an action packed two-day event filled with team building, strategic conversations, and a few non-conventional outside-of-the-box squeeze-your-mind kind of activities aimed at keeping creativity top of mind in our work. It was thrilling to see the passion for our users and our business in the hearts and minds of this talented group.

monster 2

One of the unique and memorable activities was a session led by Howard Lyon, a professional artist. We were lucky enough to have him come in and share some thoughts on how process aids creativity. On the surface many people believe the words process and creativity are not compatible. Howard made a great case for how process is critical to creating works of art. He walked us through some fascinating insights into the processes the masters use and then shared his own process accompanied by visuals of every stage from inception to finished masterpiece.

Things got real for the team when Howard paired us up in twos, passed out modeling clay, and gave us a simplified process for how to build a monster. First we answered a series of questions like “What is the name of my monster?” “Where does my monster live?” “Is it a monster for kids or adults?” “What does it eat?”  Then we drew a small sketch of the monster based on the answers to our questions. After we each had a sketch we paired up with another team member, got our hands dirty (literally) and created some of the masterpieces below. Once we finished our monsters we experimented with the effects of lighting by taking photos of our models from different angles against a green backdrop. We then submitted our photos to our designers to add in some final effects.

At the end of our summit we took home our cool monsters plus and a renewed determination to build excellent experiences for our amazing users.

Monster 3

 

Monster 4

 

 

The post Lessons Learned from a Monster Artist appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/lessons-learned-from-a-monster-artist/feed/ 0
Monitoring progress of SOA HPC jobs programmaticallyhttp://blogs.ancestry.com/techroots/monitoring-progress-of-soa-hpc-jobs-programmatically/ http://blogs.ancestry.com/techroots/monitoring-progress-of-soa-hpc-jobs-programmatically/#comments Fri, 17 Oct 2014 14:15:27 +0000 Chad Groneman http://blogs.ancestry.com/techroots/?p=2873 Here at Ancestry.com, we currently use Microsoft’s High Performance Computing (HPC) cluster to do a variety of things.  My team has multiple things we use an HPC cluster for.  Interestingly enough, we don’t communicate with HPC exactly the same for any distinct job type.  We’re using the Service Oriented Architecture (SOA) model for two of… Read more

The post Monitoring progress of SOA HPC jobs programmatically appeared first on Tech Roots.

]]>
Here at Ancestry.com, we currently use Microsoft’s High Performance Computing (HPC) cluster to do a variety of things.  My team has multiple things we use an HPC cluster for.  Interestingly enough, we don’t communicate with HPC exactly the same for any distinct job type.  We’re using the Service Oriented Architecture (SOA) model for two of our use cases, but even those communicate differently.

Recently, I was working on a problem where I wanted our program to know exactly how many tasks in a job had completed (not just the percentage of progress), similar to what can be seen in HPC Job manager.  The code for these HPC jobs uses the BrokerClient to send tasks.  With the BrokerClient, you can “fire and forget”, which is what this solution does.  I should note that the BrokerClient can retrieve results, after the job is finished, but that wasn’t my use case.  I thought there should be a simple way to ask HPC how many tasks had completed.  It turns out that this is not as easy as you might expect, when using the SOA model.  I couldn’t find any documentation on how to do it.  I found a solution that worked for me, and I thought I’d share it.

HPC Session Request Breakdown, as shown in HPC Job Manager

HPC Session Request Breakdown, as shown in HPC Job Manager

With a BrokerClient, your link back to the HPC job comes from the Session object used to create the BrokerClient.  From a Scheduler, you can get your ISchedulerJob that corresponds with the Session by matching the ISchedulerJob.Id to the Session.Id.  My first thought was to use ISchedulerJob.GetTaskList() to retrieve the tasks and look at the task details.  It turns out that for SOA jobs, tasks do not correspond to requests.  The tasks don’t have any methods on them to indicate how many requests they’ve fulfilled, either.

My solution was found while looking at the results of the ISchedulerJob.GetCustomProperties() method.  I was surprised to find the solution there, since the MSDN documentation states that this is “application-defined properties”.

I found four name-value pairs which may be useful for knowing the state of tasks in a SOA job, with the following keys:

  • “HPC_Calculating”
  • “HPC_Caclulated”
  • “HPC_Faulted”
  • “HPC_PurgedProcessed”

I should note that some of these properties don’t exist when the job is brand new, with no requests sent to it yet.  Also, I was disappointed to find no key corresponding to the “incoming” requests, since some applications might not be able to calculate that themselves.

With that information, I was able to write code to monitor the SOA jobs.

With all that said, I should also say that our other SOA HPC use case monitors the state of the tasks, and is capable of more detailed real-time information.  We do this by creating our own ChannelFactory and channels.  By using that, the requests are not “fire and forget” – we get results back from each request individually as it completes.  We know how many outstanding requests there are, and how many have completed.  If we wanted to, we could use the same solution presented for the BrokerClient to find out how many are in the “calculating” state.

One last disclaimer:  These “Custom Properties” are not documented, but they are publicly exposed.  Microsoft could change them.  If they ever do, I hope they would consider it a breaking change, and document it.  There are no guarantees of that, so use discretion when considering this solution.

The post Monitoring progress of SOA HPC jobs programmatically appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/monitoring-progress-of-soa-hpc-jobs-programmatically/feed/ 2
2 Talks and 4 Posters in 4 Days at the ASHG Annual Meetinghttp://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/ http://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/#comments Wed, 15 Oct 2014 20:11:47 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=2865 For the AncestryDNA science team, October brings more than fall foliage and pumpkins.  It also brings us the yearly meeting of the American Society of Human Genetics (ASHG), the main conference of the year in our field. On Saturday, we’ll arrive in San Diego to join thousands of other scientists for a four day conference… Read more

The post 2 Talks and 4 Posters in 4 Days at the ASHG Annual Meeting appeared first on Tech Roots.

]]>
For the AncestryDNA science team, October brings more than fall foliage and pumpkins.  It also brings us the yearly meeting of the American Society of Human Genetics (ASHG), the main conference of the year in our field.

On Saturday, we’ll arrive in San Diego to join thousands of other scientists for a four day conference to discuss topics in genetics, exchange ideas with colleagues, listen to talks and presentations – and importantly, to give some presentations of our own.

We’re always on the lookout for ways that we can translate the latest scientific findings into future features for AncestryDNA customers.  The ASHG Annual Meeting is a chance for all of us to soak up the newest advancements in human genetics.

This year, the number and variety of presentations that we are giving at ASHG attests to the fact that AncestryDNA, too, plays a role in these advancements.

This year, we’re proud to be giving two platform presentations – only 8% of applications for platform presentations at ASHG were accepted. Keith Noto will be giving a platform talk entitled “Underdog: A Fully-Supervised Phasing Algorithm that Learns from Hundreds of Thousands of Samples and Phases in Minutes,” discussing the workings behind an impressive algorithm we’ve developed to phase genotype data extremely quickly and accurately. Yong Wang’s platform talk will reveal a few fascinating discoveries about U.S. population history from studying patterns of ethnicity and identity-by-descent among AncestryDNA customers.

We’ll also be giving a number of poster presentations.  Mathew Barber will be presenting the method behind another algorithm that we’ve developed to better identify true identical-by-descent DNA matches.  I’ll be presenting a method we’ve developed to reconstruct the genomes of ancestors from genotype data of their descendants.  Jake Byrnes will be presenting a poster with a collaborator from Stanford University about inferring sub-continental local genomic ancestry. Finally, Eunjung Han and Peter Carbonetto will each present results from previous research they conducted at the University of California, Los Angeles and the University of Chicago, respectively.

We’re looking forward to engaging in insightful dialogue about our work with the scientific community. Even if we won’t see much fall foliage in San Diego.

The post 2 Talks and 4 Posters in 4 Days at the ASHG Annual Meeting appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/feed/ 0
External APIs: To Explode, or Not to Explode, That is the Questionhttp://blogs.ancestry.com/techroots/external-apis-to-explode-or-not-to-explode-that-is-the-question/ http://blogs.ancestry.com/techroots/external-apis-to-explode-or-not-to-explode-that-is-the-question/#comments Mon, 29 Sep 2014 17:00:30 +0000 Harold Madsen http://blogs.ancestry.com/techroots/?p=2830 Shakespeare might not approve of my taking liberties with his play Hamlet, though prince Hamlet was essentially saying the same thing as I was feeling last year: To be, or not to be, that is the question— Whether ’tis Nobler in the mind to suffer The Slings and Arrows of outrageous Fortune, Or to take… Read more

The post External APIs: To Explode, or Not to Explode, That is the Question appeared first on Tech Roots.

]]>
William Shakespeare - hs-augsburg.de

William Shakespeare – hs-augsburg.de

Shakespeare might not approve of my taking liberties with his play Hamlet, though prince Hamlet was essentially saying the same thing as I was feeling last year:

To be, or not to be, that is the question—

Whether ’tis Nobler in the mind to suffer

The Slings and Arrows of outrageous Fortune,

Or to take Arms against a Sea of troubles…

Would Hamlet go on or cease from all? Yes, I may have felt just as Hamlet in the Nunnery Scene when I thought about my “sea of troubles” just one year ago. Well, maybe I’m waxing a bit too dramatic but there were real concerns on my part regarding last year’s events (oh how “smart a lash” that memory doth make!). What was that worrisome memory? Allow me now to retrace my steps to that challenging day and give you context to my soliloquy.

This story begins last fall and there are many actors on the stage of events. Yes, my story begins with our mobile app and our external API and reaches its climax when seemingly all users download their family trees all at once! Oh the misery. Let us begin our tale of woe.

Ancestry.com (the bastion of family history) has an external API that is used by our mobile apps and other strategic initiatives to share and update a person’s family tree, events, stories and photos. Our external API has been most important in our mobile efforts (11 million downloads) and in working with the popular TV series, “Who Do You Think You Are?” Our mobile team had successfully grown our mobile usage to such an extent that I began to worry it might actually tax our systems. That concern was beginning to bubble up from my subconscious the fall of last year and this leads us to that disquieting day. Last year, our iOS mobile app was promoted in the Apple App Store because of the updates we had made for the release (along with other Ancestry promotions). Those promotions led to large numbers of simultaneous family-tree downloads and it weakened the mighty knees of our backend services. We endured a week of utmost traffic and were consigned to throttle (limit usage of) our mobile users (the API team saved the day by throttling usage and thus preserving the backend services). After experiencing that calamitous week, we might well have cried, “Get thee to a nunnery!” or “O, woe is me” but we repressed such impulses.

OK, it wasn’t actually a “calamitous week”, I was just getting into my Hamlet character. Given that impact to our website was quite minimal, most of our users had a good experience. However, it was a bit frustrating for many of our mobile users – it took too long for many to successfully download their family tree to their mobile device. This really is great news that our mobile traffic has been growing. We realized that we must architect a plan to take us through the next round of application and user growth. Here’s how it happened:

That experience caused us to reconsider how we deploy our mobile apps, how our mobile apps interact with the API, how we call our backend services, how we deliver a tree download and if we should continue to aggregate our services at the API layer. Each of these areas of the company went under a review to see how we might mobile to api to servicesoptimize our systems. After holding periodic meetings, discussions and code reviews over several months a plan began to gel. Below is a list of some changes made to our systems and application:

  • Pass Through: Rather than aggregate our services at the API layer, we took the strategy of creating a “pass-through” layer back to our backend services. This put the responsibility directly on our services to further optimize their code, and in some cases, create new endpoints specifically with mobile usage in mind. This methodology also enabled our mobile teams to more effectively cache data according to their needs and Service team recommendations. More on that below.
  • Mobile Usage: As our users became more mobile we have increased traffic through our APIs from mobile devices. Last fall our mobile usage at Ancestry.com reached critical mass and put serious pressure on our services (especially during big promotions and app updates). Because mobile usage differs from the website in important ways it was time to address this in our backend services. After several meetings involving cross-functional teams, a few service calls were designed with mobile usage in mind. One of the results was that downloading your entire family tree became much faster. Downloading a tree with 10 thousand persons (and all their associated events etc.) decreased in time from several minutes to under 1 minute.
  • Cashing: Because we changed our API model to a pass-through, our mobile app could now cache data from each call at appropriate intervals thus taking pressure off of our backend services and network. This meant fewer calls (in the long run) to the external API.
  • Mobile App Optimization: One area of review was our mobile application. After the code review we theorized that our app might have put undue pressure on our services. What was the root cause? Apple has two new, interesting features:
    • Apple can automatically download and install new applications on your iPhone or iPad
    • Apple can wake up apps in the background and do tasks

When we released our app last year, we believe it was automatically downloaded by Apple (onto most Apple devices) and then, in the background, automatically downloading that user’s main tree. To be sure, this process would have happened anyway once the end-user opened the app manually (that was required for that app update) but doing it manually would have helped spread the traffic over several days rather than all at once. Of course this is just theory but we wanted to ensure it was not happening and would not happen next time.

  • User Queuing: As you know, queue is just another word for getting in a line. People get in line to buy a new iPhone or to buy tickets for a concert. That’s what we do when there are too many requests at a given moment. Anticipating high traffic from our new 6.0 mobile app (plus other site promotions at that time of year), we created a new way of throttling too-high traffic. Rather than throttling a percentage of calls to our API (making it hard for any-one user to successfully download their family tree to their mobile device) we created a system called User Queuing which allowed a certain # of users into our system at one time. By allowing X number of users into our systems for 10 minutes of uninterrupted usage ensured each would have a pristine experience. This would also protect our backend services from being overloaded as well. We could adjust on the fly how many users were allowed through our API at any one moment. Thus more individuals would have a better experience and the others would be invited to return in a few minutes. We would only turn-on User Queuing if too many users made requests at the same moment.
  • Load Tests: To ensure our systems and new service calls could handle beyond-expected peak calls we ran them through a gauntlet of load tests. These series of tests ensured we had proper capacity.

Now, once our app was approved by Apple, we could have immediately released our app but there were things to consider. Here is how we timed the successful release:

  • We received permission from Apple to release our app in the app store the day before the Apple promotion – thus helping us take some of the steam off of the release.
  • We decided to release at a time of day when we anticipated traffic would be somewhat low.
  • We decided to release when our engineers and database administrators were all available in case we needed to react quickly and also to monitor traffic.

Finally, the day arrived and we were ready. All hands on deck. User Queuing ready to trigger. There was great excitement and nerves. How would our systems hold out? Which internal system might buckle under pressure or show up with a previously undiscovered bug? How long after the launch would we need to kick in User Queuing and how many users would be temporarily turned away by the queue? Did we have enough servers, or memory or database throughput? On the other hand, we had tested our code so well, how could it fail? There was much excitement in the air.

All engineers were readied…and…the button was pushed to release our new mobile app!

Did it all collapse? Were there cascading failures? Was the load too much to bear? Did everything explode?

Nope!

Nothing happened, OK, it seemed like nothing. The load gradually increased over the next few hours but our systems held up wonderfully. No strain, no collapse, no running low on memory, no bottlenecks. Nothing. Yes, there were a few minor bugs to fix but most customers had a great experience and it went very smoothly. team lunchThe time, effort, and planning paid off. It worked!

We were so happy – and relieved. We had done our job. In the coming days several teams went to lunch to celebrate the successful execution of months of planning and work. Some of the engineers actually smiled on the day that nothing happened. Even Hamlet dropped by and asked me a question: “Didst thou not explode with a sea of troubles?” And I said, “not on your life!”

The post External APIs: To Explode, or Not to Explode, That is the Question appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/external-apis-to-explode-or-not-to-explode-that-is-the-question/feed/ 0