Tech Roots » Seng Lin Shee http://blogs.ancestry.com/techroots Ancestry.com Tech Roots Blogs Wed, 29 Apr 2015 16:23:51 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 Lesson Learned: Sharing Code With Git Submodulehttp://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/ http://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/#comments Thu, 26 Feb 2015 03:28:31 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=2954 You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any… Read more

The post Lesson Learned: Sharing Code With Git Submodule appeared first on Tech Roots.

]]>
You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any project that makes use of such scenarios, irrespective of programming languages.

Recently, my team had issues with working with submodules. These ranged from changes in project structure to abrupt change in using tools and commands when working on projects that involve submodules.  In the industry, there are opinions that consider Git submoduling as an anti-pattern, where the ideal solution is to reference shared code only as precompiled packages (e.g. NuGet, Nexus etc).

This post is a short reflection on how we can restrict ourselves to only certain scenarios and how best to use projects utilizing submodules daily.

When should you use submodules?

  • When you want to share code with multiple projects
  • When you want to work on the code in the submodule at the same time as you work on the code in the main project

In a different light, this blog highlights several scenarios where Git submodule will actually make your project management a living nightmare.

Why use submodules?

To solve the following problems:

  • When a new person starts working on a project, you’ll  want to avoid having this person find out the individual shared reference repositories that need to be cloned. The main repository should be self-contained.
  • In a CI (continuous integration) environment, plainly hooking up shared reference Git repositories as material pulls would be detrimental as there is no coupling of versions between modules. Any modification in a repository would trigger the dependent CI pipeline; hence possibly causing a pipeline to be blocked if there is a breaking change.
  • Allow independent development of projects. For example, say both projects, ConsumerProject1 and ConsumerProject2 which depend on a SharedProject can be worked on without worrying about breaking changes that would affect the pipeline status (which may block development and deployment of the separate project/services).

How should submodules be restricted?

We found that the best way to prevent complexity from creeping in to this methodology is to do the following:

  • Avoid nested submodule structures, meaning that submodules containing other submodules, which may share the same submodules as the others, thus creating duplicates. Thus, the parent repository would NEVER be a shared project.
  • Depending on the development environment (i.e. Visual Studio) submodules should only be worked on when opened through the solution file of the parent repository. This is to ensure consistent relative path works across other parent repositories which consume the same submodules.
  • Submodules should always be added to the root of the parent repository (for consistency).
  • The parent  repository would be responsible for satisfying the dependency requirements of submodules by linking necessary submodules together, similar to the responsibility of an IoC (Inversion of Control) container (i.e. Unity).

What are the main Git commands when working with submodules?

  • When importing a repository that contain submodules:
    git clone --recursive <url>
  • When pulling in latest change for a repository (e.g. parent project) that contain submodules:
    git pull
    git submodule update --init –recursive

    The ‘update’ command is used to update the contents of the submodule folders after the first git pull updates the commit references of the submodules in the parent project (yes, it’s weird)

  • When you want to update all submodules for a repository to their latest commit
    git submodule update --init --remote --recursive
  • By default, the above submodule update commands will result in your submodules being in a detached state. Before you begin work, create a branch to track all changes.
    git checkout -b <branchname>
    git submodule foreach 'git checkout -b <branchname>'

 

So, how do you use Git submodules? What best practices do you use to keep Git modules well-managed and within a certain amount of complexity? Do share your experience and feedback here in the comment section below.

The post Lesson Learned: Sharing Code With Git Submodule appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/lesson-learned-sharing-code-with-git-submodule/feed/ 0
Big Data for Developers at Ancestryhttp://blogs.ancestry.com/techroots/big-data-for-developers-at-ancestry/ http://blogs.ancestry.com/techroots/big-data-for-developers-at-ancestry/#comments Thu, 25 Sep 2014 22:59:00 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=2800 Big Data has been all the craze. Business, marketing and project managers like it because they can plot out trends to make decisions. To us developers, Big Data is just a bunch of logs.  In this blog post, I would like to point out that Big Data (or logs with context) can be leveraged by… Read more

The post Big Data for Developers at Ancestry appeared first on Tech Roots.

]]>
Big Data has been all the craze. Business, marketing and project managers like it because they can plot out trends to make decisions. To us developers, Big Data is just a bunch of logs.  In this blog post, I would like to point out that Big Data (or logs with context) can be leveraged by development teams to understand how our APIs are used.

Developers have implemented logging for a very long time. There are transaction logs, error logs, access logs and more. So, how has logging changed today? Big Data is not all that different from logging. In fact, I would consider Big Data logs as logs with context. Context allows you to do perform interesting things with the data. Now, we can correlate user activity with what’s happening in the system.

A Different Type of Log

So, what are logs? Logs are record of events, and frequently created in the case of applications with very little user interaction. It goes without saying that many logs are transaction logs or error logs.

However, there is a difference between forensics and business logs. Big Data is normally associated with events, actions and behaviors of users when using the system.  Examples include records of purchases, which are linked to a user profile and spanned across time. We call these business logs.  Data and business analysis would love to get a hold on this data; run some machine learning algorithms and finally predict the outcome of a certain decision to improve user experience.

Now back to the developer. How does Big Data help us? On our end, we can utilize forensics logs. Logs get more interesting and helpful if we can combine records from multiple sources. Imagine; hooking in and correlating IIS logs, method logs and performance counters together.

Big Data for Monitoring and Forensics

I would like to advocate that Big Data can and should be leveraged by web service developers to:

  1. Better understanding the system and improve performance of critical paths
  2. Investigate failure trends which might lead to errors or exacerbate current issues.

Logs can include:

  1. Method calls (including context of call – user login, ip address, parameter values, return values etc.)
  2. Execution time of method
  3. Chain of calls (e.g. method names, server names etc.)
    This can be used to trace where method calls originate

With the various data being logged for every single call, it is important that the logging system is able to hold and process huge volume of data. Big Data has to be handled on a whole different scale. The screenshots below are charts from Kibana. Please refer here to find out how to set up data collection and dashboard display using this suite of open source tools.

Example Usage

Based on the decision as to what kind of monitoring is required, the relevant information (e.g. context, method latency, class/method names) should be included in Big Data logs.

Detecting Problematic Dependencies

Plotting time spent in classes of incoming and outgoing components provides us with visibility into the proportion amount of time spent in each layer of the service. The plot below revealed that the service was spending more and more time in a particular component; thus warranting an investigation.

Time in Classes

Discovering Faulty Queries

Logging all exceptions, together with the appropriate error messages and details, allows the developers to determine the circumstances under which a method would fail. The plot below shows that MySql Exceptions started occurring at 17:30. Due to the team including parameters within logs, we were able to determine that invalid queries were used (typos and syntax errors).

Exceptions

Determine Traffic Pattern

Tapping into the IP address of incoming request reveals very interest traffic patterns. In the example below, the graph indicates a spike in traffic. However, upon closer look, this graph shows that spike spanned across ALL countries. This concludes that this spike in traffic is not due to user behavior and this leads to further investigation other possible causes (e.g., DOS attacks, simultaneous updates for mobile apps, error in logs etc.) In this case, we found out it was a false positive; repeated reads in log forwarders through the logging infrastructure.

Country Traffic With Indicator

Determine Faulty Dependents (as opposed to dependencies)

Big Data log generations can be enhanced to include IDs to track the chain of service calls from clients through to the various services in the system. The first column below indicates that traffic from the iOS mobile app passes through the External API gateway before reaching our service. Other columns indicate different flows, thus allowing developers enough information to detect and isolate problems to different systems if needed.

Event Flows

Tracking Progression Through Various Services

Ancestry.com has implemented a Big Data framework across all services to support call tracking across different services. This helps developers (who are knowledgeable on the underlying architecture) to debug whenever a scenario doesn’t work as expected. The graph below depicts different methods being exercised across different services, where each color refers to a single scenario. Such data provides full visibility on the interaction amongst different services across the organization.

Test Tracking

Summary

Forensic logs can be harnessed and used with Big Data tools and framework to greatly improve the effectiveness of development teams. By combining various views (such as the examples above) into a single dashboard, we are able to provide developers with a health snapshot of the system at any time in order to determine failures or to improve architectural designs.

By leveraging Big Data for forensics logging, we, as developers are able to determine faults and reproduce errors messages without the conventional debugging tools. We have full visibility into the various processes in the system (assuming we have sufficient logs). Gone were the days when we need to instrument code on LIVE boxes because the issue only occurs in the LIVE environment.

All of these work are done independently of the Business Analysts and are in fact, very crucial to the agility of the team to quickly react to issues and to continuously improve the system.

Do your developers use Big Data as part of daily development and maintenance of web services? What would you add to increase visibility in the system and to reduce bug-detection time?

The post Big Data for Developers at Ancestry appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/big-data-for-developers-at-ancestry/feed/ 2
Migrating From TFS to Git-based Repositories (II)http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-ii/ http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-ii/#comments Fri, 08 Aug 2014 20:38:59 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=2579 Previously, I wrote about why Git-based repositories have become popular and why TFS users ought to migrate to Git. In this article, I would like to take a stab at providing a quick guide for longtime TFS / Visual Studio users to quickly ramp up on the knowledge required to work on Git-based repositories. This… Read more

The post Migrating From TFS to Git-based Repositories (II) appeared first on Tech Roots.

]]>
Previously, I wrote about why Git-based repositories have become popular and why TFS users ought to migrate to Git.

In this article, I would like to take a stab at providing a quick guide for longtime TFS / Visual Studio users to quickly ramp up on the knowledge required to work on Git-based repositories. This article will try to present Git usage based on the perspective of a TFS user. Of course, there may be some Git-only concepts, but I will try my best to lower the learning curve for the reader.

I do not intend to thoroughly explore the basic Git concepts. There are very good tutorials out there with amazing visualizations (e.g. see Git tutorial). However, this is more like a no-frills quick guide for no-messing-around people to quickly get something done in a Git-based world (Am I hitting a nerve yet? :P ).

Visual Studio has done a good job abstracting the complex commands behind the scenes, though I would highly recommend going through the nitty-gritty details of each Git command if you are vested in using Git for the long term.

For this tutorial, I only require that you have one of the following installed:

  1. Visual Studio 2013
  2. Visual Studio 2012 with Visual Studio Tools for Git

Remapping your TFS-trained brain to a Git-based one

Let’s compare the different approaches between TFS and Git.

TFS Terminology

Tfs flow

  1. You will start your work on a TFS solution by synchronizing the repository to your local folder.
  2. Every time you modify a file, you will check out that file.
  3. Checking-in a file commits the change to the central repository; hence, it requires all contributors who are working on that file to ensure that conflicts have been resolved.
  4. The one thing to note is that TFS keeps track of the entire content of files, rather than the changes made to the contents.
  5. Additionally, versioning and branching requires developers to obtain special rights to the TFS project repository.

Git Terminology

Git flow

If you are part of the core contributor group (left part of diagram):

  1. Git, on the other hand, introduces the concept of a local repository. Each local repository represents a standalone repository that allows the contributor to continue working, even when there is no network connection.
  2. The contributor is able to commit work to the local repository and create branches based on the last snapshot taken from the remote repository.
  3. When the contributor is ready to push the changes to the main repository, a sync is performed (pull, followed by a push). If conflicts do occur, a fetch and a merge are performed which requires the contributor to resolve conflicts.
  4. Following conflict resolution, a commit is performed against the local repository and then finally a sync back to the remote repository.
  5. The image above excludes the branching concept. You can read more about it here.

If you are an interested third party who wants to contribute (right part of diagram):

  1. The selling point of Git is the ability for external users (who have read-only access) to contribute (with control from registered contributors).
  2. Anyone who has read-only access is able to set up a Personal Project within the Git service and fork the repository.
  3. Within this project, the external contributor has full access to modify any files. This Personal Project also has a remote repository and local repository component. Once ready, the helpful contributor may launch a pull request against the contributors of the main Team Project (see above).
  4. With Git, unregistered contributors are able to get involved and contribute to the main Team Project without risking breaking the main project.
  5. There can be as many personal projects forking from any repositories as needed.
  6. *It should be noted that any projects (be it Personal or Team Projects) can step up to be the main working project in the event that the other projects disappear/lose popularity. Welcome to the wild world of open source development.

Guide to Your Very First Git Project

Outlined below are the steps you will take to make your very first check-in to a Git repository. This walkthrough assumes you are new to Git but have been using Visual Studio + TFS for a period of time.

Start from the very top and make your way to the bottom by trying out different approaches based on your situation and scenario. These approaches are the fork and pull (blue) and the shared repository (green) models. I intentionally present the feature branching model (yellow) (which I am not elaborating in this article) to show the similarities. You can read about these collaborative development models here.

Feel free to click on any particular step to learn more about it in detail.


Git Guide

Migrating from TFS to Git-based Repository

  1. Create a new folder for your repository. TFS and Git temp files do not play nicely with each other. The following would initialize the folder with the relevant files for Visual Studio projects and solutions (namely .ignore and .attribute files).
    Git step 1
  2. Copy your TFS project over to this folder.
  3. I would advise running the following command to remove “READ ONLY” flags for all files in the folder (this is automatically set by TFS when files are not checked out).
    >attrib -R /S /D *
  4. Click on Changes.
    Git step 2
  5. You will notice the generate files (.gitattributes & .gitignore). For now, you want to add all the files that you have just added. Click Add All under the Untracked Files drop down menu.
    Git step 3
  6. Then, click the Unsynced Commits.
  7. Enter the URL of the newly created remote repository. This URL is obtained from the Git service when creating a new project repository.
    Git step 4
  8. You will be greeted with the following prompt:
    Git step 5
  9. You will then see Visual Studio solution files listed in the view. If you do not have solution files, then unfortunately, you will have to rely on tools such as the Git command line or other visualization tools.

Back to index

Clone Repository onto Your Local Machine

  1. Within Visual Studio, in the Team Explorer bar,
    Git step 6

    1. Click the Connect to Team Projects.
    2. Clone a repository by entering the URL of the remote repository.
    3. Provide a path for your local repository files.
  2. You will then see Visual Studio solution files listed in the view. If you do not have solution files, then unfortunately, you will have to rely on tools such as the Git command line or other visualization tools.

Back to index

Commit the Changes

  1.  All basic operations (Create, Remove, Update, Delete, Rename, Move, Undelete, Undo, View History) are identical to the ones used with TFS.
  2. Please note that committing the changes DOES NOT affect the remote repository. This only saves the changes to your local repository.
  3. After committing, you will see the following notification:
    Git step 7

Back to index

Pushing to the Remote Repository

  1. Committing does not store the changes to the remote repository. You will need to push the change.
  2. Sync refers to the sequential steps of pulling (to ensure both local and remote repositories have the same base changes) and pushing from/to the remote repository.
    Git step 8

Back to index

Pulling from the Remote Repository and Conflict Resolution

  1. If no other contributors have added a change that conflicts with your change, you are good to go. Otherwise, the following message will appear:
    Git step 9
  2. Click on Resolve the Conflict link to bring up the Resolve Conflict page. This is similar to conflict resolution in TFS. Click on Merge to bring up the Merge Window.
    Git step 10
  3. Once you are done merging, hit the Accept Merge button.
    Git step 11
  4. Merging creates new changes on top of your existing changes to match the base change in the remote repository. Click Commit Merge, followed by Commit in order to commit this change to your local repository.
    Git step 12
  5. Now, you can finally click Sync.
    Git step 13
  6. If you see the following message, you have completed the “check-in” to your remote repository.
    Git step 14

Back to index

Forking a Repository

  1. Forking is needed when developers have restricted (read-only) access to a repository.  See Workflow Strategies in Stash for more information.
  2. One thing to note is that forking is essentially server-side cloning. You can fork any repository provided you have read access. This allows anyone to contribute and share changes with the community.
  3. There are two ways to ensure your fork stays current within the remote repository:
    1. Git services such as Stash have features that automatically sync the forked repository with the original repository.
    2. Manually syncing with the remote server
  4. You are probably wondering what the difference is between branching and forking. Here is a good answer to that question. One simple answer is that you have to be a registered collaborator in order to make a branch or pull/push an existing branch.
  5. Each Git service has its own way of creating a fork. The feature will be available when you have selected the right repository project and a branch to fork. Here are the references for GitHub and Stash respectively.
  6. Once you have forked a repository, you will have your own personal URL for the newly created/cloned repository.

Back to index

Submit a Pull Request

  1. Pull requests are useful to notify project maintainers about changes in your fork, which you want integrated into the main branch.
  2. A pull request initiates the process to merge each change from the contributors’ branch to the main branch after approval.
  3. Depending on the Git service, a pull request provides a means to conduct code reviews amongst stakeholders.
  4. Most Git services will be able to trigger a pull request within the branch view of the repository. Please read these sites for specific instructions for BitBucket, GitHub and Stash.
  5. A pull request can only be approved if there are no conflicts with the targeted branch. Otherwise, the repository will provide specific instructions to merge changes from the main repository back to your working repository.

Back to index

Summary

Git is a newer approach to version control and has been very successful with open source projects as well as with development teams who adopt the open source methodology. There are benefits for both Git and TFS repositories. Some projects may not be suitable candidates for adopting Git, whereas some are appropriate. These factors include team size, team dynamics, project cadence and requirements.

What are your thoughts about when Git should be the preferred version control for a project? What is the best approach for lowering the learning curve for long-term TFS users? How was your (or your team’s) experience in migrating towards full Git adoption? Did it work out? What Git tools do you use to improve Git-related tasks? Please share your experience in the comment section below.

 

The post Migrating From TFS to Git-based Repositories (II) appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-ii/feed/ 2
Migrating From TFS to Git-based Repositories (Part I)http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-part-i/ http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-part-i/#comments Tue, 29 Apr 2014 17:42:32 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=2317 Git, a distributed revision control and source code management system has been making waves for years, and many software houses have been slowly adopting this system as not only their source code repository, but also as a way software development projects are managed. There is much debate about using either a centralized or distributed revision… Read more

The post Migrating From TFS to Git-based Repositories (Part I) appeared first on Tech Roots.

]]>
Git, a distributed revision control and source code management system has been making waves for years, and many software houses have been slowly adopting this system as not only their source code repository, but also as a way software development projects are managed.

There is much debate about using either a centralized or distributed revision control system, so I am not going to delve into promoting one system over the other. What I hope to do is shed some light on the rationale and concepts of Git with long-time Team Foundation Server (TFS) users. I would also like to provide a mini tutorial on how to quickly get started on your recently migrated Git projects. The following blog article is my opinion and experience of Git and does not necessarily reflect the position of the company.

Today’s blog article is a summary of what I have learned about Git and the rationale to migrate away from TFS; targeted to long time TFS users.

What is Git?

Git is a distributed revision control and source code management (SCM) system. Note that the keyword here is distributed. The main difference from central revision control systems like TFS, is that there can be no one main repository. Anyone can fork a branch (assuming it’s public) into his/her personal repository and anyone can participate in improving anybody’s branch. Git allows local (on your machine) repositories to be disconnected from the network completely.

If you would like to explore the different repository installations, check out the MSDN blog discussion here and a heated discussion on Stack Overflow here. My two cents: TFS is still viable for projects that require tight integration with Visual Studio and Microsoft products. In these scenarios, TFS functions as more than just a source code repository system.

I find that the following talk by Linus Torvalds explains the true concepts on why Git operates the way it is, and why it is architected so.

Not bothered to watch the whole thing? Summary: Git is architected for an open (as in open source) and distributed (a network of developers) development process.

How Git improves on TFS

I’m approaching Git as a long time TFS user. The following paragraphs assumes that you have sufficient experience in TFS.

Feature work branching

Have you ever found it frustrating to work on multiple features at the same time, and juggle to provide a shelveset that is as minimal as possible, in order to make code reviews easier? Have you ever copied all files in the repository in a different folder just to make sure that the baseline repository is not buggy in the first place?  Do you squirm when another team member checks in before you do; which results in you spending a chunk of your time syncing it back in order to check it in?

Git provides an easy way to work on different work items separately by creating branches within your repository.  Git doesn’t store data as a series of changesets or deltas, but instead as a series of snapshots. When merging, you do not need to make sure you have all the previous changes in the targeted branch.

All of this can be done without duplicating your project folder and without connecting to the network.

Offline work

Have you ever been forced to wait for slow network connections in order to check out files? What if the network connection goes down? What if you are working away from office and do not have a stable internet connection?

Within a standard workflow, you would always work off a local copy of the repository (known as the local branch). However, apart from that, there is a concept of a remote branch, that represents the state of branches on your remote repositories. Your changes would not be made available to everyone until you ‘push’ your changes back to the remote repository. So, yes. Git still uses a central repository as a communication hub for all developers.

A remote branch represents a snapshot in time of the remote repository on the remote server. With a remote branch snapshot (fetched) locally, you can always fork various copies of that branch to begin different feature work without any connection to the repository server.

Open Collaboration

TFS is tightly integrated with LDAP/Windows Authentication and makes use of this to provide secured access to the source code. However, this hinders wide spread participation from potential coders due to access requirements for checking out files and submitting changes. There is no easy way to fully utilize the capability of the source code repository (code review / change tracking) unless you have certain level of permissions to the project repository.

Git encourages an open source development model, whereby multiple people can fork/duplicate projects easily. If the main branch is only maintained by a single individual, any contributor can propose changes by forking a branch, make the modification and then submitting a ‘pull request’. This essentially means “pulling in a change” or putting the changeset under consideration by the project administrator.

Web Based Code Review

Web Interfaces such as GitHub and Stash have made it much easier to manage projects, pull requests (shelvesets for considerations) and even code reviews. These web interfaces provides a great channel for communication among developers and even observers (this can include managers and product owners) to even participate in the review process; without even using any development tools.

Backup

If the project has many developers, each clone copy of the project on a developer’s machine is effectively a backup. Git intrinsically saves an entire history of snapshots within your local repository. TFS on the other hand saves only the latest synced version of your files.

When not to consider Git

Git it a totally different beast and has different concepts from TFS. It requires some retraining in order to fully utilize the capability of Git.

If you have a small team and do not benefit from any of the above features that I presented above, then there is no significant benefit of migrating away from TFS. In fact, you would lose productive hours and even days learning new syntax and flows, just for the sake of using a new technology.

At the end of the day, it depends on what workflow suits your development projects and if the organization wants centralized control.

Summary

Git is a distributed revision control system that advocates collaboration at a wide scale and supports offline development work. Its killer feature is branching and merging of projects with ease and speed. This style of code management is ideal for parallel development work used by open source projects.

In Part II, I present a set of Git commands analogous to the way you work with TFS projects, in order to get you started right away with Git.

The post Migrating From TFS to Git-based Repositories (Part I) appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/migrating-from-tfs-to-git-based-repositories-part-i/feed/ 0
Featured Article: Migration to Continuous Delivery at Ancestry.comhttp://blogs.ancestry.com/techroots/featured-article-migration-to-continuous-delivery-at-ancestry-com/ http://blogs.ancestry.com/techroots/featured-article-migration-to-continuous-delivery-at-ancestry-com/#comments Sat, 07 Dec 2013 00:50:57 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=1601 Starting with the adoption of Agile development practices, Ancestry.com has progressed to a continuous delivery model to enable code release whenever the business requires it. Transitioning from large, weekly or bi-weekly software rollouts to smaller, incremental updates has allowed Ancestry.com to increase responsiveness and deliver new features to customers more quickly. Ancestry.com has come a… Read more

The post Featured Article: Migration to Continuous Delivery at Ancestry.com appeared first on Tech Roots.

]]>
Starting with the adoption of Agile development practices, Ancestry.com has progressed to a continuous delivery model to enable code release whenever the business requires it. Transitioning from large, weekly or bi-weekly software rollouts to smaller, incremental updates has allowed Ancestry.com to increase responsiveness and deliver new features to customers more quickly. Ancestry.com has come a long way in regards to developing a continuous delivery model and will continue to evolve to further adapt to the fast changing pace of the market.

The lessons learned from our efforts in building a  continuous delivery model have been featured in TechTarget’s SearchSoftwareQuality online magazine. You can view our photo story here.

The post Featured Article: Migration to Continuous Delivery at Ancestry.com appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/featured-article-migration-to-continuous-delivery-at-ancestry-com/feed/ 0
Exposing APIs to Your Clientshttp://blogs.ancestry.com/techroots/exposing-apis-to-your-clients/ http://blogs.ancestry.com/techroots/exposing-apis-to-your-clients/#comments Wed, 27 Nov 2013 18:55:55 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=1583 So, you want to share your super awesome system with the world. You have it all figured out. You implemented it as a web service, and you have exposed the necessary APIs as HTTP endpoints. Your hope is that people will start to leverage those endpoints and begin to build awe-inspiring apps that will further… Read more

The post Exposing APIs to Your Clients appeared first on Tech Roots.

]]>
So, you want to share your super awesome system with the world. You have it all figured out. You implemented it as a web service, and you have exposed the necessary APIs as HTTP endpoints. Your hope is that people will start to leverage those endpoints and begin to build awe-inspiring apps that will further increase the value of your web service API.

Of course, there are decisions you need to make, such as protocols to use, security concerns, documentations, tutorials, support and the whole shebang.

There is one contentious decision that you as a developer have to make: Should a client library be provided for developers to use your web service? Or is exposing an endpoint sufficient for developers to start building applications?

We can go on for hours debating the pros and cons for providing and omitting client libraries. The argument is very similar to Amazon proving that REST doesn’t matter for Cloud APIs. There are standards that we can follow and that are backed by an army of proponents. However, as with most of my blog articles, I came to the conclusion that the decision depends on

  1. How the APIs are used.
  2. How you want to govern the APIs.
  3. Who uses it?

In support of Client Libraries

Popular APIs include Google, Flickr, YouTube, Twitter and Amazon as they provide developer SDKs that make it really easy to implement these APIs in application across a myriad of platforms and languages. The motivation for this is to actually provide a lower learning curve to developers to get onboard the API platform.  Evidently, the underlying API calls which have been abstracted are pretty complex.

Another argument for client libraries would be to provide a control point; a programmable interface that provides a logical abstraction to represent the actions that will be performed. A client library acts as a proxy between the application and the actual endpoint. By mediating traffic between the application and the endpoint, the proxy is able to address cross cutting concerns such as analytics collection, traffic control and even access rights. As Sam Ramji pointed out, API clients are necessary for the sustained growth of web and cloud API usage.

Without a client library or an SDK, a developer is required to provide extra logic to communicate with a service endpoint. There is no forcing function for a standard or common way to access the endpoint (unless documented explicitly). This becomes extremely critical in an enterprise that is comprised of a multitude of web services that as a whole represent the entire company asset. The absence of a framework allows developers the freedom to implement his or her own way of addressing the same concerns; leading to a plethora of different coding styles. The existence of an SDK allows the service to establish a common programming pattern and model that are part of the nature and identity of the service.

Argument against Client Libraries

If different systems require a different client library or SDK, then a  client that uses a multitude of services would eventually be overwhelmed with the sheer number of different models and programming patterns (e.g. imagine forcing a developer to use Unity just for instantiating an object, when the developer has no interest in using IoC pattern).

If an API endpoint is simple, with straightforward message formats, a common tool/library could be built to provide a communication mechanism for accessing and parsing messages to and from API endpoints.

Also, SDKs and client libraries are usually maintained by the endpoint developers. Resources have to be dedicated to support the development toolset (aka SDKs/client libraries) every time a new iteration of service is rolled out. The team needs to evaluate if such a commitment can be maintained over a long period of time. Furthermore, this maintenance may include updating SDKs/client libraries across multiple languages and environments.

Following the above reason, from the perspective of the client, each library coded for a service is written by someone else. A great trust is needed to rely on the skills and experience of the library developer. It is also possible that the client libraries may not even be properly maintained, updated or tested.

Conclusion

The current trend is to architect your system into tiers, so that you can distinguish business logic, processing and storage implementations. This calls for abstraction of services behind service client APIs, or the usage of SDKs. However, this should not always be the de facto solution for abstraction services. Over engineering only complicates matters, and sometimes the simplest solution is the most viable answer.

Using all of the above arguments, I hope you are able to make the right decision for your service, which would also attract the right developer for your project.

The post Exposing APIs to Your Clients appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/exposing-apis-to-your-clients/feed/ 4
Untangling Authentication and Authorizationhttp://blogs.ancestry.com/techroots/untangling-authentication-and-authorization/ http://blogs.ancestry.com/techroots/untangling-authentication-and-authorization/#comments Fri, 27 Sep 2013 19:40:39 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=1239 When designing web service APIs, a decision has to be made to protect the usage of such APIs. If you are working within a protected firewall, and you trust every single user or machine on the network, this article does not apply to you – you are in API heaven. For the rest of us,… Read more

The post Untangling Authentication and Authorization appeared first on Tech Roots.

]]>
When designing web service APIs, a decision has to be made to protect the usage of such APIs. If you are working within a protected firewall, and you trust every single user or machine on the network, this article does not apply to you – you are in API heaven.

For the rest of us, please read on. This post is meant to provide a sufficient overview, or answer questions that have been puzzling fellow developers and architects who are new to this field. I use Federated Identity as an example of an authentication mechanism, and OAuth, a standard for web service authorizations.

Within the latest Ancestry App for the new iOS 7 operating system, we introduced a new way of building family trees by integrating with Facebook. The integration allows users to identify themselves via Facebook (authentication) and provides permissions for Ancestry.com to retrieve limited Facebook information (authorization). These distinctions are crucial to fellow developers, as well as program/project managers in order to avoid over-engineering any user flows within an application.

API Security

A common misconception is that security is viewed as one single solution and implementation. In fact, the security work can mostly be categorized into two phases:

  1. Authentication identifies the participant of the session. You have probably used Windows, Forms and Basic (username/password) Authentication; these are protocols to verify your identity to the web service. WS-Federation is one such protocol that facilitates Single Sign-On (SSO) and brokering of identity between federation partners. This allows an individual to use a resource in a partner company if there is a trust established between the individual’s company and the partner company.
  2. Authorization deals with access rights of the identity that is associated with the current session or context of the request. For example, Microsoft SharePoint uses role-based authorization, whereas WS-Federation utilizes claim-based authorization to evaluate the rights of a user session. The popular authorization method for web service APIs today revolves around OAuth, which allows third-party applications to perform operations on behalf of users.

By separating out the responsibilities, interesting applications and scenarios can arise when different parties (internally/externally) innovate on the different responsibilities. For example, Google is a major identity provider that authenticates the user; however, any application can re-utilize Google logins without the need to require users to create new account credentials for the new application/website. Essentially, you are delegating authentication to a trusted third-party. More details on this to come in the next section.

Authentication

Sears.com's social login experience

Sears.com allows you to login via Facebook, Google or Yahoo! without the need to create a separate, new account at sears.com.

What is Federated Identity?

Let’s skip standard authentication mechanisms such as Basic, Forms, Windows and Certificate authentications. These mechanisms are apparent when logging into browser-based applications. Native applications may also utilize these mechanisms when prompting for credentials, as the backend API calls may require similar authentication schemes.

Federated Identity is a mechanism to link up an individual’s identity and attributes across different identity management systems. Examples of such applications include foursquare.com and sears.com (implementation may differ). Other non-obvious examples would be when you try logging in to outsourced applications such as insurance, paystub and 401(k) management sites through your corporate firewall. These are perfect examples of applications from different systems that utilize identity information from a completely separate identity repository.

With Federated Identity, after selecting your ‘home’ system (or automatically identified), you may proceed to authenticate using conventional mechanisms such as Basic (aka username/password), Forms, Windows, Certificate or other protocols on your ‘home’ system. A ‘trust’ is established between the ‘home’ system and the ‘application’ system, which is utilized by the user. This allows the application system to identify and authorize the user session.

The advantage of federated system is that the website/application can easily extend to support different identity providers that use the same type of federation protocol (e.g. WS-Federation, SAML-P, OpenID etc).

When is Federation Used?

Like all features, you would need to assess the protocol before adopting it as part of your security strategy. A Federated Identity system opens up your system to users who would like to use their existing credentials to use your application (single single-on). A resource protected behind a federation system could perform authorization based on information from the identity provider.

Similar to what is written in a Layer7 blog, utilizing social logins dramatically improves the login experience of the user on a mobile device. This is especially true when an integration library exists that makes use of stored credentials in the system. Note that social login also encompasses a small portion of delegation (covered in the OAuth topic below).

The core questions that should be asked regarding this kind of implementation include:

  1. Is this the user experience that is desired?
  2. How does this affect external developers that are developing against your application APIs?
  3. How easy is it to establish a trust between an identity provider and the resource (web services/API endpoints)?
  4. Does this meet your security requirements?

It should also be pointed out that there are similar identification mechanisms which are not federation protocols, but rather delegation protocols, i.e. Facebook Login.

Authorization

What is OAuth?

One should not confuse OAuth with other Federation Identity protocols. OAuth is an authorization protocol and is not primarily used to identify a user. OAuth provides a simple way to verify the access level of a request for a web (service). It provides a mechanism for application users to delegate access to a third-party (application backend services, which will perform actions in the background) to work on behalf of the user.

The confusion arises when OAuth is used as a social login by many sites. For example, you can login to sites that use social logins via Facebook and LinkedIn. What’s happening on the backend is a call back to the Facebook/LinkedIn endpoint, using a pre-established access token, to retrieve specific identity/profile information. This endpoint is not part of the protocol, but actually a regular API endpoint established by the individual corporations.

Ancestry iOS application Facebook Login experience

The new Ancestry iOS application utilizes Facebook Login to provide seamless login to the application, and to provide delegated access to friends and family information from the user’s Facebook Profile.

As we have already established, OAuth is an authorization protocol, so it makes sense that OAuth can be combined with any authentication protocols as well. While Vittorio Bertocci blogs about the inadequacy of OAuth as a Federated Identity protocol, there are Federation protocols that utilize OAuth. OAuth alone does not provide the mechanism for a user to login through different identity providers (Yahoo!, Google, Microsoft etc.).

When is OAuth Used?

As more scenarios require the unobtrusive behavior of applications to perform in the background, there is a need for a protocol that should not frequently prompt a user to enter credentials (due to password expiring or simply an application reboot).  Such applications include desktop widgets, mobile applications and background daemons running on servers.

A delegation mechanism is required, which allows the application to perform tasks on behalf of the user, without compromising the full privacy of the user. An example includes the user posting to a social network. Your service/application can exploit this to further increase fame and notoriety in public.

Secondly, the delegation mechanism of OAuth allows you to open your API endpoints to third-party developers, while still providing confidence to your end user base that the third-party developers’ scope is restricted.

OAuth provides the end user with control over specific actions the application can actually perform, by approving a given set of permissions.

Conclusion

The complexity of implementing security, and the relevancy to the business goals should not be overlooked. If there is a business model that harnesses an open API, then it is necessary a strategy exists to create/manage a community of developers, and to maintain relevant and up-to-date documentation.

If it makes sense to allow anyone access to a website via any identity provider (Google, Yahoo!, Microsoft, Facebook), then a plan for some kind of Federated Identity should be in the works.

As different sites use different protocols to perform authorization/authorization, or even both at the same time, implementation becomes more complex and more confusing. Couple this with the effort to support multiple social logins, it’s no wonder you get lost in the protocol web. Let the dominant standard prevail!

Last but not least, it is imperative that the user experience is not bogged down due to the many standards in authentication and authorization protocol. Identify the standard patterns in the marketplace and incorporate the protocol with the most intuitive user experience. Have fun!

The post Untangling Authentication and Authorization appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/untangling-authentication-and-authorization/feed/ 2
Acceptance Testing at Ancestry.comhttp://blogs.ancestry.com/techroots/acceptance-testing-at-ancestry-com/ http://blogs.ancestry.com/techroots/acceptance-testing-at-ancestry-com/#comments Tue, 02 Apr 2013 15:31:11 +0000 Seng Lin Shee http://blogs.ancestry.com/techroots/?p=143 What Are Acceptance Tests? Many developers are confused with the jargon used by test and software engineers when developing tests. Even test developers (TE/SET/SDET) are confused with these terms. In general, test suites occur in the following varieties: Unit tests Integration tests End-to-end tests To add to the confusion, there are: Functional tests Acceptance tests… Read more

The post Acceptance Testing at Ancestry.com appeared first on Tech Roots.

]]>
What Are Acceptance Tests?

Many developers are confused with the jargon used by test and software engineers when developing tests. Even test developers (TE/SET/SDET) are confused with these terms.

In general, test suites occur in the following varieties:

  • Unit tests
  • Integration tests
  • End-to-end tests

To add to the confusion, there are:

  • Functional tests
  • Acceptance tests

We understand that it is good practice to run a product against a set of Acceptance Test suites before releasing code, or before a feature can be considered complete.  As our team performs Test-driven development (TDD) and the Behavioral-driven Development (BDD), developers will need to create an acceptance test first, and are initially confused on what the tests should look like.

I refer you to Jonas Bandi’s blog. Jonas provided a good explanation of what an acceptance test is.

AcceptanceVsIntegrationTests

Acceptance Tests with relation to unit, integration and end-to-end tests

To summarize, an Acceptance Test is a way to represent a test case; a test case that meets the acceptance criteria of a story.  An Acceptance Test is basically the contract between the product owner and the developers, when feature development starts in a development life cycle (be it Sprint or Waterfall model).  A story in this context refers to a feature set that product management wants delivered to the customer.

So, the question would be: What would be the best approach to write acceptance tests?

My answer would be: It depends…

To clarify my answer, let me provide 3 scenarios. You see, unit tests, integration tests and end-to-end tests can all be written as Acceptance Tests; depending on what the feature is.

  • A unit test may be appropriate if we are delivering something which is simple, doesn’t require complex dependencies and strict behavior between interfaces.
  • Integration tests are commonly utilized when the acceptance criteria are restricted to interactions between well-known modules.
  • Most of the time, features are shipped based on its usability and the ability to perform a certain task or function. This would involve user scenarios that may involve multiple feature sets to work together to achieve that wanted result. This is when an end-to-end test would be handy. Such tests are normally the closest to real world scenarios.

The key point here is that the feature specification should be known beforehand. If there is an unknown, either that has to be fleshed out, or the story needs to be sliced into smaller pieces and addressed independently.

Integration of Acceptance Tests into Development Cycle

Be it the Waterfall or Agile model, Acceptance Tests are an integral part of the TDD and the BDD methodology applied in a software development process.

The Ancestry API team uses Agile for their development process. In order to fully benefit from Acceptance Testing, the act of designing and writing has to be part of the Sprint cycle. This means that everyone (stake-holders, dependent parties, developers, testers etc.) has to participate in a grooming meeting to vet out the requirements of a proposed feature which will be developed.

The grooming/planning meeting for vetting out the acceptance criteria, which will in turn be used to construct the acceptance test; provides an avenue for the customer/product owner to convey the requirements and expected scenarios to the development team. The important concept is that there should be a two-way communication between the development team and product owner (e.g. stake holders, program managers etc.). If there are any ambiguities during the conversation (related to implementation, dependencies, testability etc.), those have to be fleshed out before a story can be considered READY for development to start.

Catching Problems in Specification Phase

The READINESS of the story is critical because unknown variables may impact the velocity of the team. Problems such as incorrect architecture may throw up the entire sprint cycle and render the implementation unusable.

Our team has benefited greatly from this approach when unknowns were identified by upstream dependencies. This resulted in a conclusion that there is not enough features and data provided by library methods in order to implement the required scenarios. As a result, the story was considered not ready and was sent back to the drawing board. This saved the team plenty of time compared to not detecting the problem and much effort being wasted in designing the wrong implementation which does not meet the requirement.

Acceptance Criteria to Test Code

The development of the acceptance test is an integral part of the development process. One approach that we have taken is to represent an acceptance test in a form which is readable to the product owner. In our case, we utilize Gherkin in .NET via the SpecFlow framework.

Benefits of using the SpecFlow framework:

  • Reusability of GIVEN, WHEN and THEN statements
  • Isolation of responsibility statement makes execution and validation easier in the test
  • Regular expressions in statement definition allows flexibility in reusing statements for various scenarios
  • Good integration with Visual Studio (debugging etc.)

An Acceptance Test would ideally be a direct translation from an acceptance criteria define by the stake holders of the feature. There is a melting of skill sets between stake holders and development during this stage. Stake-holders would write acceptance criteria in the Gherkin form, being aware of:

  • Initial states of the scenario (GIVEN statements)
  • Feature to implement (WHEN statements)
  • Expected outcome (THEN statements)

What results is a specification that ensures:

  • Development team is building the feature correctly
  • Feature team is building the right feature
  • Product is testable – which allows automation
  • All dependencies and ambiguities have been fleshed out

The development team would rewrite this in SpecFlow, automate the tests and introduce the new acceptance tests into the test suite. When all acceptance tests pass, the feature would be considered completed in terms of meeting the requirements of stake holders.

Is That The Only Testing Needed For a Feature?

Of course not. There are plenty of other tests to ensure the quality of the product; a topic which is not the focus of this article. However, acceptance testing drives the behavioral-driven design approach of the development process and is critical in ensuring a healthy communication channel between stake holders and the development team.

Future Talks

I will be giving a presentation on Continuous Delivery at Ancestry.com on June 5, 2013 at Caesars Palace, Las Vegas, NV during the Better Software & AGILE Development Conference WEST.

AGILEWEST2013

The post Acceptance Testing at Ancestry.com appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/acceptance-testing-at-ancestry-com/feed/ 1