Faria Engineering Manual

This manual is primarily written for developers, but contains content applicable to others in R&D.

Our R&D Philosophy

You can read Faria’s overall philosophy in ‘The Faria Way’. This section extends that philosophy with specifics for our R&D organisation.

You’re working for Faria because we believe in your talent for turning raw ideas into software that delights our customers and helps us grow. But we expect you to do one more thing:

THINK

In the current competitive landscape, it is simply not enough to turn specifications into features (though we expect you to be able to do that!). You need to think through how your work connects to others in the company, and most importantly to our customers, and plan accordingly. Some examples:

  • If you receive a CloudFlare error trying to check out a feature on one of our production servers, investigate and escalate, don’t just refresh and ignore.

  • If you run across a major customer-facing bug (one that is costing our schools, parents, or students significant distress), make sure it is treated as a priority.

  • If you’re building a feature with a new user interface, is S&S in the loop? Design? Operations? Don’t assume someone else is handling tutorials, icons, or server monitoring: ask and verify.

  • If you think a schedule is unworkable, make sure your supervisor hears your concerns, and if you’re not satisfied with their response, talk to senior technical staff, the CEO or Managing Director.

The key habit of thought is to always do the right thing. Ask yourself:

Does this contribute to helping our customers?

Can I explain how it helps them? If not, why am I doing this?

An experienced team member develops three spheres of awareness:

Customer Awareness: Knowing how our customers are affected by your work and what will improve their Faria experience.

Situational Awareness: Knowing how your own work interlocks with the work done elsewhere in the company.

Security Awareness: Knowing how your work actively protects against threats to our customer and company data.

Working with others is at the heart of this three-way awareness. Some key rules are:

  • If you are the person to do something, commit to when it will be done and how others will know. Record your commitment so that you remember it. Then execute and deliver.

  • If you need feedback from others, put a deadline on the task(“Can you get your comments on the design back to me by Tuesday EOD?”).

  • If you are asked for feedback and don’t give any, then your silence implies you approve.

  • Don’t let things sit, fester, and get lost. Know what you’re waiting for and follow up. Work may slip but everyone involved needs to know about the slippage.

  • You can always take a minute to thank people.

We expect you to make mistakes. If you never make a mistake, then you’re not pushing hard enough to do your best work. Here’s the three-point plan for dealing with mistakes:

  1. Own up to your mistake. Tell others in the company so we can adjust plans and resources.

  2. Apologise to anyone you hurt and fix the problem. Get help fixing it if you need to, but if it’s your mistake, you need to be the one driving the fix.

  3. Most critically, make a different mistake next time. If you make the same mistake over and over again, you’re not learning.

Finally, the two golden rules of R&D:

NO SURPRISES

Communicate, communicate, communicate. No one in the company should ever be surprised that a project is late, that you need a design sketch or a tutorial created, that a customer is facing a serious issue. Use Slack or pick up the phone if you have to.

NO YAKS

As technologists, we are prone to digressing into an endless series of subsidiary tasks to achieve perfection. The next thing you know, you’re shaving a yak. Our policy towards this is summed up in Seth Godin’s blog:

“The minute you start walking down a path toward a yak shaving party, it’s worth making a compromise. Doing it well now is much better than doing it perfectly later.”

We solve customer issues. We don’t shave yaks. If you’re unsure whether you have a yak in the room, talk to a co-worker or member of the Senior Technical Staff for a reality check.

How Faria is Organised

Faria is a world-wide company (actually, it’s several companies, but you don’t need to worry about that in Engineering) with a lot of moving parts. This section will give you an overview of how your job fits into the whole.

Organization Chart and Job Descriptions

The current organization chart is found in the Staff Handbook. With respect to the R&D side of things, there are seven key jobs you should understand:

CEO: The Chief Executive Officer is ultimately responsible for running the entire company. We have a very hands-on CEO who is deeply involved with all aspects of the company, including technical matters. In critical situations, the CEO will give the Senior Technical Staff assistance in making sure that customers are satisfied and deadlines are met.

CTO: The Chief Technical Officer is responsible for assisting teams in making the technical choices that ensure code quality. Plans to use a new tool or technology should be known to and usually approved by the CTO. The CTO works with the HTO (Head of Technical Operations) to standardise cross-team tools as necessary.

VP Engineering: The Vice President of Engineering is responsible for making sure that our software ships on time with high quality. The VP Engineering focuses on removing blockers, allocating resources, and encouraging communication.

VP Product: The Vice President of Product is responsible for the overall customer-facing direction of the company. VP Product knows what we are planning to ship, and why we are planning to ship it.

Design Director: The Director of Design is responsible for setting the UI & UX direction for all Faria products, and ensuring that we have an appropriate level of consistency across teams and products.

Head of Technical Operations: The HTO runs our DevOps group. Additionally, HTO is the default R&D lead for incidents, and is responsible for monitoring and ensuring our ISO 27001 compliance.

DRI: The Directly Responsible Individual is a key concept for Faria. On the organization chart, you will see that every product has a DRI. But we also believe in having a DRI for epics, features, conferences, and in fact every other workflow at Faria. You will be the DRI for parts of the work that are directly assigned to you. The DRI for an entire product is also referred to as the Principal Developer, PD.

The CTO, VP Product, and VP Engineering make up our Senior Technical Staff. They are empowered to make company-wide decisions when necessary.

Keep in mind that we are a small and rapidly-growing company. Many people wear multiple hats at the moment. Job descriptions are meant to help everyone find our key resources, not to limit the scope of employee initiative.

Product Teams and Other Teams

Product development inside of Faria is handled by individual teams: we have a ManageBac team, an OpenApply team, and so on. Each team self-organises its own work (more on that later), and Senior Technical Staff communicates on a regular basis to coordinate work across teams.

There are also other key teams with R&D that provide support to the product teams:

  • DevOps is responsible for the smooth running of the systems that present our software to the outside world, as well as for providing and maintaining internal tools and ensuring corporate security.

  • The Interop Team is responsible for software that works with more than one of our products (such as LaunchPad, our SSO solution) as well as data exchange and SSO with external partners.

  • The Design Team is responsible for providing design assets to all projects.

  • The Documentation Team is responsible for external-facing documentation

Note that some of these teams currently consist of very few people, or even a single person. We expect them to grow in the future.

Incident Management

Everyone at Faria is responsible for incident management. You are expected to be familiar with the overall Incident Management procedure in the Staff Handbook, as well as the R&D-specific Incident Management procedure. The material in this section supplements and emphasises those documents.

Pulling the Fire Alarm

Here’s the key point from the Staff Handbook: If you’re not sure whether or not some loss of functionality is an incident, treat it as an incident and escalate it. We would rather have false alarms than unhappy customers.

We offer a 99% system uptime SLA to our customers. This means:

  • No more than 3 days, 15 hours downtime in the course of a year

  • No more than 7 hours downtime in the course of a month

  • No more than 14 minutes downtime in a single day

It is imperative that we identify & remediate incidents quickly to meet our SLA. If you think you have noticed an incident, following up takes precedence over your other work.

If you only remember one thing from the incident management procedure, remember this: it is your duty to notify @operators on Slack if you think something is seriously wrong. We would much rather deal with a false alarm than have a customer hurt because you assumed someone else was fixing the problem.

Incident Management in R&D

Here’s a scenario: You happen to be on Slack on Sunday afternoon because you wanted to grab a copy of a cat GIF that was posted to your product chat room. Someone from S&S comes in and reports that the entire site is returning CloudFlare 520 screens for users in Japan for every request.

Congratulations, you’re it! Until someone else senior to you shows up, you’re managing this incident. Remember these things while you’re scrambling to look up the full procedure:

  1. Don’t panic!

  2. Acknowledge the S&S person and let them know you’ll be the R&D lead until further notice.

  3. If you feel competent to handle the incident, great! Start down the checklist. If not, grab your Emergency Contact Card and get someone else involved: your team lead, any member of the Senior Technical Staff, or the CEO or Managing Director. Ping them on Slack, Skype them, call them. If the site is down, it’s “all hands on deck” regardless of other work or timezones.

Even if you feel competent to handle the response, do let the DRI know when you get the chance. They can help provide additional resources and double-check your actions.

Scheduled Downtime Procedures

Some downtime is unavoidable. For example, we might be moving a server to another data center, or performing a migration that has to revise millions of rows of data. In cases where downtime is certain, or even likely, we follow a procedure designed to minimise customer impact.

Note: Scheduled downtime always occurs on a Saturday or major holiday (e.g. Christmas vacation, Easter break, weekend of July 4th). We also avoid scheduling downtime during regional conferences or user group conference (UGC), S&S is in charge of ensuring that there are no major conflicts.

Scheduled downtime must be coordinated between R&D (product team & DevOps) and S&S. It is critical that all teams involved have someone present & on call while the downtime is actually occurring.

Prior to any scheduled downtime, R&D must provide via Basecamp:

  • Name of R&D Lead.

  • Start time.

  • Expected duration.

  • Applications/functionality that will be affected.

  • Reason why (should be kept as broad as possible – e.g. “server maintenance” / “network configuration”).

DevOps, Product Team Lead, VP Engineering and S&S (MD or RD) should all be tagged in the Basecamp ticket.

This notice must be given at least 72 hours in advance of the planned downtime. Longer notice – up to two weeks in advance – is preferable. R&D should also try to group upcoming downtime into as few windows as possible. That is, if multiple applications or areas need maintenance we try to perform all in the same window.

Within 24 hours, a senior S&S team member (Managing Director or Regional Director) will confirm back to R&D:

  • The downtime occurs during an acceptable time (check if there is a conference occurring — i.e. we would not schedule downtime on a Saturday during a regional conference or UGC).

  • Responsible S&S person is available, i.e. not on a long flight.

  • The message for blog & Twitter, for technical review.

  • Name of Support Lead.

If S&S does not sign off explicitly on the downtime, then it cannot occur!

After the message has been approved by R&D, the blog and Twitter will be updated 24+ hours prior to the scheduled downtime by S&S

Guidelines for planning downtime:

  • The downtime duration should be as conservative as possible (i.e. if we estimate the whole process to take 1 hour, we should announce it as 90 minutes. Always add a buffer of 25-50%, so we can finish early and beat expectations).

  • To avoid confusion with timezones / daylight savings, times in Basecamp should always be posted in UTC. S&S will handle communicating local times to affected customers.

  • The reason given should allow S&S to provide a justification to any inquiring customers.

  • The preferred time of maintenance should be daylight during a low usage point at the start of the day for those who need to do the work (not middle of the night after a long day).

  • Scheduling request should be made as far in advance as possible, no less than 72 hours.

  • Any prerequisites (e.g. server builds, licenses that need to be purchased, etc.) should be included on the Basecamp ticket, so all potential risk areas that can delay or impact downtime are known in advance.

When the scheduled time arrives, Head of Technical Operations (or SRE on duty) should confirm:

  • All involved (Product Team, DevOps, S&S) have appropriate staff present & online to handle any unexpected failures

  • All involved confirm that we are “go” for downtime. Concerns from any department are enough cause to postpone/reschedule.

  • Written work plan/checklist exists for the work, including maintenance page and status page updates.

After confirming these factors, HTO or SRE on duty will give the go-ahead for the downtime to proceed.

Assuming the maintenance proceeds smoothly, HTO or SRE on duty notifies S&S when the downtime is over so they can cross-check and then post to Twitter and the blog. If there are any issues, HTO/Product representative (in consultation with Senior Technical Staff if necessary) will determine the best course of action – rollback/abort, extend maintenance, adjust plan. S&S must be kept informed so they can handle any customer queries.

If scheduled downtime becomes unduly extended (ie an hour or more past planned end time) then it should be escalated according to the unplanned downtime Incident Management procedure.

The Software & Planning Lifecycle

Several documents govern our overall software planning and delivery. These include the Software Development Policy (SDP) and Secure Software Development Life Cycle (SSDLC) documents that are part of our ISO 27001 procedures, as well as the overall standardised Faria process.

The SDP is a formalised framework that has been approved as part of our ISO 27001 certification. It is designed to provide accountability while still allowing individual product teams and developers the flexibility to do their work. The SDP later in the appendix to this manual is a copy of the SDP as it exists in our ISO 27001 document repository

The SSDLC is a formal policy under ISO 27001 that addresses the way in which security requirements are managed alongside functional requirements. Two versions of the SSDLC are presented in appendices to this manual: one that applies to the bulk of Faria products, and one that has been customised for OpenApply.

The Faria Product Management Guide helps translate the SDP and SSDLC into day-to-day actions. Within the framework of the formal processes, it provides the actions that we expect R&D staff will take to succeed.

Keep in mind that individual product teams may, with CEO/Senior Technical Staff approval, choose to modify these procedures. Any modifications to the SDP or SSDLC must be communicated back to the Information Security Steering Group so that they may be properly documented and audited.

Bug Priorities

Everyone in R&D should be aware of our bug priority scale:

Critical: Critical bugs are those that include serious monetary, functionality, or security damage, and that affect a large number of users. These are the bugs that threaten our good relationships with customers, and that have to be fixed immediately. For critical bugs, we will call people to work in the middle of the night if need be, and invoke our incident handling procedure to coordinate technical fixes and customer messaging.

Major: Major bugs affect some area of functionality, without having the serious consequences of critical bugs. A particular school having unusual report generation issues, or a menu that paints incorrectly while still being functional, are good examples of major bugs. We expect major bugs to be fixed during the sprint when they are found, but they can be triaged and scheduled along with routine feature work.

Minor: Minor bugs are annoyances with low urgency: design inconsistencies, buttons that could have better labels, and so on. Minor bugs may be triaged into future sprints so they can be cleaned up during a batch of similar bugs.

Bugs are initially classified by the person who reports them, and confirmed by Project Managers (PM) and Product Designers (PD). If you’re assigned a bug, take a moment to think about whether the classification is correct. If you think a bug is more or less critical, then it’s your responsibility to raise the issue.

Guide for Product Managers

We break product management down into five key areas:

  • Sprint planning.

  • Specification.

  • Design.

  • Implementation.

  • QA and review.

Sprint Planning

This is an overview of the Sprint planning process together with guidance. Depending on the team, sprint planning may be done by the CEO, PMs, or PDs.

  1. Identify good features. This should be the result of ongoing, frequent (every 2 weeks or less) communication between S&S, PM, and PDs.

  2. Include:

    1. Sales drivers (integrations, highly demanded features).

    2. Solutions to customer pains (eg, fixing non-intuitive UX).

    3. Fixes for future issues (product alignments, curriculum changes).

    4. R&D needs (technical debt).

  3. Avoid:

    1. Overly-complex solutions.

    2. Major system changes.

    3. R&D time drains.

    4. Solutions specific to one or a few customers.

  4. Make sure the “ingredients for success” are in place:

    1. Adequate specifications with clearly-defined scope.

    2. Photoshop / Sketch files, icons, components, wireframes.

    3. Resources needed from DevOps (IAM, servers, environments).

    4. Respect the lead time needed by other parts of the organisation.

  5. Prioritising the sprint (1 week before the sprint starts):

    1. “Cobweb items”: legacy baggage from previous releases and sprints that need to be resolved immediately. We practice “first in first out”: until these items are resolved. They come at a mental cost to S&S and R&D, so it’s imperative that we finish what’s already started before beginning new items.

    2. Bug Fixes: These should be ongoing and are high priority. The litmus test here is that by the end of every week, the Bugs list should be cleared out and close to zero. If the Bugs list chronically requires more than 10-15% of the team’s time, then the PM needs to help the team take corrective action through reducing technical debt.

    3. Design: Items that are missing specs/wireframes/design assets need to be flagged and recorded as a list for the Director of Design and CEO.

    4. Quick Wins: These should be resolved first: quick wins provide momentum and can help tide over customers with mid-sprint deploys. They give everyone good “flow” and momentum.

    5. Big Wins: These need to be attacked with focus after clearing out cobwebs, bug fixes, and quick wins. PMs should insulate and provide radio silence for R&D while these ‘Big Wins’ are in progress. New requests should be held off from review until ‘Big Wins’ ship into production – defer future sprint planning until after that happens.

  6. Account for time conservatively:

    1. Add “gut-feeling” time estimates to all tasks. PDs will guide PMs in this process.

    2. Compare the sum-total estimates to capacity (determined by the number of developers on team and working days in the sprint).

    3. Leave a 10-15% buffer for the unexpected

    4. Shift out what is over capacity, and rearrange future sprints to accommodate.

  7. Kick-off with R&D just before the start of the sprint:

    1. Confirm time estimates and work with PD to assign R&D DRIs for each feature.

    2. Create a timeline for each item, with expected dates for releases to staging.

  8. Kick-off with S&S on day 1 of the sprint:

    1. Create a memo for the items in the sprint and post to the #ss channel on Slack.

    2. Sample kick-off.

    3. Each line should show what is being done, who will do it, how long it will take and the current status.

  9. Ensure transparency and clear communication throughout the sprint:

    1. Slack:

      1. All conversation about sprint updates should happen in the #product channel.

      2. Post updates on deployments and progress to @here in #ss to keep everyone in the loop.

      3. Contact the S&S Feature Friday DRI directly with a list of deploys for the week at the end of each week.

      4. At the end of the sprint, confirm to #ss with a review of what was actually deployed.

    2. Aha!

      1. Update the status of a card as soon as change happens, so feature boards accurately reflect the status of the sprint.

      2. Attach PSDs, specs, icons, files directly to the Aha! card.

      3. Task specific communication and questions can be left as comments on the card.

      4. Comments should use @tag so DRI receives an email.

      5. For features with many sub-tasks to track, create to-do lists and comment directly on the to-dos.

    3. Basecamp (BC):

      1. Review/respond to BC posts daily at a set time, preferably in the morning. All inbound S&S requests should have a reply and acknowledgement from PMs within 24 hours. Link the response to Aha! or GitHub issues where applicable.

      2. Close tickets on completion after S&S QA confirms .

      3. If a ticket must be postponed, comment with the reason why this is going into the backlog.

Specification

Good specifications require thinking through the customer’s shared jobs-to-be-done (what they wish to accomplish as a customer group as a whole) and working backwards from that intent to design a clear, well-thought out feature that will be semi-permanent (i.e. not requiring rework and further changes 3-6 weeks hence) and to meet their business requirements optimally (i.e. implementing what is most practical to provide 80-90% of the business value).

PMs need to carefully filter and review specifications before passing to R&D, while considering the following:

1. From the individual to the whole: customer expectations are frequently focused on their needs in isolation without consideration for a broader set of customers. Since we are designing a multi-tenant system, we cannot implement custom features for a specific customer, we can only implement features to serve their shared needs. This is the first test that every specification must meet, if a feature only serves an individual customer, then it should be discarded.

2. Variance: whenever a specification is built we need to consider how the needs vary across customers, examples of this range from filtering options, text & terminology changes, display options, etc. Designing the OA Forms builder is a good example of addressing the challenge of variance, but it’s important to understand the trade-offs, not every feature should be customisable. Decisions about trade-offs need to be made with this in mind: that every feature that provides variance means another point of configuration in Settings, another point of support & training, and potentially another set of customisation requests. PMs need to make decisions to standardise logically e.g. the Model-T Ford comes in black only.

3. Clarity: the most important detail for every spec is to be clear and precise in terms of what needs to be done by R&D. This means that at the point of the specification being passed to R&D, there is no further material discussion about implementation scope or what needs to be changed. If there is post-spec discussion or flip-flopping of change requests, then this means that PMs in coordination with S&S have not thought through the requirements thoroughly.

4. Accidental Removal: when enhancing new features or releasing a new UI, PMs need to be careful not to remove features or customisation options without sufficient discussion & warning to S&S. We should strive to maintain backwards compatibility at all times, except when a new way is clearly better and will not impose a huge support cost.

5. Obsolescence: certain features will have a limited life-span and will only be used for 2-3 years. In these cases, we need to consider how to develop them flexibly, so that technical debt will be manageable when these features enter an archived state. Specifications need to consider active, archived and deleted states, how data is stored and retained 3-years later.

6. Architecture & Data Migration: back-end changes can be the most complex and require careful data migration planning. Whenever we change the data model, we must be careful to run migrations and have a rollback plan.

A good spec is composed of the following:

  1. Implementation: in this section include any documents that describe how this feature can be implemented. For example, a spec for Google Drive integration would link to the Google API documents. Include an explanation of why we are choosing this implementation.

  2. Scope: this maps out all the areas this feature will affect. Consider both the front end (what is visible to the user) and the back end (including data migrations and handling of legacy and historical data). Include a list of all pages that will need changes.

  3. Blockers: understand what will rate-limit or restrict the R&D process. Often this will be in designs, integration APIs, or 3rd-party and cross-product development.

  4. Reference & design implementation: someone has always done something before us. Study what’s out there before starting anything. Scour competitor feature boards regularly to ensure that we are developing at or above parity.

  5. User stories, wireframes & workflow diagrams: based on the scope, pick one page and illustrate end-to-end how it will work.

    1. User stories should describe the various types of users who can benefit from this feature, and what they need it to do.

    2. Wireframes offer an outline of the visual implementation.

    3. Workflow diagrams describe the step-by-step process the user goes through when using this feature.

Design

Preparing designs in the form of PSDs or wireframes together with components before the start of each sprint helps to ensure that principal developers and S&S can visually confirm what is expected to be delivered on production. Careful review early on, three-way discussion and thinking help to avoid any problems in development.

Designs function as the visual blueprints of what we plan to build, whereas the components are the Lego pieces (e.g. icons, buttons, custom UI styles, etc.)

The design process is carried out as follows:

1. Specification review: during this stage the specifications are reviewed carefully. This is usually done 1-1 in private without other distractions.

2. Sketch & wireframe: an initial sketch or wireframe is drawn. Example content values and data ranges, which are either fixed or variable, are identified. Multi-step workflows are mapped out and considered.

3. PSD: the wireframe is translated into a hi-res PSD together with the example content.

4. Joint review: between S&S, PD and Design to ensure that all the user stories are covered adequately, to ensure that the workflows are logical and as simple as possible without unnecessary steps and often will require rework as features are implemented and new issues arise.

5. User states & miscellaneous components: to ensure that we cover different user states (e.g. student vs. administrator) fully in the designs. For example, students will often only have read-only access, so ‘Edit’ & ‘Delete’ buttons are hidden. Guidance text also needs to be adapted and prepared together with link to help tutorials.

6. Zero-state, blank slates and QuickStart guides: are the screens that appear when there is no content or when a user is new to the system without having completed configuration steps. Zero-state partials are typically used for tabs, menus and limited areas, whereas blank slates guide the user with a trail towards a particular action, e.g. ‘Add New Deadline’ on a calendar with 0 deadlines. QuickStart guides are more complex blank slates that are used to provide a setup checklist or to guide the user towards a multi-step sequential process (e.g. configuring settings, updating profile, etc.)

Design work can be passed to Principal Developers in partial stages, but development work should not commence until all designs & components are in place.

Implementation

Armed with specifications and designs, and with all components/icons/etc in hand, the Product Team can implement the feature with code. The PD can choose to implement the feature, to delegate it to a particular developer, or for complex features/epics a team of developers (while always keeping in mind that each feature needs a single DRI).

Implementation isn’t complete until all cross-checks are completed, including:

  • Security review.

  • Code quality review (manual and/or automatic via Code Climate).

  • Test coverage & passing tests on CI.

  • Peer review of all pull requests before merger back into main line.

QA, Testing & Design Review

Thorough QA testing and a Design Review ensures that only well built features are released into production. This process covers the following steps:

Ensuring test case coverage: to fully cover a functional area of a system with test cases it is important that you map:

  1. All functions of the system under regular use.

  2. All functions of the system under excess strain.

  3. All edge cases.

  4. ‘Blank slate’ cases.

For example, when writing test cases for a form editor, the cases should include (mapped back to the list above):

  1. Regular input for the entire form.

  2. Excessively long input into the fields in the form.

  3. Attempts to break the system in some way (strange characters, other languages, navigating away with partial input or saving with empty input).

  4. The appearance of the form without any input.

Writing a test case: There are four parts to a good test case:

  1. Name

    1. When writing a test case name, keep it clear and concise.

    2. Functional area mapping of the case should be marked by folder navigation and sections rather than name, to keep the name short.

    3. Be considerate, other team members will have to find the case again quickly when reporting bugs and feature requests.

  2. Preconditions

    1. What accounts will the tester need to be logged into?

    2. What objects need to be created in the system to execute the test?

    3. Does the tester need any integrations or settings enabled/disabled?

    4. Clear and thorough preconditions mean quick test runs.

  3. Steps

    1. Carefully map out each step the tester needs to execute.

    2. Do not describe the expected outcomes here.

  4. Expected Results

    1. It is important that this section is as thorough as possible.

    2. List all the outcomes of the test case that should happen, and when in the process they should occur.

    3. The expected results may span multiple functional areas and multiple users. For example editing student data should change the view seen by parents, and show up on any exports.

Preparing a Test Run

  1. First, collect all necessary cases prior to testing

    1. Identify every single test case pertinent to what you are trying to test.

    2. If this does not offer full enough coverage, create more test cases for any new function.

    3. Check for outdated cases – do you need to change the steps or expected results for any test case to accommodate the new feature being put into place?

  2. Create the test run with the proper title, description, and cases

    1. Title – Reference the date of the run, and what it is testing.

    2. Description – Further specify the new feature, bug fix, or change this run will be testing.

    3. Cases – Select/specify the cases and check off all the cases you collected in step 1.

  3. Comment on BC post or Aha! card with a link to the test run.

Testing

  1. Open the test run, and select a case.

  2. Run through the case exactly as specified.

  3. Once complete, compare your results with the expected results and add a status.

    1. If all results pass, update status to “Passed” and move on to the next case.

    2. If you were unable to run through the test because of an unsatisfied precondition (for example, another feature needed for the test run has not yet been deployed), select “Blocked” and comment with the unsatisfied precondition.

    3. If the expected and actual results do not match, update status to “Failed” and comment with a clear description of what needs to change. Post to Basecamp as a bug with your explanation when necessary.

  4. As bugs are fixed and preconditions are met, you can change the status to “Retest” and continue this cycle until all cases are “Passed”.

  5. Once all cases pass, you can click the ‘log’ button to close the run, and let everyone know testing is complete.

  6. Note that it may be necessary to test twice, once on staging and once on production.

 

Design Review

We have several checks and parallel processes in design review. The goal is to ship high-quality UI and UX while avoiding bottlenecks. Our principles:

  1. All front-end developers report directly to the Director of Design. All design questions are directed at them. Generally they set aside one day a week to answer open issues in batch.

  2. Functional implementation can start as soon as wireframes/specs are ready, using prototype components when necessary. Developers can start work before the PSD is available and provide functional/implementation feedback while PSDs are still in process.

  3. Design will post monthly plans with the goal of stopping last-minute rushed PSDs and constantly jumping between products. Design tries to work in 2-week sprints with focus, i.e. 2 weeks on MB followed by 2 weeks on OA, etc.

  4. This means that all front-end developers (with back-end support as needed) must provide a minimum of 1 week and ideally 2 weeks notice for all required design work.

  5. Try to limit incremental design requests to the wireframe/prototype/spec stage.

  6. Designs will conform to our master UI style kit. Design review meetings can identify any required addition to the kit.

 

Refer to the Work, Design, Style & Conventions Guide for more information on our design process.

4. Tutorials, Communications & Marketing

A feature isn’t done until we properly communicate it. Our S&S team owns all customer-facing communication, but they need developer support to do a good job. Their products include:

  1. Blog posts (“feature Fridays”) highlighting new features in our products.

  2. In-product tutorials (ScreenSteps) showing how to use features to their best advantage.

  3. Newsletters, conferences, and other forms of one-to-many marketing.

  4. Direct contact with individual customers through in-person calls, telephone, email, UserVoice and so on.

It is important to keep S&S informed as the product takes shape and ship dates near. Be sure you communicate any changes in plan, especially those that will affect user interface or functionality.

Deployment

Each team will set their own rules on deployment, but in general:

  1. Major feature deployments require 48-hour notice to S&S and confirmation that all communications workstreams have been completed.

  2. Minor deployments (UI cleanups, bug fixes) can be pushed any time, so long as there is no disruption of regular operations.

  3. Deployments that require extensive downtime (such as large database migrations) must be handled according to the scheduled downtime procedures.

  4. We do not schedule routine deployments on Friday.

  5. Staging deployment & S&S cross-check should always precede production deployment

  6. Deployments to the demo (sandbox) environments must coordinate with S&S to be sure not to disrupt sales demos.

DevOps is responsible for providing deployment access based on team needs. We require developers to use SSH keys in GitHub for deployments.

Day to Day Tools

This section includes information on how best to use some of our common tools: git, GitHub, Basecamp, and Slack, as well as information on policies related to tools. Remember, this information is just a starting point. Individual teams may choose to modify the standard processes to handle their own unique situation, so long as they stay within the framework provided by the SDP.

Faria Developer Tools Policy

Our policy on developer tools is designed to strike a balance that enables R&D to do their best work while not imposing avoidable costs on the company as a whole. Key considerations include:

  • To the greatest extent possible, developers should be able to use the tools that make them most productive.

  • For ISO 27001 certification, we must approve all tools and evaluate their security impact as a central corporate function.

  • To keep the budget under control, we need one point of contact for all developer tools purchases.

  • Company-wide standardisation matters more for efforts that cross teams than for efforts within a single team.

  • Standardising external tools (such as transactional mail) and cross-product shared services (such as ruby versions) helps us keep DevOps lean and focused.

Within this framework:

  1. PDs may approve free tools for use within their own team. However, operations *must* still be notified if these tools are hosted outside of FEG servers so that they may be added to the “ISO 27001 Third-Party Relationships” document and their security impact evaluated.

  2. PDs may approve paid tools for use within their own team, but must get budget approval from the Head of Technical Operations before tools are purchased. All licenses must be purchased by Operations, and all tools will be licensed to Faria. Paid tools must be added to the “ISO 27001 Third-Party Relationships” document and their security impact evaluated.

  3. The Head of Technical Operations must approve all tools that are used by multiple teams, and is responsible for budgeting & evaluating security impacts for such tools. DevOps will aid in evaluating major tool categories in conjunction with PDs to move us continually towards best-of-breed tools with minimised costs.

  4. These policies apply regardless of whether tools run on external SaaS and cloud services, main FEG servers, locally-hosted servers, or developer desktops. “Bootleg” tools that access FEG data or source code are a serious violation of company policy.

Git Usage

How to Branch and Deploy

Most Faria Systems applications follow the Git Flow process, where there are two separate main branches, one for development and one for production releases.

The production branch can be found either as master or as stable. It is expected that the code tagged as the production branch is always immediately deployable.

The development branch, on the other hand, should always be named “develop”. Code on the development branch represent the upcoming release and may be ahead of the production branch.

It is expected that code on both master and develop is clean and all test cases pass. If the test suite failed then the branch should not be merged.

When and Where to Branch

There are two main kinds of branches. You should choose the type of your new branch based on the characteristics below.

  1. Hotfix branches: hotfix branches should be branched from the production branch, they contain code which should go into production as soon as possible.

  2. Feature branches: feature branches should be branched off the development branch, they contain code responsible for delivering a new feature or otherwise should be tested (perhaps through end-user testing).

How to Name Your Branch

Usually, your branch names should capture the following information:

  1. The ticketing system where the reason for change originated.

  2. The relevant ticket or ToDo Item in the ticketing system.

  3. A simple summary of the change.

Changes can originate from GitHub Issues, Basecamp ToDos, or another source. Try to capture both the name of the system and an identifier that allows you to correlate the branch back to the ticket directly within the branch name. (although a link to the relevant resource is also preferred when you write the commit message, in practice they are harder to find during bisects, so they are not sufficient.)

For example, a feature branch opened in order to complete a Basecamp To-Do item may be named “feature/bc-1048576-frobulator-revision”. In this case “bc” represents Basecamp and the issue number is captured within the branch as well.

A hotfix opened in order to fix an Airbrake warning, on the other hand, may be named “hotfix/ab-1048576-fix-dangling-fubar”. (In case of hotfixes, when capturing the best identifier for issue correlation, try to go for the issue group ID instead of individual issue IDs.)

A few commonly used abbreviations are listed below.

  • GitHub Issues: gh.

  • Basecamp: bc.

  • New Relic: nr.

  • Airbrake: ab.

How to Create Your Commits

Each commit should be a standalone unit of change. If you have changed a file back-and-forth multiple times, you may want to squash them using git rebase -interactive so the history remains pleasant for the reviewer.

Commit messages should start with a single line describing the change, then if more context is desired you can add an empty line then start documenting additional information.

For example, a small commit message looks like this:

fixes navigation bar overlapping with search bar

A larger commit message may look like this:

adds migration which pulls Current School and Prior School information out of the payload and into dedicated UUID fields

this facilitates implementation of the Authorised Scope for proper permissions control

Commit messages should pass US English or UK English spell checks and there should not be multiple “FIX” or “WIP” commits without other information. However, it is up to each principal developer whether such commits are to be accepted.

The Unit of Change in our process is at the branch level, so micro commits are fine. There is no requirement to coalesce a lot of changes into a big commit or split all changes out. You can select the best model that works for you at the branch level.

How to Test Your Branch

Every push to your repository automatically triggers the automatic test suite in your application. Currently this is done via CircleCI which you can access with your GitHub account.

Branches that trigger test failures will not be merged.

Feature branches should also undergo end-user testing, either by engaging QA or by live sessions with the product owner. This kind of testing can be done in the Staging Environment.

The product’s deployment script allows deployments to a Staging Environment. It is for you to use. To gain access to Staging Environments or to request unusual arrangements, please liaise with Operations.

How to Merge a Branch

Depending on the nature of your branch, you will usually run git flow feature finish or git flow hotfix finish.

There are several finer points to highlight.

Do not Use Fast Forwarding
Git Flow, by default, merges a feature branch with a single commit using fast forward. Although this is quite a contentious issue, you are not to merge with fast-forward enabled.

During git bisect, if all merges have been done with fast-forward enabled, the number of commits you must check goes up significantly. This slows down validation during a critical time and is not recommended for the sake of developer sanity.

If the development or production branch does not have a merge commit at its tip, it is not clear whether the branch can be safely rebased upon. If the tip of the branch points to a merge commit, it is obvious that the branch is in a finalised state, where other branches can be rebased upon it.

Rebase Before Reviews and Deployment

Before submitting your branch for code review in a Pull Request, rebase it against the latest development or production branch. This work should be done by the creator of the branch and this is not negotiable.

Creating Production Releases from the Development Branch

To cut a new production release from the development branch, it is recommended that you create a new Git Flow release and then immediately close it:

$ git flow release start `date -u +%Y%m%d%H%M%S`

#…

$ git flow release finish

This gives you a date and time for the release and is suitable for Web Applications where there is no versioning requirement.

How to Deploy Your Branch

Normally, the principal developer for a given product should handle deployments, but each Product Team is free to make their own arrangements.

All branches, no matter how inconsequential, should pass all test cases prior to deployment.

An exception can be made for true emergencies, where the fix is anticipated to definitely work and additional delay in resolution is undesirable. The principal developer of the affected product can make this overriding decision on their own authority.

Remember: the Production Application SLA we hold is 99.9%.

GitHub Usage

A few guidelines for GitHub:

  1. All company source code must be on GitHub. We do not use other server-side repositories.

  2. All GitHub users must have 2FA turned on.

  3. All developers should use signed commits.

  4. All code should be merged using GitHub pull requests, rather than pushed directly to the master or develop branches.

  5. The master and develop branches should be set as protected branches that require reviews before merging pull requests. PDs are encouraged to also require successful Code Climate and CI runs before merges are allowed.

  6. Contact DevOps if you need a new repository, configuration for an existing repository, or changes in who can access a repository.

Basecamp Usage

Our product development work is organised into two main projects on BaseCamp:

  1. Each project has a ToDo list on the @Inbox board. This list is where feature requests and other work items to be tackled as part of the normal sprint process should be entered.

  2. Each project has a ToDo list on the @Bugs board. This list is where bugs should be entered for triage and assignment.

There are some cross-project functional projects in Basecamp, eg DevOps is managed via the Technical Operations project.

Within each list, we use bold-faced TODO items to indicate steps in a kanban-style workflow. Individual PMs and PDs establish the most appropriate series of steps for each product.

Guidance for using Basecamp effectively:

  1. Any work item should be documented on BC via ToDos, Messages and Files.

  2. Deadlines may be set via Milestones. If you have completed a Milestone and related ToDos, check it off. As mentioned above, if you feel a milestone due date is unrealistic, raise this issue immediately with PM.

  3. When reviewing a complex ToDo, set a time estimate and due date on BC.

  4. If your ToDo is urgent or system-wide (e.g. WSOD, downtime, etc.), please add the prefix (URGENT) and assign to PM. Follow-up with a phone call or 1-on-1 Slack if it is critical.

  5. Ensure that your ToDos are organised in order of priority at all times in the sequence that you plan to complete them.

  6. If you post a ToDo always ensure that there is appropriate detail (e.g. if it is a bug, show the screenshot, URL, browser details, etc. or any relevant contextual information that would be required to replicate the issue).

  7. Ensure that you have assigned each ToDo to the person, who is most capable of addressing the issue.

  8. Ensure that your file formatting is consistent, especially if these are system-related files.

Remote Presence & Slack

We use Slack for day-to-day communication, these are some ground rules and helpful points of guidance for effective remote presence.

Things you can do to build positive FLOW:

  • Say “good morning” or “evening”, send greetings frequently to your colleagues and use emoticons to build friendly rapport.

  • Post an update of what you have done or what you plan to do today or this week (giving your colleague a heads up on a deploy, confirming a priority, sharing information, etc. will always be good)

  • Seek clarification in the relevant channel and request feedback from a DRI / request an action to be done by a specific individual. Before seeing clarification, batch your questions and post them in a numbered format (organising them into a bulk list) will avoid subsequent interruptions and provide full resolution more quickly.

  • Clarify and settle an issue to a conclusion in as few back & forth dialogues as possible, linking to the relevant Basecamp item or URL and posting the final confirmation (e.g. Deployed to production to let S&S know).

  • Know when to pick up the phone (or Skype) and call someone directly (this is most relevant when there are 50-60 row chats that can be settled in a 5-min phone call). If you feel blood pressure rising, always pick up the phone and call.

  • Share thoughts or ideas, interesting things you have read in #General or funny things in #random.

  • Post in advance that you are going off Slack for focused quiet time / AFK (away from keyboard), etc. and will not be accessible, then go offline. If you feel a discussion is going to risk overflowing, then let the other person know you have X time left.

  • Confirm a scheduled meeting 1-2 hours in advance to your counterpart. They will appreciate the friendly reminder.

Things you may do inadvertently, but which are bad for FLOW:

  • Leave your computer in the middle of a discussion with someone without saying AFK or BRB. This is bad because the other person doesn’t know you have left. The lack of response will create anxiety and uncertainty causing confusion & stress that can be avoidable through simple communication. If you are on-call make sure your status is apparent (at keyboard, on phone, etc.).

  • Type in ALL CAPS. This may be accidental.

  • Post about a customer issue or describe a problem without providing a relevant link to full context.

  • @ certain individuals, who have notifications turned on, at odd hours resulting in a discussion continuing (or starting in the middle of the night) when there is no urgency. You can adjust your own notification preferences to avoid being woken up.

  • Hide your status, if you are not green colleagues will not know if you are online.

  • Message others 1-to-1 about things that should be in the product channels (if you have a private discussion with someone that relates to product delivery it results in duplicative communication to PM & other colleagues on the product team, this is bad for everyone because it creates communication silos).

  • Make snarky or rude comments, or fail to acknowledge someone’s question or final comment within a few minutes. Ways to avoid this are to say “sorry” and apologise for delays or unavailability.

Things you should never do (unless there are exceptional circumstances i.e. reputation / armageddon risk to the team or company):

  • Post a non-critical issue on Slack before having posted it to Basecamp with full replication details. This is bad for R&D because it interrupts their focus for a non-urgent issue, it’s not productive because it breaks the process.

  • Disconnect randomly or go home / offline in the middle of a discussion with your phone turned off without settling an issue / not only is this disrespectful to the colleague, but it will cause unnecessary frustration on all sides.

  • Keep colleagues up late at night needlessly in a way that is inconsiderate to their time and work schedule. If you need to review something try to propose setting a routine, mutually known and acceptable “check-in” time to get in sync. Click on someone in Slack to see what time it is in their timezone.

Final point for management:

If your colleague or staff member is staying online all night to resolve a problem, make sure you are checking in or staying up with them, getting them coffee, ordering them dinner, etc. whatever will help them resolve the issue.

Database Access Policy

  1. Live production database access is limited to senior technical staff & PDs, and only when there is no other way to investigate an issue or solve a problem.

  2. All access to live production databases must be done from within our production security zone. We do not allow tunnelling SSH connections to database servers from outside of our production machines.

  3. Whenever possible, database work should be performed in less-risky ways. Consider using a read-only connection or a restored snapshot copy of the database instead of a read-write production connection. Limit work to the smallest possible subset of the data.

  4. Production database credentials should not be shared with non-authorised users.

  5. Any exceptions to the above policy must be approved by Security Team, or (in emergencies only) the Head of Technical Operations, CEO, Managing Director, CTO, VP Engineering, or VP Products.

Balancing Work and Life

We are a relatively small company in a fiercely competitive market. This requires us all to offer extra effort at times to help us succeed. But we are also in a market with strong ups-and-downs over time (weekly cycle, semester cycle, annual cycle) which helps keep us from having peak workloads except at easily predictable times. This section outlines what you should expect when deadlines are looming, and contains some tips on working in a sustainable fashion.

Death marches and 110% effort

We believe in supporting our staff (not just in R&D) with practical deadlines & reasonable work-loads, to avoid a “death march” scenario where you slog along day after day in pursuit of an unreachable goal. CEO, PMs, Senior Technical Staff, and PDs work together to regulate the flow of work so that at any time staff knows what is next-up and when critical deadlines are coming – and to ensure that those deadlines can be met. If you think you’re being saddled with more work than can be done, raise the alarm early so that the workload can be adjusted.

There will be times, though, when extra effort will be appreciated and even expected. You may have Saturdays when you need to be present for a scheduled downtime, or night/weekend call hours. A critical customer-facing bug could require extra effort to solve. A teammate could be under pressure due to an illness at home. By its nature, software R&D is unpredictable!

What we expect

We expect R&D staff to average a 40-hour week, with occasional spikes of extreme effort. You can expect management to notice and support your efforts, and to address any major mismatch of workload.

How We Judge Success

Ultimately, our corporate success is judged by whether our customers are happy. Your individual success is judged by your contribution to that happiness. We do have a few particular metrics that we track, but we’re more interested in whether you are focused and effective in your efforts.

Management Metrics

The bottom line for management is simple: how much are we spending on a particular product (R&D, S&S, operations, marketing…) vs. how much revenue is that product bringing in. We also keep an eye on some numeric measures, such as:

  • Cost per visit.

  • Number of bugs and incidents.

  • Cost of servers per visit.

  • Mean time to fix bugs.

Accelerated Journey Matrix

This matrix is our way of categorising where a particular developer is from a job development standpoint. It is unlikely that a single developer will be perfectly aligned in all areas, so this tool also helps us identify areas where mentoring or other training opportunities may be helpful.

Junior 0-2 years

Intermediate 2-3 years

Senior 2-3 years

Hiring Requirement

  • Basic technical ability (not newbie)

  • Demonstrated competence / self-learning (academic success or resourcefulness)

  • Eagerness & Base Ambition to Achieve

  • Min Production (1-2 years) Experience (knows stuff & balancing bugs vs test)

  • Proven ability to translate specs into features

  • Good sense & Methods (Knows when to push back)

  • Built & Managed a Team effectively

  • Defined architecture & scope / Built to completion (Idea to App to Market)

  • Improvement History

Milestones

  • GTD to ships without rework

  • Internalised Product Development Process

  • Proves commitment during crisis

  • Proven leadership & management skills

  • Total understanding & maximum clarity to UI of all parts (server to app)

  • Independent Basecamp management

  • Time & sweat through production scaling challenges (overcome all obstacles)

  • Wit & persistence, tolerance & patience, wisdom

Capability

  • Technical Ability

  • Self-management

  • Observe & Improve

  • Presence & Responsive communication

  • Proactive solutions

  • Judgment & Evaluation

  • Enforce “right” process

  • Constructing strong, effective teams

  • Momentum in success

  • Creating right R&D environment

Character & Skills to focus on building

  • Ambition for accelerating the feature, customer-first attitude, care & trust for colleagues & responsibility & consistent

  • Urgency & Intensity, acuity & OCD perception, sequential thinking to process management via reality check & objective analysis

  • Wisdom & patience, knowing optimums, right people, plan & to scale

Success Metrics

  • Bugs Fixed

  • Contribution to features / refactoring / design

  • Features on production

  • SLA resolution

  • Basecamp management

  • Product-team stability & Health

  • Process refinement & effective hiring and delegation

  • Ability to tackle new markets & products

Base Compensation (Monthly)

$2,000 – 3,000

$ 4,000 – 5,000

$ 8,000 – 15,000

Reviews

Our review policies are reprinted here from the Staff Handbook.

30 Day Review

This is a formal review by the new staff member’s immediate supervisor, which should include:

  • Areas of achievement.

  • Areas for improvement.

  • Final conclusion (Strong / Weak).

The review must be recorded in PeopleHR.

Annual Review

While feedback is fast, continuous, and objective at this company, each staff member is entitled to a formal review at least once a year (usually in February), to cover areas of achievement, areas for improvement, and to have a career path and objectives laid out. Copies of annual reviews must be sent to [email protected] for record-keeping.

Other Stuff You Should Know

This section collects a variety of things that didn’t fit anywhere else in the Engineering Handbook, but that every R&D member should know about.

Faria Applications

If you’re interested in knowing what applications Faria produces, uses as part of our own applications, or 3rd parties that use our services in their applications, read this developer overview.

ISO 27001

Faria is pursuing ISO 27001 accreditation. This is a global standard that attests to our customers that we take the security of their data very seriously. The certification process is also helping to drive us towards best practices in data security, disaster recovery, staff training, and other areas.

We expect to complete the initial application process and receive accreditation from our auditors in mid-2017. The basic idea behind ISO 27001 is that we must document all procedures that touch upon security, and then follow the documented procedures.

Much of the required documentation for ISO 27001 is extremely specialised, covering areas such as the composition of our Security Incident Response Team or the procedures under our Business Continuity and Disaster Recovery Plan. As your job is touched by these areas, you’ll be introduced to the formal plans.

At the bottom, though, ISO 27001 is all about protecting our customers’ data and our own internal data, through security consciousness and careful work. If you should find an area where you feel our ISO 27001 documentation is not aligned with what we actually do, please raise the issue with the company’s Information Security Officer.

RCAs & Post-Mortems

At some point you will be asked to write a Root Cause Analysis (RCA) or post-mortem. These may be extensive and formal (as in the case of an incident post-mortem) or somewhat more informal (as in the case of updating a Basecamp ticket with an explanation of what went wrong.

A good RCA should include:

  • A concise explanation of the issue.

  • An analysis of the factors that led to the issue. The goal is understanding, not blame – but if you were responsible, admit the responsibility.

  • A list of corrective actions (which should already be complete, or else which should be followed up on..

An example RCA:

The immediate issue: parents could not log in to OA complete re-enrollment, a critical and time-sensitive process.

Why? Our sync software deleted their OA accounts.

Why? The OA accounts in question were linked to MB accounts that we had told the school that it was OK to delete, and so they deleted those MB accounts. This was an error on my part.

Why? OA and MB accounts are usually linked by sharing the same “core ID” value, but this can be overridden. Any MB account at a school can be linked to any OA account at that school, internally in the sync database. I failed to check for this in compiling the initial list, and assumed the accounts were linked by core ID.

Why? The current sync software is too complex and too undocumented, and I forgot to look for this possibility.

Why? We have had multiple changes of ownership for this code, and understanding has been difficult to acquire.

Remedial actions:

1. (completed): Turn off sync at GAIS and clean up the mess. Monitoring closely with Kristy and will treat any additional issues they report as drop-everything-fires with the hope of regaining their trust.

2. (completed): Add notes on the situation to the sync documentation I’ve been compiling as things go wrong.

3. (completed): Update my diagnostic software to find cases where records are directly linked, instead of assuming that Core ID controls. This was already done by the time I compiled the most recent lists (hence the warning that Angelica noted) but by the time I realised this it was too late and they had already done deletes based on the earlier erroneous list.

4. If and when they are ready to move back to sync, delete all of the current linking information in the sync database and start over.

5. The longer-term fix is to take the current sync software out and replace it. This is being actively pursued; Oleh and Yulian are both on this project, with Alexey Kisel from OA.

5. Work with Oleh to use a copy of GAIS data as a test case for the new sync engine. If we can get good results with their data in testing then it would make more sense to just move them straight there instead of back to old sync and then to new sync. If we can’t get good results, it means the new sync engine isn’t done yet.

This particular RCA uses the “5 Whys” technique to dig into the cascading sequence of events that led to the problem.

For a good example of a more formal RCA for a production incident, see GitLab’s postmortem of a database outage.

Open Source Policy

Faria’s software is built upon a foundation of open source, and our company policy is to help improve the foundation. When you make an improvement to an open source library, please submit your changes to the upstream so that others may benefit from your work. This policy does not apply to changes that include Faria confidential information.

NOTE: We require a member of senior technical staff to approve open source contributions as a cross-check that they do not contain confidential information.

Appendix 1: Software Development Policy

Objective

The System Development Policy provides specific instructions and requirements for the development and change of secure applications developed for Faria Education Group (FEG), either internally or through a third-party. This policy must be followed for all new developments or major upgrades to FEG applications.

In implementing a Software Development Life Cycle (SDLC) approach, FEG has employed industry-leading security standards, including The Open Web Application Security Project (OWASP) standard.

Approach

FEG directly or indirectly through its third-parties, employs a number of structured software development processes and phases, which include properly authorising, testing, approving, implementing, documenting and maintaining the specified system/application. SDLC activities, ranging from the initial requirements stage to the routine maintenance of a system/application, are administered by FEG.

The SDLC methodology encompasses a number of phases, each concluding with a major milestone. Assessments are conducted after each phase to determine if objectives have been satisfied. The following principles apply:

  • The requirements of this policy apply to software acquired or developed by, or on behalf of FEG for production implementation.

  • All development or acquisition of software must follow the software development Change Request procedure for testing and production implementation.

  • Once development or acquisition has occurred the Asset Management procedure must be followed, and the asset register updated.

  • All software development or acquisition must follow the FEG approved Software Development procedures (outlined below) and be performed by authorised persons.

  • Appropriate documentation must be approved and retained for each project that utilises the Software Development procedure.

A workflow for the request, development and release of software changes can be found at the end of this document. Activities for internally-developed systems/applications consist of the following procedures and phases:

New System/Application or Feature Set Development

A new system/application or feature set development includes the implementation of a new service or addition to the features and functions of a current product. The same processes are involved when adding major enhancements to existing functionality. When considering enhancements for this process, we group individual features into feature sets (also called epics) and proceed with the overall process on a feature set basis.

Continuous Development

FEG employs a continuous/agile approach to implementing individual features within feature sets. This means that not all features in a feature set will be at the same point within the overall process at the same time. In particular requirements analysis, design, implementation, and quality assurance for individual features may proceed asynchronously and in parallel. Entire feature sets are released to production as a unified group.

Request for New System/Application or Feature Set

The process begins with the request for a new system/application, feature set or tool. Authorised personnel will initiate the request. All requests are to be appropriately logged using a FEG Change Request.

All of the main sections of the Change Request should be completed and appropriate authorisation requested/received by the Requestor. The following areas will require completion:

  • Requestor Details.

  • Details of Change Request.

  • Resourcing.

  • Approval.

  • Post Implementation Checks.

All Change Requests must be approved by FEG Management (CEO, Managing Director, or Head of Technical Operations) or their designees before proceeding to development and/or deployment.

Feasibility Study

Once a request for a new system/application, feature set or tool is received, FEG will analyse the request and evaluate the operational impact. This may involve undertaking a risk assessment, following the Risk Management Policy. A feasibility study may also be undertaken with the assistance of the Principal Developers.  For completely new systems and applications, a business case may be requested as part of the feasibility study.

Based on the requirements analysis and feasibility study, if the requested system/application, feature set or tool should be completed, a work estimate to implement the new system/application, feature set or tool is prepared.

Estimate of Infrastructure Requirements

Along with estimating the effort and time required to implement the new system/application, feature set or tool, an estimate of any new hardware and/or software required for development and final deployment is conducted.  These estimates are passed on to the business unit (e.g. Sales & Support, being the requestor) for final approval of costs that will be charged to the business unit before development proceeds.

Management Decision

After reviewing the business case for the new system/application, feature set or tool, FEG management (CEO, Managing Director, Head of Technical Operations or their designees) must decide whether the cost/benefits and company’s strategic direction warrant proceeding with development. FEG Management will classify and prioritise requests:

  • Requests approved for immediate implementation will proceed to the development and deployment phases immediately.

  • Requests approved for future implementation will be scheduled by FEG project management according to the company’s overall plan and resource availability.

  • Requests denied will not be scheduled, and the original requestor will be notified.

Requirement Analysis

During this phase, a detailed requirements analysis of the new system/application, feature set or tool is conducted and documented in the form of a requirements specification. Documents and activities for this phase include:

  • Obtaining copies of current documentation for the activity being analysed.

  • Interviewing personnel for major activities during this phase.

Requirements analysis will generally be completed at the feature set level, rather than the individual feature level.

Design

In this phase, various technology and project personnel from FEG collaborate to develop a detailed design of the various activities involved. The software development team charged with implementing the feature reviews the design and the final version is documented in the form of a design specification document. If the feature or tool is to be a part of an existing system/application or functionality, the existing design document may be modified instead of creating a new document. Test plans and procedures for system tests are also developed. Individual features may proceed from design to implementation even if other features in the same feature set do not yet have a finalised design.

Implementation

Once the design is finalised, the actual implementation of the system/application, feature or tool begins with a test in a development environment. After all errors found during the testing stage are triaged, the application code is released to a staging (test) server for quality assurance. The Principal Developer for an individual feature may schedule an error for future remediation if they determine that the error has no security or operational impact. Individual features may proceed from implementation to quality assurance even if other features in the same feature set are not yet ready for QA.

Quality Assurance

After a new feature is deployed to a staging (test) server and integrated in the test environment, any necessary test data will also be created on the staging (test) server by the development team. The test environment is configured as a replica of the production environment or a specific client environment; however, there may be external interfaces which, at times, may not be duplicated, and approximations may be used. Testers then assess the new feature in this test environment. Test cases and scripts are written and documented as required. Any discrepancies are resolved with the development team and any other additional testing is conducted. Sales & Support may be involved at different levels in this phase of the project cycle, based on a mutual understanding of verification requirements. Test results are documented and reviewed by the Software Development team and S&S team for final approval. All open issues must be triaged and either fixed, closed without change, or scheduled for future work before a feature is approved for production release.

Release for Production

Once the system/application, feature set or tool is successful in the staging (test) environment, FEG approves the release for production. Feature sets are not approved for production until all constituent features have passed quality assurance. FEG Operations team coordinates with the Development team to ensure a smooth deployment. After a new feature set is deployed to production, the S&S team is responsible for final acceptance testing.

Security Considerations

The following security controls should be applied to all new software developments or major application upgrades:

  • Validation of all input (to prevent cross-site scripting, injection flaws and malicious file execution) are conducted as needed.

  • Validation of proper error handling is conducted as needed.

  • Validation of secure cryptographic storage is conducted as needed.

  • Validation of secure communications is conducted as needed.

  • Validation of proper Role Based Access Control (RBAC) is conducted as needed.

  • The development/staging (test) environments are separated from the production environments, with access control in place to enforce the division.

  • There is a separation of duties between personnel assigned to the development/test environments and those assigned to the production environments.

  • Unless live debugging is required of the system/application, no changes will be made directly to the production environment.

  • All incidents affecting production systems (including system malfunctions) will be recorded through the Incident Management system. ee separate Incident Management Policy.

  • Production data are not used for testing and development, or are sanitised before use. In the case where production data must be used on a staging server to verify proper operation, the minimum necessary amount of production data will be used and that server will be secured to production standards.

  • Test data and accounts are removed before a production system becomes active.

  • Custom application accounts, user IDs and/or passwords are removed before the system goes into production or is released to S&S and Customers.

Software Development Procedure

The following workflow is recommended for all development procedures.

Appendix 2: Secure Software Development Life Cycle

Version 1.0

Status: Released

Last revision: 3 December 2016

DRI: Mike Gunderloy

Objective

This document describes the Secure Software Development Life Cycle (SSDLC) practices employed at Faria Education Group Ltd (FEG). The goal of these practices is to ensure that information security requirements are managed as part of a continuous development process alongside functional requirements.

Philosophy

At FEG, we feel very strongly that security is not an activity separated from other development, nor is it something to be “bolted on” when the rest of software development is finished. Rather, we expect and encourage our developers to consider secure development practices throughout the software development lifecycle (SDLC) so that our products are secure by default.

Phases

The SSDLC proceeds through six phases:

  1. Requirements gathering & risk analysis.

  2. Architecture & design.

  3. Coding.

  4. Testing.

  5. Software release.

  6. Ongoing operations.

Because the SSDLC is a continuous process, these phases will be executed multiple times as releases of our software product move from requirements gathering to operations. There will be times when more than one of these phases is being executed simultaneously in regard to different requirements and different requirements.

The activities for each phase are detailed below.

Requirements Gathering & Risk Analysis

High level information security risk analysis: FEG conducts periodic reviews of information security risks that affect the entire organization and its component activities and products. Identified threats and vulnerabilities are ranked on likelihood and impact, and mitigation plans developed based on overall risk.

Legal and compliance requirements analysis: FEG maintains a matrix of legal and compliance requirements that are critical to our business. This includes standards such as ISO 27001 and PCI-DSS, as well as laws related to student records and privacy in multiple jurisdictions.

Secure Development Training and Mentoring: All developers at FEG undergo mandatory training in secure development techniques on an annual basis. Senior developers are expected to mentor other developers on secure software techniques. FEG supports professional development through conference attendance and self-study on an ongoing basis.

Risk Awareness: FEG Senior R&D personnel actively seek out current resources on web application and server security, and share information with the rest of the company. We strive to cultivate a culture of security-awareness.

Architecture & Design

Security Requirements: As part of our software design process, security requirements are explicitly maintained on the sprint plan for every FEG product. These requirements are prioritised along with functional requirements.

Threat Modeling: Individual product teams within FEG maintain their own models of potential threats to their applications and the data that they manage. In addition, senior R&D and Operations personnel maintain threat models for issues that impact multiple FEG products.

Technical Risk Assessment: Identified threats and vulnerabilities for each product and sprint are ranked on likelihood and impact, and mitigation plans developed based on overall risk.

Coding

Coding Guidelines: FEG developers are expected to be familiar with and follow secure coding guidelines, including the OWASP Secure Coding Practices and the Ruby on Rails Security Guide. These resources are made available to all FEG developers as part of the developer onboarding process.

Security Tests: All security requirements identified during the Architecture & Design phase will have corresponding unit and integration tests. We strive for 100% testing code coverage of all security-related code.

Peer Code Review: All new code is subject to peer code review before being deployed to non-development servers. We do not allow merging pull requests without an approved peer code review.

Automated Code Review: All new code is subject to automated code review via Code Climate. We do not allow merging pull requests without a successful review.

Testing

Internal Security Testing: We perform manual internal security testing as part of our acceptance tests for every product release. For security tests, we refer to the OWASP Testing Guide. FEG has a set of standard acceptance tests for each product that include checking security-related items.

External Pen Tests:  FEG contracts for external pen testing of all public-facing applications and servers on an annual basis. The results of these tests are shared with senior R&D and are used to update threat models, risk assessments, and security requirements.

Software Release

Final Security Review: Every FEG software release is subject to a final security review by both R&D and S&S. Any review participant is empowered to block a release until all security concerns are resolved.

Secure Deployment: All FEG software is deployed via secured connections. All deployments are done by specific FEG staff, and access to deploy is limited only to authorised staff. Source code is tagged so that we always know exactly which revision is deployed on which servers.

Secure Configuration: All FEG servers are configured by secure means, and all configuration information is maintained under change management. Configuration change requests are handled and documented by FEG operations staff.

Ongoing Operations

Network Configuration and Firewall: FEG maintains secure network configuration according to industry best practices. All access to FEG servers is limited to secure ports, audited, and only authorised for necessary job functions. FEG maintains a web application firewall for all applications.

Monitoring, Logging, and Alerting: FEG monitors all servers on a continuous basis for evidence of security incidents. All server access is logged for future inspection. Alerts are transmitted to key FEG personnel on a real-time basis.

Security Incident Process: Any known or suspected security incident is treated as a highest priority in FEG operations. We maintain an active list of staff to be notified 24x7 for any security incident, and all employees are trained on the use of this list. FEG maintains a library of incident postmortem reports.

Appendix 3: Secure Software Development Life Cycle (OpenApply Team)

Version: 1.0

Status: Released

Last revision: 15 December 2016

DRI: Paul Nikitochkin/Mike Gunderloy

Objective

This document describes the Secure Software Development Life Cycle (SSDLC) practices employed in the OpenApply product team at Faria Education Group Ltd (FEG). The goal of these practices is to ensure that information security requirements are managed as part of a continuous development and deployment process alongside functional requirements.

Philosophy

At FEG, we feel very strongly that security is not an activity separated from other development, nor is it something to be “bolted on” when the rest of software development is finished. Rather, we expect and encourage our developers to consider secure development practices throughout the software development lifecycle (SDLC) so that our products are secure by default.

Phases

The SSDLC proceeds through six phases:

  1. Requirements gathering & risk analysis.

  2. Architecture & design.

  3. Coding.

  4. Testing.

  5. Software release.

  6. Ongoing operations.

Because the SSDLC is a continuous process, these phases will be executed multiple times as releases of our software product move from requirements gathering to operations. Indeed, there will be times when more than one of these phases is being executed simultaneously in regard to different requirements and different requirements.

The activities for each phase are detailed below.

Requirements Gathering & Risk Analysis

High level information security risk analysis: FEG conducts periodic reviews of information security risks that affect the entire organization and its component activities and products. Identified threats and vulnerabilities are ranked on likelihood and impact, and mitigation plans developed based on overall risk.

Legal and compliance requirements analysis: FEG maintains a matrix of legal and compliance requirements that are critical to our business. This includes standards such as ISO 27001 and PCI-DSS, as well as laws related to student records and privacy in multiple jurisdictions.

Secure Development Training and Mentoring: All developers at FEG undergo mandatory training in secure development techniques on an annual basis. Senior developers are expected to mentor other developers on secure software techniques. FEG supports professional development through conference attendance and self-study on an ongoing basis.

Risk Awareness: FEG Senior R&D personnel actively seek out current resources on web application and server security, and share information with the rest of the company. We strive to cultivate a culture of security-awareness.

Architecture & Design

Security Requirements: As part of our software design process, security requirements are explicitly maintained on the ongoing plan for OpenApply. These requirements are prioritised along with functional requirements.

Coding

Coding Guidelines: OpenApply developers are expected to be familiar with and follow secure coding guidelines, including the OWASP Secure Coding Practices and the Ruby on Rails Security Guide. These resources are made available to all OpenApply developers as part of the developer onboarding process.

Security Tests: All security requirements identified during the Architecture & Design phase will have corresponding unit and integration tests. We strive for 100% testing code coverage of all security-related code.

Peer Code Review: All new code is subject to peer code review before being deployed to non-development servers. We do not allow merging pull requests without an approved peer code review. We explicitly include finding and removing security issues within our peer code review standards.

Automated Static Code Analysis: All new code is subject to automated static code analysis and test coverage checking to help identify security issues.

Testing

Internal Security Testing: We perform manual internal security testing as part of our acceptance tests for every product release. For security tests, we refer to the OWASP Testing Guide. FEG has a set of standard acceptance tests for OpenApply that include checking security-related items.

External Pen Tests:  FEG contracts for external pen testing of all public-facing applications and servers (including OpenApply) on an annual basis. The results of these tests are shared with senior R&D and are used to update threat models, risk assessments, and security requirements.

Software Release

Continuous Deployment: OpenApply is continuously deployed to staging servers on the completion of each successful test pass.

Secure Deployment: OpenApply is deployed via secured connections. All deployments are done by specific FEG staff, and access to deploy code to production is limited only to authorised staff. Source code is tagged so that we always know exactly which revision is deployed on which servers.

Secure Configuration: All FEG servers are configured by secure means, and all configuration information is maintained under change management. Configuration change requests are handled and documented by FEG operations staff.

Ongoing Operations

Network Configuration and Firewall: FEG maintains secure network configuration according to industry best practices. All access to FEG servers is limited to secure ports, audited, and only authorised for necessary job functions. FEG maintains a web application firewall for all applications.

Monitoring, Logging, and Alerting: FEG monitors all servers on a continuous basis for evidence of security incidents. All server access is logged for future inspection. Alerts are transmitted to key FEG personnel on a real-time basis.

Security Incident Process: Any known or suspected security incident is treated as a highest priority in FEG operations. We maintain an active list of staff to be notified 24x7 for any security incident, and all employees are trained on the use of this list. FEG maintains a library of incident postmortem reports.

Changelog

Date

Who

What changed

02/03/17

Chris Ward and Mike Gunderloy

Overall consistency, spelling and grammar check.